zoukankan      html  css  js  c++  java
  • hive中的bucket table

    前言

    bucket table(桶表)是对数据进行哈希取值,然后放到不同文件中存储

    应用场景

    当数据量比较大,我们需要更快的完成任务,多个map和reduce进程是唯一的选择。
    但是如果输入文件是一个的话,map任务只能启动一个。
    此时bucket table是个很好的选择,通过指定CLUSTERED的字段,将文件通过hash打散成多个小文件。

    create table test
    (id int,
     name string
    )
    CLUSTERED BY(id) SORTED BY(name) INTO 32 BUCKETS
    ROW FORMAT DELIMITED
    FIELDS TERMINATED BY ‘/t’;

    执行insert前不要忘记设置

    set hive.enforce.bucketing = true;

    强制采用多个reduce进行输出

    hive> INSERT OVERWRITE TABLE test select * from test09;
    Total MapReduce jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks determined at compile time: 32
    In order to change the average load for a reducer (in bytes):
      set hive.exec.reducers.bytes.per.reducer=<number>
    In order to limit the maximum number of reducers:
      set hive.exec.reducers.max=<number>
    In order to set a constant number of reducers:
      set mapred.reduce.tasks=<number>
    Starting Job = job_201103070826_0018, Tracking URL = http://hadoop00:50030/jobdetails.jsp?jobid=job_201103070826_0018 
    Kill Command = /home/hjl/hadoop/bin/../bin/hadoop job  -Dmapred.job.tracker=hadoop00:9001 -kill job_201103070826_0018
    2011-03-08 11:34:23,055 Stage-1 map = 0%,  reduce = 0%
    2011-03-08 11:34:27,084 Stage-1 map = 6%,  reduce = 0%
    *************************************************
    Ended Job = job_201103070826_0018
    Loading data to table test
    5 Rows loaded to test
    OK
    Time taken: 175.036 seconds

    hive的sunwg_test11文件夹下面出现了32个文件,而不是一个文件

    [hadoop@hadoop00 ~]$ hadoop fs -ls /ticketdev/test
    Found 32 items
    -rw-r–r–   3 ticketdev hadoop          0 2011-03-08 11:20 /ticketdev/test/attempt_201103070826_0018_r_000000_0
    -rw-r–r–   3 ticketdev hadoop          0 2011-03-08 11:20 /ticketdev/test/attempt_201103070826_0018_r_000001_0
    -rw-r–r–   3 ticketdev hadoop          0 2011-03-08 11:20 /ticketdev/test/attempt_201103070826_0018_r_000002_0
    -rw-r–r–   3 ticketdev hadoop          0 2011-03-08 11:20 /ticketdev/test/attempt_201103070826_0018_r_000003_0
    -rw-r–r–   3 ticketdev hadoop          8 2011-03-08 11:20 /ticketdev/test/attempt_201103070826_0018_r_000004_0
    -rw-r–r–   3 ticketdev hadoop          9 2011-03-08 11:20 /ticketdev/test/attempt_201103070826_0018_r_000005_0
    -rw-r–r–   3 ticketdev hadoop          8 2011-03-08 11:20 /ticketdev/test/attempt_201103070826_0018_r_000006_0
    -rw-r–r–   3 ticketdev hadoop          9 2011-03-08 11:20 /ticketdev/test/attempt_201103070826_0018_r_000007_0
    -rw-r–r–   3 ticketdev hadoop          9 2011-03-08 11:20 /ticketdev/test/attempt_201103070826_0018_r_000008_0
    -rw-r–r–   3 ticketdev hadoop          0 2011-03-08 11:20 /ticketdev/test/attempt_201103070826_0018_r_000009_0
    -rw-r–r–   3 ticketdev hadoop          0 2011-03-08 11:20 /ticketdev/test/attempt_201103070826_0018_r_000010_0
    -rw-r–r–   3 ticketdev hadoop          0 2011-03-08 11:20 /ticketdev/test/attempt_201103070826_0018_r_000011_0
    -rw-r–r–   3 ticketdev hadoop          0 2011-03-08 11:20 /ticketdev/test/attempt_201103070826_0018_r_000012_0
    -rw-r–r–   3 ticketdev hadoop          0 2011-03-08 11:20 /ticketdev/test/attempt_201103070826_0018_r_000013_0
    -rw-r–r–   3 ticketdev hadoop          0 2011-03-08 11:21 /ticketdev/test/attempt_201103070826_0018_r_000014_0
    -rw-r–r–   3 ticketdev hadoop          0 2011-03-08 11:21 /ticketdev/test/attempt_201103070826_0018_r_000015_0
    -rw-r–r–   3 ticketdev hadoop          0 2011-03-08 11:21 /ticketdev/test/attempt_201103070826_0018_r_000016_0
    -rw-r–r–   3 ticketdev hadoop          0 2011-03-08 11:21 /ticketdev/test/attempt_201103070826_0018_r_000017_0
    -rw-r–r–   3 ticketdev hadoop          0 2011-03-08 11:21 /ticketdev/test/attempt_201103070826_0018_r_000018_0
    -rw-r–r–   3 ticketdev hadoop          0 2011-03-08 11:21 /ticketdev/test/attempt_201103070826_0018_r_000019_0
    -rw-r–r–   3 ticketdev hadoop          0 2011-03-08 11:21 /ticketdev/test/attempt_201103070826_0018_r_000020_0
    -rw-r–r–   3 ticketdev hadoop          0 2011-03-08 11:21 /ticketdev/test/attempt_201103070826_0018_r_000021_0
    -rw-r–r–   3 ticketdev hadoop          0 2011-03-08 11:21 /ticketdev/test/attempt_201103070826_0018_r_000022_0
    -rw-r–r–   3 ticketdev hadoop          0 2011-03-08 11:21 /ticketdev/test/attempt_201103070826_0018_r_000023_0
    -rw-r–r–   3 ticketdev hadoop          0 2011-03-08 11:21 /ticketdev/test/attempt_201103070826_0018_r_000024_0
    -rw-r–r–   3 ticketdev hadoop          0 2011-03-08 11:21 /ticketdev/test/attempt_201103070826_0018_r_000025_0
    -rw-r–r–   3 ticketdev hadoop          0 2011-03-08 11:21 /ticketdev/test/attempt_201103070826_0018_r_000026_0
    -rw-r–r–   3 ticketdev hadoop          0 2011-03-08 11:21 /ticketdev/test/attempt_201103070826_0018_r_000027_0
    -rw-r–r–   3 ticketdev hadoop          0 2011-03-08 11:21 /ticketdev/test/attempt_201103070826_0018_r_000028_0
    -rw-r–r–   3 ticketdev hadoop          0 2011-03-08 11:21 /ticketdev/test/attempt_201103070826_0018_r_000029_0
    -rw-r–r–   3 ticketdev hadoop          0 2011-03-08 11:21 /ticketdev/test/attempt_201103070826_0018_r_000030_0
    -rw-r–r–   3 ticketdev hadoop          0 2011-03-08 11:21 /ticketdev/test/attempt_201103070826_0018_r_000031_0

    文件被打散后,可以启动多个mapreduce task
    当执行一些操作的时候,你会发现系统启动了32个map任务

  • 相关阅读:
    java实现23种设计模式之中介者模式
    java实现23种设计模式之访问者模式
    java实现23种设计模式之状态模式
    java实现23种设计模式之备忘录模式
    java实现23种设计模式之命令模式
    java实现23种设计模式之责任链模式
    VS2012+OpenCV2.4.9+OpenTLD环境搭建
    Real-time Compressive Tracking
    OpenTLD相关资料
    华为面试题
  • 原文地址:https://www.cnblogs.com/duanxingxing/p/5156951.html
Copyright © 2011-2022 走看看