zoukankan      html  css  js  c++  java
  • hive中的bucket table (输入文件是一个的话,map任务只能启动一个 ,给力啊)

    当数据量比较大,我们需要更快的完成任务,多个map和reduce进程是唯一的选择。
    但是如果输入文件是一个的话,map任务只能启动一个。
    此时bucket table是个很好的选择,通过指定CLUSTERED的字段,将文件通过hash打散成多个小文件。

    create table sunwg_test11(id int,name string)
    CLUSTERED BY(id) SORTED BY(name) INTO 32 BUCKETS
    ROW FORMAT DELIMITED
    FIELDS TERMINATED BY ‘/t’;

    需要特别注意的是:clustered by和sorted by不会影响数据的导入,这意味着,用户必须自己负责数据如何如何导入,包括数据的分桶和排序。

    执行insert前不要忘记设置
    set hive.enforce.bucketing = true;
    强制采用多个reduce进行输出

    hive> INSERT OVERWRITE TABLE sunwg_test11 select * from test09;
    Total MapReduce jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks determined at compile time: 32
    In order to change the average load for a reducer (in bytes):
      set hive.exec.reducers.bytes.per.reducer=<number>
    In order to limit the maximum number of reducers:
      set hive.exec.reducers.max=<number>
    In order to set a constant number of reducers:
      set mapred.reduce.tasks=<number>
    Starting Job = job_201103070826_0018, Tracking URL = http://hadoop00:50030/jobdetails.jsp?jobid=job_201103070826_0018 
    Kill Command = /home/hjl/hadoop/bin/../bin/hadoop job  -Dmapred.job.tracker=hadoop00:9001 -kill job_201103070826_0018
    2011-03-08 11:34:23,055 Stage-1 map = 0%,  reduce = 0%
    2011-03-08 11:34:27,084 Stage-1 map = 6%,  reduce = 0%
    2011-03-08 11:34:29,100 Stage-1 map = 13%,  reduce = 0%
    2011-03-08 11:34:32,124 Stage-1 map = 19%,  reduce = 0%
    2011-03-08 11:34:34,142 Stage-1 map = 22%,  reduce = 0%
    2011-03-08 11:34:35,151 Stage-1 map = 25%,  reduce = 0%
    2011-03-08 11:34:37,167 Stage-1 map = 28%,  reduce = 0%
    2011-03-08 11:34:39,182 Stage-1 map = 31%,  reduce = 0%
    2011-03-08 11:34:41,199 Stage-1 map = 34%,  reduce = 1%
    2011-03-08 11:34:42,211 Stage-1 map = 38%,  reduce = 1%
    2011-03-08 11:34:44,233 Stage-1 map = 41%,  reduce = 1%
    2011-03-08 11:34:46,250 Stage-1 map = 44%,  reduce = 1%
    2011-03-08 11:34:48,270 Stage-1 map = 47%,  reduce = 1%
    2011-03-08 11:34:49,280 Stage-1 map = 50%,  reduce = 1%
    2011-03-08 11:34:51,300 Stage-1 map = 53%,  reduce = 1%
    2011-03-08 11:34:53,316 Stage-1 map = 56%,  reduce = 1%
    2011-03-08 11:34:55,330 Stage-1 map = 59%,  reduce = 1%
    2011-03-08 11:34:56,340 Stage-1 map = 63%,  reduce = 1%
    2011-03-08 11:34:58,357 Stage-1 map = 66%,  reduce = 1%
    2011-03-08 11:35:00,378 Stage-1 map = 69%,  reduce = 1%
    2011-03-08 11:35:02,393 Stage-1 map = 72%,  reduce = 1%
    2011-03-08 11:35:04,409 Stage-1 map = 75%,  reduce = 1%
    2011-03-08 11:35:05,419 Stage-1 map = 78%,  reduce = 1%
    2011-03-08 11:35:07,435 Stage-1 map = 81%,  reduce = 1%
    2011-03-08 11:35:09,451 Stage-1 map = 84%,  reduce = 2%
    2011-03-08 11:35:12,475 Stage-1 map = 88%,  reduce = 2%
    2011-03-08 11:35:14,496 Stage-1 map = 91%,  reduce = 2%
    2011-03-08 11:35:16,513 Stage-1 map = 94%,  reduce = 2%
    2011-03-08 11:35:18,528 Stage-1 map = 97%,  reduce = 2%
    2011-03-08 11:35:20,552 Stage-1 map = 100%,  reduce = 2%
    2011-03-08 11:35:25,589 Stage-1 map = 100%,  reduce = 6%
    2011-03-08 11:35:33,645 Stage-1 map = 100%,  reduce = 9%
    2011-03-08 11:35:34,654 Stage-1 map = 100%,  reduce = 13%
    2011-03-08 11:35:39,693 Stage-1 map = 100%,  reduce = 16%
    2011-03-08 11:35:41,710 Stage-1 map = 100%,  reduce = 19%
    2011-03-08 11:35:45,740 Stage-1 map = 100%,  reduce = 22%
    2011-03-08 11:35:47,757 Stage-1 map = 100%,  reduce = 25%
    2011-03-08 11:35:52,793 Stage-1 map = 100%,  reduce = 28%
    2011-03-08 11:35:54,808 Stage-1 map = 100%,  reduce = 31%
    2011-03-08 11:35:59,844 Stage-1 map = 100%,  reduce = 34%
    2011-03-08 11:36:01,861 Stage-1 map = 100%,  reduce = 38%
    2011-03-08 11:36:05,891 Stage-1 map = 100%,  reduce = 41%
    2011-03-08 11:36:07,911 Stage-1 map = 100%,  reduce = 44%
    2011-03-08 11:36:12,947 Stage-1 map = 100%,  reduce = 47%
    2011-03-08 11:36:13,958 Stage-1 map = 100%,  reduce = 50%
    2011-03-08 11:36:19,002 Stage-1 map = 100%,  reduce = 53%
    2011-03-08 11:36:21,017 Stage-1 map = 100%,  reduce = 56%
    2011-03-08 11:36:26,053 Stage-1 map = 100%,  reduce = 59%
    2011-03-08 11:36:28,068 Stage-1 map = 100%,  reduce = 63%
    2011-03-08 11:36:33,106 Stage-1 map = 100%,  reduce = 66%
    2011-03-08 11:36:35,122 Stage-1 map = 100%,  reduce = 69%
    2011-03-08 11:36:39,152 Stage-1 map = 100%,  reduce = 72%
    2011-03-08 11:36:41,169 Stage-1 map = 100%,  reduce = 75%
    2011-03-08 11:36:46,208 Stage-1 map = 100%,  reduce = 78%
    2011-03-08 11:36:48,227 Stage-1 map = 100%,  reduce = 81%
    2011-03-08 11:36:53,262 Stage-1 map = 100%,  reduce = 84%
    2011-03-08 11:36:54,271 Stage-1 map = 100%,  reduce = 88%
    2011-03-08 11:36:59,309 Stage-1 map = 100%,  reduce = 91%
    2011-03-08 11:37:01,328 Stage-1 map = 100%,  reduce = 94%
    2011-03-08 11:37:06,365 Stage-1 map = 100%,  reduce = 97%
    2011-03-08 11:37:08,382 Stage-1 map = 100%,  reduce = 100%
    Ended Job = job_201103070826_0018
    Loading data to table sunwg_test11
    5 Rows loaded to sunwg_test11
    OK
    Time taken: 175.036 seconds

    hive的sunwg_test11文件夹下面出现了32个文件,而不是一个文件
    [hadoop@hadoop00 ~]$ hadoop fs -ls /hjl/sunwg_test11
    Found 32 items
    -rw-r–r–   3 hjl hadoop          0 2011-03-08 11:20 /hjl/sunwg_test11/attempt_201103070826_0018_r_000000_0
    -rw-r–r–   3 hjl hadoop          0 2011-03-08 11:20 /hjl/sunwg_test11/attempt_201103070826_0018_r_000001_0
    -rw-r–r–   3 hjl hadoop          0 2011-03-08 11:20 /hjl/sunwg_test11/attempt_201103070826_0018_r_000002_0
    -rw-r–r–   3 hjl hadoop          0 2011-03-08 11:20 /hjl/sunwg_test11/attempt_201103070826_0018_r_000003_0
    -rw-r–r–   3 hjl hadoop          8 2011-03-08 11:20 /hjl/sunwg_test11/attempt_201103070826_0018_r_000004_0
    -rw-r–r–   3 hjl hadoop          9 2011-03-08 11:20 /hjl/sunwg_test11/attempt_201103070826_0018_r_000005_0
    -rw-r–r–   3 hjl hadoop          8 2011-03-08 11:20 /hjl/sunwg_test11/attempt_201103070826_0018_r_000006_0
    -rw-r–r–   3 hjl hadoop          9 2011-03-08 11:20 /hjl/sunwg_test11/attempt_201103070826_0018_r_000007_0
    -rw-r–r–   3 hjl hadoop          9 2011-03-08 11:20 /hjl/sunwg_test11/attempt_201103070826_0018_r_000008_0
    -rw-r–r–   3 hjl hadoop          0 2011-03-08 11:20 /hjl/sunwg_test11/attempt_201103070826_0018_r_000009_0
    -rw-r–r–   3 hjl hadoop          0 2011-03-08 11:20 /hjl/sunwg_test11/attempt_201103070826_0018_r_000010_0
    -rw-r–r–   3 hjl hadoop          0 2011-03-08 11:20 /hjl/sunwg_test11/attempt_201103070826_0018_r_000011_0
    -rw-r–r–   3 hjl hadoop          0 2011-03-08 11:20 /hjl/sunwg_test11/attempt_201103070826_0018_r_000012_0
    -rw-r–r–   3 hjl hadoop          0 2011-03-08 11:20 /hjl/sunwg_test11/attempt_201103070826_0018_r_000013_0
    -rw-r–r–   3 hjl hadoop          0 2011-03-08 11:21 /hjl/sunwg_test11/attempt_201103070826_0018_r_000014_0
    -rw-r–r–   3 hjl hadoop          0 2011-03-08 11:21 /hjl/sunwg_test11/attempt_201103070826_0018_r_000015_0
    -rw-r–r–   3 hjl hadoop          0 2011-03-08 11:21 /hjl/sunwg_test11/attempt_201103070826_0018_r_000016_0
    -rw-r–r–   3 hjl hadoop          0 2011-03-08 11:21 /hjl/sunwg_test11/attempt_201103070826_0018_r_000017_0
    -rw-r–r–   3 hjl hadoop          0 2011-03-08 11:21 /hjl/sunwg_test11/attempt_201103070826_0018_r_000018_0
    -rw-r–r–   3 hjl hadoop          0 2011-03-08 11:21 /hjl/sunwg_test11/attempt_201103070826_0018_r_000019_0
    -rw-r–r–   3 hjl hadoop          0 2011-03-08 11:21 /hjl/sunwg_test11/attempt_201103070826_0018_r_000020_0
    -rw-r–r–   3 hjl hadoop          0 2011-03-08 11:21 /hjl/sunwg_test11/attempt_201103070826_0018_r_000021_0
    -rw-r–r–   3 hjl hadoop          0 2011-03-08 11:21 /hjl/sunwg_test11/attempt_201103070826_0018_r_000022_0
    -rw-r–r–   3 hjl hadoop          0 2011-03-08 11:21 /hjl/sunwg_test11/attempt_201103070826_0018_r_000023_0
    -rw-r–r–   3 hjl hadoop          0 2011-03-08 11:21 /hjl/sunwg_test11/attempt_201103070826_0018_r_000024_0
    -rw-r–r–   3 hjl hadoop          0 2011-03-08 11:21 /hjl/sunwg_test11/attempt_201103070826_0018_r_000025_0
    -rw-r–r–   3 hjl hadoop          0 2011-03-08 11:21 /hjl/sunwg_test11/attempt_201103070826_0018_r_000026_0
    -rw-r–r–   3 hjl hadoop          0 2011-03-08 11:21 /hjl/sunwg_test11/attempt_201103070826_0018_r_000027_0
    -rw-r–r–   3 hjl hadoop          0 2011-03-08 11:21 /hjl/sunwg_test11/attempt_201103070826_0018_r_000028_0
    -rw-r–r–   3 hjl hadoop          0 2011-03-08 11:21 /hjl/sunwg_test11/attempt_201103070826_0018_r_000029_0
    -rw-r–r–   3 hjl hadoop          0 2011-03-08 11:21 /hjl/sunwg_test11/attempt_201103070826_0018_r_000030_0
    -rw-r–r–   3 hjl hadoop          0 2011-03-08 11:21 /hjl/sunwg_test11/attempt_201103070826_0018_r_000031_0

    文件被打散后,可以启动多个mapreduce task
    当执行一些操作的时候,你会发现系统启动了32个map任务

  • 相关阅读:
    浅谈Java的开放封闭原则
    Gson 和 Fastjson 你不知道的事
    mac 开发必备软件(不断update ing...)
    fastJson泛型如何转换
    springboot 学习笔记(二)--- properties 配置
    springboot 学习笔记(一)
    mac 安装MySQL
    mybatis 注解快速上手
    svn 冲突解决
    java画图输出到磁盘
  • 原文地址:https://www.cnblogs.com/tangtianfly/p/2818315.html
Copyright © 2011-2022 走看看