zoukankan      html  css  js  c++  java
  • Hive桶列BucketedTables

    The CLUSTERED BY and SORTED BY creation commands do not affect how data is inserted into a table – only how it is read. This means that users must be careful to insert data correctly by specifying the number of reducers to be equal to the number of buckets, and using CLUSTER BY and SORT BY commands in their query.

     In general, distributing rows based on the hash will give you a even distribution(均匀分布) in the buckets.

    set mapred.reduce.tasks = 3;

    set hive.enforce.bucketing = true;

    CREATE TABLE user_info_bucketed(user_id BIGINT, firstname STRING, lastname STRING)

    COMMENT 'A bucketed copy of user_info'

    PARTITIONED BY(ds STRING)

    CLUSTERED BY(user_id) INTO 3 BUCKETS;

    INSERT into TABLE user_info_bucketed

    PARTITION (ds='2015-07-25')

    values

    (100,'python','postgresql'), (101,'python','postgresql'), (102,'python','postgresql'), (103,'python','postgresql'), (104,'python','postgresql'), (105,'python','postgresql'), (106,'python','postgresql'), (107,'python','postgresql'), (108,'python','postgresql'), (109,'python','postgresql'), (111,'python','postgresql'), (112,'python','postgresql'), (113,'python','postgresql'), (114,'python','postgresql'), (115,'python','postgresql'), (116,'python','postgresql'), (117,'python','postgresql'), (118,'python','postgresql'), (119,'python','postgresql'), (120,'python','postgresql'), (121,'python','postgresql'), (122,'python','postgresql'), (2000,'R','Oracle'), (2001,'R','Oracle'), (2002,'R','Oracle'), (2003,'R','Oracle'), (2004,'R','Oracle'), (2005,'R','Oracle'), (2006,'R','Oracle'), (2007,'R','Oracle'), (2008,'R','Oracle'), (2009,'R','Oracle'), (2010,'R','Oracle'), (2011,'R','Oracle'), (2012,'R','Oracle'), (2013,'R','Oracle'), (2014,'R','Oracle'), (2015,'R','Oracle'), (2016,'R','Oracle'), (2017,'R','Oracle'), (2018,'R','Oracle'), (2019,'R','Oracle'), (2020,'R','Oracle'), (2030,'R','Oracle'), (2040,'R','Oracle'), (2050,'R','Oracle');

    [spark01 ~]$ hadoop fs -ls -R /user/hive/warehouse/test.db/user_info_bucketed
    drwxrwxrwx   - huai supergroup          0 2015-07-20 22:46 /user/hive/warehouse/test.db/user_info_bucketed/ds=2015-07-25
    -rwxrwxrwx   3 huai supergroup        266 2015-07-20 22:46 /user/hive/warehouse/test.db/user_info_bucketed/ds=2015-07-25/000000_0
    -rwxrwxrwx   3 huai supergroup        288 2015-07-20 22:46 /user/hive/warehouse/test.db/user_info_bucketed/ds=2015-07-25/000001_0
    -rwxrwxrwx   3 huai supergroup        266 2015-07-20 22:46 /user/hive/warehouse/test.db/user_info_bucketed/ds=2015-07-25/000002_0

    [spark01 ~]$ hadoop fs -cat /user/hive/warehouse/test.db/user_info_bucketed/ds=2015-07-25/000000_0 |sort
    102pythonpostgresql
    105pythonpostgresql
    108pythonpostgresql
    111pythonpostgresql
    114pythonpostgresql
    117pythonpostgresql
    120pythonpostgresql
    2001ROracle
    2004ROracle
    2007ROracle
    2010ROracle
    2013ROracle
    2016ROracle
    2019ROracle
    2040ROracle
    [spark01 ~]$ hadoop fs -cat /user/hive/warehouse/test.db/user_info_bucketed/ds=2015-07-25/000001_0 |sort
    100pythonpostgresql
    103pythonpostgresql
    106pythonpostgresql
    109pythonpostgresql
    112pythonpostgresql
    115pythonpostgresql
    118pythonpostgresql
    121pythonpostgresql
    2002ROracle
    2005ROracle
    2008ROracle
    2011ROracle
    2014ROracle
    2017ROracle
    2020ROracle
    2050ROracle
    [spark01 ~]$ hadoop fs -cat /user/hive/warehouse/test.db/user_info_bucketed/ds=2015-07-25/000002_0 |sort
    101pythonpostgresql
    104pythonpostgresql
    107pythonpostgresql
    113pythonpostgresql
    116pythonpostgresql
    119pythonpostgresql
    122pythonpostgresql
    2000ROracle
    2003ROracle
    2006ROracle
    2009ROracle
    2012ROracle
    2015ROracle
    2018ROracle
    2030ROracle

  • 相关阅读:
    检测右键点击
    o(∩_∩)o. 原来如此,FLV或者流播放结束的事件不是STOP,而是Complete.
    epoll和selecter
    爬虫与Python:(四)爬虫进阶一之数据抓取——1.Ajax简介
    爬虫与Python:(四)爬虫进阶二之数据存储(文件存储)——1.Text存储
    Python之复制文件目录和快捷方式
    爬虫与Python:(三)基本库的使用——扩展:requests爬取阳光电影网源码
    Python内置函数str()——将对象转换为字符串
    爬虫与Python:(四)爬虫进阶一之数据抓取——2.Python模拟Ajax
    爬虫与Python:(三)基本库的使用——扩展:异常处理中except的用法和作用是什么
  • 原文地址:https://www.cnblogs.com/wwxbi/p/4662996.html
Copyright © 2011-2022 走看看