zoukankan      html  css  js  c++  java
  • Hive桶列BucketedTables

    The CLUSTERED BY and SORTED BY creation commands do not affect how data is inserted into a table – only how it is read. This means that users must be careful to insert data correctly by specifying the number of reducers to be equal to the number of buckets, and using CLUSTER BY and SORT BY commands in their query.

     In general, distributing rows based on the hash will give you a even distribution(均匀分布) in the buckets.

    set mapred.reduce.tasks = 3;

    set hive.enforce.bucketing = true;

    CREATE TABLE user_info_bucketed(user_id BIGINT, firstname STRING, lastname STRING)

    COMMENT 'A bucketed copy of user_info'

    PARTITIONED BY(ds STRING)

    CLUSTERED BY(user_id) INTO 3 BUCKETS;

    INSERT into TABLE user_info_bucketed

    PARTITION (ds='2015-07-25')

    values

    (100,'python','postgresql'), (101,'python','postgresql'), (102,'python','postgresql'), (103,'python','postgresql'), (104,'python','postgresql'), (105,'python','postgresql'), (106,'python','postgresql'), (107,'python','postgresql'), (108,'python','postgresql'), (109,'python','postgresql'), (111,'python','postgresql'), (112,'python','postgresql'), (113,'python','postgresql'), (114,'python','postgresql'), (115,'python','postgresql'), (116,'python','postgresql'), (117,'python','postgresql'), (118,'python','postgresql'), (119,'python','postgresql'), (120,'python','postgresql'), (121,'python','postgresql'), (122,'python','postgresql'), (2000,'R','Oracle'), (2001,'R','Oracle'), (2002,'R','Oracle'), (2003,'R','Oracle'), (2004,'R','Oracle'), (2005,'R','Oracle'), (2006,'R','Oracle'), (2007,'R','Oracle'), (2008,'R','Oracle'), (2009,'R','Oracle'), (2010,'R','Oracle'), (2011,'R','Oracle'), (2012,'R','Oracle'), (2013,'R','Oracle'), (2014,'R','Oracle'), (2015,'R','Oracle'), (2016,'R','Oracle'), (2017,'R','Oracle'), (2018,'R','Oracle'), (2019,'R','Oracle'), (2020,'R','Oracle'), (2030,'R','Oracle'), (2040,'R','Oracle'), (2050,'R','Oracle');

    [spark01 ~]$ hadoop fs -ls -R /user/hive/warehouse/test.db/user_info_bucketed
    drwxrwxrwx   - huai supergroup          0 2015-07-20 22:46 /user/hive/warehouse/test.db/user_info_bucketed/ds=2015-07-25
    -rwxrwxrwx   3 huai supergroup        266 2015-07-20 22:46 /user/hive/warehouse/test.db/user_info_bucketed/ds=2015-07-25/000000_0
    -rwxrwxrwx   3 huai supergroup        288 2015-07-20 22:46 /user/hive/warehouse/test.db/user_info_bucketed/ds=2015-07-25/000001_0
    -rwxrwxrwx   3 huai supergroup        266 2015-07-20 22:46 /user/hive/warehouse/test.db/user_info_bucketed/ds=2015-07-25/000002_0

    [spark01 ~]$ hadoop fs -cat /user/hive/warehouse/test.db/user_info_bucketed/ds=2015-07-25/000000_0 |sort
    102pythonpostgresql
    105pythonpostgresql
    108pythonpostgresql
    111pythonpostgresql
    114pythonpostgresql
    117pythonpostgresql
    120pythonpostgresql
    2001ROracle
    2004ROracle
    2007ROracle
    2010ROracle
    2013ROracle
    2016ROracle
    2019ROracle
    2040ROracle
    [spark01 ~]$ hadoop fs -cat /user/hive/warehouse/test.db/user_info_bucketed/ds=2015-07-25/000001_0 |sort
    100pythonpostgresql
    103pythonpostgresql
    106pythonpostgresql
    109pythonpostgresql
    112pythonpostgresql
    115pythonpostgresql
    118pythonpostgresql
    121pythonpostgresql
    2002ROracle
    2005ROracle
    2008ROracle
    2011ROracle
    2014ROracle
    2017ROracle
    2020ROracle
    2050ROracle
    [spark01 ~]$ hadoop fs -cat /user/hive/warehouse/test.db/user_info_bucketed/ds=2015-07-25/000002_0 |sort
    101pythonpostgresql
    104pythonpostgresql
    107pythonpostgresql
    113pythonpostgresql
    116pythonpostgresql
    119pythonpostgresql
    122pythonpostgresql
    2000ROracle
    2003ROracle
    2006ROracle
    2009ROracle
    2012ROracle
    2015ROracle
    2018ROracle
    2030ROracle

  • 相关阅读:
    购买 Linux VPS 服务器后简单的安全设置
    VPS性能测试:CPU内存,硬盘IO读写,带宽速度,UnixBench和压力测试
    Polysh实现多服务器批量执行shell
    第十一周编程总结
    第十周作业
    第九周编程总结修改
    第九周编程总结
    第八周编程总结
    第7周编程总结
    第七周编程总结啊
  • 原文地址:https://www.cnblogs.com/wwxbi/p/4662996.html
Copyright © 2011-2022 走看看