zoukankan      html  css  js  c++  java
  • Hive桶列BucketedTables

    The CLUSTERED BY and SORTED BY creation commands do not affect how data is inserted into a table – only how it is read. This means that users must be careful to insert data correctly by specifying the number of reducers to be equal to the number of buckets, and using CLUSTER BY and SORT BY commands in their query.

     In general, distributing rows based on the hash will give you a even distribution(均匀分布) in the buckets.

    set mapred.reduce.tasks = 3;

    set hive.enforce.bucketing = true;

    CREATE TABLE user_info_bucketed(user_id BIGINT, firstname STRING, lastname STRING)

    COMMENT 'A bucketed copy of user_info'

    PARTITIONED BY(ds STRING)

    CLUSTERED BY(user_id) INTO 3 BUCKETS;

    INSERT into TABLE user_info_bucketed

    PARTITION (ds='2015-07-25')

    values

    (100,'python','postgresql'), (101,'python','postgresql'), (102,'python','postgresql'), (103,'python','postgresql'), (104,'python','postgresql'), (105,'python','postgresql'), (106,'python','postgresql'), (107,'python','postgresql'), (108,'python','postgresql'), (109,'python','postgresql'), (111,'python','postgresql'), (112,'python','postgresql'), (113,'python','postgresql'), (114,'python','postgresql'), (115,'python','postgresql'), (116,'python','postgresql'), (117,'python','postgresql'), (118,'python','postgresql'), (119,'python','postgresql'), (120,'python','postgresql'), (121,'python','postgresql'), (122,'python','postgresql'), (2000,'R','Oracle'), (2001,'R','Oracle'), (2002,'R','Oracle'), (2003,'R','Oracle'), (2004,'R','Oracle'), (2005,'R','Oracle'), (2006,'R','Oracle'), (2007,'R','Oracle'), (2008,'R','Oracle'), (2009,'R','Oracle'), (2010,'R','Oracle'), (2011,'R','Oracle'), (2012,'R','Oracle'), (2013,'R','Oracle'), (2014,'R','Oracle'), (2015,'R','Oracle'), (2016,'R','Oracle'), (2017,'R','Oracle'), (2018,'R','Oracle'), (2019,'R','Oracle'), (2020,'R','Oracle'), (2030,'R','Oracle'), (2040,'R','Oracle'), (2050,'R','Oracle');

    [spark01 ~]$ hadoop fs -ls -R /user/hive/warehouse/test.db/user_info_bucketed
    drwxrwxrwx   - huai supergroup          0 2015-07-20 22:46 /user/hive/warehouse/test.db/user_info_bucketed/ds=2015-07-25
    -rwxrwxrwx   3 huai supergroup        266 2015-07-20 22:46 /user/hive/warehouse/test.db/user_info_bucketed/ds=2015-07-25/000000_0
    -rwxrwxrwx   3 huai supergroup        288 2015-07-20 22:46 /user/hive/warehouse/test.db/user_info_bucketed/ds=2015-07-25/000001_0
    -rwxrwxrwx   3 huai supergroup        266 2015-07-20 22:46 /user/hive/warehouse/test.db/user_info_bucketed/ds=2015-07-25/000002_0

    [spark01 ~]$ hadoop fs -cat /user/hive/warehouse/test.db/user_info_bucketed/ds=2015-07-25/000000_0 |sort
    102pythonpostgresql
    105pythonpostgresql
    108pythonpostgresql
    111pythonpostgresql
    114pythonpostgresql
    117pythonpostgresql
    120pythonpostgresql
    2001ROracle
    2004ROracle
    2007ROracle
    2010ROracle
    2013ROracle
    2016ROracle
    2019ROracle
    2040ROracle
    [spark01 ~]$ hadoop fs -cat /user/hive/warehouse/test.db/user_info_bucketed/ds=2015-07-25/000001_0 |sort
    100pythonpostgresql
    103pythonpostgresql
    106pythonpostgresql
    109pythonpostgresql
    112pythonpostgresql
    115pythonpostgresql
    118pythonpostgresql
    121pythonpostgresql
    2002ROracle
    2005ROracle
    2008ROracle
    2011ROracle
    2014ROracle
    2017ROracle
    2020ROracle
    2050ROracle
    [spark01 ~]$ hadoop fs -cat /user/hive/warehouse/test.db/user_info_bucketed/ds=2015-07-25/000002_0 |sort
    101pythonpostgresql
    104pythonpostgresql
    107pythonpostgresql
    113pythonpostgresql
    116pythonpostgresql
    119pythonpostgresql
    122pythonpostgresql
    2000ROracle
    2003ROracle
    2006ROracle
    2009ROracle
    2012ROracle
    2015ROracle
    2018ROracle
    2030ROracle

  • 相关阅读:
    MVC中使用EF(2):实现基本的CRUD功能
    ibatis学习之道:ibatis的<[CDATA]>dynamic属性跟#$的应用
    css-选择器
    postman进行http接口测试
    使用HttpClient测试SpringMVC的接口
    http接口测试—自动化测试框架设计
    接口测试自动化框架搭建
    JAVA利用HttpClient进行POST请求(HTTPS)
    java.io.IOException: Attempted read from closed stream. 异常,解决
    java.lang.OutOfMemoryError:GC overhead limit exceeded填坑心得
  • 原文地址:https://www.cnblogs.com/wwxbi/p/4662996.html
Copyright © 2011-2022 走看看