zoukankan      html  css  js  c++  java
  • Hive桶列BucketedTables

    The CLUSTERED BY and SORTED BY creation commands do not affect how data is inserted into a table – only how it is read. This means that users must be careful to insert data correctly by specifying the number of reducers to be equal to the number of buckets, and using CLUSTER BY and SORT BY commands in their query.

     In general, distributing rows based on the hash will give you a even distribution(均匀分布) in the buckets.

    set mapred.reduce.tasks = 3;

    set hive.enforce.bucketing = true;

    CREATE TABLE user_info_bucketed(user_id BIGINT, firstname STRING, lastname STRING)

    COMMENT 'A bucketed copy of user_info'

    PARTITIONED BY(ds STRING)

    CLUSTERED BY(user_id) INTO 3 BUCKETS;

    INSERT into TABLE user_info_bucketed

    PARTITION (ds='2015-07-25')

    values

    (100,'python','postgresql'), (101,'python','postgresql'), (102,'python','postgresql'), (103,'python','postgresql'), (104,'python','postgresql'), (105,'python','postgresql'), (106,'python','postgresql'), (107,'python','postgresql'), (108,'python','postgresql'), (109,'python','postgresql'), (111,'python','postgresql'), (112,'python','postgresql'), (113,'python','postgresql'), (114,'python','postgresql'), (115,'python','postgresql'), (116,'python','postgresql'), (117,'python','postgresql'), (118,'python','postgresql'), (119,'python','postgresql'), (120,'python','postgresql'), (121,'python','postgresql'), (122,'python','postgresql'), (2000,'R','Oracle'), (2001,'R','Oracle'), (2002,'R','Oracle'), (2003,'R','Oracle'), (2004,'R','Oracle'), (2005,'R','Oracle'), (2006,'R','Oracle'), (2007,'R','Oracle'), (2008,'R','Oracle'), (2009,'R','Oracle'), (2010,'R','Oracle'), (2011,'R','Oracle'), (2012,'R','Oracle'), (2013,'R','Oracle'), (2014,'R','Oracle'), (2015,'R','Oracle'), (2016,'R','Oracle'), (2017,'R','Oracle'), (2018,'R','Oracle'), (2019,'R','Oracle'), (2020,'R','Oracle'), (2030,'R','Oracle'), (2040,'R','Oracle'), (2050,'R','Oracle');

    [spark01 ~]$ hadoop fs -ls -R /user/hive/warehouse/test.db/user_info_bucketed
    drwxrwxrwx   - huai supergroup          0 2015-07-20 22:46 /user/hive/warehouse/test.db/user_info_bucketed/ds=2015-07-25
    -rwxrwxrwx   3 huai supergroup        266 2015-07-20 22:46 /user/hive/warehouse/test.db/user_info_bucketed/ds=2015-07-25/000000_0
    -rwxrwxrwx   3 huai supergroup        288 2015-07-20 22:46 /user/hive/warehouse/test.db/user_info_bucketed/ds=2015-07-25/000001_0
    -rwxrwxrwx   3 huai supergroup        266 2015-07-20 22:46 /user/hive/warehouse/test.db/user_info_bucketed/ds=2015-07-25/000002_0

    [spark01 ~]$ hadoop fs -cat /user/hive/warehouse/test.db/user_info_bucketed/ds=2015-07-25/000000_0 |sort
    102pythonpostgresql
    105pythonpostgresql
    108pythonpostgresql
    111pythonpostgresql
    114pythonpostgresql
    117pythonpostgresql
    120pythonpostgresql
    2001ROracle
    2004ROracle
    2007ROracle
    2010ROracle
    2013ROracle
    2016ROracle
    2019ROracle
    2040ROracle
    [spark01 ~]$ hadoop fs -cat /user/hive/warehouse/test.db/user_info_bucketed/ds=2015-07-25/000001_0 |sort
    100pythonpostgresql
    103pythonpostgresql
    106pythonpostgresql
    109pythonpostgresql
    112pythonpostgresql
    115pythonpostgresql
    118pythonpostgresql
    121pythonpostgresql
    2002ROracle
    2005ROracle
    2008ROracle
    2011ROracle
    2014ROracle
    2017ROracle
    2020ROracle
    2050ROracle
    [spark01 ~]$ hadoop fs -cat /user/hive/warehouse/test.db/user_info_bucketed/ds=2015-07-25/000002_0 |sort
    101pythonpostgresql
    104pythonpostgresql
    107pythonpostgresql
    113pythonpostgresql
    116pythonpostgresql
    119pythonpostgresql
    122pythonpostgresql
    2000ROracle
    2003ROracle
    2006ROracle
    2009ROracle
    2012ROracle
    2015ROracle
    2018ROracle
    2030ROracle

  • 相关阅读:
    MySql多表循环遍历更新
    GridView控件的选择功能,代码实现CheckBox控件的全选、反选以及取消
    使用HTTP POST请求12306网站接口查询火车车次API
    GridView控件的绑定分页功能
    使用HTTP GET请求12306网站接口获取车站名和车站Code
    浅谈从Oracle数据库中取出10条数据的Select语句与Sql Server、MySql的区别
    2022 程序员口语提升指南
    R语言与java整合
    新浪的股票接口 c# (收藏)
    摘记
  • 原文地址:https://www.cnblogs.com/wwxbi/p/4662996.html
Copyright © 2011-2022 走看看