zoukankan      html  css  js  c++  java
  • Hive 表分区

    Hive表的分区就是一个目录,分区字段不和表的字段重复

    创建分区表:

    create table tb_partition(id string, name string)
    PARTITIONED BY (month string)
    row format delimited fields terminated by '	';

    加载数据到hive分区表中

    方法一:通过load方式加载

    load data local inpath '/home/hadoop/files/nameinfo.txt' overwrite into table tb_partition partition(month='201709');

    方法二:insert select 方式

    insert overwrite table tb_partition partition(month='201707') select id, name from name;
    hive> insert into table tb_partition partition(month='201707') select id, name from name;
    Query ID = hadoop_20170918222525_7d074ba1-bff9-44fc-a664-508275175849
    Total jobs = 3
    Launching Job 1 out of 3
    Number of reduce tasks is set to 0 since there's no reduce operator

    方法三:可通过手动上传文件到分区目录,进行加载

    hdfs dfs -mkdir /user/hive/warehouse/tb_partition/month=201710
    hdfs dfs -put nameinfo.txt /user/hive/warehouse/tb_partition/month=201710

    虽然方法三手动上传文件到分区目录,但是查询表的时候是查询不到数据的,需要更新元数据信息。

    更新源数据的两种方法:

    方法一:msck repair table 表名

    hive> msck repair table tb_partition;
    OK
    Partitions not in metastore:    tb_partition:month=201710
    Repair: Added partition to metastore tb_partition:month=201710
    Time taken: 0.265 seconds, Fetched: 2 row(s)

    方法二:alter table tb_partition add partition(month='201708');

    hive> alter table tb_partition add partition(month='201708');
    OK
    Time taken: 0.126 seconds

    查询表数据:

    hive> select *from tb_partition ;
    OK
    1       Lily    201708
    2       Andy    201708
    3       Tom     201708
    1       Lily    201709
    2       Andy    201709
    3       Tom     201709
    1       Lily    201710
    2       Andy    201710
    3       Tom     201710
    Time taken: 0.161 seconds, Fetched: 9 row(s)

    查询分区信息: show partitions 表名

    hive> show partitions tb_partition;
    OK
    month=201708
    month=201709
    month=201710
    Time taken: 0.154 seconds, Fetched: 3 row(s)

    查看hdfs中的文件结构

    [hadoop@node11 files]$ hdfs dfs -ls /user/hive/warehouse/tb_partition/
    17/09/18 22:33:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Found 4 items
    drwxr-xr-x   - hadoop supergroup          0 2017-09-18 22:25 /user/hive/warehouse/tb_partition/month=201707
    drwxr-xr-x   - hadoop supergroup          0 2017-09-18 22:15 /user/hive/warehouse/tb_partition/month=201708
    drwxr-xr-x   - hadoop supergroup          0 2017-09-18 05:55 /user/hive/warehouse/tb_partition/month=201709
    drwxr-xr-x   - hadoop supergroup          0 2017-09-18 22:03 /user/hive/warehouse/tb_partition/month=201710

    创建多级分区

    create table tb_mul_partition(id string, name string)
    PARTITIONED BY (month string, code string)
    row format delimited fields terminated by '	';

    加载数据:

    load data local inpath '/home/hadoop/files/nameinfo.txt' into table tb_mul_partition partition(month='201709',code='10000'); 
    load data local inpath '/home/hadoop/files/nameinfo.txt' into table tb_mul_partition partition(month='201710',code='10000'); 

    查询数据:

    hive> select *From tb_mul_partition where code='10000';
    OK
    1       Lily    201709  10000
    2       Andy    201709  10000
    3       Tom     201709  10000
    1       Lily    201710  10000
    2       Andy    201710  10000
    3       Tom     201710  10000
    Time taken: 0.208 seconds, Fetched: 6 row(s)

    测试以下指定一个分区:

    hive> load data local inpath '/home/hadoop/files/nameinfo.txt' into table tb_mul_partition partition(month='201708'); 
    FAILED: SemanticException [Error 10006]: Line 1:95 Partition not found ''201708''
    hive> load data local inpath '/home/hadoop/files/nameinfo.txt' into table tb_mul_partition partition(code='20000'); 
    FAILED: SemanticException [Error 10006]: Line 1:95 Partition not found ''20000''

    创建是多级分区,指定一个分区是不可以的。

    查看一下在hdfs中存储的结构:

    [hadoop@node11 files]$ hdfs dfs -ls /user/hive/warehouse/tb_mul_partition/month=201710
    drwxr-xr-x   - hadoop supergroup          0 2017-09-18 22:36 /user/hive/warehouse/tb_mul_partition/month=201710/code=10000

    动态分区

    回顾一下之前的向分区插入数据:

    insert overwrite table tb_partition partition(month='201707') select id, name from name;

    这里需要指定具体的分区信息‘201707’,这里通过动态操作,向表里插入数据。

    新建表:

    hive> create table tb_copy_partition like tb_partition;
    OK
    Time taken: 0.118 seconds

    查看一下表结构:

    hive> desc tb_copy_partition;
    OK
    id                      string                                      
    name                    string                                      
    month                   string                                      
                     
    # Partition Information          
    # col_name              data_type               comment             
                     
    month                   string                                      
    Time taken: 0.127 seconds, Fetched: 8 row(s)

    接下来通过动态操作,向tb_copy_partitioon里面插入数据,

    insert into table tb_copy_partition partition(month) select id, name, month from tb_partition; 这里注意需要将分区字段month放到最后。

    hive> insert into table tb_copy_partition partition(month) select id, name, month from tb_partition;
    FAILED: SemanticException [Error 10096]: Dynamic partition strict mode requires at least one static partition column. To turn this off set hive.exec.dynamic.partition.mode=nonstrict

    这里报错,使用动态加载,需要 To turn this off set hive.exec.dynamic.partition.mode=nonstrict

    那根据错误信息设置一下

    hive> set hive.exec.dynamic.partition.mode=nonstrict;

    查询设置信息,设置成功

    hive> set hive.exec.dynamic.partition.mode;
    hive.exec.dynamic.partition.mode=nonstrict

    重新执行:

    hive> insert into table tb_copy_partition partition(month) select id, name, month from tb_partition;
    Query ID = hadoop_20170918230808_0bf202da-279f-4df3-a153-ece0e457c905
    Total jobs = 3
    Launching Job 1 out of 3
    Number of reduce tasks is set to 0 since there's no reduce operator
    Starting Job = job_1505785612206_0002, Tracking URL = http://node11:8088/proxy/application_1505785612206_0002/
    Kill Command = /home/hadoop/app/hadoop-2.6.0-cdh5.10.0/bin/hadoop job  -kill job_1505785612206_0002
    Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 0
    2017-09-18 23:08:13,698 Stage-1 map = 0%,  reduce = 0%
    2017-09-18 23:08:23,896 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 1.94 sec
    2017-09-18 23:08:27,172 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 3.63 sec
    MapReduce Total cumulative CPU time: 3 seconds 630 msec
    Ended Job = job_1505785612206_0002
    Stage-4 is selected by condition resolver.
    Stage-3 is filtered out by condition resolver.
    Stage-5 is filtered out by condition resolver.
    Moving data to: hdfs://cluster1/user/hive/warehouse/tb_copy_partition/.hive-staging_hive_2017-09-18_23-08-01_475_7542657053989652968-1/-ext-10000
    Loading data to table default.tb_copy_partition partition (month=null)
             Time taken for load dynamic partitions : 381
            Loading partition {month=201709}
            Loading partition {month=201708}
            Loading partition {month=201710}
            Loading partition {month=201707}
             Time taken for adding to write entity : 0
    Partition default.tb_copy_partition{month=201707} stats: [numFiles=1, numRows=3, totalSize=20, rawDataSize=17]
    Partition default.tb_copy_partition{month=201708} stats: [numFiles=1, numRows=3, totalSize=20, rawDataSize=17]
    Partition default.tb_copy_partition{month=201709} stats: [numFiles=1, numRows=3, totalSize=20, rawDataSize=17]
    Partition default.tb_copy_partition{month=201710} stats: [numFiles=1, numRows=3, totalSize=20, rawDataSize=17]
    MapReduce Jobs Launched: 
    Stage-Stage-1: Map: 2   Cumulative CPU: 3.63 sec   HDFS Read: 8926 HDFS Write: 382 SUCCESS
    Total MapReduce CPU Time Spent: 3 seconds 630 msec
    OK
    Time taken: 28.932 seconds

    查询一下数据:

    hive> select *From tb_copy_partition;
    OK
    1       Lily    201707
    2       Andy    201707
    3       Tom     201707
    1       Lily    201708
    2       Andy    201708
    3       Tom     201708
    1       Lily    201709
    2       Andy    201709
    3       Tom     201709
    1       Lily    201710
    2       Andy    201710
    3       Tom     201710
    Time taken: 0.121 seconds, Fetched: 12 row(s)

    完成

  • 相关阅读:
    因素的随机选择,这里只是处理了 20 % 也是80 %
    随机函数完成。就是要的这个效果。
    poly 奇数偶数 隔点选择。 实时的 线 面的选择, 就是和max2011 石墨一样的办法,我只是没有用when 。
    在 poly中根据线段的长短,还有就是面积的选择
    终于找到bit 的资料了,我是poly 中看到的。
    常用脚本资料词汇
    转:Android源码分析
    Android内核的简单分析(转
    转:Android核心模块及相关技术介绍
    Google Earth 的几个标志性地方。
  • 原文地址:https://www.cnblogs.com/one--way/p/7550795.html
Copyright © 2011-2022 走看看