zoukankan      html  css  js  c++  java
  • Hive_Hive的数据模型_分区表

    Hive的数据模型之分区表

    准备数据表:

    create table sampledata
    (sid int, sname string, gender string, language int, math int, english int)
    row format delimited fields terminated by ',' stored as textfile;

    准备文本数据:

    sampledata.txt
    1,Tom,M,60,80,96
    2,Mary,F,11,22,33
    3,Jerry,M,90,11,23
    4,Rose,M,78,77,76
    5,Mike,F,99,98,98

    将文本数据插入到数据表:

    hive> load data local inpath '/root/pl62716/hive/sampledata.txt' into table sampledata;

    -partition对应于数据库中的Partition 列的密集索引

    -在Hive中,表中的一个Partition对应于表下的一个目录,所有的Partition的数据都存储在对应的目录中。

    创建分区表:

    create table partition_table
    (sid int, sname string)
    partitioned by (gender string)
    row format delimited fields terminated by ',';

    向分区表中插入数据:

    hive> insert into table partition_table partition(gender='M') select sid, sname from sampledata where gender='M';
    hive> insert into table partition_table partition(gender='F') select sid, sname from sampledata where gender='F';

    从内部表解析比从分区表解析效率低:

    (1)内部表:

    hive> explain select * from sampledata where gender='M';
    OK
    STAGE DEPENDENCIES:
      Stage-0 is a root stage
    
    STAGE PLANS:
      Stage: Stage-0
        Fetch Operator
          limit: -1
          Processor Tree:
            TableScan
              alias: sampledata
              Statistics: Num rows: 1 Data size: 90 Basic stats: COMPLETE Column stats: NONE
              Filter Operator
                predicate: (gender = 'M') (type: boolean)
                Statistics: Num rows: 1 Data size: 90 Basic stats: COMPLETE Column stats: NONE
                Select Operator
                  expressions: sid (type: int), sname (type: string), 'M' (type: string), language (type: int), math (type: int), english (type: int)
                  outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5
                  Statistics: Num rows: 1 Data size: 90 Basic stats: COMPLETE Column stats: NONE
                  ListSink
    
    Time taken: 0.046 seconds, Fetched: 20 row(s)

    (2)分区表:

    hive> explain select * from partition_table where gender='M';
    OK
    STAGE DEPENDENCIES:
      Stage-0 is a root stage
    
    STAGE PLANS:
      Stage: Stage-0
        Fetch Operator
          limit: -1
          Processor Tree:
            TableScan
              alias: partition_table
              Statistics: Num rows: 2 Data size: 13 Basic stats: COMPLETE Column stats: NONE
              Select Operator
                expressions: sid (type: int), sname (type: string), 'M' (type: string)
                outputColumnNames: _col0, _col1, _col2
                Statistics: Num rows: 2 Data size: 13 Basic stats: COMPLETE Column stats: NONE
                ListSink
    
    Time taken: 0.187 seconds, Fetched: 17 row(s)
  • 相关阅读:
    C#新功能--命名参数与可选参数
    C#新功能--命名参数
    初识SqlLite ---.net连接数据库
    混合模式程序集是针对“v2.0.50727”版的运行时生成的,在没有配置其他信息的情况下,无法在 4.0 运行时中加载该...
    看到他我一下子就悟了-- 泛型(2)
    ExtJS4中设置tabpanel的tab高度问题
    java学习1-环境搭建
    oracle获取clob调优
    oracle job定时执行存储过程
    【转】【真正福利】成为专业程序员路上用到的各种优秀资料、神器及框架
  • 原文地址:https://www.cnblogs.com/liupuLearning/p/6322609.html
Copyright © 2011-2022 走看看