zoukankan      html  css  js  c++  java
  • Aws云服务EMR使用

    Aws云服务EMR使用

    创建表结构

    创建abc库下的abc_user_i表字段s3://abc-server/abc-emr/shell/ABC_USER_HIVE.q:

    • EXTERNAL 指定为外部表
    • partitioned by (createTime Date) 指定分区表,列名createTime
    • LOCATION '${INPUT}' 指定输出位置
    CREATE EXTERNAL TABLE IF NOT EXISTS abc.abc_user_i ( 
    devId STRING,
    appId INT ,
    paName STRING,
    appVersion STRING,
    appVercode STRING,
    sdkVersion STRING,
    sdkVerCode STRING,       
    phoneVersion STRING,
    mac STRING,
    source STRING, 
    content STRING,
    logDate Date,
    ip STRING
    )
    partitioned by (createTime Date)
    ROW FORMAT DELIMITED FIELDS TERMINATED BY ':'
    LOCATION '${INPUT}';
    

    添加步骤创建表:
    添加步骤

    hive的操作

    # 创建分区:

    • location 指定 存储文件的具体位置 按日期存储的压缩包文件
    • 分区一个目录对应一条分区表
    alter table abc.abc_user_i add partition (createTime='2017-10-20') location 's3://abc-server/abc-emr/InputDate/2017-10-20/';
    alter table abc.abc_user_i add partition (createTime='2017-10-20') location 's3://abc-server/abc-emr/InputDate/2017-10-21/';
    alter table abc.abc_user_i add partition (createTime='2017-10-20') location 's3://abc-server/abc-emr/InputDate/2017-10-22/';
    

    # 查询已经创建的分区:

    show partitions abc.abc_user_i;
    createtime=2017-10-20
    createtime=2017-10-21
    createtime=2017-10-22
    

    # 根据分区 查询结果:

    hive> select count(*),createTime from abc.abc_user_i where createTime='2017-10-01' group by createTime;
    Query ID = hadoop_20171102062813_7cccccxxx-c311-411e-de30-1xxxxaaaaa4
    Total jobs = 1
    Launching Job 1 out of 1
    Status: Running (Executing on YARN cluster with App id application_1508122225619_0272)
    
    ----------------------------------------------------------------------------------------------
            VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED  
    ----------------------------------------------------------------------------------------------
    Map 1 .......... container     SUCCEEDED      1          1        0        0       0       0  
    Reducer 2 ...... container     SUCCEEDED      1          1        0        0       0       0  
    ----------------------------------------------------------------------------------------------
    VERTICES: 02/02  [==========================>>] 100%  ELAPSED TIME: 15.65 s    
    ----------------------------------------------------------------------------------------------
    OK
    5404869 2017-10-01
    Time taken: 17.211 seconds, Fetched: 1 row(s)
    

    # 删除分区(外部表只会删除索引,不会删除数据;内部表会删除索引和数据):

    alter table adsdk.adsdk_useraction_i drop partition(createTime='2017-10-24');
    

    Hive创建外部表以及分区参考:
    http://blog.csdn.net/csfreebird/article/details/27874943

  • 相关阅读:
    PyQt5笔记之标签
    PyQt5笔记之布局管理
    PyQt5笔记之菜单栏
    Windows环境安装PyQt5
    MySQL8的密码策略
    find命令常用参数
    Linux性能监控工具
    Linux内核管理
    DNS简单配置
    Solr 5.2.1 部署并索引Mysql数据库
  • 原文地址:https://www.cnblogs.com/baolin2200/p/7772309.html
Copyright © 2011-2022 走看看