zoukankan      html  css  js  c++  java
  • impala+kudu

    【impala建表】kudu的表必须有主键,作为分区的字段需排在其他字段前面

    【range分区】(不推荐)
    CREATE TABLE KUDU_WATER_HISTORY ( 
    id STRING, 
    year INT,
    device STRING,
    reading INT,
    time STRING,
    PRIMARY KEY (id,year) 
    ) PARTITION BY RANGE (year)

    PARTITION VALUES < 2017, 
    PARTITION 2017 <= VALUES < 2018, 
    PARTITION 2018 <= VALUES
    )
    STORED AS KUDU 
    TBLPROPERTIES('kudu.master_addresses' = '10.10.30.200:7051');

    【hash分区】(不推荐)
    CREATE TABLE KUDU_WATER_HISTORY_PARTITION_BY_ID ( 
    id STRING, 
    year INT,
    device STRING,
    reading INT,
    time STRING,
    PRIMARY KEY (id) 
    ) PARTITION BY HASH (id) PARTITIONS 4
    STORED AS KUDU 
    TBLPROPERTIES('kudu.master_addresses' = '10.10.30.200:7051');

    【hash range混合分区】推荐是用混合分区方式
    CREATE TABLE KUDU_WATER_HISTORY (
    id STRING,
    year INT,
    device STRING,
    reading INT,
    time STRING,
    PRIMARY KEY (id,device,year)
    ) PARTITION BY HASH (device) PARTITIONS 3,
    RANGE (year)
    (
    PARTITION VALUE = 2016,
    PARTITION VALUE = 2017,
    PARTITION VALUE = 2018,
    PARTITION VALUE = 2019
    )
    STORED AS KUDU
    TBLPROPERTIES('kudu.master_addresses' = '10.10.30.200:7051');

    CREATE TABLE DEVICE_KUDU (
    id STRING,
    device STRING,
    name STRING,
    orgId INT,
    PRIMARY KEY (id)
    ) PARTITION BY HASH (id) PARTITIONS 4
    STORED AS KUDU
    TBLPROPERTIES('kudu.master_addresses' = '10.10.30.200:7051');

    【增加分区】
    ALTER TABLE KUDU_WATER_HISTORY ADD RANGE PARTITION VALUE = 2020;

    【查询语句】

    select
    T_3C75F1.`device`,
    year(T_3C75F1.`time`),
    month(T_3C75F1.`time`),
    sum(T_3C75F1.`reading`),
    count(1)
    from (select DEVICE_KUDU.device,reading,to_timestamp(time,'yyyy-MM-dd HH:mm:ss') as time from KUDU_WATER_HISTORY,DEVICE_KUDU where KUDU_WATER_HISTORY.device=DEVICE_KUDU.device) as `T_3C75F1`
    group by
    T_3C75F1.`device`,
    year(T_3C75F1.`time`),
    month(T_3C75F1.`time`);

    耗时:DEVICE_KUDU表50条记录,KUDU_WATER_HISTORY表1亿条记录,执行上面的查询语句耗时12秒。同样的数据量和查询方式,比impala+hdfs+parquet快10倍。

    【跨数据源的查询】

    device表的数据存储在hdfs中,抄表读数的数据存储在kudu中。分别在impala中创建对应的表

    explain select
    T_3C75F1.`deviceid`,
    year(T_3C75F1.`time`),
    month(T_3C75F1.`time`),
    sum(T_3C75F1.`reading`),
    count(1)
    from (select device_parquet.deviceid,reading,to_timestamp(time,'yyyy-MM-dd HH:mm:ss') as time from KUDU_WATER_HISTORY,device_parquet where KUDU_WATER_HISTORY.device=device_parquet.deviceid) as `T_3C75F1`
    group by
    T_3C75F1.`deviceid`,
    year(T_3C75F1.`time`),
    month(T_3C75F1.`time`);

    执行上面的语句,查看执行计划

    +-----------------------------------------------------------------------------------------------------------------------------------------+
    | Explain String                                                                                                                          |
    +-----------------------------------------------------------------------------------------------------------------------------------------+
    | Per-Host Resource Reservation: Memory=400.00MB                                                                                          |
    | Per-Host Resource Estimates: Memory=2.25GB                                                                                              |
    | WARNING: The following tables are missing relevant table and/or column statistics.                                                      |
    | default.device_parquet, default.kudu_water_history                                                                                      |
    |                                                                                                                                         |
    | PLAN-ROOT SINK                                                                                                                          |
    | |                                                                                                                                       |
    | 07:EXCHANGE [UNPARTITIONED]                                                                                                             |
    | |                                                                                                                                       |
    | 06:AGGREGATE [FINALIZE]                                                                                                                 |
    | |  output: sum:merge(T_3C75F1.reading), count:merge(1)                                                                                  |
    | |  group by: T_3C75F1.deviceid, year(T_3C75F1.time), month(T_3C75F1.time)                                                               |
    | |                                                                                                                                       |
    | 05:EXCHANGE [HASH(T_3C75F1.deviceid,year(T_3C75F1.time),month(T_3C75F1.time))]                                                          |
    | |                                                                                                                                       |
    | 03:AGGREGATE [STREAMING]                                                                                                                |
    | |  output: sum(reading), count(1)                                                                                                       |
    | |  group by: device_parquet.deviceid, year(to_timestamp(time, 'yyyy-MM-dd HH:mm:ss')), month(to_timestamp(time, 'yyyy-MM-dd HH:mm:ss')) |
    | |                                                                                                                                       |
    | 02:HASH JOIN [INNER JOIN, BROADCAST]                                                                                                    |
    | |  hash predicates: KUDU_WATER_HISTORY.device = device_parquet.deviceid                                                                 |
    | |                                                                                                                                       |
    | |--04:EXCHANGE [BROADCAST]                                                                                                              |
    | |  |                                                                                                                                    |
    | |  01:SCAN HDFS [default.device_parquet]                                                                                                |
    | |     partitions=1/1 files=1 size=1.00KB                                                                                                |
    | |                                                                                                                                       |
    | 00:SCAN KUDU [default.kudu_water_history]                                                                                               |
    +-----------------------------------------------------------------------------------------------------------------------------------------+

    能看到00:SCAN KUDU和01:SCAN HDFS,分别从kudu和hdfs中查询数据。

  • 相关阅读:
    Windows共享作为公司文件服务器的案例
    关于限制域用户登陆计算机的几点总结
    防止加入域主机脱离域的控制(管理员权限)
    域组策略禁止执行指定软件
    域用户执行金蝶K/3报错解决方法
    WinXP/Win7/Win8本地用户配置文件迁移至域用户
    Windows批处理:请求远程协助
    浅谈JavaScript闭包
    新蛋那几月
    JavaScript 32位整型无符号操作
  • 原文地址:https://www.cnblogs.com/puroc/p/9234304.html
Copyright © 2011-2022 走看看