zoukankan      html  css  js  c++  java
  • hive中使用case、if:一个region统计业务(hive条件函数case、if、COALESCE语法介绍:CONDITIONAL FUNCTIONS IN HIVE)

    前言:Hive ql自己设计总结
    1,遇到复杂的查询情况,就分步处理。将一个复杂的逻辑,分成几个简单子步骤处理。
    2,但能合在一起的,尽量和在一起的。比如同级别的多个concat函数合并一个select
    

    也就是说,字段之间是并行的同级别处理,则放在一个hive ql;而字段间有前后处理逻辑依赖(判断、补值、计算)则可分步执行,提前将每个字段分别处理好,然后进行相应的分步简单逻辑处理。

    一、 场景:日志中region数据处理(国家,省份,城市)
    select city_id,province_id,country_id
    from wizad_mdm_cleaned_hdfs
    where city_id = '' or country_id = '' or province_id = ''
    group by city_id,province_id,country_id
    二 、发现日志中有空数据:
    38              1      
            73      1      
            75      1      
    64      81             
            76      1      
                          (全空)
            77         
    三、设定过滤逻辑
    if country_id='' 
             if province_id != '' then 
                       if city_id = '' thenCONCAT('region_','1','_',province_id)
                       elseCONCAT('region_','1','_',province_id,'_',city_id)
             else 
                       if city_id != '' thenCONCAT('region_','1','_',parent_region_id,'_',city_id)
    else
             if province_id='' 
                       if city_id !='' thenCONCAT('region_',country_id,'_',parent_region_id,'_',city_id)
    四、hive ql实现
    SET mapred.queue.names=queue3;
    SET mapred.reduce.tasks=14;
    DROP TABLE IF EXISTS test_lmj_mdm_tmp1;
    CREATE TABLE test_lmj_mdm_tmp1 AS
    SELECT
    guid,
    (CASE country_id
    WHEN '' THEN (CASE WHEN province_id='' THENIF(city_id = '','',CONCAT('region_','1','_',parent_region_id,'_',city_id)) ELSEIF(city_id='',CONCAT('region_','1','_',province_id),CONCAT('region_','1','_',province_id,'_',city_id))END)
    ELSE (CASE when province_id='' THENIF(city_id='',CONCAT('region_',country_id),CONCAT('region_',country_id,'_',parent_region_id,'_',city_id))ELSE IF(city_id = '', CONCAT('region_',country_id,'_',province_id),CONCAT('region_',country_id,'_',province_id,'_',city_id))END)
    END )AS region,
    (CASE connection_type WHEN '2' THENCONCAT('carrier_','wifi') ELSE CONCAT('carrier_',c.element_id) END) AS carrier,
    SUM(CASE WHEN logtype = '1' THEN 1 ELSE 0END) AS imp_pv,
    SUM(CASE WHEN logtype = '2' THEN 1 ELSE 0END) AS clk_pv
    FROM wizad_mdm_cleaned_hdfs a
    left outer joinwizad_mdm_dev_lmj_ad_campaign_industry_brand b
    ON (a.wizad_ad_id = b.ad_id)
    left outer join (SELECT * FROMwizad_mdm_dev_lmj_mapping_table_analytics WHERE TYPE = '7') c
    ON (a.adn_id = c.ad_network_id ANDa.carrier_id = c.mapping_id)
    left outer joinwizad_mdm_dev_lmj_app_category_analytics d
    ON (a.app_category_id = d.adn_category)
    left outer join (select region_template_id,parent_region_id from wizad_mdm_dev_lmj_region_template) e
    ON (a.city_id = e.region_template_id)
    WHERE a.day = '2015-01-01'
    GROUP BY guid,
    (CASE country_id
    WHEN '' THEN (CASE WHEN province_id = ''THEN IF(city_id = '','',CONCAT('region_','1','_',parent_region_id,'_',city_id))ELSEIF(city_id='',CONCAT('region_','1','_',province_id),CONCAT('region_','1','_',province_id,'_',city_id))END)
    ELSE (CASE when province_id='' THENIF(city_id='',CONCAT('region_',country_id),CONCAT('region_',country_id,'_',parent_region_id,'_',city_id))ELSEIF(city_id='',CONCAT('region_',country_id,'_',province_id),CONCAT('region_',country_id,'_',province_id,'_',city_id))END)
    END),
    (CASE connection_type WHEN '2' THENCONCAT('carrier_','wifi') ELSE CONCAT('carrier_',c.element_id) END);
    五、Hive ql语句分析

    上例中使用case和if,语法参见最后{七、CONDITIONAL FUNCTIONS IN HIVE}
    注意:
    1,case特殊用法:case后可无对象,而在when后加条件判断语句,如,case when a=1 then true else false end;
    2,select后的变换字段提取,对应在groupby中也要有,如carrier的case处理。(否则select不到)。但group by 后不能起表别名(as),select后可以。substring处理time时也一样在select和group by都有,
    3,left outerjoin用子查询减少join时的内存
    4,IF看版本才能用

    六、Hive ql设计重构
    初学者如我,总设计复杂逻辑,变态语句。
    实际上,有经验的人面对逻辑太过复杂,应该分步操作。一个sql的高级同事重构上例。分两步:
     - 1)先分别给各字段补充合理值(能补充的补充,不能的置空)
     - 2)然后在region处理时直接过滤掉非法值记录
    
    6.1步骤一语句
    DROP TABLE IF EXISTS test_lmj_mdm_tmp;
    CREATE TABLE test_lmj_mdm_tmp AS
    SELECT
    guid,
    CONCAT('adn_',adn_id) AS adn,
    CONCAT('time_',substr(createtime,12,2)) AS hour,
    CONCAT('os_',os_id) AS os,
    case when (country_id = '' or country_id = 'NULL' or country_id isnull)
                and (province_id ='' or province_id = 'NULL' or province_id is null)
                and (city_id = ''or city_id = 'NULL' or city_id is null)
            then ''
         when (country_id = '' orcountry_id = 'NULL' or country_id is null)
                and (province_id<> '' or province_id <> 'NULL' or province_id is not null orcity_id <> '' or city_id <> 'NULL' or city_id is not null)
            then '1'
         else country_id end ascountry_id,
    case when (province_id = '' or province_id = 'NULL' or province_idis null)
                ande.parent_region_id <> '' and e.parent_region_id <> 'NULL' ande.parent_region_id is not null
            thene.parent_region_id
         else province_id end asprovince_id,
    city_id,
    CONCAT('campaign_',b.campaign_id) AS campaign,
    CONCAT('interest_',b.industry_id) AS interest,
    CONCAT('brand_',b.brand_id) AS brand,
    (CASE connection_type WHEN '2' THEN CONCAT('carrier_','wifi') ELSECONCAT('carrier_',c.element_id) END) AS carrier,
    CONCAT('appcategory_',d.wizad_category) AS appcategory,
    uid,
    SUM(CASE WHEN logtype = '1' THEN 1 ELSE 0 END) AS imp_pv,
    SUM(CASE WHEN logtype = '2' THEN 1 ELSE 0 END) AS clk_pv
    FROM ${clean_log_table} a
    left outer join wizad_mdm_dev_lmj_ad_campaign_industry_brand b
    ON (a.wizad_ad_id = b.ad_id)
    left outer join (SELECT * FROMwizad_mdm_dev_lmj_mapping_table_analytics WHERE TYPE = '7') c
    ON (a.adn_id = c.ad_network_id AND a.carrier_id = c.mapping_id)
    left outer join wizad_mdm_dev_lmj_app_category_analytics d
    ON (a.app_category_id = d.adn_category)
    left outer join (select region_template_id, parent_region_id fromwizad_mdm_dev_lmj_region_template) e
    ON (a.city_id = e.region_template_id)
    WHERE a.day < '${pt}' and a.day >= '${time_span}'
    GROUP BY guid,
    CONCAT('adn_',adn_id),
    CONCAT('time_',substr(createtime,12,2)),
    CONCAT('os_',os_id),
    case when (country_id = '' or country_id = 'NULL' or country_id isnull)
              and (province_id ='' or province_id = 'NULL' or province_id is null)
              and (city_id = '' orcity_id = 'NULL' or city_id is null)
              then ''
         when (country_id = '' orcountry_id = 'NULL' or country_id is null)
              and (province_id<> '' or province_id <> 'NULL' or province_id is not null orcity_id <> '' or city_id <> 'NULL' or city_id is not null)
              then '1'
         else country_id end,
    case when (province_id = '' or province_id = 'NULL' or province_idis null)
              and e.parent_region_id <> '' ande.parent_region_id <> 'NULL' and e.parent_region_id is not null
              thene.parent_region_id
         else province_id end,
    city_id,
    CONCAT('campaign_',b.campaign_id),
    CONCAT('interest_',b.industry_id),
    CONCAT('brand_',b.brand_id),
    (CASE connection_type WHEN '2' THEN CONCAT('carrier_','wifi') ELSECONCAT('carrier_',c.element_id) END),
    CONCAT('appcategory_',d.wizad_category),
    UID;
    6.2步骤二语句
    SELECT guid,CONCAT('region_',country_id,'_',province_id,(case when city_id<> '' and city_id <> 'NULL' and city_id is not null thenconcat('_',city_id) else '' end)) AS fixeddim,UID,SUM(imp_pv) AS pv
    FROM test_lmj_mdm_tmp
    where imp_pv > 0
    and country_id <> ''
    and country_id <> 'NULL'
    and country_id is not null
    and province_id <> ''
    and province_id <> 'NULL'
    and province_id is not null
    GROUP BY guid,CONCAT('region_',country_id,'_',province_id,(case whencity_id <> '' and city_id <> 'NULL' and city_id is not null thenconcat('_',city_id) else '' end)),
    UID

    以下引自网络

    七、CONDITIONALFUNCTIONS IN HIVE

    Hive supports three types of conditional functions. These functions
    are listed below:

    IF( Test Condition, True Value, False Value )

    The IF condition evaluates the “Test Condition” and if the “Test
    Condition” is true, then it returns the “True Value”. Otherwise, it
    returns the False Value. Example: IF(1=1, ‘working’, ‘not working’)
    returns ‘working’

    COALESCE( value1,value2,… )

    The COALESCE function returns the fist not NULL value from the list of
    values. If all the values in the list are NULL, then it returns NULL.
    Example: COALESCE(NULL,NULL,5,NULL,4) returns 5

    CASE Statement

    The syntax for the case statement is: CASE [ expression ]

        WHEN condition1 THEN result1
        WHEN condition2 THEN result2
        ...
        WHEN conditionn THEN resultn
        ELSE result END

    Here expression is optional. It is the value that you are comparing to
    the list of conditions. (ie: condition1, condition2, … conditionn).

    All the conditions must be of same datatype. Conditions are evaluated
    in the order listed. Once a condition is found to be true, the case
    statement will return the result and not evaluate the conditions any
    further.
    转自:http://www.folkstalk.com/2011/11/conditional-functions-in-hive.html
    All the results must be of same datatype. This is the value returned
    once a condition is found to be true.

    IF no condition is found to be true, then the case statement will
    return the value in the ELSE clause. If the ELSE clause is omitted and
    no condition is found to be true, then the case statement will return
    NULL

    Example:

        CASE   Fruit
            WHEN 'APPLE' THEN 'The owner is APPLE'
            WHEN 'ORANGE' THEN 'The owner is ORANGE'
            ELSE 'It is another Fruit'
        END

    The other form of CASE is

        CASE 
             WHEN Fruit = 'APPLE' THEN 'The owner is APPLE'
             WHEN Fruit = 'ORANGE' THEN 'The owner is ORANGE'
             ELSE 'It is another Fruit'
        END
  • 相关阅读:
    myeclipse中jpa的安装以及jpa reverse engining
    myeclipse显示db-brower
    jpa报错 Unable to acquire a connection from driver [null], user [null] and URL [null]
    sqlserver安装和踩坑经历
    idea注释模板
    可选链与空值合并
    Nokia5130不能上网
    血小板 live2d web使用
    什么值得买前端面试题 2019秋季
    便利蜂前端面试题 2019秋季
  • 原文地址:https://www.cnblogs.com/cl1024cl/p/6205368.html
Copyright © 2011-2022 走看看