zoukankan      html  css  js  c++  java
  • dwd_fact_order_info

    订单事实表

    累积型快照事实表

    分析:

    drop table if exists dwd_fact_order_info; 
    create external table dwd_fact_order_info 
    ( 
        `id` string COMMENT '订单编号', 
        `order_status` string COMMENT '订单状态', 
        `user_id` string COMMENT '用户 id', 
        `out_trade_no` string COMMENT '支付流水号', 
        `create_time` string COMMENT '创建时间(未支付状态)', 
        `payment_time` string COMMENT '支付时间(已支付状态)', 
        `cancel_time` string COMMENT '取消时间(已取消状态)', 
        `finish_time` string COMMENT '完成时间(已完成状态)', 
        `refund_time` string COMMENT '退款时间(退款中状态)', 
        `refund_finish_time` string COMMENT '退款完成时间(退款完成状态)', 
        `province_id` string COMMENT '省份 ID', 
        `activity_id` string COMMENT '活动 ID', 
        `original_total_amount` string COMMENT '原价金额', 
        `benefit_reduce_amount` string COMMENT '优惠金额', 
        `feight_fee` string COMMENT '运费', 
        `final_total_amount` decimal(10,2) COMMENT '订单金额' )
    PARTITIONED BY (`dt` string) 
    stored as parquet 
    location '/ecdw/dwd/dwd_fact_order_info/' 
    tblproperties ("parquet.compression"="lzo");
    订单生命周期:创建时间=》支付时间=》取消时间=》完成时间=》退款时间=》退款完成时间
     
    (1)由于 ODS 层订单表只有创建时间和操作时间两个状态,不能表达所有时间含义,所以
    需要关联订单状态表。订单事实表里面增加了活动 id,所以需要关联活动订单表
     
    (2)会用代码1001表示创建时间,以此类推
     
     

    技术重点:

    1.contact_ws函数

    与concat对比,concat_ws 函数在连接字符串的时候,只要有一个字符串不是 NULL,就不会返回 NULL。
    concat_ws 函数需要指定分隔符。

    2.STR_TO_MAP 函数

    举例 str_to_map(  'text',   ','   ,'='  )

    将text制成kv,= 变成 :,kv之间以,间隔

    插入sql:

    set hive.exec.dynamic.partition.mode = nonstrict;
    insert overwrite table dwd_fact_order_info partition(dt) 
    select
        if (new.id is null, old.id, new.id),
        if (new.order_status is null, old.order_status, new.order_status),
        if (new.user_id is null, old.user_id, new.user_id),
        if (new.out_trade_no is null, old.out_trade_no, new.out_trade_no),
        if (new.tms['1001'] is null, old.create_time, new.tms['1001']),
        if (new.tms['1002'] is null, old.payment_time, new.tms['1002']),
        if (new.tms['1003'] is null, old.cancel_time, new.tms['1003']),
        if (new.tms['1004'] is null, old.finish_time, new.tms['1004']),
        if (new.tms['1005'] is null, old.refund_time, new.tms['1005']),
        if (new.tms['1006'] is null, old.refund_finish_time, new.tms['1006']),
        if (new.province_id is null, old.province_id, new.province_id),
        if (new.activity_id is null, old.activity_id, new.activity_id),
        if (new.original_total_amount is null, old.original_total_amount, new.original_total_amount),
        if (new.benefit_reduce_amount is null, old.benefit_reduce_amount, new.benefit_reduce_amount),
        if (new.feight_fee is null, old.feight_fee, new.feight_fee),
        if (new.final_total_amount is null, old.final_total_amount, new.final_total_amount), 
        date_format(if (new.tms['1001'] is null, old.create_time, new.tms['1001']), 'yyyy-MM-dd') 
    from
    (
        select 
            id, 
            order_status, 
            user_id, 
            out_trade_no, 
            create_time, 
            payment_time, 
            cancel_time, 
            finish_time, 
            refund_time, 
            refund_finish_time, 
            province_id, 
            activity_id, 
            original_total_amount, 
            benefit_reduce_amount, 
            feight_fee, 
            final_total_amount 
        from dwd_fact_order_info 
        where dt in (
            select date_format(create_time, 'yyyy-MM-dd')
            from ods_order_info
            where dt='2020-03-10'
        )
    ) 
    old 
    
    full outer join
    (
        select 
            info.id, 
            info.order_status, 
            info.user_id, 
            info.out_trade_no, 
            info.province_id, 
            act.activity_id, 
            log.tms, 
            info.original_total_amount, 
            info.benefit_reduce_amount, 
            info.feight_fee, 
            info.final_total_amount 
        from
        (
            select 
                order_id, 
                str_to_map(concat_ws(',', collect_set(concat(order_status, '=', operate_time))), ',', '=') tms 
            from ods_order_status_log 
            where dt = '2020-03-10'
            group by order_id
        ) log 
        
        join
        (
            select 
                * 
            from ods_order_info 
            where dt = '2020-03-10'
        ) info 
        on log.order_id = info.id 
        
        left join
        (
            select 
                * 
            from ods_activity_order 
            where dt = '2020-03-10'
        ) act 
        on log.order_id = act.order_id
    ) 
    new 
    on old.id = new.id;

    累积表肯定需要full outer join

    旧数据:检索ods_order_info的creat_time,此时会获取到10号及其之前的数据creat_time,但因为dwd还未生成对应数据,所以只会获取10号之前的数据

    新数据:只要最新分区就完事了

    要将订单时间变成map, 是为了和ods_order_info, ods_acticity_order表  join,获取其他信息,

    不然ods_order_status_log 表存在多个order_id 就很难join

  • 相关阅读:
    Django学习之路
    Bootstrap插件及其应用方法网址
    BOM和DOM
    jQuery学习
    初识JS之数据类型
    前端学习之路CSS基础学习二
    前端学习之路CSS基础学习一
    前端学习之路
    ORM
    Python Twisted 学习系列4(转载stulife最棒的Twisted入门教程)
  • 原文地址:https://www.cnblogs.com/ldy233/p/14443430.html
Copyright © 2011-2022 走看看