订单事实表
累积型快照事实表
分析:
drop table if exists dwd_fact_order_info; create external table dwd_fact_order_info ( `id` string COMMENT '订单编号', `order_status` string COMMENT '订单状态', `user_id` string COMMENT '用户 id', `out_trade_no` string COMMENT '支付流水号', `create_time` string COMMENT '创建时间(未支付状态)', `payment_time` string COMMENT '支付时间(已支付状态)', `cancel_time` string COMMENT '取消时间(已取消状态)', `finish_time` string COMMENT '完成时间(已完成状态)', `refund_time` string COMMENT '退款时间(退款中状态)', `refund_finish_time` string COMMENT '退款完成时间(退款完成状态)', `province_id` string COMMENT '省份 ID', `activity_id` string COMMENT '活动 ID', `original_total_amount` string COMMENT '原价金额', `benefit_reduce_amount` string COMMENT '优惠金额', `feight_fee` string COMMENT '运费', `final_total_amount` decimal(10,2) COMMENT '订单金额' ) PARTITIONED BY (`dt` string) stored as parquet location '/ecdw/dwd/dwd_fact_order_info/' tblproperties ("parquet.compression"="lzo");
订单生命周期:创建时间=》支付时间=》取消时间=》完成时间=》退款时间=》退款完成时间
(1)由于 ODS 层订单表只有创建时间和操作时间两个状态,不能表达所有时间含义,所以
需要关联订单状态表。订单事实表里面增加了活动 id,所以需要关联活动订单表
(2)会用代码1001表示创建时间,以此类推
技术重点:
1.contact_ws函数
与concat对比,concat_ws 函数在连接字符串的时候,只要有一个字符串不是 NULL,就不会返回 NULL。
concat_ws 函数需要指定分隔符。
2.STR_TO_MAP 函数
举例 str_to_map( 'text', ',' ,'=' )
将text制成kv,= 变成 :,kv之间以,间隔
插入sql:
set hive.exec.dynamic.partition.mode = nonstrict; insert overwrite table dwd_fact_order_info partition(dt) select if (new.id is null, old.id, new.id), if (new.order_status is null, old.order_status, new.order_status), if (new.user_id is null, old.user_id, new.user_id), if (new.out_trade_no is null, old.out_trade_no, new.out_trade_no), if (new.tms['1001'] is null, old.create_time, new.tms['1001']), if (new.tms['1002'] is null, old.payment_time, new.tms['1002']), if (new.tms['1003'] is null, old.cancel_time, new.tms['1003']), if (new.tms['1004'] is null, old.finish_time, new.tms['1004']), if (new.tms['1005'] is null, old.refund_time, new.tms['1005']), if (new.tms['1006'] is null, old.refund_finish_time, new.tms['1006']), if (new.province_id is null, old.province_id, new.province_id), if (new.activity_id is null, old.activity_id, new.activity_id), if (new.original_total_amount is null, old.original_total_amount, new.original_total_amount), if (new.benefit_reduce_amount is null, old.benefit_reduce_amount, new.benefit_reduce_amount), if (new.feight_fee is null, old.feight_fee, new.feight_fee), if (new.final_total_amount is null, old.final_total_amount, new.final_total_amount), date_format(if (new.tms['1001'] is null, old.create_time, new.tms['1001']), 'yyyy-MM-dd') from ( select id, order_status, user_id, out_trade_no, create_time, payment_time, cancel_time, finish_time, refund_time, refund_finish_time, province_id, activity_id, original_total_amount, benefit_reduce_amount, feight_fee, final_total_amount from dwd_fact_order_info where dt in ( select date_format(create_time, 'yyyy-MM-dd') from ods_order_info where dt='2020-03-10' ) ) old full outer join ( select info.id, info.order_status, info.user_id, info.out_trade_no, info.province_id, act.activity_id, log.tms, info.original_total_amount, info.benefit_reduce_amount, info.feight_fee, info.final_total_amount from ( select order_id, str_to_map(concat_ws(',', collect_set(concat(order_status, '=', operate_time))), ',', '=') tms from ods_order_status_log where dt = '2020-03-10' group by order_id ) log join ( select * from ods_order_info where dt = '2020-03-10' ) info on log.order_id = info.id left join ( select * from ods_activity_order where dt = '2020-03-10' ) act on log.order_id = act.order_id ) new on old.id = new.id;
累积表肯定需要full outer join
旧数据:检索ods_order_info的creat_time,此时会获取到10号及其之前的数据creat_time,但因为dwd还未生成对应数据,所以只会获取10号之前的数据
新数据:只要最新分区就完事了
要将订单时间变成map, 是为了和ods_order_info, ods_acticity_order表 join,获取其他信息,
不然ods_order_status_log 表存在多个order_id 就很难join