zoukankan      html  css  js  c++  java
  • 项目实战从0到1之hive(27)数仓项目(九)数仓搭建

    一、数仓搭建 - DWS 层
    1.1 业务术语
    1)用户
    用户以设备为判断标准,在移动统计中,每个独立设备认为是一个独立用户。Android
    系统根据 IMEI 号,IOS 系统根据 OpenUDID 来标识一个独立用户,每部手机一个用户

    2)新增用户
    首次联网使用应用的用户。如果一个用户首次打开某 APP,那这个用户定义为新增用
    户;卸载再安装的设备,不会被算作一次新增。新增用户包括日新增用户、周新增用户、月
    新增用户

    3)活跃用户
    打开应用的用户即为活跃用户,不考虑用户的使用情况。每天一台设备打开多次会被计
    为一个活跃用户

    4)周(月)活跃用户
    某个自然周(月)内启动过应用的用户,该周(月)内的多次启动只记一个活跃用户

    5)月活跃率
    月活跃用户与截止到该月累计的用户总和之间的比例

    6)沉默用户
    用户仅在安装当天(次日)启动一次,后续时间无再启动行为。该指标可以反映新增用
    户质量和用户与 APP 的匹配程度

    7)版本分布
    不同版本的周内各天新增用户数,活跃用户数和启动次数。利于判断 APP 各个版本之
    间的优劣和用户行为习惯

    8)本周回流用户
    上周未启动过应用,本周启动了应用的用户

    9)连续 n 周活跃用户
    连续 n 周,每周至少启动一次

    10)忠诚用户
    连续活跃 5 周以上的用户

    11)连续活跃用户
    连续 2 周及以上活跃的用户

    12)近期流失用户
    连续 n(2<= n <= 4)周没有启动应用的用户。(第 n+1 周没有启动过)

    13)留存用户
    某段时间内的新增用户,经过一段时间后,仍然使用应用的被认作是留存用户;这部分
    用户占当时新增用户的比例即是留存率
    例如,5 月份新增用户 200,这 200 人在 6 月份启动过应用的有 100 人,7 月份启动过应用的有 80 人,8 月份启动过应用的有 50 人;则 5 月份新增用户一个月后的留存率是 50%,二个月后的留存率是 40%,三个月后的留存率是 25%

    14)用户新鲜度
    每天启动应用的新老用户比例,即新增用户数占活跃用户数的比例

    15)单次使用时长
    每次启动使用的时间长度

    16)日使用时长
    累计一天内的使用时间长度

    17)启动次数计算标准
    IOS 平台应用退到后台就算一次独立的启动;Android 平台我们规定,两次启动之间的间隔小于 30 秒,被计算一次启动。用户在使用过程中,若因收发短信或接电话等退出应用30 秒又再次返回应用中,那这两次行为应该是延续而非独立的,所以可以被算作一次使用行为,即一次启动。业内大多使用 30 秒这个标准,但用户还是可以自定义此时间间隔


    1.2 系统函数
    1.2.1 collect_set 函数
    1)创建原数据表

    drop table if exists stud;
    create table stud (name string, area string, course string, score int);

    2)向原数据表中插入数据

    insert into table stud values('zhang3','bj','math',88);
    insert into table stud values('li4','bj','math',99);
    insert into table stud values('wang5','sh','chinese',92);
    insert into table stud values('zhao6','sh','chinese',54);
    insert into table stud values('tian7','bj','chinese',91);

    3)查询表中数据

    select * from stud;
    
    stud.name stud.area stud.course stud.score
    zhang3 bj math 88
    li4 bj math 99
    wang5 sh chinese 92
    zhao6 sh chinese 54
    tian7 bj chinese 91

    4)把同一分组的不同行的数据聚合成一个集合

    select course, collect_set(area), avg(score) from stud group by course;
    
    chinese ["sh","bj"] 79.0
    math ["bj"] 93.5

    5) 用下标可以取某一个

    select course, collect_set(area)[0], avg(score) from
    
    stud group by course;
    chinese sh 79.0
    math bj 93.5

    1.2.2 nvl 函数
    1)基本语法

    NVL(表达式 1,表达式 2)

    如果表达式 1 为空值,NVL 返回值为表达式 2 的值,否则返回表达式 1 的值。 该函数的目的是把一个空值(null)转换成一个实际的值。其表达式的值可以是数字型、字符型和日期型。但是表达式 1 和表达式 2 的数据类型必须为同一个类型

    1.2.3 日期处理函数
    1)date_format 函数(根据格式整理日期)

    hive (gmall)> select date_format('2020-03-10','yyyy-MM');
    2020-03

    2)date_add 函数(加减日期)

    hive (gmall)> select date_add('2020-03-10',-1);
    2020-03-09
    hive (gmall)> select date_add('2020-03-10',1);
    2020-03-11

    3)next_day 函数
    (1)取当前天的下一个周一

    hive (gmall)> select next_day('2020-03-12','MO');
    2020-03-16
    
    说明:星期一到星期日的英文(Monday,Tuesday、Wednesday、Thursday、Friday、Saturday、Sunday)

    (2)取当前周的周一

    hive (gmall)> select date_add(next_day('2020-03-12','MO'),-7);
    2020-03-11

    4)last_day 函数(求当月最后一天日期)

    hive (gmall)> select last_day('2020-03-10');
    2020-03-31

    1.3 DWS 层(用户行为)
    1.3.1 每日设备行为
    每日设备行为,主要按照 设备 id 统计




    1)建表语句

    drop table if exists dws_uv_detail_daycount;
    create external table dws_uv_detail_daycount
    (
    `mid_id` string COMMENT '设备唯一标识',
    `user_id` string COMMENT '用户标识',
    `version_code` string COMMENT '程序版本号',
    `version_name` string COMMENT '程序版本名',
    `lang` string COMMENT '系统语言',
    `source` string COMMENT '渠道号',
    `os` string COMMENT '安卓系统版本',
    `area` string COMMENT '区域',
    `model` string COMMENT '手机型号',
    `brand` string COMMENT '手机品牌',
    `sdk_version` string COMMENT 'sdkVersion',
    `gmail` string COMMENT 'gmail',
    `height_width` string COMMENT '屏幕宽高',
    `app_time` string COMMENT '客户端日志产生时的时间',
    `network` string COMMENT '网络模式',
    `lng` string COMMENT '经度',
    `lat` string COMMENT '纬度',
    `login_count` bigint COMMENT '活跃次数'
    )
    partitioned by(dt string)
    stored as parquet
    location '/warehouse/gmall/dws/dws_uv_detail_daycount';

    2)数据装载

    insert overwrite table dws_uv_detail_daycount partition(dt='2020-03-10')
    select
    mid_id,
    concat_ws('|', collect_set(user_id)) user_id,
    concat_ws('|', collect_set(version_code)) version_code,
    concat_ws('|', collect_set(version_name)) version_name,
    concat_ws('|', collect_set(lang))lang,
    concat_ws('|', collect_set(source)) source,
    concat_ws('|', collect_set(os)) os,
    concat_ws('|', collect_set(area)) area,
    concat_ws('|', collect_set(model)) model,
    concat_ws('|', collect_set(brand)) brand,
    concat_ws('|', collect_set(sdk_version)) sdk_version,
    concat_ws('|', collect_set(gmail)) gmail,
    concat_ws('|', collect_set(height_width)) height_width,
    concat_ws('|', collect_set(app_time)) app_time,
    concat_ws('|', collect_set(network)) network,
    concat_ws('|', collect_set(lng)) lng,
    concat_ws('|', collect_set(lat)) lat,
    count(*) login_count
    from dwd_start_log
    where dt='2020-03-10'
    group by mid_id;

    3)查询加载结果

    select * from dws_uv_detail_daycount where dt='2020-03-10';

    1.4 DWS 层(业务)
    DWS 层的宽表字段,是站在不同维度的视角去看事实表,重点关注事实表的度量值




    1.4.1 每日会员行为
    1)建表语句

    drop table if exists dws_user_action_daycount;
    create external table dws_user_action_daycount
    (
    user_id string comment '用户 id',
    login_count bigint comment '登录次数',
    cart_count bigint comment '加入购物车次数',
    cart_amount double comment '加入购物车金额',
    order_count bigint comment '下单次数',
    order_amount decimal(16,2) comment '下单金额',
    payment_count bigint comment '支付次数',
    payment_amount decimal(16,2) comment '支付金额'
    ) COMMENT '每日用户行为'
    PARTITIONED BY (`dt` string)
    stored as parquet
    location '/warehouse/gmall/dws/dws_user_action_daycount/'
    tblproperties ("parquet.compression"="lzo");

    2)数据装载

    with
    tmp_login as
    (
    select
    user_id,
    count(*) login_count
    from dwd_start_log
    where dt='2020-03-10'
    and user_id is not null
    group by user_id
    ),
    tmp_cart as
    (
    select
    user_id,
    count(*) cart_count,
    sum(cart_price*sku_num) cart_amount
    from dwd_fact_cart_info
    where dt='2020-03-10'
    and user_id is not null
    and date_format(create_time,'yyyy-MM-dd')='2020-03-10'
    group by user_id
    ),
    tmp_order as
    (
    select
    user_id,
    count(*) order_count,
    sum(final_total_amount) order_amount
    from dwd_fact_order_info
    where dt='2020-03-10'
    group by user_id
    ) ,
    tmp_payment as
    (
    select
    user_id,
    count(*) payment_count,
    sum(payment_amount) payment_amount
    from dwd_fact_payment_info
    where dt='2020-03-10'
    group by user_id
    )
    insert overwrite table dws_user_action_daycount partition(dt='2020-03-10')
    select
    user_actions.user_id,
    sum(user_actions.login_count),
    sum(user_actions.cart_count),
    sum(user_actions.cart_amount),
    sum(user_actions.order_count),
    sum(user_actions.order_amount),
    sum(user_actions.payment_count),
    sum(user_actions.payment_amount)
    from
    (
    select
    user_id,
    login_count,
    0 cart_count,
    0 cart_amount,
    0 order_count,
    0 order_amount,
    0 payment_count,
    0 payment_amount
    from
    tmp_login
    union all
    select
    user_id,
    0 login_count,
    cart_count,
    cart_amount,
    0 order_count,
    0 order_amount,
    0 payment_count,
    0 payment_amount
    from
    tmp_cart
    union all
    select
    user_id,
    0 login_count,
    0 cart_count,
    0 cart_amount,
    order_count,
    order_amount,
    0 payment_count,
    0 payment_amount
    from tmp_order
    union all
    select
    user_id,
    0 login_count,
    0 cart_count,
    0 cart_amount,
    0 order_count,
    0 order_amount,
    payment_count,
    payment_amount
    from tmp_payment
    ) user_actions
    group by user_id;

    3)查询加载结果

    select * from dws_user_action_daycount where dt=2020-03-10’;

    1.4.2 每日商品行为

    1)建表语句

    drop table if exists dws_sku_action_daycount;
    create external table dws_sku_action_daycount
    (
    sku_id string comment 'sku_id',
    order_count bigint comment '被下单次数',
    order_num bigint comment '被下单件数',
    order_amount decimal(16,2) comment '被下单金额',
    payment_count bigint comment '被支付次数',
    payment_num bigint comment '被支付件数',
    payment_amount decimal(16,2) comment '被支付金额',
    refund_count bigint comment '被退款次数',
    refund_num bigint comment '被退款件数',
    refund_amount decimal(16,2) comment '被退款金额',
    cart_count bigint comment '被加入购物车次数',
    cart_num bigint comment '被加入购物车件数',
    favor_count bigint comment '被收藏次数',
    appraise_good_count bigint comment '好评数',
    appraise_mid_count bigint comment '中评数',
    appraise_bad_count bigint comment '差评数',
    appraise_default_count bigint comment '默认评价数'
    ) COMMENT '每日商品行为'
    PARTITIONED BY (`dt` string)
    stored as parquet
    location '/warehouse/gmall/dws/dws_sku_action_daycount/'
    tblproperties ("parquet.compression"="lzo");

    2)数据装载
    注意:如果是 23 点 59 下单,支付日期跨天。需要从订单详情里面取出支付时间是今天,订单时间是昨天或者今天的订单

    with
    tmp_order as
    (
    select
    sku_id,
    count(*) order_count,
    sum(sku_num) order_num,
    sum(total_amount) order_amount
    from dwd_fact_order_detail
    where dt='2020-03-10'
    group by sku_id
    ),
    tmp_payment as
    (
    select
    sku_id,
    count(*) payment_count,
    sum(sku_num) payment_num,
    sum(total_amount) payment_amount
    from dwd_fact_order_detail
    where dt='2020-03-10'
    and order_id in
    (
    select
    id
    from dwd_fact_order_info
    where (dt='2020-03-10' or dt=date_add('2020-03-10',-1))
    and date_format(payment_time,'yyyy-MM-dd')='2020-03-10'
    )
    group by sku_id
    ),
    tmp_refund as
    (
    select
    sku_id,
    count(*) refund_count,
    sum(refund_num) refund_num,
    sum(refund_amount) refund_amount
    from dwd_fact_order_refund_info
    where dt='2020-03-10'
    group by sku_id
    ),
    tmp_cart as
    (
    select
    sku_id,
    count(*) cart_count,
    sum(sku_num) cart_num
    from dwd_fact_cart_info
    where dt='2020-03-10'
    and date_format(create_time,'yyyy-MM-dd')='2020-03-10'
    group by sku_id
    ),
    tmp_favor as
    (
    select
    sku_id,
    count(*) favor_count
    from dwd_fact_favor_info
    where dt='2020-03-10'
    and date_format(create_time,'yyyy-MM-dd')='2020-03-10'
    group by sku_id
    ),
    tmp_appraise as
    (
    select
    sku_id,
    sum(if(appraise='1201',1,0)) appraise_good_count,
    sum(if(appraise='1202',1,0)) appraise_mid_count,
    sum(if(appraise='1203',1,0)) appraise_bad_count,
    sum(if(appraise='1204',1,0)) appraise_default_count
    from dwd_fact_comment_info
    where dt='2020-03-10'
    group by sku_id
    )
    insert overwrite table dws_sku_action_daycount partition(dt='2020-03-10')
    select
    sku_id,
    sum(order_count),
    sum(order_num),
    sum(order_amount),
    sum(payment_count),
    sum(payment_num),
    sum(payment_amount),
    sum(refund_count),
    sum(refund_num),
    sum(refund_amount),
    sum(cart_count),
    sum(cart_num),
    sum(favor_count),
    sum(appraise_good_count),
    sum(appraise_mid_count),
    sum(appraise_bad_count),
    sum(appraise_default_count)
    from
    (
    select
    sku_id,
    order_count,
    order_num,
    order_amount,
    0 payment_count,
    0 payment_num,
    0 payment_amount,
    0 refund_count,
    0 refund_num,
    0 refund_amount,
    0 cart_count,
    0 cart_num,
    0 favor_count,
    0 appraise_good_count,
    0 appraise_mid_count,
    0 appraise_bad_count,
    0 appraise_default_count
    from tmp_order
    union all
    select
    sku_id,
    0 order_count,
    0 order_num,
    0 order_amount,
    payment_count,
    payment_num,
    payment_amount,
    0 refund_count,
    0 refund_num,
    0 refund_amount,
    0 cart_count,
    0 cart_num,
    0 favor_count,
    0 appraise_good_count,
    0 appraise_mid_count,
    0 appraise_bad_count,
    0 appraise_default_count
    from tmp_payment
    union all
    select
    sku_id,
    0 order_count,
    0 order_num,
    0 order_amount,
    0 payment_count,
    0 payment_num,
    0 payment_amount,
    refund_count,
    refund_num,
    refund_amount,
    0 cart_count,
    0 cart_num,
    0 favor_count,
    0 appraise_good_count,
    0 appraise_mid_count,
    0 appraise_bad_count,
    0 appraise_default_count
    from tmp_refund
    union all
    select
    sku_id,
    0 order_count,
    0 order_num,
    0 order_amount,
    0 payment_count,
    0 payment_num,
    0 payment_amount,
    0 refund_count,
    0 refund_num,
    0 refund_amount,
    cart_count,
    cart_num,
    0 favor_count,
    0 appraise_good_count,
    0 appraise_mid_count,
    0 appraise_bad_count,
    0 appraise_default_count
    from tmp_cart
    union all
    select
    sku_id,
    0 order_count,
    0 order_num,
    0 order_amount,
    0 payment_count,
    0 payment_num,
    0 payment_amount,
    0 refund_count,
    0 refund_num,
    0 refund_amount,
    0 cart_count,
    0 cart_num,
    favor_count,
    0 appraise_good_count,
    0 appraise_mid_count,
    0 appraise_bad_count,
    0 appraise_default_count
    from tmp_favor
    union all
    select
    sku_id,
    0 order_count,
    0 order_num,
    0 order_amount,
    0 payment_count,
    0 payment_num,
    0 payment_amount,
    0 refund_count,
    0 refund_num,
    0 refund_amount,
    0 cart_count,
    0 cart_num,
    0 favor_count,
    appraise_good_count,
    appraise_mid_count,
    appraise_bad_count,
    appraise_default_count
    from tmp_appraise
    )tmp
    group by sku_id;

    3)查询加载结果

    select * from dws_sku_action_daycount where dt='2020-03-10';

    1.4.5 每日购买行为




    1)建表语句

    drop table if exists dws_sale_detail_daycount;
    create external table dws_sale_detail_daycount
    (
    user_id string comment '用户 id',
    sku_id string comment '商品 id',
    user_gender string comment '用户性别',
    user_age string comment '用户年龄',
    user_level string comment '用户等级',
    order_price decimal(10,2) comment '商品价格',
    sku_name string comment '商品名称',
    sku_tm_id string comment '品牌 id',
    sku_category3_id string comment '商品三级品类 id',
    sku_category2_id string comment '商品二级品类 id',
    sku_category1_id string comment '商品一级品类 id',
    sku_category3_name string comment '商品三级品类名称',
    sku_category2_name string comment '商品二级品类名称',
    sku_category1_name string comment '商品一级品类名称',
    spu_id string comment '商品 spu',
    sku_num int comment '购买个数',
    order_count bigint comment '当日下单单数',
    order_amount decimal(16,2) comment '当日下单金额'
    ) COMMENT '每日购买行为'
    PARTITIONED BY (`dt` string)
    stored as parquet
    location '/warehouse/gmall/dws/dws_sale_detail_daycount/'
    tblproperties ("parquet.compression"="lzo");

    2)数据装载

    insert overwrite table dws_sale_detail_daycount partition(dt='2020-03-10')
    select
    op.user_id,
    op.sku_id,
    ui.gender,
    months_between('2020-03-10', ui.birthday)/12 age,
    ui.user_level,
    si.price,
    si.sku_name,
    si.tm_id,
    si.category3_id,
    si.category2_id,
    si.category1_id,
    si.category3_name,
    si.category2_name,
    si.category1_name,
    si.spu_id,
    op.sku_num,
    op.order_count,
    op.order_amount
    from
    (
    select
    user_id,
    sku_id,
    sum(sku_num) sku_num,
    count(*) order_count,
    sum(total_amount) order_amount
    from dwd_fact_order_detail
    where dt='2020-03-10'
    group by user_id, sku_id
    )op
    join
    (
    select
    *
    from dwd_dim_user_info_his
    where end_date='9999-99-99'
    )ui on op.user_id = ui.id
    join
    (
    select
    *
    from dwd_dim_sku_info
    where dt='2020-03-10'
    )si on op.sku_id = si.id;

    3)查询加载结果

    select * from dws_sale_detail_daycount where dt='2020-03-10';

    1.5 DWS 层数据导入脚本
    1)vim dwd_to_dws.sh
    在脚本中填写如下内容

    #!/bin/bash
    APP=gmall
    hive=/opt/modules/hive/bin/hive
    # 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
    if [ -n "$1" ] ;then
    do_date=$1
    else
    do_date=`date -d "-1 day" +%F`
    fi
    sql="
    insert overwrite table ${APP}.dws_uv_detail_daycount partition(dt='$do_date')
    select
    mid_id,
    concat_ws('|', collect_set(user_id)) user_id,
    concat_ws('|', collect_set(version_code)) version_code,
    concat_ws('|', collect_set(version_name)) version_name,
    concat_ws('|', collect_set(lang))lang,
    concat_ws('|', collect_set(source)) source,
    concat_ws('|', collect_set(os)) os,
    concat_ws('|', collect_set(area)) area,
    concat_ws('|', collect_set(model)) model,
    concat_ws('|', collect_set(brand)) brand,
    concat_ws('|', collect_set(sdk_version)) sdk_version,
    concat_ws('|', collect_set(gmail)) gmail,
    concat_ws('|', collect_set(height_width)) height_width,
    concat_ws('|', collect_set(app_time)) app_time,
    concat_ws('|', collect_set(network)) network,
    concat_ws('|', collect_set(lng)) lng,
    concat_ws('|', collect_set(lat)) lat,
    count(*) login_count
    from ${APP}.dwd_start_log
    where dt='$do_date'
    group by mid_id;
    with
    tmp_login as
    (
    select
    user_id,
    count(*) login_count
    from ${APP}.dwd_start_log
    where dt='$do_date'
    and user_id is not null
    group by user_id
    ),
    tmp_cart as
    (
    select
    user_id,
    count(*) cart_count,
    sum(cart_price*sku_num) cart_amount
    from ${APP}.dwd_fact_cart_info
    where dt='$do_date'
    and user_id is not null
    and date_format(create_time,'yyyy-MM-dd')='$do_date'
    group by user_id
    ),
    tmp_order as
    (
    select
    user_id,
    count(*) order_count,
    sum(final_total_amount) order_amount
    from ${APP}.dwd_fact_order_info
    where dt='$do_date'
    group by user_id
    ) ,
    tmp_payment as
    (
    select
    user_id,
    count(*) payment_count,
    sum(payment_amount) payment_amount
    from ${APP}.dwd_fact_payment_info
    where dt='$do_date'
    group by user_id
    )
    
    
    insert overwrite table ${APP}.dws_user_action_daycount partition(dt='$do_date')
    select
    user_actions.user_id,
    sum(user_actions.login_count),
    sum(user_actions.cart_count),
    sum(user_actions.cart_amount),
    sum(user_actions.order_count),
    sum(user_actions.order_amount),
    sum(user_actions.payment_count),
    sum(user_actions.payment_amount)
    from
    (
    select
    user_id,
    login_count,
    0 cart_count,
    0 cart_amount,
    0 order_count,
    0 order_amount,
    0 payment_count,
    0 payment_amount
    from
    tmp_login
    union all
    select
    user_id,
    0 login_count,
    cart_count,
    cart_amount,
    0 order_count,
    0 order_amount,
    0 payment_count,
    0 payment_amount
    from
    tmp_cart
    union all
    select
    user_id,
    0 login_count,
    0 cart_count,
    0 cart_amount,
    order_count,
    order_amount,
    0 payment_count,
    0 payment_amount
    from tmp_order
    union all
    select
    user_id,
    0 login_count,
    0 cart_count,
    0 cart_amount,
    0 order_count,
    0 order_amount,
    payment_count,
    payment_amount
    from tmp_payment
    ) user_actions
    group by user_id;
    with
    tmp_order as
    (
    select
    sku_id,
    count(*) order_count,
    sum(sku_num) order_num,
    sum(total_amount) order_amount
    from ${APP}.dwd_fact_order_detail
    where dt='$do_date'
    group by sku_id
    ),
    tmp_payment as
    (
    select
    sku_id,
    count(*) payment_count,
    sum(sku_num) payment_num,
    sum(total_amount) payment_amount
    from ${APP}.dwd_fact_order_detail
    where dt='$do_date'
    and order_id in
    (
    select
    id
    from ${APP}.dwd_fact_order_info
    where (dt='$do_date' or dt=date_add('$do_date',-1))
    and date_format(payment_time,'yyyy-MM-dd')='$do_date'
    )
    group by sku_id
    ),
    tmp_refund as
    (
    select
    sku_id,
    count(*) refund_count,
    sum(refund_num) refund_num,
    sum(refund_amount) refund_amount
    from ${APP}.dwd_fact_order_refund_info
    where dt='$do_date'
    group by sku_id
    ),
    tmp_cart as
    (
    select
    sku_id,
    count(*) cart_count,
    sum(sku_num) cart_num
    from ${APP}.dwd_fact_cart_info
    where dt='$do_date'
    and date_format(create_time,'yyyy-MM-dd')='$do_date'
    group by sku_id
    ),
    tmp_favor as
    (
    select
    sku_id,
    count(*) favor_count
    from ${APP}.dwd_fact_favor_info
    where dt='$do_date'
    and date_format(create_time,'yyyy-MM-dd')='$do_date'
    group by sku_id
    ),
    tmp_appraise as
    (
    select
    sku_id,
    sum(if(appraise='1201',1,0)) appraise_good_count,
    sum(if(appraise='1202',1,0)) appraise_mid_count,
    sum(if(appraise='1203',1,0)) appraise_bad_count,
    sum(if(appraise='1204',1,0)) appraise_default_count
    from ${APP}.dwd_fact_comment_info
    where dt='$do_date'
    group by sku_id
    )
    
    
    insert overwrite table ${APP}.dws_sku_action_daycount partition(dt='$do_date')
    select
    sku_id,
    sum(order_count),
    sum(order_num),
    sum(order_amount),
    sum(payment_count),
    sum(payment_num),
    sum(payment_amount),
    sum(refund_count),
    sum(refund_num),
    sum(refund_amount),
    sum(cart_count),
    sum(cart_num),
    sum(favor_count),
    sum(appraise_good_count),
    sum(appraise_mid_count),
    sum(appraise_bad_count),
    sum(appraise_default_count)
    from
    (
    select
    sku_id,
    order_count,
    order_num,
    order_amount,
    0 payment_count,
    0 payment_num,
    0 payment_amount,
    0 refund_count,
    0 refund_num,
    0 refund_amount,
    0 cart_count,
    0 cart_num,
    0 favor_count,
    0 appraise_good_count,
    0 appraise_mid_count,
    0 appraise_bad_count,
    0 appraise_default_count
    from tmp_order
    union all
    select
    sku_id,
    0 order_count,
    0 order_num,
    0 order_amount,
    payment_count,
    payment_num,
    payment_amount,
    0 refund_count,
    0 refund_num,
    0 refund_amount,
    0 cart_count,
    0 cart_num,
    0 favor_count,
    0 appraise_good_count,
    0 appraise_mid_count,
    0 appraise_bad_count,
    0 appraise_default_count
    from tmp_payment
    union all
    select
    sku_id,
    0 order_count,
    0 order_num,
    0 order_amount,
    0 payment_count,
    0 payment_num,
    0 payment_amount,
    refund_count,
    refund_num,
    refund_amount,
    0 cart_count,
    0 cart_num,
    0 favor_count,
    0 appraise_good_count,
    0 appraise_mid_count,
    0 appraise_bad_count,
    0 appraise_default_count
    from tmp_refund
    union all
    select
    sku_id,
    0 order_count,
    0 order_num,
    0 order_amount,
    0 payment_count,
    0 payment_num,
    0 payment_amount,
    0 refund_count,
    0 refund_num,
    0 refund_amount,
    cart_count,
    cart_num,
    0 favor_count,
    0 appraise_good_count,
    0 appraise_mid_count,
    0 appraise_bad_count,
    0 appraise_default_count
    from tmp_cart
    union all
    select
    sku_id,
    0 order_count,
    0 order_num,
    0 order_amount,
    0 payment_count,
    0 payment_num,
    0 payment_amount,
    0 refund_count,
    0 refund_num,
    0 refund_amount,
    0 cart_count,
    0 cart_num,
    favor_count,
    0 appraise_good_count,
    0 appraise_mid_count,
    0 appraise_bad_count,
    0 appraise_default_count
    from tmp_favor
    union all
    select
    sku_id,
    0 order_count,
    0 order_num,
    0 order_amount,
    0 payment_count,
    0 payment_num,
    0 payment_amount,
    0 refund_count,
    0 refund_num,
    0 refund_amount,
    0 cart_count,
    0 cart_num,
    0 favor_count,
    appraise_good_count,
    appraise_mid_count,
    appraise_bad_count,
    appraise_default_count
    from tmp_appraise
    )tmp
    group by sku_id;
    
    
    insert overwrite table ${APP}.dws_sale_detail_daycount partition(dt='$do_date')
    select
    op.user_id,
    op.sku_id,
    ui.gender,
    months_between('$do_date', ui.birthday)/12 age,
    ui.user_level,
    si.price,
    si.sku_name,
    si.tm_id,
    si.category3_id,
    si.category2_id,
    si.category1_id,
    si.category3_name,
    si.category2_name,
    si.category1_name,
    si.spu_id,
    op.sku_num,
    op.order_count,
    op.order_amount
    from
    (
    select
    user_id,
    sku_id,
    sum(sku_num) sku_num,
    count(*) order_count,
    sum(total_amount) order_amount
    from ${APP}.dwd_fact_order_detail
    where dt='$do_date'
    group by user_id, sku_id
    )op
    join
    (
    select
    *
    from ${APP}.dwd_dim_user_info_his
    where end_date='9999-99-99'
    )ui on op.user_id = ui.id
    join
    (
    select
    *
    from ${APP}.dwd_dim_sku_info
    where dt='$do_date'
    )si on op.sku_id = si.id;
    "
    $hive -e "$sql"

    2)增加脚本执行权限

    chmod 770 dwd_to_dws.sh

    3)执行脚本导入数据

    dwd_to_dws.sh 2020-03-11

    4)查看导入数据

    select * from dws_uv_detail_daycount where dt='2020-03-11';
    select * from dws_user_action_daycount where dt='2020-03-11';
    select * from dws_sku_action_daycount where dt='2020-03-11';
    select * from dws_sale_detail_daycount where dt='2020-03-11';
    作者:大码王

    -------------------------------------------

    个性签名:独学而无友,则孤陋而寡闻。做一个灵魂有趣的人!

    如果觉得这篇文章对你有小小的帮助的话,记得在右下角点个“推荐”哦,博主在此感谢!

    万水千山总是情,打赏一分行不行,所以如果你心情还比较高兴,也是可以扫码打赏博主,哈哈哈(っ•?ω•?)っ???!

  • 相关阅读:
    D
    洛谷P2002 消息扩散
    洛谷P5058 [ZJOI2004]嗅探器
    洛谷P2746 校园网Network of Schools
    洛谷P3388 【模板】割点(割顶)
    洛谷P1407 [国家集训队]稳定婚姻
    2018年12月18日
    洛谷P1547 Out of Hay
    洛谷P4018 Roy&October之取石子
    洛谷P1318 积水面积
  • 原文地址:https://www.cnblogs.com/huanghanyu/p/13738969.html
Copyright © 2011-2022 走看看