zoukankan      html  css  js  c++  java
  • 数据仓库 DWS层之用户行为宽表

    为什么需要用户行为宽表?把每个用户单日的行为聚合起来组成一张多列宽表,以便之后关联用户维度信息后,进行不同角度的统计分析。

    数据来源:DWD层相关的业务数据表

    创建用户行为宽表:

    这张宽表整合了下单、支付和评论3种行为。

    drop table if exists dws_user_action;
    create external table dws_user_action 
    (   
        user_id          string      comment '用户 id',
        order_count     bigint      comment '下单次数 ',
        order_amount    decimal(16,2)  comment '下单金额 ',
        payment_count   bigint      comment '支付次数',
        payment_amount  decimal(16,2) comment '支付金额 ',
        comment_count   bigint      comment '评论次数'
    ) COMMENT '每日用户行为宽表'
    PARTITIONED BY (`dt` string)
    stored as parquet
    location '/warehouse/gmall/dws/dws_user_action/'
    tblproperties ("parquet.compression"="snappy");

    数据导入脚本:

    with as基本语法为如下,作用是定义一个临时表,可以在后续的语句中多次使用,提高sql可读性。注意多个临时表之间用逗号,而最后一个临时表和查询语句之间没有符号。

    WITH t1 AS (
            SELECT *
            FROM carinfo
        ), 
        t2 AS (
            SELECT *
            FROM car_blacklist
        )
    SELECT *
    FROM t1, t2
    #!/bin/bash
    
    # 定义变量方便修改
    APP=gmall
    hive=/opt/module/hive/bin/hive
    
    # 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
    if [ -n "$1" ] ;then
        do_date=$1
    else 
        do_date=`date -d "-1 day" +%F`  
    fi 
    
    sql="
    
    with  
    tmp_order as
    (
        select 
            user_id, 
            sum(oi.total_amount) order_amount, 
            count(*)  order_count
        from "$APP".dwd_order_info  oi
        where date_format(oi.create_time,'yyyy-MM-dd')='$do_date'
        group by user_id
    )  ,
    tmp_payment as
    (
        select 
            user_id, 
            sum(pi.total_amount) payment_amount, 
            count(*) payment_count 
        from "$APP".dwd_payment_info pi 
        where date_format(pi.payment_time,'yyyy-MM-dd')='$do_date'
        group by user_id
    ),
    tmp_comment as
    (  
        select  
            user_id, 
            count(*) comment_count
        from "$APP".dwd_comment_log c
        where date_format(c.dt,'yyyy-MM-dd')='$do_date'
        group by user_id 
    )
    
    Insert overwrite table "$APP".dws_user_action partition(dt='$do_date')
    select 
        user_actions.user_id, 
        sum(user_actions.order_count), 
        sum(user_actions.order_amount),
        sum(user_actions.payment_count), 
        sum(user_actions.payment_amount),
        sum(user_actions.comment_count) 
    from
    (
        select
            user_id,
            order_count,
            order_amount,
            0 payment_count,
            0 payment_amount,
            0 comment_count
        from tmp_order
    
        union all
        select
            user_id,
            0,
            0,
            payment_count,
            payment_amount,
            0
        from tmp_payment
    
        union all
        select
            user_id,
            0,
            0,
            0,
            0,
            comment_count 
        from tmp_comment
     ) user_actions
    group by user_id;
    "
    
    $hive -e "$sql"
  • 相关阅读:
    搜索区间
    搜索插入位置
    旋转排序数组
    搜索二维矩阵
    njnja 安装
    rpmbuild打包
    snappy 安装
    mysql8 安装
    re2c安装
    make 安装
  • 原文地址:https://www.cnblogs.com/noyouth/p/13225215.html
Copyright © 2011-2022 走看看