zoukankan      html  css  js  c++  java
  • maxcompute mapjoin

    如果要join两张表,其中一张很大,另外一张很小,我们应该broadcast小表,把小表广播到各个节点。

    还有另外一种,使用Cartesian product笛卡尔积,也就是直接join 不on

    那么必须使用map join 

    举例,如果要计算会员数量及所占比例,需要除以会员总数,而这个总数我们就需要进行广播


    with
    customer as
    (select userid, customeruid from dwd_store_member_ticket_di where pt>=20210221 and customeruid is not null and customeruid!='0' group by userid,customeruid),
    customer_cnt as
    (select COUNT(userid) as cnt_total from customer)
    SELECT
    /*+ mapjoin(d) */
    c.*,round(c.cnt/d.cnt_total,4) as ratio FROM
    (select b.parentuserid,b.parent_account,b.parent_company,COUNT(a.userid) as cnt
    FROM
    customer a
    INNER JOIN
    (select userid, parentuserid, parent_account, parent_company from dim_usergroup_df where pt='20210221') b
    on a.userid=b.userid
    group by b.parentuserid,b.parent_account,b.parent_company
    having COUNT(a.userid)>300
    ) c
    left join
    customer_cnt d
    order by c.cnt desc
    ;
     
    只要在select之后加上/*+ mapjoin(d) */, 不用逗号
     
    left join 左边必须是大表,d是小表
  • 相关阅读:
    浅谈flume
    浅谈storm
    浅谈zookeeper
    IntelliJ IDEA 使用教程
    浅谈spark
    添加本地jar包到maven仓库
    eclipse通过maven进行打编译
    pom.xml中添加远程仓库
    maven编译错误maven-assembly-plugin:2.2-beta-5:assembly (default-cli) on project
    最长上升子序列
  • 原文地址:https://www.cnblogs.com/yjybupt/p/14430599.html
Copyright © 2011-2022 走看看