zoukankan      html  css  js  c++  java
  • TPC-H 分析

    TCPH的表结构

    最小的且不变的,

    - Region,Nation(FK,REGION记录属于哪个区域)

    1万级别,

    - Supplier(供应商),Skey,Name,地址,NKey(FK,记录所属国家key),电话,acctbal(Account Balance,账户余额)

    10万级别,

    Customer(15万),Skey,Name,地址,NKey(FK,记录所属国家key),电话,acctbal,MKTSEGMENT(marktsegment,细分市场)

    Part(零件)(20万),Pkey,Name,MFGR(manufacturer,制造商),品牌,类型,大小,container(包装,容器),零售价

    PartSupp(零件供应商关系表)(80万),Pkey,Skey,AVAILQTY(供应量),SupplyCost(供应价格,成本)

    100万级别,

    Orders(订单)(150万),Okey,Ckey(FK,记录订单来自客户的key),订单状态,订单总价格,订单优先级,收银员,发货优先级

    LineItem(订单流水明细,订单,零件,供应商的关系表)(600万)

    Okey,PKey,Skey,流水号,数量,extendedprice(扩展成本,单位成本*数量),折扣,税,退回标记,流水状态,

    shipdate(发货日期),commitdate(预计到货日期),receiptdate(实际收货日期),shipinstruct(运输策略),shipmode(运输途径)

    Pricing Summary Report Query (Q1) 

    对于lineitem的groupby的能力

    groupby, CP 1.3,small group by keys

    表达式,

    - CP 4.1a,d

    - CP 4.2a,公共子表达式

    Minimum Cost Supplier Query (Q2) 

    找出特定区域中,为某些part提供最低供应价的suppliers

    Join,CP 2.2(sparse foreign key),这里对于part的筛选会导致sparse

    字符串表达式,CP 4.3a: Rewrite LIKE(X%) into Range Query

    子查询,CP5.2: Moving Predicates into a Subquery

    Shipping Priority Query (Q3) 

    大于某日期未发货的订单,订单中未发货的的流水的总revenue,并按revenue,订单日期排序,并筛选customer的细分市场

    Join,3表join,((customer,order),lineitem)

    Groupby,

    - CP1.2,考虑sortAgg,HashJoin,以order为probe side

    - CP1.4

    Locality,CP3.2,shipdate推导到orderdate

    Order Priority Checking Query (Q4) 

    每个订单优先级,客户收货有延迟的订单数量

    子查询,

    Groupby,CP1.2,CP1.4

     

    Local Supplier Volume Query (Q5) 

    Local Supplier,即supplier和customer属于同一个nation

    一年内,某个region中,每个nation的满足local supplier的订单流水的revenue的和,并按revenue排序

    Join,

    - 6表连接,CP 2.3

    CP2.4(late projection)

    Forecasting Revenue Change Query (Q6) 

    单纯的lineitem的过滤

    常量折叠

    算术表达式,CP4.1c,压缩执行,CP 4.2d: Evaluation Order in Conjunctions and Disjunctions.

    Volume Shipping Query (Q7) 

    两年间,两国之间的交易量

    CP4.2b,join相关表达式下推

    National Market Share Query (Q8) 

    6表join,筛选出满足条件的流水的年份,折扣后金额,供应商名

    非关联子查询,

    表达式,

    select
        o_year,
        sum(case
            when nation = '[NATION]'
            then volume
            else 0
        end) / sum(volume) as mkt_share
    from (
        select
            extract(year from o_orderdate) as o_year,
            l_extendedprice * (1-l_discount) as volume,
            n2.n_name as nation
        from
            part,
            supplier,
            lineitem,
            orders,
            customer,
            nation n1,
            nation n2,
            region
        where
            p_partkey = l_partkey
            and s_suppkey = l_suppkey
            and l_orderkey = o_orderkey
            and o_custkey = c_custkey
            and c_nationkey = n1.n_nationkey
            and n1.n_regionkey = r_regionkey
            and r_name = '[REGION]'
            and s_nationkey = n2.n_nationkey
            and o_orderdate between date '1995-01-01' and date '1996-12-31'
            and p_type = '[TYPE]'
        ) as all_nations
    group by
        o_year
    order by
        o_year;

    Product Type Profit Measure Query (Q9) 

    profit,收益,流水的extended的价格 - 供应总价格

    对满足条件的part的订单流水的收益,按nation和年份进行groupby,得到某nation,某年的收益

    Join,CP2.1(large join),6表join,CP2.3

    字符串表达式,CP 4.3a: Rewrite LIKE(X%) into Range Query

    Returned Item Reporting Query (Q10) 

    3个月内,有退货的客户,总退货款的倒序

    GroupBy,CP 1.4(Dependent Group-By Keys),虽然groupby那么多字段,其实都是depend on custKey,所以可以消除

    Join,CP2.4(late projection)

    Important Stock Identification Query (Q11) 

    零件的存货价值,超过nation所有零件的存货价值的一定比例,大宗货品;

    CP5.3: Overlap between Outer- and Subquery. 去关联重写

    select
        ps_partkey,
        sum(ps_supplycost * ps_availqty) as value
    from
        partsupp,
        supplier,
        nation
    where
        ps_suppkey = s_suppkey
        and s_nationkey = n_nationkey
        and n_name = '[NATION]'
    group by
        ps_partkey having
            sum(ps_supplycost * ps_availqty) > (
                select
                    sum(ps_supplycost * ps_availqty) * [FRACTION]
                from
                    partsupp,
                    supplier,
                    nation
                where
                    ps_suppkey = s_suppkey
                    and s_nationkey = n_nationkey
                    and n_name = '[NATION]'
            )
    order by
        value desc;

    Shipping Modes and Order Priority Query (Q12) 

    一年内收货的,按时shipping的,按shipmode groupby,订单优先级的sum

    Locality,CP3.2,receiptdate推导到orderdate

    CP 4.2c: Large IN Clauses.

     

    Customer Distribution Query (Q13) 

    客户订单数,customer和order join,并count groupby orderkey获取,并对订单有字符串匹配的筛选

    根据客户订单数进行groupby和排序,统计不同订单数的客户数,获取客户在订单数上的分布

    Left outer join = left join

    非关联子查询,

    字符串表达式,CP 4.3a: Rewrite LIKE(X%) into Range Query,匹配comment字段的性能会成为瓶颈

    Join,CP2.1(Group Join),在customer和orders基于custKey join的同时,完成对于custKey的groupby count

    Promotion Effect Query (Q14) 

    全局aggregation

    CP 4.3a: Rewrite LIKE(X%) into Range Query

    select 
        100.00 * sum(case 
                   when p_type like 'PROMO%' 
                   then l_extendedprice*(1-l_discount) 
                   else 0 
        end) / sum(l_extendedprice * (1 - l_discount)) as promo_revenue 
    from 
        lineitem, 
        part 
    where 
        l_partkey = p_partkey
        and l_shipdate >= date '[DATE]'
        and l_shipdate < date '[DATE]' + interval '1' month;

    Top Supplier Query (Q15) 

    创建view revenue,一个季度,所有供应商,各自的总供应金额;view既是对于非关联子查询的优化

    然后join,找出所有供应金额等于最大的供应金额的供应商

    CP5.3: Overlap between Outer- and Subquery. 找出最大供应金额的子查询,做一遍即可

    create view revenue[STREAM_ID] (supplier_no, total_revenue) as
        select
            l_suppkey,
            sum(l_extendedprice * (1 - l_discount))
        from
            lineitem
        where
            l_shipdate >= date '[DATE]'
            and l_shipdate < date '[DATE]' + interval '3' month
        group by
            l_suppkey;
    select
        s_suppkey,
        s_name,
        s_address,
        s_phone,
        total_revenue
    from
        supplier,
        revenue[STREAM_ID]
    where
        s_suppkey = supplier_no
        and total_revenue = (
            select
                max(total_revenue)
            from
                revenue[STREAM_ID]
        )
    order by
        s_suppkey;
    drop view revenue[STREAM_ID];

    Parts/Supplier Relationship Query (Q16) 

    表达式,

    CP 4.2c: Large IN Clauses.

    CP 4.3a: Rewrite LIKE(X%) into Range Query

    Small-Quantity-Order Revenue Query (Q17) 

    小订单流水,该流水的量低于该零件订单平均量的20%

    某种零件,小订单年平均销售总额

    关联子查询,

    - CP5.2: Moving Predicates into a Subquery

    - CP5.3: Overlap between Outer- and Subquery.

    select
        sum(l_extendedprice) / 7.0 as avg_yearly
    from
        lineitem,
        part
    where
        p_partkey = l_partkey
        and p_brand = '[BRAND]'
        and p_container = '[CONTAINER]'
        and l_quantity < (
            select
                0.2 * avg(l_quantity)
            from
                lineitem
            where
                l_partkey = p_partkey
        );

    Large Volume Customer Query (Q18) 

    大订单,即customer的某个order的总quantity大于某个阈值,

    列出所有客户的大订单,并且按订单总价格排序

    子查询,

    GroupBy,CP 1.2,CP1.4

    Join,CP2.1(large join)

    Discounted Revenue Query (Q19) 

    过滤条件比较复杂,

    CP4.2b,join相关表达式下推

    CP4.2c: Large IN Clauses.

    CP4.2d: Evaluation Order in Conjunctions and Disjunctions.

     

    Potential Part Promotion Query (Q20) 

    对特定nation的供应商的筛选,

    供应商有满足某种特性的part,这里是name中包含某种color

    且当前part的带供货的数量,超出该供应商一年中对于该零件的供应量的一半

    字符串表达式,CP 4.3a: Rewrite LIKE(X%) into Range Query

    关联子查询,

    - CP5.2: Moving Predicates into a Subquery

    - CP5.3: Overlap between Outer- and Subquery. 去关联重写

    Suppliers Who Kept Orders Waiting Query (Q21) 

    supplier超期,lineitem.receiptdate > lineitem.commitdate

    supplier block order, 订单中该supplier的流水超期,并且该订单存在其他的supplier,并且其他的supplier没有超期

    找出所有超期的supplier,并统计超期的订单数

    子查询,CP5.1: Flattening Subqueries,exists转成equiv-join,not exists转成anti-join

    Global Sales Opportunity Query (Q22) 

    CP 4.2c: Large IN Clauses.

  • 相关阅读:
    详解javascript中的闭包
    Cookie/Session的机制与安全
    session详解
    linux常用目录简介
    对比cp和scp命令 将数据从一台linux服务器复制到另一台linux服务器
    webpack打包速度和性能再次优化
    pc浏览器css和js计算浏览器宽度的差异以及和滚动条的关系
    chrome浏览器Timing内各字段解析
    深入理解-CLI与PHP-FPM
    swool教程链接汇总
  • 原文地址:https://www.cnblogs.com/fxjwind/p/15349034.html
Copyright © 2011-2022 走看看