TCPH的表结构
最小的且不变的,
- Region,Nation(FK,REGION记录属于哪个区域)
1万级别,
- Supplier(供应商),Skey,Name,地址,NKey(FK,记录所属国家key),电话,acctbal(Account Balance,账户余额)
10万级别,
- Customer(15万),Skey,Name,地址,NKey(FK,记录所属国家key),电话,acctbal,MKTSEGMENT(marktsegment,细分市场)
- Part(零件)(20万),Pkey,Name,MFGR(manufacturer,制造商),品牌,类型,大小,container(包装,容器),零售价
- PartSupp(零件供应商关系表)(80万),Pkey,Skey,AVAILQTY(供应量),SupplyCost(供应价格,成本)
100万级别,
- Orders(订单)(150万),Okey,Ckey(FK,记录订单来自客户的key),订单状态,订单总价格,订单优先级,收银员,发货优先级
- LineItem(订单流水明细,订单,零件,供应商的关系表)(600万)
Okey,PKey,Skey,流水号,数量,extendedprice(扩展成本,单位成本*数量),折扣,税,退回标记,流水状态,
shipdate(发货日期),commitdate(预计到货日期),receiptdate(实际收货日期),shipinstruct(运输策略),shipmode(运输途径)
Pricing Summary Report Query (Q1)
对于lineitem的groupby的能力
groupby, CP 1.3,small group by keys
表达式,
- CP 4.1a,d
- CP 4.2a,公共子表达式
Minimum Cost Supplier Query (Q2)
找出特定区域中,为某些part提供最低供应价的suppliers
Join,CP 2.2(sparse foreign key),这里对于part的筛选会导致sparse
字符串表达式,CP 4.3a: Rewrite LIKE(X%) into Range Query
子查询,CP5.2: Moving Predicates into a Subquery
Shipping Priority Query (Q3)
大于某日期未发货的订单,订单中未发货的的流水的总revenue,并按revenue,订单日期排序,并筛选customer的细分市场
Join,3表join,((customer,order),lineitem)
Groupby,
- CP1.2,考虑sortAgg,HashJoin,以order为probe side
- CP1.4
Locality,CP3.2,shipdate推导到orderdate
Order Priority Checking Query (Q4)
每个订单优先级,客户收货有延迟的订单数量
子查询,
Groupby,CP1.2,CP1.4
Local Supplier Volume Query (Q5)
Local Supplier,即supplier和customer属于同一个nation
一年内,某个region中,每个nation的满足local supplier的订单流水的revenue的和,并按revenue排序
Join,
- 6表连接,CP 2.3
- CP2.4(late projection)
Forecasting Revenue Change Query (Q6)
单纯的lineitem的过滤
常量折叠
算术表达式,CP4.1c,压缩执行,CP 4.2d: Evaluation Order in Conjunctions and Disjunctions.
Volume Shipping Query (Q7)
两年间,两国之间的交易量
CP4.2b,join相关表达式下推
National Market Share Query (Q8)
6表join,筛选出满足条件的流水的年份,折扣后金额,供应商名
非关联子查询,
表达式,
select o_year, sum(case when nation = '[NATION]' then volume else 0 end) / sum(volume) as mkt_share from ( select extract(year from o_orderdate) as o_year, l_extendedprice * (1-l_discount) as volume, n2.n_name as nation from part, supplier, lineitem, orders, customer, nation n1, nation n2, region where p_partkey = l_partkey and s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name = '[REGION]' and s_nationkey = n2.n_nationkey and o_orderdate between date '1995-01-01' and date '1996-12-31' and p_type = '[TYPE]' ) as all_nations group by o_year order by o_year;
Product Type Profit Measure Query (Q9)
profit,收益,流水的extended的价格 - 供应总价格
对满足条件的part的订单流水的收益,按nation和年份进行groupby,得到某nation,某年的收益
Join,CP2.1(large join),6表join,CP2.3
字符串表达式,CP 4.3a: Rewrite LIKE(X%) into Range Query
Returned Item Reporting Query (Q10)
3个月内,有退货的客户,总退货款的倒序
GroupBy,CP 1.4(Dependent Group-By Keys),虽然groupby那么多字段,其实都是depend on custKey,所以可以消除
Join,CP2.4(late projection)
Important Stock Identification Query (Q11)
零件的存货价值,超过nation所有零件的存货价值的一定比例,大宗货品;
CP5.3: Overlap between Outer- and Subquery. 去关联重写
select ps_partkey, sum(ps_supplycost * ps_availqty) as value from partsupp, supplier, nation where ps_suppkey = s_suppkey and s_nationkey = n_nationkey and n_name = '[NATION]' group by ps_partkey having sum(ps_supplycost * ps_availqty) > ( select sum(ps_supplycost * ps_availqty) * [FRACTION] from partsupp, supplier, nation where ps_suppkey = s_suppkey and s_nationkey = n_nationkey and n_name = '[NATION]' ) order by value desc;
Shipping Modes and Order Priority Query (Q12)
一年内收货的,按时shipping的,按shipmode groupby,订单优先级的sum
Locality,CP3.2,receiptdate推导到orderdate
CP 4.2c: Large IN Clauses.
Customer Distribution Query (Q13)
客户订单数,customer和order join,并count groupby orderkey获取,并对订单有字符串匹配的筛选
根据客户订单数进行groupby和排序,统计不同订单数的客户数,获取客户在订单数上的分布
Left outer join = left join
非关联子查询,
字符串表达式,CP 4.3a: Rewrite LIKE(X%) into Range Query,匹配comment字段的性能会成为瓶颈
Join,CP2.1(Group Join),在customer和orders基于custKey join的同时,完成对于custKey的groupby count
Promotion Effect Query (Q14)
全局aggregation
CP 4.3a: Rewrite LIKE(X%) into Range Query
select 100.00 * sum(case when p_type like 'PROMO%' then l_extendedprice*(1-l_discount) else 0 end) / sum(l_extendedprice * (1 - l_discount)) as promo_revenue from lineitem, part where l_partkey = p_partkey and l_shipdate >= date '[DATE]' and l_shipdate < date '[DATE]' + interval '1' month;
Top Supplier Query (Q15)
创建view revenue,一个季度,所有供应商,各自的总供应金额;view既是对于非关联子查询的优化
然后join,找出所有供应金额等于最大的供应金额的供应商
CP5.3: Overlap between Outer- and Subquery. 找出最大供应金额的子查询,做一遍即可
create view revenue[STREAM_ID] (supplier_no, total_revenue) as select l_suppkey, sum(l_extendedprice * (1 - l_discount)) from lineitem where l_shipdate >= date '[DATE]' and l_shipdate < date '[DATE]' + interval '3' month group by l_suppkey; select s_suppkey, s_name, s_address, s_phone, total_revenue from supplier, revenue[STREAM_ID] where s_suppkey = supplier_no and total_revenue = ( select max(total_revenue) from revenue[STREAM_ID] ) order by s_suppkey; drop view revenue[STREAM_ID];
Parts/Supplier Relationship Query (Q16)
表达式,
CP 4.2c: Large IN Clauses.
CP 4.3a: Rewrite LIKE(X%) into Range Query
Small-Quantity-Order Revenue Query (Q17)
小订单流水,该流水的量低于该零件订单平均量的20%
某种零件,小订单年平均销售总额
关联子查询,
- CP5.2: Moving Predicates into a Subquery
- CP5.3: Overlap between Outer- and Subquery.
select sum(l_extendedprice) / 7.0 as avg_yearly from lineitem, part where p_partkey = l_partkey and p_brand = '[BRAND]' and p_container = '[CONTAINER]' and l_quantity < ( select 0.2 * avg(l_quantity) from lineitem where l_partkey = p_partkey );
Large Volume Customer Query (Q18)
大订单,即customer的某个order的总quantity大于某个阈值,
列出所有客户的大订单,并且按订单总价格排序
子查询,
GroupBy,CP 1.2,CP1.4
Join,CP2.1(large join)
Discounted Revenue Query (Q19)
过滤条件比较复杂,
CP4.2b,join相关表达式下推
CP4.2c: Large IN Clauses.
CP4.2d: Evaluation Order in Conjunctions and Disjunctions.
Potential Part Promotion Query (Q20)
对特定nation的供应商的筛选,
供应商有满足某种特性的part,这里是name中包含某种color
且当前part的带供货的数量,超出该供应商一年中对于该零件的供应量的一半
字符串表达式,CP 4.3a: Rewrite LIKE(X%) into Range Query
关联子查询,
- CP5.2: Moving Predicates into a Subquery
- CP5.3: Overlap between Outer- and Subquery. 去关联重写
Suppliers Who Kept Orders Waiting Query (Q21)
supplier超期,lineitem.receiptdate > lineitem.commitdate
supplier block order, 订单中该supplier的流水超期,并且该订单存在其他的supplier,并且其他的supplier没有超期
找出所有超期的supplier,并统计超期的订单数
子查询,CP5.1: Flattening Subqueries,exists转成equiv-join,not exists转成anti-join
Global Sales Opportunity Query (Q22)
CP 4.2c: Large IN Clauses.