题目1: 有如下数据: A,2015-01,5 A,2015-01,15 B,2015-01,5 A,2015-01,8 B,2015-01,25 A,2015-01,5 C,2015-03,20 字段说明: username: 姓名 date : 日期 cost_money : 消费金额 需求: 1.计算出每个人每月的消费金额? (1)Select username,sum(cost_money) From tab1 Group by date,username Order by username 2.计算出每个人截止到当月的消费总金额? (1)Select tmp.username,sum(tmp. su) over(partition by tmp.username order by tmp.date ) ‘合计’ From ( Select username,sum(cost_money) su,date From tab1 Group by date,username ) tmp Group by tmp.date,tmp.username,su Order by tmp.username; 题目2: 有50W个京东店铺,每个顾客访客访问任何一个店铺的任何一个商品时都会产生一条访问日志,访问日志存储的表名为Visit,访客的用户id为user_id,被访问的店铺名称为shop,请统计: 1)每个店铺的UV(访客数) 2)每个店铺访问次数top3的访客信息。输出店铺名称、访客id、访问次数 create table Second_Visit (user_id string,shop string); insert into table second_visit values ('1','a'); insert into table second_visit values ('1','b'); insert into table second_visit values ('2','a'); insert into table second_visit values ('3','c'); insert into table second_visit values ('1','a'); insert into table second_visit values ('1','a'); 1)Select shop ,cout(1) From Second_Visit Group by shop; 2)SELECT*
from( Select tmp.shop,tmp.user_id,cu,row_number() over(partition by tmp.shop ORDER BY tmp.cu desc) rn From (select user_id,shop,count(1) cu From Second_Visit Group by user_id,shop) tmp )tmp1 Where tmp1.rn < 4;
题目3: 有一个5000万的用户文件(user_id,name,age),一个2亿记录的用户看电影的记录文件(user_id,url),根据年龄段观看电影的次数进行排序? 1.建表 create table forth_user(user_id string,name string,age int); create table forth_log(user_id string,url string); insert into table forth_user values('001','wt',10); insert into table forth_user values('002','ls',18); insert into table forth_user values('003','zz',30); insert into table forth_user values('004','zz',50); insert into table forth_log values('001','sdf'); insert into table forth_log values('001','wss'); insert into table forth_log values('002','sdf'); insert into table forth_log values('003','sdf'); insert into table forth_log values('004','sdf'); 2.分析需求 先求出每个人看了几次电影,t1 然后t1和user表join,拼接age字段 t2表 划分年龄段,0-20,20-40,40-60,60-- 按年龄段分组,按照次数排序 Select user_id ,name, CASE WHEN age >=0 AND age <=20 THEN '10-20' WHEN age >=21 AND age <=40 THEN '21-30' WHEN age >=41 AND age <=60 THEN '41-50' WHEN age >=61 THEN '61+' END AS ageband From (Select user_id id,name,count(1) ct From forth_use fu Join forth_log fl On fu.user_id = fl.user_id Group by fu.user_id) tab1 Join forth_user tab2 on tab1.id=tab2.user_id Group by ageband Order by ct desc;