zoukankan      html  css  js  c++  java
  • 【HIVE高级笔试必备题型】(组内topN、相邻行的值比较问题)求语文大于数学_/_求文科大于理科成绩的学生

    Hive SQL练习之成绩分析

    数据:[id, 学号,班级,科目,成绩]
    1,1,1,yuwen,80
    2,1,1,shuxue,85
    3,2,1,yuwen,75
    4,2,1,shuxue,70
    5,3,1,yuwen,86
    6,3,1,shuxue,72
    7,4,2,yuwen,88
    8,4,2,shuxue,99
    9,5,2,yuwen,86
    10,5,2,shuxue,94
    11,6,2,yuwen,56
    12,6,2,shuxue,96
    

    题目:

    (1)求每个班级前三名的同学(组内topN问题);

    分析:按班级分组,同时取每个同学的平均成绩(或总分)还需要对学生分组。
    1)按班级+学生分组,取每个同学平均成绩

    select cid, sid, avg(score) avg_score from t_course group by cid, sid order by cid, avg_score desc;
    	+------+------+------------+
    	| cid  | sid  | avg_score  |
    	+------+------+------------+
    	| 1    | 1    | 82.5       |
    	| 1    | 3    | 79.0       |
    	| 1    | 2    | 72.5       |
    	| 2    | 4    | 93.5       |
    	| 2    | 5    | 90.0       |
    	| 2    | 6    | 76.0       |
    	+------+------+------------+
    
    	注意,这里因为数据量太少(每个班就三名同学),直接按上面排序就取出了前三名学生。
    	这个题实际考察的是多分组(班级+学生),组内TopN(成绩前三名),应该使用dense_rank函数。
    		- 请自行回顾row_number、rank、dense_rank三个函数的区别。
    	
    	例如取前两名,就得用上dense_rank
    	select * from (select cid, sid, avg(score) as avg_score, dense_rank() over (partition by cid order by avg(score) desc) as rank from t_course group by cid, sid  order by cid) t where rank<=2;
    		+--------+--------+--------------+---------+
    		| t.cid  | t.sid  | t.avg_score  | t.rank  |
    		+--------+--------+--------------+---------+
    		| 1      | 1      | 82.5         | 1       |
    		| 1      | 3      | 79.0         | 2       |
    		| 2      | 4      | 93.5         | 1       |
    		| 2      | 5      | 90.0         | 2       |
    		+--------+--------+--------------+---------+
    

    (2)求语文成绩比数学成绩高的同学,要求使用两种方式完成。

    提示:case when,collect_set,concat,concat_set

    本人自创方法(也是最简单的方法)

    0: jdbc:hive2://bigboss3:10000> select * from(select id,sid,cid,score,lead(score) over() as shuxue from t_course)t where id%2=1 and score>shuxue;
    +-------+--------+--------+----------+-----------+
    | t.id  | t.sid  | t.cid  | t.score  | t.shuxue  |
    +-------+--------+--------+----------+-----------+
    | 3     | 2      | 1      | 75       | 70        |
    | 5     | 3      | 1      | 86       | 72        |
    +-------+--------+--------+----------+-----------+
    
    

    解析:每个学生都是相邻行科目与成绩不同,使用lead(score) over()把成绩那一列整体向上提升一行,那么奇数行就有该同学语文和数学成绩,这样比起来不就很简单了吗,这个方法最简单了,也是我自己想的,得出答案也是正确的。

    方法二:使用collect_set,concat,concat_set,str_to_map

    collect_set函数将多行转换为一行,使用concat,concat_set,可以将字段拼接为map格式

    0: jdbc:hive2://bigboss3:10000> select * from (select sid,cid,str_to_map(concat_ws(",",collect_set(sc))) as scmap from (select sid,cid,concat(course,":",score) as sc from t_course) t group by sid,cid)s where s.scmap["yuwen"]>s.scmap["shuxue"];
    +--------+--------+-------------------------------+
    | s.sid  | s.cid  |            s.scmap            |
    +--------+--------+-------------------------------+
    | 3      | 1      | {"yuwen":"86","shuxue":"72"}  |
    | 2      | 1      | {"yuwen":"75","shuxue":"70"}  |
    +--------+--------+-------------------------------+
    2 rows selected (1.369 seconds)
    

    解析:

    1)使用concat方法,将学科和成绩进行拼接
    		select sid, concat(course, ":", score) from t_course;
    		+------+------------+
    		| sid  |    _c1     |
    		+------+------------+
    		| 1    | yuwen:80   |
    		| 1    | shuxue:85  |
    		| 2    | yuwen:75   |
    		| 2    | shuxue:70  |
    		
    		2)使用collect_set函数,每个同学的多行成绩合并到一行
    			注意:多行变一行是聚集操作,需要分组
    		select sid, collect_set(concat(course, ":", score)) from t_course group by sid;
    		+------+---------------------------+
    		| sid  |            _c1            |
    		+------+---------------------------+
    		| 4    | ["yuwen:88","shuxue:99"]  |
    		| 6    | ["yuwen:56","shuxue:96"]  |
    		| 2    | ["yuwen:75","shuxue:70"]  |
    		| 1    | ["yuwen:80","shuxue:85"]  |
    		| 3    | ["yuwen:86","shuxue:72"]  |
    		| 5    | ["yuwen:86","shuxue:94"]  |
    		+------+---------------------------+
    	
    		3)转换为map格式,方便获取成绩
    			注意:上面获取的["yuwen:88","shuxue:99"]是数组,无法直接转换为map,需要先使用concat_ws将数组转换为字符串
    			select sid, concat_ws(",", collect_set(concat(course, ":", score))) from t_course group by sid;
    			+------+---------------------+
    			| sid  |         _c1         |
    			+------+---------------------+
    			| 4    | yuwen:88,shuxue:99  |
    			| 6    | yuwen:56,shuxue:96  |
    			| 2    | yuwen:75,shuxue:70  |
    		
    			再将字符串转换为map
    			select sid, str_to_map(concat_ws(",", collect_set(concat(course, ":", score)))) as score_map from t_course group by sid;
    			+------+-------------------------------+
    			| sid  |           score_map           |
    			+------+-------------------------------+
    			| 4    | {"yuwen":"88","shuxue":"99"}  |
    			| 6    | {"yuwen":"56","shuxue":"96"}  |
    			| 2    | {"yuwen":"75","shuxue":"70"}  |
    			| 1    | {"yuwen":"80","shuxue":"85"}  |
    			| 3    | {"yuwen":"86","shuxue":"72"}  |
    			| 5    | {"yuwen":"86","shuxue":"94"}  |
    			+------+-------------------------------+
    		
    		4)通过map访问学科和成绩,并进行过滤
    		 select sid, score_map["yuwen"] as yuwen, score_map["shuxue"] as shuxue from (select sid, str_to_map(concat_ws(",", collect_set(concat(course, ":", score)))) as score_map from t_course group by sid) t where score_map["yuwen"]>score_map["shuxue"];
    		+------+--------+---------+
    		| sid  | yuwen  | shuxue  |
    		+------+--------+---------+
    		| 2    | 75     | 70      |
    		| 3    | 86     | 72      |
    		+------+--------+---------+
    

    方法三:case when

    这个方法比较巧妙,一般想不出来

    select sid, max(yuwen) as score_yw, max(shuxue) as score_sx from (select *, case course when "yuwen" then score else 0 end as yuwen, case course when "shuxue" then score else 0 end as shuxue from t_course) t group by sid  having score_yw>score_sx;
    		+------+-----------+-----------+
    		| sid  | score_yw  | score_sx  |
    		+------+-----------+-----------+
    		| 2    | 75        | 70        |
    		| 3    | 86        | 72        |
    		+------+-----------+-----------+
    

    解析

    思路:由于各科成绩在不同行,无法直接进行比较,可以使用case  when新增yuwen、shuxue两列,然后再将各科成绩转换为一行进行比较。
    		1)新增yuwen、shuxue两列
    		 select *, case course when "yuwen" then score else 0 end as yuwen, case course when "shuxue" then score else 0 end as shuxue from t_course;
    			+--------------+---------------+---------------+------------------+-----------------+--------+---------+
    			| t_course.id  | t_course.sid  | t_course.cid  | t_course.course  | t_course.score  | yuwen  | shuxue  |
    			+--------------+---------------+---------------+------------------+-----------------+--------+---------+
    			| 1            | 1             | 1             | yuwen            | 80              | 80     | 0       |
    			| 2            | 1             | 1             | shuxue           | 85              | 0      | 85      |
    			| 3            | 2             | 1             | yuwen            | 75              | 75     | 0       |
    			| 4            | 2             | 1             | shuxue           | 70              | 0      | 70      |
    			| 5            | 3             | 1             | yuwen            | 86              | 86     | 0       |
    			| 6            | 3             | 1             | shuxue           | 72              | 0      | 72      |
    			| 7            | 4             | 2             | yuwen            | 88              | 88     | 0       |
    			| 8            | 4             | 2             | shuxue           | 99              | 0      | 99      |
    			| 9            | 5             | 2             | yuwen            | 86              | 86     | 0       |
    			| 10           | 5             | 2             | shuxue           | 94              | 0      | 94      |
    			| 11           | 6             | 2             | yuwen            | 56              | 56     | 0       |
    			| 12           | 6             | 2             | shuxue           | 96              | 0      | 96      |
    			+--------------+---------------+---------------+------------------+-----------------+--------+---------+
    			
    		case course when "yuwen" then score else 0 end as yuwen解析:
    			对每一行,当course取值为"yuwen"时,则新增的yuwen字段值为score(语文成绩),否则(course值为shuxue),取值为0;
    		
    		2)对每一个学生做分组,目的是将每个学生的成绩显示到一行
    		select sid, max(yuwen) as score_yw, max(shuxue) as score_sx from (select *, case course when "yuwen" then score else 0 end as yuwen, case course when "shuxue" then score else 0 end as shuxue from t_course) t group by sid order by sid;
    		+------+-----------+-----------+
    		| sid  | score_yw  | score_sx  |
    		+------+-----------+-----------+
    		| 1    | 80        | 85        |
    		| 2    | 75        | 70        |
    		| 3    | 86        | 72        |
    		| 4    | 88        | 99        |
    		| 5    | 86        | 94        |
    		| 6    | 56        | 96        |
    		+------+-----------+-----------+
    			
    		3)查找语文成绩更高的学生,注意分组条件过滤使用having,而不是where
    		select sid, max(yuwen) as score_yw, max(shuxue) as score_sx from (select *, case course when "yuwen" then score else 0 end as yuwen, case course when "shuxue" then score else 0 end as shuxue from t_course) t group by sid  having score_yw>score_sx;
    		+------+-----------+-----------+
    		| sid  | score_yw  | score_sx  |
    		+------+-----------+-----------+
    		| 2    | 75        | 70        |
    		| 3    | 86        | 72        |
    		+------+-----------+-----------+
    
  • 相关阅读:
    mysql字节乱码问题之解决方法
    tracert nbtstat nslookup arp指令用法及示例
    Java ThreadLocal示例及使用方法总结
    C# Undo Redo
    自定义过滤器
    C#粘贴复制数据库中的内容
    分布式数据同步
    创建路径(c#)动态分段
    构建网络数据集(c#)
    创建一个栅格图层
  • 原文地址:https://www.cnblogs.com/BIG-BOSS-ZC/p/11807296.html
Copyright © 2011-2022 走看看