知识点:
substr、concat函数的使用:
row_number() over(distribute by year sort by temp desc) #按照年分组,按照tmp去排序
需求:现有hive表temp,其中只有一个字段(temp_record string),每一行代表某一天的气温,比如,2014010114代表,2014年1月1日的气温为14度,表中数据如下:
要求:用hive求出每一年的最大气温那条记录
注意:数据格式不能改变,例如求出来2015年的最大气温那条记录为2015010999
2014010114 2014010216 2014010317 2014010410 2014010506 2012010609 2012010732 2012010812 2012010919 2012011023 2001010116 2001010212 2001010310 2001010411 2001010529 2013010619 2013010722 2013010812 2013010929 2013011023 2008010105 2008010216 2008010337 2008010414 2008010516 2007010619 2007010712 2007010812 2007010999 2007011023 2010010114 2010010216 2010010317 2010010410 2010010506 2015010649 2015010722 2015010812 2015010999 2015011023
select concat(t2.years,t2.month_day,t2.day_weather) from (select t1.years,t1.month_day,t1.day_weather, row_number() over(distribute by t1.years sort by t1.day_weather desc) as index from (select substr(line,1,4)years,substr(line,5,4)month_day,substr (line,9)day_weather from weather)t1 )t2 where t2.index=1;