zoukankan      html  css  js  c++  java
  • hive:排序分析函数

    基本排序函数

    语法:

    rank()over([partition by col1] order by col2)
    dense_rank()over([partition by col1] order by col2)
    row_number()over([partition by col1] order by col2)

    其中[partition by col1]可省略

    案例:

    selectname,score,rank() over(partition by name order by score) tt from t;
    selectname,score,dense_rank() over(partition by name order by score) tt from t;
    selectname,score,row_number() over(partition by name order by score) tt from t;
    select name,score,rank()over(order by score) tt from t;
    前三名
     select name,score from (selectname,score,dense_rank() over(partition by name order by score desc) tt from t)x where x.tt<=3;
    分数为70的排第几
     select name,score,x.tt from (selectname,score,rank() over(partition by name order by score desc) tt from t) xwhere x.name='语文' and x.score=70;
    分页查询
     select xx.* from (select t.*,row_number()over(order by score desc) rowno from t) xx where xx.rowno between 1 and 3;
    实际案例:
    insert overwritetable otheravgrank_amt select substr(bus_inst_no,0,5),xt_op_trl,canal,sa_tx_dt,dr_cr_cod,cr_tx_amt,f_fare,counts,count(bus_inst_no),dense_rank()over (order by cr_tx_amt desc) as cr_tx_amt_rank,dense_rank() over (order byf_fare desc) as f_fare_rank,dense_rank() over (order by counts desc) as counts_rankfrom branch_amt group bysubstr(bus_inst_no,0,5),xt_op_trl,canal,sa_tx_dt,dr_cr_cod,cr_tx_amt,f_fare,counts;
    insert overwritetable denserank_amt selectbus_inst_no,xt_op_trl,canal,sa_tx_dt,dr_cr_cod,cr_tx_amt,f_fare,counts,count(bus_inst_no),dense_rank()over (partition by bus_inst_no order by cr_tx_amt desc) ascr_tx_amt_rank,dense_rank() over (partition by bus_inst_no order by f_faredesc) as f_fare_rank,dense_rank() over (partition by bus_inst_no order bycounts desc) as counts_rank from otheravgrank_amt group bybus_inst_no,xt_op_trl,canal,sa_tx_dt,dr_cr_cod,cr_tx_amt,f_fare,counts;
    insert overwritetable denserank_amt select * from denserank_amt sort by bus_inst_no;
    

     

    普及一下:

    rank/dense_rank/row_number区别

    row_number函数返回一个唯一的值,当碰到相同数据时,排名按照记录集中记录的顺序依次递增。

    rank函数会返回数据项在分组中的排名,排名相等会在名次中留下空位

    dense_rank返回数据项在分组中的排名,排名相等会在名次中不会留下空位

    更详细介绍可参考这里

    REGION_ID CUSTOMER_ID      TOTAL       RANK DENSE_RANK ROW_NUMBER-------------       

    5           2                 1224992         12         12         12     

      9          23                1224992         12         12         13     

    9          24                1224992         12         12         14      

    10          30               1216858         15           13            15


    排序函数进阶

    percent_rank 百分比排序函数

    计算公式为:PERCENT_RANK() = (RANK() – 1) / (Total Rows – 1)

    其中,RANK() 表示当前行基于ORDER BY后所跟字段的排名,而Total Rows 是当前行所在分区的总行数。

    ·         Hive-0.12.0中内置的分析函数,参考oracle用法

    ·                       org.apache.hadoop.hive.ql.exe.FunctionRegistry              
    ·                       registerHiveUDAFsAsWindowFunctions();
    ·                       registerWindowFunction("row_number", newGenericUDAFRowNumber());  --row_number实现类
    ·                       registerWindowFunction("rank", new GenericUDAFRank());
    ·                       registerWindowFunction("dense_rank", new GenericUDAFDenseRank());
    ·                       registerWindowFunction("percent_rank", newGenericUDAFPercentRank());
    ·                       registerWindowFunction("cume_dist", new GenericUDAFCumeDist());
    ·                       registerWindowFunction("ntile", new GenericUDAFNTile());
    ·                       registerWindowFunction("first_value", new GenericUDAFFirstValue());
    ·                       registerWindowFunction("last_value", newGenericUDAFLastValue());
    ·                       registerWindowFunction(LEAD_FUNC_NAME, new GenericUDAFLead(), false);
    registerWindowFunction(LAG_FUNC_NAME,new GenericUDAFLag(), false);
    

    实例:

    SELECT DepartmentID, Surname,Salary, Sex,
        PERCENT_RANK( ) OVER ( PARTITION BY Sex
          ORDER BY Salary DESC ) AS PctRank
     FROM Employees
     WHERE State IN ( 'NY' );
    

      由于按性别 (Sex) 划分输入,所以分别对男雇员和女雇员执行PERCENT_RANK 计算。


  • 相关阅读:
    温故而知新汇总贴
    温故而知新—heap
    温故而知新--hashtable
    温故而知新-set
    温故而知新——map
    温故而知新----stack
    找工作的时候我们改准备些什么
    js瀑布流布局
    js小游戏---智力游戏
    原生js完成拼图小游戏
  • 原文地址:https://www.cnblogs.com/kxdblog/p/4034242.html
Copyright © 2011-2022 走看看