zoukankan      html  css  js  c++  java
  • 使用clickhouse实现开窗函数 row/rank_number 和 lag lead

    ROW_NUMBER实现

    如何在ClickHouse中实现ROW_NUMBER OVER 和DENSE_RANK OVER等同效果的查询,它们在一些其他数据库中可用于RANK排序。

    CH中并没有直接提供对应的开窗函数,需要利用一些特殊函数变相实现,主要会用到下面几个数组函数,它们分别是:

    arrayEnumerate
    arrayEnumerateDense
    arrayEnumerateUniq

    这些函数均接受一个数组作为输入参数,并返回数组中元素出现的位置,例如:

    SELECT    
     arrayEnumerate([10, 20, 30, 10, 40]) AS row_number,     
     arrayEnumerateDense([10, 20, 30, 10, 40]) AS dense_rank,    
     arrayEnumerateUniq([10, 20, 30, 10, 40]) AS uniq_rank
     
    ┌─row_number──┬─dense_rank──┬─uniq_rank───┐
    │ [1,2,3,4,5][1,2,3,1,4][1,1,1,2,1] │
    └─────────────┴─────────────┴─────────────┘

    数据格式:

    我们的目标是实现开窗查询:

    ROW_NUMBER() OVER( PARTITION BY id ORDER BY val )==>arrayEnumerate
    DENSE_RANK() OVER( PARTITION BY id ORDER BY val )==>arrayEnumerateDense
    UNIQ_RANK() OVER( PARTITION BY id ORDER BY val )==>arrayEnumerateUniq

    代码如下

    SELECT     
    customer_id ,     
    groupArray(loan_dt) AS loan_dt, 
    groupArray(ifnull(end_date,toDate('2099-12-31'))) AS end_date,  
    groupArray(due_days) AS due_days,    
    groupArray(loan_id) AS loan_id,    
    arrayEnumerate(loan_id) AS row_number,     
    arrayEnumerateDense(loan_id) AS dense_rank,     
    arrayEnumerateUniq(loan_id) AS uniq_rank
    FROM (    
        SELECT   *  FROM res_report.xxx_loan     ORDER BY loan_dt ,loan_id  
    )
    GROUP BY customer_id

     数组展开,利用ARRAY JOIN将数组展开,并按照customer_id 、loan_id  列排序:

    SELECT 
    customer_id
    ,loan_id
    ,loan_dt
    ,IF(end_date=toDate('2099-12-31'),null,end_date) as end_dt
    ,due_days
    ,row_number 
    ,dense_rank
    ,uniq_rank
    from 
    (
    SELECT     
    customer_id ,     
    groupArray(loan_dt) AS loan_dt, 
    groupArray(ifnull(end_date,toDate('2099-12-31'))) AS end_date,  
    groupArray(due_days) AS due_days,    
    groupArray(loan_id) AS loan_id,    
    arrayEnumerate(loan_id) AS row_number,     
    arrayEnumerateDense(loan_id) AS dense_rank,     
    arrayEnumerateUniq(loan_id) AS uniq_rank
    FROM (    
        SELECT   *  FROM res_report.xxx_loan     ORDER BY loan_dt ,loan_id  
    )
    GROUP BY customer_id
    )
    ARRAY JOIN  
        loan_dt,
        loan_id, 
        end_date,
        due_days,
        row_number,     
        dense_rank,     
        uniq_rank
    ORDER BY     
    customer_id ASC,     
    row_number ASC ,     
    dense_rank ASC

     技巧:因为end_date可能为空值,会导致array长度不一致。报错。需要用特数值填充然后最后再转换回来。

    lag/lead实现:

    neighbor(column, offset[, default_value])
     
    The result of the function depends on the affected data blocks and the order of data in the block.
    If you make a subquery with ORDER BY and call the function from outside the subquery, you can get the expected result.
     
    Parameters
     
    column — A column name or scalar expression.
    offset — The number of rows forwards or backwards from the current row of column. Int64.
    default_value — Optional. The value to be returned if offset goes beyond the scope of the block. Type of data blocks affected.
    Returned values
     
    Value for column in offset distance from current row if offset value is not outside block bounds.
    Default value for column if offset value is outside block bounds. If default_value is given, then it will be used.
    Type: type of data blocks affected or default value type.
     
     
    参考:
    https://clickhouse.tech/docs/en/sql-reference/functions/other-functions/

    代码如下

    SELECT 
    customer_id
    ,loan_id
    ,loan_dt
    ,IF(end_date=toDate('2099-12-31'),null,end_date) as end_dt
    ,due_days
    ,row_number 
    ,dense_rank
    ,uniq_rank
    ,if(neighbor(row_number , 1)<>1,neighbor(loan_dt , 1),null)  as lead_loan_dt
    ,if(row_number<>1,neighbor(end_dt, -1),null)  as lag_end_dt
    from 
    (
    SELECT     
    customer_id ,     
    groupArray(loan_dt) AS loan_dt, 
    groupArray(ifnull(end_date,toDate('2099-12-31'))) AS end_date,  
    groupArray(due_days) AS due_days,    
    groupArray(loan_id) AS loan_id,    
    groupArray(loan_tot) AS loan_tot, 
    arrayEnumerate(loan_id) AS row_number,     
    arrayEnumerateDense(loan_id) AS dense_rank,     
    arrayEnumerateUniq(loan_id) AS uniq_rank
    FROM (    
        SELECT   *  FROM res_report.xxx_loan     ORDER BY loan_dt ,loan_id  
    )
    GROUP BY customer_id
    )
    ARRAY JOIN  
        loan_dt,
        loan_id, 
        end_date,
        due_days,
        row_number,     
        dense_rank,     
        uniq_rank
    ORDER BY     
    customer_id ASC,     
    row_number ASC ,     
    dense_rank ASC

    发现有一点问题:就是最后一行lead的时候会出现异常值1970-01-01年值的问题。 

    SELECT 
    customer_id
    ,loan_id
    ,loan_dt
    ,IF(end_date=toDate('2099-12-31'),null,end_date) as end_dt
    ,due_days
    ,row_number as row_num
    ,dense_rank
    ,uniq_rank
    ,if(neighbor(row_num , 1)>1,neighbor(loan_dt , 1),null)  as lead_loan_dt
    ,if(row_num<>1,neighbor(end_dt, -1),null)  as lag_end_dt
    from 
    (
    SELECT     
    customer_id ,     
    groupArray(loan_dt) AS loan_dt, 
    groupArray(ifnull(end_date,toDate('2099-12-31'))) AS end_date,  
    groupArray(due_days) AS due_days,    
    groupArray(loan_id) AS loan_id,    
    arrayEnumerate(loan_id) AS row_number,     
    arrayEnumerateDense(loan_id) AS dense_rank,     
    arrayEnumerateUniq(loan_id) AS uniq_rank
    FROM (    
        SELECT   *  FROM res_report.ipeso_loan     ORDER BY loan_dt ,loan_id  
    )
    GROUP BY customer_id
    )
    ARRAY JOIN  
        loan_dt,
        loan_id, 
        end_date,
        due_days,
        row_number,     
        dense_rank,     
        uniq_rank
    ORDER BY     
    customer_id ASC,     
    row_number ASC ,     
    dense_rank ASC
  • 相关阅读:
    Cow Rectangles&Moovie Mooving
    Sound静音问题
    Spring MVC 流程图(转)
    centos6.5配置redis服务 很好用谢谢
    如何用70行Java代码实现深度神经网络算法(转)
    java中枚举(enum)小例子。之前学过枚举但是一直没用,这里有个枚举类帮你我理解下(很肤浅)
    幸福很简单(一直不知道怎么去阐述幸福,今天终于看到一个台词觉得这个阐述还行,作一个笔记)-----------穷人好像都是这么觉得的
    ExecutorService中submit和execute的区别(转)
    spring batch部分
    java 堆栈的区别(转百度)
  • 原文地址:https://www.cnblogs.com/wqbin/p/14189943.html
Copyright © 2011-2022 走看看