zoukankan      html  css  js  c++  java
  • Analytic Functions in Oracle

    Contents

    Overview and Introduction
    How Analytic Functions Work
    The Syntax
    Examples
    Calculate a running Total
    Top-N Queries
        Example 1
        Example 2
    Windows
        Range Windows
        Compute average salary for defined range
        Row Windows
        Accessing Rows Around Your Current Row
    LAG
    LEAD
    Determine the First Value / Last Value of a Group
    Crosstab or Pivot Queries
    ROLLUP and RANK Examples
    CUBE
    Grouping Functions:
        Grouping_ID
        GROUP_ID
        Grouping SETS
    More Examples with EMP Table

    Overview

    Analytic Functions are designed to address such problems as "Calculate a running total", "Find percentages within a group", "Top-N queries", "Compute a moving average" and many more. 
    Most of these problems can be solved using standard PL/SQL, however the performance is often not what it should be. 
    Analytic Functions add extensions to the SQL language that not only make these operations easier to code; they make them faster than could be achieved with pure SQL or PL/SQL.

    How Analytic Functions Work ?

    Analytic functions compute an aggregate value based on a group of rows. They differ from aggregate functions (like select AVG(sal) from emp) in that they return multiple rows for each group.
    Analytic functions also operate on subsets of rows, similar to aggregate functions in GROUP BY queries, but they do not reduce the number of rows returned by the query.
    The group of rows is called a window and is defined by the analytic clause. For each row, a "sliding" window of rows is defined. The window determines the range of rows used to perform the calculations for the "current row". Window sizes can be based on either a physical number of rows or a logical interval such as time.
    Select MAX() OVER ()
    The OVER() statement signals a start of an Analytic function. That is what differentiates an Analytical Function from a regular Oracle SQL function

    Select MAX() OVER(partition by field1).
    The portioning clause is used to setup the group of data that the Analytic function would be applied to.

    Select MAX() OVER(Partition by field order by)
    Order by specify the order of the window in the group by statement. The Order by clause is a keyword in the Oracle Analytic syntax that is requirement for using some Analytic functions
    Analytic functions are the last set of operations performed in a query except for the final ORDER BY clause. All joins and all WHERE, GROUP BY, and HAVING clauses are completed before the analytic functions are processed. Therefore, analytic functions can appear only in the select list or ORDER BY clause.

    Example:
    SELECT empno, deptno, sal,
           AVG(sal) OVER (PARTITION BY deptno) AS avg_dept_sal
    FROM   emp;
        EMPNO     DEPTNO        SAL AVG_DEPT_SAL
    --------- ---------- ---------- ------------
         7782         10       2450   2916.66667
         7839         10       5000   2916.66667
         7934         10       1300   2916.66667
         7566         20       2975         2175
         7902         20       3000         2175
         7876         20       1100         2175
         7369         20        800         2175
         7788         20       3000         2175
         7521         30       1250   1566.66667
         7844         30       1500   1566.66667
         7499         30       1600   1566.66667
         7900         30        950   1566.66667
         7698         30       2850   1566.66667
         7654         30       1250   1566.66667
    This time AVG is an analytic function, operating on the group of rows defined by the contents of the OVER clause. This group of rows is known as a window, which is why analytic functions are sometimes referred to as window[ing] functions. Notice how the AVG function is still reporting the departmental average, like it did in the GROUP BY query, but the result is present in each row, rather than reducing the total number of rows returned. This is because analytic functions are performed on a result set after all join, WHERE, GROUP BY and HAVING clauses are complete, but before the final ORDER BY operation is performed.

    The Syntax

    There are some variations in the syntax of the individual analytic functions, but the basic syntax for an analytic function is as follows.

    analytic_function([ arguments ]) OVER (analytic_clause)

    The analytic_clause breaks down into the following optional elements.

    [ query_partition_clause ] [ order_by_clause [ windowing_clause ] ]

    The sub-elements of the analytic_clause each have their own syntax diagrams. Rather than repeat the syntax diagrams, the following sections describe what each section of the analytic_clause is used for.

    So here is the FULL sentence:

    Analytic-Function(<Argument>,<Argument>,...)
    OVER (
      <Query-Partition-Clause>
      <Order-By-Clause>
      <Windowing-Clause>
    )

    • Analytic-Functions
      Specify the name of an analytic function, Oracle actually provides many analytic functions such as AVGCORRCOVAR_POPCOVAR_SAMPCOUNTCUME_DIST,DENSE_RANKFIRSTFIRST_VALUELAGLASTLAST_VALUELEADMAXMINNTILEPERCENT_RANKPERCENTILE_CONTPERCENTILE_DISCRANKRATIO_TO_REPORT,STDDEVSTDDEV_POPSTDDEV_SAMPSUMVAR_POPVAR_SAMPVARIANCE.

    • Arguments
      Analytic functions take 0 to 3 arguments.

    • Query-Partition-Clause 
      The PARTITION BY clause logically breaks a single result set into N groups, according to the criteria set by the partition expressions. The words "partition" and "group" are used synonymously here. The analytic functions are applied to each group independently, they are reset for each group. If the query_partition_clause is omitted, the whole result set is treated as a single partition. 
      Example: The following query uses an empty OVER clause, so the average presented is based on all the rows of the result set.

      SELECT empno, deptno, sal,
             AVG(sal) OVER () AS avg_sal
      FROM   emp;

           EMPNO     DEPTNO        SAL    AVG_SAL
      ---------- ---------- ---------- ----------
            7369         20        800 2073.21429
            7499         30       1600 2073.21429
            7521         30       1250 2073.21429
            7566         20       2975 2073.21429
            7654         30       1250 2073.21429
            7698         30       2850 2073.21429
            7782         10       2450 2073.21429
            7788         20       3000 2073.21429
            7839         10       5000 2073.21429
            7844         30       1500 2073.21429
            7876         20       1100 2073.21429
            7900         30        950 2073.21429
            7902         20       3000 2073.21429
            7934         10       1300 2073.21429

      If we change the OVER clause to include a query_partition_clause based on the department, the averages presented are specifically for the department the employee belongs too.

      SELECT empno, deptno, sal,
             AVG(sal) OVER (PARTITION BY deptno) AS avg_dept_sal
      FROM   emp;

          EMPNO     DEPTNO        SAL AVG_DEPT_SAL
      --------- ---------- ---------- ------------
           7782         10       2450   2916.66667
           7839         10       5000   2916.66667
           7934         10       1300   2916.66667
           7566         20       2975         2175
           7902         20       3000         2175
           7876         20       1100         2175
           7369         20        800         2175
           7788         20       3000         2175
           7521         30       1250   1566.66667
           7844         30       1500   1566.66667
           7499         30       1600   1566.66667
           7900         30        950   1566.66667
           7698         30       2850   1566.66667
           7654         30       1250   1566.66667

    • Order-By-Clause
      The order_by_clause is used to order rows, or siblings, within a partition. So if an analytic function is sensitive to the order of the siblings in a partition you should include an order_by_clause. The following query uses the FIRST_VALUE function to return the first salary reported in each department. Notice we have partitioned the result set by the department, but there is no order_by_clause.

      SELECT empno, deptno, sal, 
             FIRST_VALUE(sal IGNORE NULLS) OVER (PARTITION BY deptno) AS first_sal_in_dept
      FROM   emp;

           EMPNO     DEPTNO        SAL FIRST_SAL_IN_DEPT
      ---------- ---------- ---------- -----------------
            7782         10       2450              2450
            7839         10       5000              2450
            7934         10       1300              2450
            7566         20       2975              2975
            7902         20       3000              2975
            7876         20       1100              2975
            7369         20        800              2975
            7788         20       3000              2975
            7521         30       1250              1250
            7844         30       1500              1250
            7499         30       1600              1250
            7900         30        950              1250
            7698         30       2850              1250
            7654         30       1250              1250

      Now compare the values of the FIRST_SAL_IN_DEPT column when we include an order_by_clause to order the siblings by ascending salary.

      SELECT empno, deptno, sal, 
             FIRST_VALUE(sal IGNORE NULLS) OVER (PARTITION BY deptno ORDER BY sal ASC NULLS LAST) AS first_val_in_dept
      FROM   emp;

           EMPNO     DEPTNO        SAL FIRST_VAL_IN_DEPT
      ---------- ---------- ---------- -----------------
            7934         10       1300              1300
            7782         10       2450              1300
            7839         10       5000              1300
            7369         20        800               800
            7876         20       1100               800
            7566         20       2975               800
            7788         20       3000               800
            7902         20       3000               800
            7900         30        950               950
            7654         30       1250               950
            7521         30       1250               950
            7844         30       1500               950
            7499         30       1600               950
            7698         30       2850               950

      In this case the "ASC NULLS LAST" keywords are unnecessary as ASC is the default for an order_by_clause and NULLS LAST is the default for ASC orders. When ordering by DESC, the default is NULLS FIRST.

      It is important to understand how the order_by_clause affects display order. The order_by_clause is guaranteed to affect the order of the rows as they are processed by the analytic function, but it may not always affect the display order. As a result, you must always use a conventional ORDER BY clause in the query if display order is important. Do not rely on any implicit ordering done by the analytic function. Remember, the conventional ORDER BY clause is performed after the analytic processing, so it will always take precedence.

    •  Windowing-Clause
      The windowing_clause gives some analytic functions a further degree of control over this window within the current partition. The windowing_clause is an extension of the order_by_clause and as such, it can only be used if an order_by_clause is present. The windowing_clause has two basic forms.
      RANGE BETWEEN start_point AND end_point
      ROWS BETWEEN start_point AND end_point

      Possible values for "start_point" and "end_point" are:

      • UNBOUNDED PRECEDING : The window starts at the first row of the partition. Only available for start points.
      • UNBOUNDED FOLLOWING : The window ends at the last row of the partition. ONly available for end points.
      • CURRENT ROW : The window starts or ends at the current row. Can be used as start or end point.
      • value_expr PRECEDING : An physical or logical offset before the current row using a constant or expression that evaluates to a positive numerical value. When used with RANGE, it can also be an interval literal if the order_by_clause uses a DATE column.
      • value_expr FOLLOWING : As above, but an offset after the current row.

      For analytic functions that support the windowing_clause, the default action is "RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW". The following query is similar to one used previously to report the employee salary and average department salary, but now we have included an order_by_clause so we also get the default windowing_clause. Notice how the average salary is now calculated using only the employees from the same department up to and including the current row.

      SELECT empno, deptno, sal, 
             AVG(sal) OVER (PARTITION BY deptno ORDER BY sal) AS avg_dept_sal_sofar
      FROM   emp;

         EMPNO     DEPTNO        SAL AVG_DEPT_SAL_SOFAR
      -------- ---------- ---------- ------------------
          7934         10       1300               1300
          7782         10       2450               1875
          7839         10       5000         2916.66667
          7369         20        800                800
          7876         20       1100                950
          7566         20       2975               1625
          7788         20       3000               2175
          7902         20       3000               2175
          7900         30        950                950
          7654         30       1250               1150
          7521         30       1250               1150
          7844         30       1500             1237.5
          7499         30       1600               1310
          7698         30       2850         1566.66667

      The following query shows one method for accessing data from previous and following rows within the current row using the windowing_clause. This can also be accomplished with LAG and LEAD.
      SELECT empno, deptno, sal, 
             FIRST_VALUE(sal) OVER (ORDER BY sal ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) AS previous_sal,
             LAST_VALUE(sal) OVER (ORDER BY sal ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) AS next_sal
      FROM   emp;
          EMPNO     DEPTNO        SAL PREVIOUS_SAL   NEXT_SAL
      --------- ---------- ---------- ------------ ----------
           7369         20        800          800        950
           7900         30        950          800       1100
           7876         20       1100          950       1250
           7521         30       1250         1100       1250
           7654         30       1250         1250       1300
           7934         10       1300         1250       1500
           7844         30       1500         1300       1600
           7499         30       1600         1500       2450
           7782         10       2450         1600       2850
           7698         30       2850         2450       2975
           7566         20       2975         2850       3000
           7788         20       3000         2975       3000
           7902         20       3000         3000       5000
           7839         10       5000         3000       5000

      More information on windows can be found here.


    Analytic Function Examples
    Simple Example
    Select * from ( Select cust_name, sum(clm_amt)clm_amt
                      from customer
                      group by cust_name
                      order by clm_amt desc desc ) v
                  )
        and rownum < 11;

    Simple Example result
    CUST_NAME                      CLM_AMT
    ------------------------------ -----------
    XYZ                            100,000,000
    Lexus Corp                      80,000,000
    First America                   60,000,000
    Yelp                            78,000,000
    ABC                             75,000,000
    Omega Int.                      74,000,000
    S Corp                          70,000,000
    Acme                            25,000,000
    Sun Enterprise                  23,000,000
    Film studio                     17,000,000

    Analytic Version
    select cust_name, SUM(clm_amt) OVER (partition by cust_name) clm_amt;

    In the following example we'll show GROUPING SETS (Listing 1), GROUP BY ROLLUP (Listing 2), and GROUP BY CUBE (Listing 3) to see what we get with each. We'll use the standard SCOTT.EMP table to do this. The first query will show us the sum of salaries by DEPTNO and by JOB. We need to use the GROUPING_ID function to determine what aggregation each row represents. It might not be obvious why we would need this in general from the example, but consider what would happen if DEPTNO or JOB were NULLABLE. There would be no way to distinguish the detail row from the aggregated row.

    The GROUPING_ID function returns a 0 or 1 when given a single column. (In this case, it works just like the GROUPING function.) If the return value is 0, indicating a detail record value for that particular column, then the column in question was not aggregated over (was not collapsed). If the function returns 1, then the column in question was aggregated over—any aggregates in the SELECT list would have been computed over that entire column's set of values. GROUPING_ID differs from GROUPING, in that you can send a list of columns and the GROUPING_ID function will treat the list as bits and return a decimal number. That means that the call to GROUPING_ID(a,b,c)might return any number between 0 and 7, because different 0/1 combinations are returned. Given that fact, we can use a CASE statement in the query to see if the row is a detail row for DEPTNO, for JOB, for neither, or for both.

    Using GROUPING SETS in Listing 1, we asked for GROUP BY only on DEPTNO and then only on JOB. So, that one query was like running the following query

    select deptno, null, sum(sal) 
    from emp group by deptno
    union all
    select null, job, sum(sal)
    from emp group by job;

    Code Listing 1: Using GROUPING_ID with GROUPING SETS

    select deptno, job, sum(sal),
    grouping_id(deptno) gid_d,
    grouping_id(job) gid_j,
    grouping_id(deptno,job) gid_dj,
    bin_to_num(grouping_id(deptno),grouping_id(job)) b2n,
    case when grouping_id(deptno,job) = 0
    then 'Dtl both'
    when grouping_id(deptno,job) = 1
    then 'Agg over job'
    when grouping_id(deptno,job) = 2
    then 'Agg over deptno'
    when grouping_id(deptno,job) = 3
    then 'Agg over both'
    end what
    from emp
    group by grouping sets( (deptno), (job) );

    DEPTNO JOB SUM(SAL) GID_D GID_J GID_DJ B2N WHAT
    ________ ____________ ___________ _______ ________ ________ ____ __________________
    10 8750 0 1 1 1 Agg over job
    20 0875 0 1 1 1 Agg over job
    30 9400 0 1 1 1 Agg over job
    ANALYST 6000 1 0 2 2 Agg over deptno
    CLERK 4150 1 0 2 2 Agg over deptno
    MANAGER 8275 1 0 2 2 Agg over deptno
    PRESIDENT 5000 1 0 2 2 Agg over deptno
    SALESMAN 5600 1 0 2 2 Agg over deptno

    . . . but without having to make two passes on the EMP table, as would be the case with the UNION ALL.

    In looking at the columns involved in the query in Listing 1, we can see that the function GROUPING(column_name) shows us when a column is aggregated over or preserved as a detail record. When GROUPING(deptno) = 0, DEPTNO is preserved in the output. When it is 1, it is aggregated over. However, we have two columns in this set we are aggregating by, for a total of four possible 0/1 combinations. (In this query, only two are possible.) Using the GROUPING_ID function on this vector of columns, we can easily see what each row represents. I've also included the alternative, more verbose way to accomplish this—the BIN_TO_NUM() function, to which we can send a list of 0s and 1s and get back a decimal number as well. I'm pretty sure you'll agree that GROUPING_ID(c1,c2,c3) is easier than the corresponding BIN_TO_NUM call with three GROUPING calls.

    In Listing 2, we take a look at GROUP BY ROLLUP. A rollup by the two columns DEPTNO and JOB will produce

    1. Detail records by DEPTNO, JOB (sum of SAL for each DEPTNO/JOB combination).
    2. A summary record for each DEPTNO over JOB (like a subtotal).
    3. A summary record over DEPTNO and JOB—a single aggregate for the entire result. Listing 2 shows the query and the results.

    Code Listing 2: Using GROUPING_ID with GROUP BY ROLLUP

    select deptno, job, sum(sal),
    grouping_id(deptno) gid_d,
    grouping_id(job) gid_j,
    case when grouping_id(deptno,job) = 0
    then 'Dtl both'
    when grouping_id(deptno,job) = 1
    then 'Agg over job'
    when grouping_id(deptno,job) = 2
    then 'Agg over deptno'
    when grouping_id(deptno,job) = 3
    then 'Agg over both'
    end what
    from emp
    group by rollup( deptno, job );

    DEPTNO JOB SUM(SAL) GID_D GID_J WHAT
    ________ _____________ ___________ _______ _______ ________________
    10 CLERK 1300 0 0 Dtl both
    10 MANAGER 2450 0 0 Dtl both
    10 PRESIDENT 5000 0 0 Dtl both
    10 8750 0 1 Agg over job
    20 CLERK 1900 0 0 Dtl both
    20 ANALYST 6000 0 0 Dtl both
    20 MANAGER 2975 0 0 Dtl both
    20 10875 0 1 Agg over job
    30 CLERK 950 0 0 Dtl both
    30 MANAGER 2850 0 0 Dtl both
    30 SALESMAN 5600 0 0 Dtl both
    30 9400 0 1 Agg over job
    29025 1 1 Agg over both

    A rollup is sort of like a running total report, and GROUPING_ID tells us when the rollups happened. So the data is sorted by DEPTNO, JOB, and we have subtotals by DEPTNO (aggregated over JOB) and by DEPTNO, JOB (aggregated over both) along with the details by DEPTNO/JOB.

    As you can see in Listing 2, the GROUPING_ID function was useful in telling us when we were dealing with a rolled-up record and the level of detail we could expect in that record.

    Last, we'll look at GROUP BY CUBE. CUBE is similar to ROLLUP, in that you get the same three record types as shown in Listing 2 but also get all possible aggregations. CUBE grouping by DEPTNO and JOB will give you records by all of the following:

    1. DEPTNO and JOB
    2. DEPTNO over JOB
    3. JOB over DEPTNO
    4. A single total aggregate

    You get every possible aggregate. Listing 3 shows the syntax and output and how to use the GROUPING_ID function to see what the level of detail is for each row. It is interesting to note that GROUP BY CUBE produces a superset of the rows we observed in the first query (in Listing 1). You could use GROUPING_ID with CUBE to generate the same result set as the original grouping sets query. That is, adding

    having (grouping_id(deptno,job)=2  
    or (grouping_id(deptno,job)=1

    . . . to the GROUP BY CUBE query would cause it to be the logical equivalent of the GROUPING SETS query. But you shouldn't do that! If you need only some of the aggregates, use GROUPING SETS to get just the ones you need computed and avoid computing the others altogether. It would be fair to say that GROUPING_ID doesn't avoid multiple grouping functions but GROUPING SETS does. However, GROUPING_ID plays an important role in seeing what data is what.

    Code Listing 3: Using GROUPING_ID with GROUP BY CUBE

    select deptno, job, sum(sal),
    grouping_id(deptno) gid_d,
    grouping_id(job) gid_j,
    case when grouping_id(deptno,job) = 0
    then 'Dtl both'
    when grouping_id(deptno,job) = 1
    then 'Agg over job'
    when grouping_id(deptno,job) = 2
    then 'Agg over deptno'
    when grouping_id(deptno,job) = 3
    then 'Agg over both'
    end what
    from emp
    group by cube( deptno, job )
    order by grouping_id(deptno,job) ;


    DEPTNO JOB SUM(SAL) GID_D GID_J WHAT
    _________ _____________ ___________ _______ _______ ___________________
    10 CLERK 1300 0 0 Dtl both
    10 MANAGER 2450 0 0 Dtl both
    10 PRESIDENT 5000 0 0 Dtl both
    20 CLERK 1900 0 0 Dtl both
    30 CLERK 950 0 0 Dtl both
    30 SALESMAN 5600 0 0 Dtl both
    30 MANAGER 2850 0 0 Dtl both
    20 MANAGER 2975 0 0 Dtl both
    20 ANALYST 6000 0 0 Dtl both
    10 8750 0 1 Agg over job
    20 10875 0 1 Agg over job
    30 9400 0 1 Agg over job
    CLERK 4150 1 0 Agg over deptno
    ANALYST 6000 1 0 Agg over deptno
    MANAGER 8275 1 0 Agg over deptno
    PRESIDENT 5000 1 0 Agg over deptno
    SALESMAN 5600 1 0 Agg over deptno
    29025 1 1 Agg over both

    Example: Calculate a running Total

    This example shows the cumulative salary within a departement row by row, with each row including a summation of the prior rows salary.

    set autotrace traceonly explain
    break on deptno skip 1
    column ename format A6
    column deptno format 999
    column sal format 99999
    column seq format 999

    SELECT ename "Ename", deptno "Deptno", sal "Sal",
      SUM(sal) OVER (ORDER BY deptno, ename) "Running Total",
      SUM(SAL) OVER (PARTITION BY deptno
                     ORDER BY ename) "Dept Total",
      ROW_NUMBER() 
        OVER (PARTITION BY deptno ORDER BY ENAME) "Seq"
    FROM emp
    ORDER BY deptno, ename
    /

    Ename  Deptno    Sal Running Total Dept Total  Seq
    ------ ------ ------ ------------- ---------- ----
    CLARK      10   2450          2450       2450    1
    KING            5000          7450       7450    2
    MILLER          1300          8750       8750    3

    ADAMS      20   1100          9850       1100    1
    FORD            3000         12850       4100    2
    JONES           2975         15825       7075    3
    SCOTT           3000         18825      10075    4
    SMITH            800         19625      10875    5

    ALLEN      30   1600         21225       1600    1
    BLAKE           2850         24075       4450    2
    JAMES            950         25025       5400    3
    MARTIN          1250         26275       6650    4
    TURNER          1500         27775       8150    5
    WARD            1250         29025       9400    6

    Execution Plan
    ---------------------------------------------------
    0 SELECT STATEMENT Optimizer=CHOOSE
    1 0 WINDOW (SORT)
    2 1 TABLE ACCESS (FULL) OF 'EMP'
    Statistics
    ---------------------------------------------------
    0 recursive calls
    0 db block gets
    3 consistent gets
    0 physical reads
    0 redo size
    1658 bytes sent via SQL*Net to client
    503 bytes received via SQL*Net from client
    2 SQL*Net roundtrips to/from client
    1 sorts (memory)
    0 sorts (disk)
    14 rows processed

    The example shows how to calculate a "Running Total" for the entire query. This is done using the entire ordered result set, via SUM(sal) OVER (ORDER BY deptno, ename).
    Further, we were able to compute a running total within each department, a total that would be reset at the beginning of the next department. The PARTITION BY deptno in that SUM(sal) caused this to happen, a partitioning clause was specified in the query in order to break the data up into groups.
    The ROW_NUMBER() function is used to sequentially number the rows returned in each group, according to our ordering criteria (a "Seq" column was added to in order to display this position).
    The execution plan shows, that the whole query is very well performed with only 3 consistent gets, this can never be accomplished with standard SQL or even PL/SQL.

    Top-N Queries

    How can we get the Top-N records by some set of fields ?
    Prior to having access to these analytic functions, questions of this nature were extremely difficult to answer. There are some problems with Top-N queries however; mostly in the way people phrase them. It is something to be careful about when designing reports. Consider this seemingly sensible request:
    I would like the top three paid sales reps by department

    Using the "traditional approach you can perform:
    select *  from 
    (your_query)
     where rownum <= 10;

    The problem with this question is that it is ambiguous because of repeated values, there might be four people who all make the same salary, what should we do then ?
    Let's look at three examples, all use the well known table EMP.

    Example 1

    Let's look at what ROW_NUMBER can do. Here is an example query using ROW_NUMBER to assign an increasing number to each row in the EMP table after sorting by SAL DESC:

    select ename, sal,
    row_number() over (order by sal desc) rn
    from emp
    order by sal desc;

    ENAME SAL RN
    ----- ---- --
    KING 5000 1
    FORD 3000 2
    SCOTT 3000 3
    JONES 2975 4
    .
    .
    .
    JAMES 950 13
    SMITH 800 14

    I can apply a predicate to ROW_NUMBER after it is assigned. For example

    select * from (select ename, sal,
    row_number() over (order by sal desc) rn
    from emp)
    where rn <= 3
    order by sal desc;

    ENAME SAL RN
    ----- ---- --
    KING 5000 1
    SCOTT 3000 2
    FORD 3000 3

    So, that demonstrates how to perform a top-n query by using ROW_NUMBER and also points out a general issue with top-n queries. If you look at that result, you see two rows with the value 3000. What if, in the EMP table, three people, instead of just two, had a salary of 3000? The result obtained by the above query would be ambiguous—I would get three records, but the records I retrieved would be somewhat random. We will analyze that on the example 2:

    Another Example:
    Sort the sales people by salary from greatest to least. Give the first three rows. If there are less then three people in a department, this will return less than three records.

    set autotrace on explain
    break on deptno skip 1

    SELECT * FROM (SELECT deptno, ename, sal, ROW_NUMBER()
                     OVER (PARTITION BY deptno 
                              ORDER BY sal DESC)
                     Top3 FROM emp)
      WHERE Top3 <= 3
    ;

        DEPTNO ENAME             SAL       TOP3
    ---------- ---------- ---------- ----------
            10 KING             5000          1
               CLARK            2450          2
               MILLER           1300          3

            20 SCOTT            3000          1
               FORD             3000          2
               JONES            2975          3

            30 BLAKE            2850          1
               ALLEN            1600          2
               TURNER           1500          3

    This query works by sorting each partition (or group, which is the deptno), in a descending order, based on the salary column and then assigning a sequential row number to each row in the group as it is processed. The use of a WHERE clause after doing this to get just the first three rows in each partition. 

    Example 2

    Bearing this in mind, I can use other analytic functions to remove the ambiguity from example 1. They will do so, but the analytic functions might return more than n rows. In my opinion, when the attribute I order by is not unique, I want my query to return all of the relevant records—not just the first narbitrary ones. To that end, I can use the RANK and DENSE_RANK analytic functions. Let's take a look at what they do:

    select ename,sal,
    row_number() over (order by sal desc)rn,
    rank() over (order by sal desc)rnk,
    dense_rank() over (order by sal desc)drnk
    from emp
    order by sal desc;

    ENAME SAL RN RNK DRNK
    ----- ---- -- --- ----
    KING 5000 1 1 1
    FORD 3000 2 2 2
    SCOTT 3000 3 2 2
    JONES 2975 4 4 3
    BLAKE 2850 5 5 4
    CLARK 2450 6 6 5
    .
    .
    .

    The main things to note here are the following:

    • ROW_NUMBER assigns contiguous, unique numbers from 1..N to a result set.
    RANK does not assign unique numbers—FORD and SCOTT tied for second place—nor does it assign contiguous numbers. No record was assigned the value of 3, because two people tied for second place, and no one came in third, according to RANK. DENSE_RANK, like RANK, does not assign unique numbers, but it does assign contiguous numbers. Even though two records tied for second place, there is a third-place record.

    You can use RANK and DENSE_RANK in the same way you would use ROW_NUMBER to restrict the number of rows returned, but obviously you'll get subtly different results. For example

    select * from (select ename,sal,
    dense_rank() over (order by sal desc)drnk
    from emp)
    where drnk <= 3
    order by sal desc;

    ENAME SAL DRNK
    ----- ---- ----
    KING 5000 1
    SCOTT 3000 2
    FORD 3000 2
    JONES 2975 3

    That query returns "the set of people who make the top three salaries," which is likely the desired result. Getting the first three records from EMP after sorting by SAL is rather arbitrary, because using exactly the same set of data, simply inserted in different orders, you could observe different result sets with ROW_NUMBER (because SAL is not unique). Using DENSE_RANK, however, I don't get precisely three records but, instead, a repeatable (deterministic) result set. And I suspect that I retrieve the set the end user really meant to retrieve—the set of people making the top three salaries.


    Another Example:
    Give me the set of sales people who make the top 3 salaries - that is, find the set of distinct salary amounts, sort them, take the largest three, and give me everyone who makes one of those values.

    SELECT * FROM (SELECT deptno, ename, sal,
                     DENSE_RANK() OVER (PARTITION BY deptno ORDER BY sal desc) 
                     TopN FROM emp)
       WHERE TopN <= 3
       ORDER BY deptno, sal DESC;


        DEPTNO ENAME             SAL       TOPN
    ---------- ---------- ---------- ----------
            10 KING             5000          1
               CLARK            2450          2
               MILLER           1300          3

            20 SCOTT            3000          1  <--- !
               FORD             3000          1  <--- !

               JONES            2975          2
               ADAMS            1100          3

            30 BLAKE            2850          1
               ALLEN            1600          2
            30 TURNER           1500          3

    Here the DENSE_RANK function was used to get the top three salaries. We assigned the dense rank to the salary column and sorted it in a descending order.
    The DENSE_RANK function computes the rank of a row in an ordered group of rows. The ranks are consecutive integers beginning with 1. The largest rank value is the number of unique values returned by the query. Rank values are not skipped in the event of ties. Rows with equal values for the ranking criteria receive the same rank.
    The DENSE_RANK function does not skip numbers and will assign the same number to those rows with the same value. Hence, after the result set is built in the inline view, we can simply select all of the rows with a dense rank of three or less, this gives us everyone who makes the top three salaries by department number.

    Windows

    The windowing clause gives us a way to define a sliding or anchored window of data, on which the analytic function will operate, within a group. The default window is an anchored window that simply starts at the first row of a group an continues to the current row.
    We can set up windows based on two criteria: RANGES of data values or ROWS offset from the current row. It can be said, that the existance of an ORDER BY in an analytic function will add a default window clause of RANGE UNBOUNDED PRECEDING. That says to get all rows in our partition that came before us as specified by the ORDER BY clause.
    Let's look at an example with a sliding window within a group and compute the sum of the current row's SAL column plus the previous 2 rows in that group. If we need a report that shows the sum of the current employee's salary with the preceding two salaries within a departement, it would look like this.

    break on deptno skip 1
    column ename format A6
    column deptno format 999
    column sal format 99999

    SELECT deptno "Deptno", ename "Ename", sal "Sal",
      SUM(SAL) OVER (PARTITION BY deptno
                     ORDER BY ename
                     ROWS 2 PRECEDING) "Sliding Total"
    FROM emp
    ORDER BY deptno, ename;

    Deptno Ename     Sal Sliding Total
    ------ ------ ------ -------------
        10 CLARK    2450          2450
           KING     5000          7450
           MILLER   1300          8750

        20 ADAMS    1100          1100
           FORD     3000          4100
           JONES    2975          7075  ^
           SCOTT    3000          8975  |
           SMITH     800          6775  -- Sliding Window


        30 ALLEN    1600          1600
           BLAKE    2850          4450
           JAMES     950          5400
           MARTIN   1250          5050 
           TURNER   1500          3700
           WARD     1250          4000

    The partition clause makes the SUM (sal) be computed within each department, independent of the other groups. Tthe SUM (sal) is ' reset ' as the department changes. The ORDER BY ENAME clause sorts the data within each department by ENAME; this allows the window clause: ROWS 2 PRECEDING, to access the 2 rows prior to the current row in a group in order to sum the salaries.
    For example, if you note the SLIDING TOTAL value for SMITH is 6 7 7 5, which is the sum of 800, 3000, and 2975. That was simply SMITH's row plus the salary from the preceding two rows in the window.

    Range Windows

    Range windows collect rows together based on a WHERE clause. If I say ' range 5 preceding ' for example, this will generate a sliding window that has the set of all preceding rows in the group such that they are within 5 units of the current row. These units may either be numeric comparisons or date comparisons and it is not valid to use RANGE with datatypes other than numbers and dates.

    Example
    Count the employees which where hired within the last 100 days preceding the own hiredate. The range window goes back 100 days from the current row's hiredate and then counts the rows within this range. The solution ist to use the following window specification:

    COUNT(*) OVER (ORDER BY hiredate ASC RANGE 100 PRECEDING)

    column ename heading "Name" format a8
    column hiredate heading "Hired" format a10
    column hiredate_pre heading "Hired-100" format a10
    column cnt heading "Cnt" format 99

    SELECT ename, hiredate, hiredate-100 hiredate_pre,
           COUNT(*)  OVER (ORDER BY hiredate ASC
                           RANGE 100 PRECEDING
    ) cnt
      FROM emp
     ORDER BY hiredate ASC;

    Name     Hired      Hired-100  Cnt
    -------- ---------- ---------- ---
    SMITH    17-DEC-80  08-SEP-80    1
    ALLEN    20-FEB-81  12-NOV-80    2
    WARD     22-FEB-81  14-NOV-80    3
    JONES    02-APR-81  23-DEC-80    3
    BLAKE    01-MAY-81  21-JAN-81    4
    CLARK    09-JUN-81  01-MAR-81    3
    TURNER   08-SEP-81  31-MAY-81    2
    MARTIN   28-SEP-81  20-JUN-81    2
    KING     17-NOV-81  09-AUG-81    3
    JAMES    03-DEC-81  25-AUG-81    5
    FORD     03-DEC-81  25-AUG-81    5
    MILLER   23-JAN-82  15-OCT-81    4
    SCOTT    09-DEC-82  31-AUG-82    1
    ADAMS    12-JAN-83  04-OCT-82    2

    We ordered the single partition by hiredate ASC. If we look for example at the row for CLARK we can see that his hiredate was 09-JUN-81, and 100 days prior to that is the date 01-MAR-81. If we look who was hired between 01-MAR-81 and 09-JUN-81, we find JONES (hired: 02-APR-81) and BLAKE (hired: 01-MAY-81). This are 3 rows including the current row, this is what we see in the column "Cnt" of CLARK's row.

    Compute average salary for defined range

    As an example, compute the average salary of people hired within 100 days before for each employee. The query looks like this:

    column ename heading "Name" format a8
    column hiredate heading "Hired" format a10
    column hiredate_pre heading "Hired-100" format a10
    column avg_sal heading "Avg-100" format 999999

    SELECT ename, hiredate, sal,
           AVG(sal) OVER (ORDER BY hiredate ASC
                          RANGE 100 PRECEDING) avg_sal

      FROM emp
    ORDER BY hiredate ASC;

    Name     Hired             SAL Avg-100
    -------- ---------- ---------- -------
    SMITH    17-DEC-80         800     800
    ALLEN    20-FEB-81        1600    1200
    WARD     22-FEB-81        1250    1217
    JONES    02-APR-81        2975    1942
    BLAKE    01-MAY-81        2850    2169
    CLARK    09-JUN-81        2450    2758

    TURNER   08-SEP-81        1500    1975
    MARTIN   28-SEP-81        1250    1375
    KING     17-NOV-81        5000    2583
    JAMES    03-DEC-81         950    2340
    FORD     03-DEC-81        3000    2340
    MILLER   23-JAN-82        1300    2563
    SCOTT    09-DEC-82        3000    3000
    ADAMS    12-JAN-83        1100    2050

    Look at CLARK again, since we understand his range window within the group. We can see that the average salary of 2758 is equal to (2975+2850+2450)/3. This is the average of the salaries for CLARK and the rows preceding CLARK, those of JONES and BLAKE. The data must be sorted in ascending order.

    Row Windows

    Row Windows are physical units; physical number of rows, to include in the window. For example you can calculate the average salary of a given record with the (up to 5) employees hired before them or after them as follows:

    set numformat 9999
    SELECT ename, hiredate, sal,
    AVG(sal)
      OVER (ORDER BY hiredate ASC ROWS 5 PRECEDING) AvgAsc,
    COUNT(*)
      OVER (ORDER BY hiredate ASC ROWS 5 PRECEDING) CntAsc,
    AVG(sal)
      OVER (ORDER BY hiredate DESC ROWS 5 PRECEDING) AvgDes,
    COUNT(*)
      OVER (ORDER BY hiredate DESC ROWS 5 PRECEDING) CntDes
    FROM emp
    ORDER BY hiredate;

    ENAME      HIREDATE    SAL AVGASC CNTASC AVGDES CNTDES
    ---------- --------- ----- ------ ------ ------ ------
    SMITH      17-DEC-80   800    800      1   1988      6
    ALLEN      20-FEB-81  1600   1200      2   2104      6
    WARD       22-FEB-81  1250   1217      3   2046      6
    JONES      02-APR-81  2975   1656      4   2671      6
    BLAKE      01-MAY-81  2850   1895      5   2675      6
    CLARK      09-JUN-81  2450   1988      6   2358      6
    TURNER     08-SEP-81  1500   2104      6   2167      6
    MARTIN     28-SEP-81  1250   2046      6   2417      6
    KING       17-NOV-81  5000   2671      6   2392      6
    JAMES      03-DEC-81   950   2333      6   1588      4
    FORD       03-DEC-81  3000   2358      6   1870      5
    MILLER     23-JAN-82  1300   2167      6   1800      3
    SCOTT      09-DEC-82  3000   2417      6   2050      2
    ADAMS      12-JAN-83  1100   2392      6   1100      1

    The window consist of up to 6 rows, the current row and five rows " in front of " this row, where " in front of " is defined by the ORDER BY clause. With ROW partitions, we do not have the limitation of RANGE partition - the data may be of any type and the order by may include many columns. Notice, that we selected out a COUNT(*) as well. This is useful just to demonstrate how many rows went into making up a given average. We can see clearly that for ALLEN's record, the average salary computation for people hired before him used only 2 records whereas the computation for salaries of people hired after him used 6.

    Accessing Rows Around Your Current Row

    Frequently you want to access data not only from the current row but the current row " in front of " or " behind " them. For example, let's say you need a report that shows, by department all of the employees; their hire date; how many days before was the last hire; how many days after was the next hire.
    Using straight SQL this query would be difficult to write. Not only that but its performance would once again definitely be questionable. The approach I typically took in the past was either to " select a select " or write a PL/SQL function that would take some data from the current row and " find " the previous and next rows data. This worked, but introduce large overhead into both the development of the query and the run-time execution of the query.
    Using analytic functions, this is easy and efficient to do.

    set echo on
    column deptno format 99 heading Dep
    column ename format a6 heading Ename
    column hiredate heading Hired
    column last_hire heading LastHired
    column days_last heading DaysLast
    column next_hire heading NextHire
    column days_next heading NextDays
    break on deptno skip 1

    SELECT deptno, ename, hiredate,
    LAG(hiredate,1,NULL)
      OVER (PARTITION BY deptno
            ORDER BY hiredate, ename) last_hire,
    hiredate - LAG(hiredate,1,NULL)
      OVER (PARTITION BY deptno
            ORDER BY hiredate, ename) days_last,
    LEAD(hiredate,1,NULL)
      OVER (PARTITION BY deptno
            ORDER BY hiredate, ename) next_hire,
    LEAD(hiredate,1,NULL)
      OVER (PARTITION BY deptno
            ORDER BY hiredate, ename) - hiredate days_next
    FROM emp
    ORDER BY deptno, hiredate;

    Dep Ename  Hired     LastHired DaysLast NextHire  NextDays
    --- ------ --------- --------- -------- --------- --------
     10 CLARK  09-JUN-81                    17-NOV-81      161
        KING   17-NOV-81 09-JUN-81      161 23-JAN-82       67
        MILLER 23-JAN-82 17-NOV-81       67

     20 SMITH  17-DEC-80                    02-APR-81      106
        JONES  02-APR-81 17-DEC-80      106 03-DEC-81      245
        FORD   03-DEC-81 02-APR-81      245 09-DEC-82      371
        SCOTT  09-DEC-82 03-DEC-81      371 12-JAN-83       34
        ADAMS  12-JAN-83 09-DEC-82       34

     30 ALLEN  20-FEB-81                    22-FEB-81        2
        WARD   22-FEB-81 20-FEB-81        2 01-MAY-81       68
        BLAKE  01-MAY-81 22-FEB-81       68 08-SEP-81      130
        TURNER 08-SEP-81 01-MAY-81      130 28-SEP-81       20
        MARTIN 28-SEP-81 08-SEP-81       20 03-DEC-81       66
        JAMES  03-DEC-81 28-SEP-81       66

    The LEAD and LAG routines could be considered a way to " index into your partitioned group ". Using these functions you can access any individual row. Notice for example in the above printout, it shows that the record  for KING  includes the data (in bold red font) from the prior row (LAST HIRE) and the next row (NEXT-HIRE). We can access the fields in records preceding or following the current record in an ordered partition easily.

    LAG

    LAG ( value_expr [, offset] [, default] )
       OVER ( [query_partition_clause] order_by_clause )

    LAG provides access to more than one row of a table at the same time without a self join. Given a series of rows returned from a query and a position of the cursor, LAG provides access to a row at a given physical offset prior to that position. If you do not specify offset, then its default is 1. The optional default value is returned if the offset goes beyond the scope of the window. If you do not specify default, then its default value is null. The following example provides, for each person in the EMP table, the salary of the employee hired just before:

    SELECT ename,hiredate,sal,
    LAG(sal, 1, 0) OVER (ORDER BY hiredate) AS PrevSal
    FROM emp
    WHERE job = 'CLERK';
    Ename  Hired       SAL PREVSAL
    ------ --------- ----- -------
    SMITH 17-DEC-80 800 0
    JAMES 03-DEC-81 950 800
    MILLER 23-JAN-82 1300 950
    ADAMS 12-JAN-83 1100 1300

    LEAD

    LEAD ( value_expr [, offset] [, default] ) 
       OVER ( [query_partition_clause] order_by_clause )

    LEAD provides access to more than one row of a table at the same time without a self join. Given a series of rows returned from a query and a position of the cursor, LEAD provides access to a row at a given physical offset beyond that position. If you do not specify offset, then its default is 1. The optional default value is returned if the offset goes beyond the scope of the table. If you do not specify default, then its default value is null. The following example provides, for each employee in the EMP table, the hire date of the employee hired just after:

    SELECT ename, hiredate, 
        LEAD(hiredate, 1) OVER (ORDER BY hiredate) AS NextHired 
    FROM emp WHERE deptno = 30;

    Ename  Hired     NEXTHIRED
    ------ --------- ---------
    ALLEN  20-FEB-81 22-FEB-81
    WARD   22-FEB-81 01-MAY-81
    BLAKE  01-MAY-81 08-SEP-81
    TURNER 08-SEP-81 28-SEP-81
    MARTIN 28-SEP-81 03-DEC-81
    JAMES  03-DEC-81

    Determine the First Value / Last Value of a Group

    The FIRST_VALUE and LAST_VALUE functions allow you to select the first and last rows from a group. These rows are especially valuable because they are often used as the baselines in calculations.
    Example
    The following example selects, for each employee in each department, the name of the employee with the lowest salary.

    break on deptno skip 1
    SELECT deptno, ename, sal, 
      FIRST_VALUE(ename)
      OVER (PARTITION BY deptno
            ORDER BY sal ASC) AS MIN_SAL_HAS
    FROM emp
    ORDER BY deptno, ename;

        DEPTNO ENAME             SAL MIN_SAL_HAS
    ---------- ---------- ---------- -----------
            10 CLARK            2450 MILLER
               KING             5000 MILLER
               MILLER           1300 MILLER

            20 ADAMS            1100 SMITH
               FORD             3000 SMITH
               JONES            2975 SMITH
               SCOTT            3000 SMITH
               SMITH             800 SMITH

            30 ALLEN            1600 JAMES
               BLAKE            2850 JAMES
               JAMES             950 JAMES
               MARTIN           1250 JAMES
               TURNER           1500 JAMES
               WARD             1250 JAMES

    The following example selects, for each employee in each department, the name of the employee with the highest salary.

    SELECT deptno, ename, sal, 
    FIRST_VALUE(ename)
    OVER (PARTITION BY deptno
    ORDER BY sal DESC) AS MAX_SAL_HAS
    FROM emp
    ORDER BY deptno, ename;

        DEPTNO ENAME             SAL MAX_SAL_HAS
    ---------- ---------- ---------- -----_-----
            10 CLARK            2450 KING
               KING             5000 KING
               MILLER           1300 KING

            20 ADAMS            1100 FORD
               FORD             3000 FORD
               JONES            2975 FORD
               SCOTT            3000 FORD
               SMITH             800 FORD

            30 ALLEN            1600 BLAKE
               BLAKE            2850 BLAKE
               JAMES             950 BLAKE
               MARTIN           1250 BLAKE
               TURNER           1500 BLAKE
               WARD             1250 BLAKE

    The following example selects, for each employee in department 30 the name of the employee with the lowest salary using an inline view

    SELECT deptno, ename, sal,
      FIRST_VALUE(ename)
      OVER (ORDER BY sal ASC) AS MIN_SAL_HAS
    FROM (SELECT * FROM emp WHERE deptno = 30)

        DEPTNO ENAME             SAL MIN_SAL_HAS
    ---------- ---------- ---------- -----------
            30 JAMES             950 JAMES
               MARTIN           1250 JAMES
               WARD             1250 JAMES
               TURNER           1500 JAMES
               ALLEN            1600 JAMES
               BLAKE            2850 JAMES

    Crosstab or Pivot Queries

    crosstab query, sometimes known as a pivot query, groups your data in a slightly different way from those we have seen hitherto. A crosstab query can be used to get a result with three rows (one for each project), with each row having three columns (the first listing the projects and then one column for each year) -- like this:

    Project        2001        2002
         ID         CHF         CHF
    -------------------------------
        100      123.00      234.50
        200      543.00      230.00
        300      238.00      120.50

    Example:
    Let's say you want to show the top 3 salary earners in each department as columns. The query needs to return exactly 1 row per department and the row would have 4 columns. The DEPTNO, the name of the highest paid employee in the department, the name of the next highest paid, and so on. Using analytic functions this almost easy, without analytic functions this was virtually impossible.

    SELECT deptno,
      MAX(DECODE(seq,1,ename,null)) first,
      MAX(DECODE(seq,2,ename,null)) second,
      MAX(DECODE(seq,3,ename,null)) third
    FROM (SELECT deptno, ename,
           row_number()
           OVER (PARTITION BY deptno
                 ORDER BY sal desc NULLS LAST) seq
           FROM emp)
    WHERE seq <= 3
    GROUP BY deptno;

        DEPTNO FIRST      SECOND     THIRD
    ---------- ---------- ---------- ----------
            10 KING       CLARK      MILLER
            20 SCOTT      FORD       JONES
            30 BLAKE      ALLEN      TURNER

    Note the inner query, that assigned a sequence (RowNr) to each employee by department number in order of salary.

    SELECT deptno, ename, sal,
     row_number()
     OVER (PARTITION BY deptno
           ORDER BY sal desc NULLS LAST) RowNr
    FROM emp;

        DEPTNO ENAME             SAL      ROWNR
    ---------- ---------- ---------- ----------
            10 KING             5000          1
            10 CLARK            2450          2
            10 MILLER           1300          3
            20 SCOTT            3000          1
            20 FORD             3000          2
            20 JONES            2975          3
            20 ADAMS            1100          4
            20 SMITH             800          5
            30 BLAKE            2850          1
            30 ALLEN            1600          2
            30 TURNER           1500          3
            30 WARD             1250          4
            30 MARTIN           1250          5
            30 JAMES             950          6

    The DECODE in the outer query keeps only rows with sequences 1, 2 or 3 and assigns them to the correct "column". The GROUP BY gets rid of the redundant rows and we are left with our collapsed result. It may be easier to understand if you see the resultset without the aggregate function MAX grouped by deptno.

    SELECT deptno,
      DECODE(seq,1,ename,null) first,
      DECODE(seq,2,ename,null) second,
      DECODE(seq,3,ename,null) third
    FROM (SELECT deptno, ename,
           row_number()
           OVER (PARTITION BY deptno
                 ORDER BY sal desc NULLS LAST) seq
           FROM emp)
    WHERE seq <= 3;

        DEPTNO FIRST      SECOND     THIRD
    ---------- ---------- ---------- ----------
            10 KING
            10            CLARK
            10                       MILLER
            20 SCOTT
            20            FORD
            20                       JONES
            30 BLAKE
            30            ALLEN
            30                       TURNER

    The MAX aggregate function will be applied by the GROUP BY column DEPTNO. In any given DEPTNO above only one row will have a non-null value for FIRST, the remaining rows in that group will always be NULL. The MAX function will pick out the non-null row and keep that for us. Hence, the group by and MAX will collapse our resultset, removing the NULL values from it and giving us what we want.

    Another example:

    Return data from EMP table in Horizontal mode (Pivot Table)
    With the addition of analytic functions in Oracle8i and the SYS_CONNECT_BY_PATH() function in Oracle9i, this became something rather easily in SQL. 
    Take the following approach:
    1. Partition the data by DEPTNO and, for each DEPTNO, sort the data by ENAME, and assign a sequential number by using the ROW_NUMBER() analytic function.
    2. Use a CONNECT BY query, starting with ROW_NUMBER() equal to 1 and connecting that record to the same DEPTNO value with ROW_NUMBER() equal to 2, and so on.  So, eventually end up with a record that is the result of connecting 1 to 2 to 3 to 4, and so on, for each DEPTNO value.
    3. Select just the "longest connect by path" for each DEPTNO value - the longest connect by path for each DEPTNO value will have all of the ENAME values gathered together.

    The SYS_CONNECT_BY_PATH() function will return the list of concatenated ENAME values.
    The query looks like this:
    select deptno,
        max(sys_connect_by_path
           (ename, ' ' )) scbp
      from (select deptno, ename, 
                row_number() over 
               (partition by deptno 
                order by ename) rn
             from emp
              )
    start with rn = 1
    connect by prior rn = rn-1 
    and prior deptno = deptno
      group by deptno
      order by deptno
     /

       DEPTNO         SCBP
    ---------         ----------------------------------
           10         CLARK KING MILLER
           20         ADAMS FORD JONES SCOTT ...
           30         ALLEN BLAKE JAMES MARTIN ... 


    ROLLUP and RANK Examples
    Looking for a quick, efficient way to summarize the data stored in your database?  The SQL ROLLUP and CUBE commands offer a valuable tool for gaining some quick and dirty insight into your data.  ROLLUP and CUBE are SQL extensions. 
    Ver http://orafaq.com/node/56

    The ROLLUP operation works on a set of columns you want to group. Just like the GROUP BY operation but aggregates a summary row for each group of columns supplied in its clause. Rollup gives the sum on the aggregate; it is used as an add-on to the GROUP BY clause. From the most detailed to a grand total and has the following basic syntax format of:
            GROUP BY ROLLUP ([columns of interest separated by commas])

    DDL to use with the examples:
    create table Employee(
      ID                 VARCHAR2(4 BYTE)   NOT NULL,
      First_Name         VARCHAR2(10 BYTE),
      Last_Name          VARCHAR2(10 BYTE),
      Start_Date         DATE,
      End_Date           DATE,
      Salary             Number(8,2),
      City               VARCHAR2(10 BYTE),
      Description        VARCHAR2(15 BYTE)
    )
    /

    -- prepare data
    insert into Employee(ID,  First_Name, Last_Name, Start_Date,End_Date,Salary, City,Description) values ('01','Jason', 'Martin', to_date('19960725','YYYYMMDD'), to_date('20060725','YYYYMMDD'), 1234.56, 'Toronto',  'Programme
    insert into Employee(ID,  First_Name, Last_Name, Start_Date,End_Date,Salary, City,Description) values('02','Alison', 'Mathews',to_date('19760321','YYYYMMDD'), to_date('19860221','YYYYMMDD'), 6661.78, 'Vancouver','Tester');
    insert into Employee(ID,  First_Name, Last_Name, Start_Date,End_Date,Salary, City,Description) values('03','James',  'Smith',  to_date('19781212','YYYYMMDD'), to_date('19900315','YYYYMMDD'), 6544.78, 'Vancouver','Tester');
    insert into Employee(ID,  First_Name, Last_Name, Start_Date,End_Date,Salary, City,Description) values('04','Celia',  'Rice',   to_date('19821024','YYYYMMDD'), to_date('19990421','YYYYMMDD'), 2344.78, 'Vancouver','Manager')
    insert into Employee(ID,  First_Name, Last_Name, Start_Date,End_Date,Salary, City,Description) values('05','Robert', 'Black',  to_date('19840115','YYYYMMDD'), to_date('19980808','YYYYMMDD'), 2334.78, 'Vancouver','Tester');
    insert into Employee(ID,  First_Name, Last_Name, Start_Date,End_Date,Salary, City,Description) values('06','Linda',  'Green',  to_date('19870730','YYYYMMDD'), to_date('19960104','YYYYMMDD'), 4322.78,'New York',  'Tester');
    insert into Employee(ID,  First_Name, Last_Name, Start_Date,End_Date,Salary, City,Description) values('07','David',  'Larry',  to_date('19901231','YYYYMMDD'), to_date('19980212','YYYYMMDD'), 7897.78,'New York',  'Manager')
    insert into Employee(ID,  First_Name, Last_Name, Start_Date,End_Date,Salary, City,Description) values('08','James',  'Cat',    to_date('19960917','YYYYMMDD'), to_date('20020415','YYYYMMDD'), 1232.78,'Vancouver', 'Tester');


    -- display data in the table
    select * from Employee;

    ID   FIRST_NAME LAST_NAME  START_DAT END_DATE      SALARY CITY       DESCRIPTION
    ---- ---------- ---------- --------- --------- ---------- ---------- ---------------
    01   Jason      Martin     25-JUL-96 25-JUL-06    1234.56 Toronto    Programmer
    02   Alison     Mathews    21-MAR-76 21-FEB-86    6661.78 Vancouver  Tester
    03   James      Smith      12-DEC-78 15-MAR-90    6544.78 Vancouver  Tester
    04   Celia      Rice       24-OCT-82 21-APR-99    2344.78 Vancouver  Manager
    05   Robert     Black      15-JAN-84 08-AUG-98    2334.78 Vancouver  Tester
    06   Linda      Green      30-JUL-87 04-JAN-96    4322.78 New York   Tester
    07   David      Larry      31-DEC-90 12-FEB-98    7897.78 New York   Manager
    08   James      Cat        17-SEP-96 15-APR-02    1232.78 Vancouver  Tester


    -- Rollup: give the sum on the aggregate; it is used as an add-on to the GROUP BY clause.
    SELECT count(*), city
    FROM employee
    GROUP BY ROLLUP(city);

      COUNT(*) CITY
    ---------- ----------
             2 New York
             1 Toronto
             5 Vancouver
             8


    With ROLLUP and ROW_NUMBER added
    SELECT ROW_NUMBER() OVER(ORDER BY city, description) rn,
      count(*), city, description
    FROM employee
    GROUP BY ROLLUP(city, description);

       RN   COUNT(*) CITY       DESCRIPTION
    ----- ---------- ---------- ---------------
        1          1 New York   Manager
        2          1 New York   Tester
        3          2 New York
        4          1 Toronto    Programmer
        5          1 Toronto
        6          1 Vancouver  Manager
        7          4 Vancouver  Tester
        8          5 Vancouver
        9          8
        
    The ROLLUP clause extends GROUP BY to return a row containing a subtotal for each group along with a total for all groups
    --Passing a Single Column to ROLLUP
    --The ROLLUP clause extends GROUP BY to return a row containing a subtotal for each group along with a total for all groups.
    SELECT city, SUM(salary)
    FROM employee
    GROUP BY city;

    CITY       SUM(SALARY)
    ---------- -----------
    New York      12220.56
    Toronto        1234.56
    Vancouver      19118.9

    --The following query rewrites the previous example to use ROLLUP.

    SELECT city, SUM(salary)
    FROM employee
    GROUP BY ROLLUP(city);

    CITY       SUM(SALARY)
    ---------- -----------
    New York      12220.56
    Toronto        1234.56
    Vancouver      19118.9
                  32574.02

    Changing the Position of Columns Passed to ROLLUP
    SELECT city, description, SUM(salary)
    FROM employee
    GROUP BY ROLLUP(city, description);

    CITY       DESCRIPTION     SUM(SALARY)
    ---------- --------------- -----------
    Toronto    Programmer          1234.56
    Toronto                        1234.56
    New York   Tester              4322.78
    New York   Manager             7897.78
    New York                      12220.56
    Vancouver  Tester             16774.12
    Vancouver  Manager             2344.78
    Vancouver                      19118.9
                                  32574.02

    SELECT city, description, SUM(salary)
    FROM employee
    GROUP BY ROLLUP(description, city);

    CITY       DESCRIPTION     SUM(SALARY)
    ---------- --------------- -----------
    New York   Tester              4322.78
    Vancouver  Tester             16774.12
               Tester              21096.9
    New York   Manager             7897.78
    Vancouver  Manager             2344.78
               Manager            10242.56
    Toronto    Programmer          1234.56
               Programmer          1234.56
                                  32574.02

    Passing Multiple Columns to ROLLUP: groups the rows into blocks with the same column values
    SELECT city, description, SUM(salary)
    FROM employee
    GROUP BY ROLLUP(city, description);

    CITY       DESCRIPTION     SUM(SALARY)
    ---------- --------------- -----------
    Toronto    Programmer          1234.56
    Toronto                        1234.56
    New York   Tester              4322.78
    New York   Manager             7897.78
    New York                      12220.56
    Vancouver  Tester             16774.12
    Vancouver  Manager             2344.78
    Vancouver                      19118.9
                                  32574.02


    Using AVG with ROLLUP
    SELECT city, description, AVG(salary)
    FROM employee
    GROUP BY ROLLUP(city, description);

    CITY       DESCRIPTION     AVG(SALARY)
    ---------- --------------- -----------
    Toronto    Programmer          1234.56
    Toronto                        1234.56
    New York   Tester              4322.78
    New York   Manager             7897.78
    New York                       6110.28
    Vancouver  Tester              4193.53
    Vancouver  Manager             2344.78
    Vancouver                      3823.78
                                 4071.7525

    Rollup function in group by clause
    SELECT city, SUM(salary)
    FROM employee
    GROUP BY ROLLUP(city);

    CITY       SUM(SALARY)
    ---------- -----------
    New York      12220.56
    Toronto        1234.56
    Vancouver      19118.9
                  32574.02

    ROLLUP and RANK() to get the sales rankings by product type ID
    CREATE TABLE all_sales (
      year INTEGER,
      month INTEGER,
      prd_type_id INTEGER,
      emp_id INTEGER ,
      amount NUMBER(8, 2)
    );

    insert into all_sales (YEAR,MONTH,PRD_TYPE_ID,EMP_ID,AMOUNT) values(2006,1    ,1          ,21    ,16034.84);
    insert into all_sales (YEAR,MONTH,PRD_TYPE_ID,EMP_ID,AMOUNT) values(2006,2    ,1          ,21    ,15644.65);
    insert into all_sales (YEAR,MONTH,PRD_TYPE_ID,EMP_ID,AMOUNT) values(2006,3    ,2          ,21    ,20167.83);
    insert into all_sales (YEAR,MONTH,PRD_TYPE_ID,EMP_ID,AMOUNT) values(2006,4    ,2          ,21    ,25056.45);
    insert into all_sales (YEAR,MONTH,PRD_TYPE_ID,EMP_ID,AMOUNT) values(2006,5    ,2          ,21    ,NULL);
    insert into all_sales (YEAR,MONTH,PRD_TYPE_ID,EMP_ID,AMOUNT) values(2006,6    ,1          ,21    ,15564.66);
    insert into all_sales (YEAR,MONTH,PRD_TYPE_ID,EMP_ID,AMOUNT) values(2006,7    ,1          ,21    ,15644.65);
    insert into all_sales (YEAR,MONTH,PRD_TYPE_ID,EMP_ID,AMOUNT) values(2006,8    ,1          ,21    ,16434.82);
    insert into all_sales (YEAR,MONTH,PRD_TYPE_ID,EMP_ID,AMOUNT) values(2006,9    ,1          ,21    ,19654.57);
    insert into all_sales (YEAR,MONTH,PRD_TYPE_ID,EMP_ID,AMOUNT) values(2006,10   ,1          ,21    ,21764.19);
    insert into all_sales (YEAR,MONTH,PRD_TYPE_ID,EMP_ID,AMOUNT) values(2006,11   ,1          ,21    ,13026.73);
    insert into all_sales (YEAR,MONTH,PRD_TYPE_ID,EMP_ID,AMOUNT) values(2006,12   ,2          ,21    ,10034.64);
    insert into all_sales (YEAR,MONTH,PRD_TYPE_ID,EMP_ID,AMOUNT) values(2005,1    ,2          ,22    ,16634.84);
    insert into all_sales (YEAR,MONTH,PRD_TYPE_ID,EMP_ID,AMOUNT) values(2005,1    ,2          ,21    ,26034.84);
    insert into all_sales (YEAR,MONTH,PRD_TYPE_ID,EMP_ID,AMOUNT) values(2005,2    ,1          ,21    ,12644.65);
    insert into all_sales (YEAR,MONTH,PRD_TYPE_ID,EMP_ID,AMOUNT) values(2005,3    ,1          ,21    ,NULL);
    insert into all_sales (YEAR,MONTH,PRD_TYPE_ID,EMP_ID,AMOUNT) values(2005,4    ,1          ,21    ,25026.45);
    insert into all_sales (YEAR,MONTH,PRD_TYPE_ID,EMP_ID,AMOUNT) values(2005,5    ,1          ,21    ,17212.66);
    insert into all_sales (YEAR,MONTH,PRD_TYPE_ID,EMP_ID,AMOUNT) values(2005,6    ,1          ,21    ,15564.26);
    insert into all_sales (YEAR,MONTH,PRD_TYPE_ID,EMP_ID,AMOUNT) values(2005,7    ,2          ,21    ,62654.82);
    insert into all_sales (YEAR,MONTH,PRD_TYPE_ID,EMP_ID,AMOUNT) values(2005,8    ,2          ,21    ,26434.82);
    insert into all_sales (YEAR,MONTH,PRD_TYPE_ID,EMP_ID,AMOUNT) values(2005,9    ,2          ,21    ,15644.65);
    insert into all_sales (YEAR,MONTH,PRD_TYPE_ID,EMP_ID,AMOUNT) values(2005,10   ,2          ,21    ,21264.19);
    insert into all_sales (YEAR,MONTH,PRD_TYPE_ID,EMP_ID,AMOUNT) values(2005,11   ,1          ,21    ,13026.73);
    insert into all_sales (YEAR,MONTH,PRD_TYPE_ID,EMP_ID,AMOUNT) values(2005,12   ,1          ,21    ,10032.64);


     select * from all_sales;

      YEAR      MONTH PRD_TYPE_ID     EMP_ID     AMOUNT
    ------ ---------- ----------- ---------- ----------
      2006          1           1         21   16034.84
      2006          2           1         21   15644.65
      2006          3           2         21   20167.83
      2006          4           2         21   25056.45
      2006          5           2         21
      2006          6           1         21   15564.66
      2006          7           1         21   15644.65
      2006          8           1         21   16434.82
      2006          9           1         21   19654.57
      2006         10           1         21   21764.19
      2006         11           1         21   13026.73
      2006         12           2         21   10034.64
      2005          1           2         22   16634.84
      2005          1           2         21   26034.84
      2005          2           1         21   12644.65
      2005          3           1         21
      2005          4           1         21   25026.45
      2005          5           1         21   17212.66
      2005          6           1         21   15564.26
      2005          7           2         21   62654.82
      2005          8           2         21   26434.82
      2005          9           2         21   15644.65
      2005         10           2         21   21264.19
      2005         11           1         21   13026.73
      2005         12           1         21   10032.64

    --ROLLUP and RANK() to get the sales rankings by product type ID

    SELECT
     prd_type_id, SUM(amount),
     RANK() OVER (ORDER BY SUM(amount) DESC) AS rank
    FROM all_sales
    GROUP BY ROLLUP(prd_type_id)
    ORDER BY prd_type_id;

    PRD_TYPE_ID SUM(AMOUNT)       RANK
    ----------- ----------- ----------
              1    227276.5          2
              2   223927.08          3
                  451203.58          1


    CUBE
    In addition to the subtotals generated by the ROLLUP extension, the CUBE extension will generate subtotals for all combinations of the dimensions specified. 
    If "n" is the number of columns listed in the CUBE, there will be 2n subtotal combinations.

    Setting Test Table
    DROP TABLE dimension_tab;
    CREATE TABLE dimension_tab (
      fact_1_id   NUMBER NOT NULL,
      fact_2_id   NUMBER NOT NULL,
      fact_3_id   NUMBER NOT NULL,
      fact_4_id   NUMBER NOT NULL,
      sales_value NUMBER(10,2) NOT NULL
    );

    INSERT INTO dimension_tab
    SELECT TRUNC(DBMS_RANDOM.value(low => 1, high => 3)) AS fact_1_id,
           TRUNC(DBMS_RANDOM.value(low => 1, high => 6)) AS fact_2_id,
           TRUNC(DBMS_RANDOM.value(low => 1, high => 11)) AS fact_3_id,
           TRUNC(DBMS_RANDOM.value(low => 1, high => 11)) AS fact_4_id,
           ROUND(DBMS_RANDOM.value(low => 1, high => 100), 2) AS sales_value
    FROM   dual
    CONNECT BY level <= 1000;
    COMMIT;

    SELECT fact_1_id, fact_2_id,
           SUM(sales_value) AS sales_value
    FROM   dimension_tab
    GROUP BY CUBE (fact_1_id, fact_2_id)
    ORDER BY fact_1_id, fact_2_id;

    FACT_1_ID  FACT_2_ID SALES_VALUE
    --------- ---------- -----------
            1          1     5806.42
            1          2     4724.82
            1          3     4358.52
            1          4     5049.58
            1          5     4929.04
            1               24868.38
            2          1     5181.96
            2          2     5008.37
            2          3     4856.44
            2          4     4342.02
            2          5     4619.73
            2               24008.52
                       1    10988.38
                       2     9733.19
                       3     9214.96
                       4      9391.6
                       5     9548.77
                             48876.9
    As the number of dimensions increase, so do the combinations of subtotals that need to be calculated
    SELECT fact_1_id, fact_2_id, fact_3_id,
           SUM(sales_value) AS sales_value
    FROM   dimension_tab
    GROUP BY CUBE (fact_1_id, fact_2_id, fact_3_id)
    ORDER BY fact_1_id, fact_2_id, fact_3_id;

    It is possible to do a partial cube to reduce the number of subtotals calculated. 
    SELECT fact_1_id, fact_2_id, fact_3_id,
           SUM(sales_value) AS sales_value
    FROM   dimension_tab
    GROUP BY fact_1_id, CUBE (fact_2_id, fact_3_id)
    ORDER BY fact_1_id, fact_2_id, fact_3_id;


    GROUPING Functions
    It can be quite easy to visually identify subtotals generated by rollups and cubes, but to do it programatically you really need something more accurate than the presence of null values in the grouping columns. This is where the GROUPING function comes in. It accepts a single column as a parameter and returns "1" if the column contains a null value generated as part of a subtotal by a ROLLUP or CUBE operation or "0" for any other value, including stored null values.

    The following query is a repeat of a previous cube, but the GROUPING function has been added for each of the dimensions in the cube.
    SELECT fact_1_id, fact_2_id,
           SUM(sales_value) AS sales_value,
           GROUPING(fact_1_id) AS f1g, 
           GROUPING(fact_2_id) AS f2g
    FROM   dimension_tab
    GROUP BY CUBE (fact_1_id, fact_2_id)
    ORDER BY fact_1_id, fact_2_id;

    FACT_1_ID  FACT_2_ID SALES_VALUE        F1G        F2G
    --------- ---------- ----------- ---------- ----------
            1          1     5806.42          0          0
            1          2     4724.82          0          0
            1          3     4358.52          0          0
            1          4     5049.58          0          0
            1          5     4929.04          0          0
            1               24868.38          0          1
            2          1     5181.96          0          0
            2          2     5008.37          0          0
            2          3     4856.44          0          0
            2          4     4342.02          0          0
            2          5     4619.73          0          0
            2               24008.52          0          1
                       1    10988.38          1          0
                       2     9733.19          1          0
                       3     9214.96          1          0
                       4      9391.6          1          0
                       5     9548.77          1          0
                             48876.9          1          1

    From this we can see:
    F1G=0,F2G=0 : Represents a row containing regular subtotal we would expect from a GROUP BY operation.
    F1G=0,F2G=1 : Represents a row containing a subtotal for a distinct value of the FACT_1_ID column, as generated by ROLLUP and CUBE operations.
    F1G=1,F2G=0 : Represents a row containing a subtotal for a distinct value of the FACT_2_ID column, which we would only see in a CUBE operation.
    F1G=1,F2G=1 : Represents a row containing a grand total for the query, as generated by ROLLUP and CUBE operations.

    It would now be easy to write a program to accurately process the data.

    The GROUPING columns can used for ordering or filtering results.

    SELECT fact_1_id, fact_2_id,
           SUM(sales_value) AS sales_value,
           GROUPING(fact_1_id) AS f1g, 
           GROUPING(fact_2_id) AS f2g
    FROM   dimension_tab
    GROUP BY CUBE (fact_1_id, fact_2_id)
    HAVING GROUPING(fact_1_id) = 1 OR GROUPING(fact_2_id) = 1
    ORDER BY GROUPING(fact_1_id), GROUPING(fact_2_id);

    FACT_1_ID  FACT_2_ID SALES_VALUE        F1G        F2G
    --------- ---------- ----------- ---------- ----------
            1               24868.38          0          1
            2               24008.52          0          1
                       5     9548.77          1          0
                       3     9214.96          1          0
                       2     9733.19          1          0
                       1    10988.38          1          0
                       4      9391.6          1          0
                             48876.9          1          1

    GROUPING_ID

    The GROUPING_ID function provides an alternate and more compact way to identify subtotal rows. Passing the dimension columns as arguments, it returns a number indicating the GROUP BY level.

    SELECT fact_1_id, fact_2_id,
           SUM(sales_value) AS sales_value,
           GROUPING_ID(fact_1_id, fact_2_id) AS grouping_id
    FROM   dimension_tab
    GROUP BY CUBE (fact_1_id, fact_2_id)
    ORDER BY fact_1_id, fact_2_id;

    FACT_1_ID  FACT_2_ID SALES_VALUE GROUPING_ID
    --------- ---------- ----------- -----------
            1          1     5806.42           0
            1          2     4724.82           0
            1          3     4358.52           0
            1          4     5049.58           0
            1          5     4929.04           0
            1               24868.38           1
            2          1     5181.96           0
            2          2     5008.37           0
            2          3     4856.44           0
            2          4     4342.02           0
            2          5     4619.73           0
            2               24008.52           1
                       1    10988.38           2
                       2     9733.19           2
                       3     9214.96           2
                       4      9391.6           2
                       5     9548.77           2
                             48876.9           3


    GROUP_ID

    It's possible to write queries that return the duplicate subtotals, which can be a little confusing. The GROUP_ID function assigns the value "0" to the first set, and all subsequent sets get assigned a higher number. The following query forces duplicates to show the GROUP_ID function in action.

    SELECT fact_1_id, fact_2_id,
           SUM(sales_value) AS sales_value,
           GROUPING_ID(fact_1_id, fact_2_id) AS grouping_id,
           GROUP_ID() AS group_id
    FROM   dimension_tab
    GROUP BY GROUPING SETS(fact_1_id, CUBE (fact_1_id, fact_2_id))
    ORDER BY fact_1_id, fact_2_id;

     FACT_1_ID  FACT_2_ID SALES_VALUE GROUPING_ID   GROUP_ID
    ---------- ---------- ----------- ----------- ----------
             1          1     5806.42           0          0
             1          2     4724.82           0          0
             1          3     4358.52           0          0
             1          4     5049.58           0          0
             1          5     4929.04           0          0
             1               24868.38           1          0
             1               24868.38           1          1
             2          1     5181.96           0          0
             2          2     5008.37           0          0
             2          3     4856.44           0          0
             2          4     4342.02           0          0
             2          5     4619.73           0          0
             2               24008.52           1          1
             2               24008.52           1          0
                        1    10988.38           2          0
                        2     9733.19           2          0
                        3     9214.96           2          0
                        4      9391.6           2          0
                        5     9548.77           2          0
                              48876.9           3          0

    If necessary, you could then filter the results using the group.

    SELECT fact_1_id, fact_2_id,
           SUM(sales_value) AS sales_value,
           GROUPING_ID(fact_1_id, fact_2_id) AS grouping_id,
           GROUP_ID() AS group_id
    FROM   dimension_tab
    GROUP BY GROUPING SETS(fact_1_id, CUBE (fact_1_id, fact_2_id))
    HAVING GROUP_ID() = 0
    ORDER BY fact_1_id, fact_2_id;

    FACT_1_ID  FACT_2_ID SALES_VALUE GROUPING_ID   GROUP_ID
    --------- ---------- ----------- ----------- ----------
            1          1     5806.42           0          0
            1          2     4724.82           0          0
            1          3     4358.52           0          0
            1          4     5049.58           0          0
            1          5     4929.04           0          0
            1               24868.38           1          0
            2          1     5181.96           0          0
            2          2     5008.37           0          0
            2          3     4856.44           0          0
            2          4     4342.02           0          0
            2          5     4619.73           0          0
            2               24008.52           1          0
                       1    10988.38           2          0
                       2     9733.19           2          0
                       3     9214.96           2          0
                       4      9391.6           2          0
                       5     9548.77           2          0
                             48876.9           3          0



    GROUPING SETS
    Calculating all possible subtotals in a cube, especially those with many dimensions, can be quite an intensive process. If you don't need all the subtotals, this can represent a considerable amount of wasted effort. The following cube with three dimensions gives 8 levels of subtotals (GROUPING_ID: 0-7), shown here.

    SELECT fact_1_id, fact_2_id, fact_3_id,
           SUM(sales_value) AS sales_value,
           GROUPING_ID(fact_1_id, fact_2_id, fact_3_id) AS grouping_id
    FROM   dimension_tab
    GROUP BY CUBE(fact_1_id, fact_2_id, fact_3_id)
    ORDER BY fact_1_id, fact_2_id, fact_3_id;

    If we only need a few of these levels of subtotaling we can use the GROUPING SETS expression and specify exactly which ones we need, saving us having to calculate the whole cube. In the following query we are only interested in subtotals for the "FACT_1_ID, FACT_2_ID" and "FACT_1_ID, FACT_3_ID" groups.

    SELECT fact_1_id, fact_2_id, fact_3_id,
           SUM(sales_value) AS sales_value,
           GROUPING_ID(fact_1_id, fact_2_id, fact_3_id) AS grouping_id
    FROM   dimension_tab
    GROUP BY GROUPING SETS((fact_1_id, fact_2_id), (fact_1_id, fact_3_id))
    ORDER BY fact_1_id, fact_2_id, fact_3_id;

     FACT_1_ID  FACT_2_ID  FACT_3_ID SALES_VALUE GROUPING_ID
    ---------- ---------- ---------- ----------- -----------
             1          1                5806.42           1
             1          2                4724.82           1
             1          3                4358.52           1
             1          4                5049.58           1
             1          5                4929.04           1
             1                     1     2328.63           2
             1                     2     2562.87           2
             1                     3     2576.24           2
             1                     4     2489.73           2
             1                     5     2645.77           2
             1                     6     2795.96           2
             1                     7     2763.93           2
             1                     8     2448.43           2
             1                     9     2237.71           2
             1                    10     2019.11           2
             2          1                5181.96           1
             2          2                5008.37           1
             2          3                4856.44           1
             2          4                4342.02           1
             2          5                4619.73           1
             2                     1     2091.33           2
             2                     2     2299.23           2
             2                     3     2381.08           2
             2                     4     2884.19           2
             2                     5      2704.9           2
             2                     6     2364.08           2
             2                     7     2261.54           2
             2                     8      2582.8           2
             2                     9     2399.91           2
             2                    10     2039.46           2

    Notice how we have gone from returning 198 rows with 8 subtotal levels in the cube, to just 30 rows with 2 subtotal levels


    More Examples with EMP Table
    @connect scott/tiger


    Analytics running total 

    set linesize 100
    set pagesize 80
    select * from emp order by DEPTNO;
         EMPNO ENAME      JOB              MGR HIREDATE                  SAL       COMM     DEPTNO
    ---------- ---------- --------- ---------- ------------------ ---------- ---------- ----------
          7782 CLARK      MANAGER         7839 09/JUN/81 00:00:00       2450                    10
          7839 KING       PRESIDENT            17/NOV/81 00:00:00       5000                    10
          7934 MILLER     CLERK           7782 23/JAN/82 00:00:00       1300                    10
          7566 JONES      MANAGER         7839 02/APR/81 00:00:00       2975                    20
          7902 FORD       ANALYST         7566 03/DEC/81 00:00:00       3000                    20
          7876 ADAMS      CLERK           7788 23/MAY/87 00:00:00       1100                    20
          7369 SMITH      CLERK           7902 17/DEC/80 00:00:00        800                    20
          7788 SCOTT      ANALYST         7566 19/APR/87 00:00:00       3000                    20
          7521 WARD       SALESMAN        7698 22/FEB/81 00:00:00       1250        500         30
          7844 TURNER     SALESMAN        7698 08/SEP/81 00:00:00       1500          0         30
          7499 ALLEN      SALESMAN        7698 20/FEB/81 00:00:00       1600        300         30
          7900 JAMES      CLERK           7698 03/DEC/81 00:00:00        950                    30
          7698 BLAKE      MANAGER         7839 01/MAY/81 00:00:00       2850                    30
          7654 MARTIN     SALESMAN        7698 28/SEP/81 00:00:00       1250       1400         30
          
    Getting a SUM of SAL by DEPTNO
    set echo on
    break on deptno skip 1
    Select deptno, ename, sal, 
           sum(sal) over (partition by deptno order by sal) running_total1,
           sum(sal) over (partition by deptno order by sal, rowid) running_total2
      from emp order by deptno, sal;

        DEPTNO ENAME             SAL RUNNING_TOTAL1 RUNNING_TOTAL2
    ---------- ---------- ---------- -------------- --------------
            10 MILLER           1300           1300           1300
               CLARK            2450           3750           3750
               KING             5000           8750           8750

            20 SMITH             800            800            800
               ADAMS            1100           1900           1900
               JONES            2975           4875           4875
               SCOTT            3000          10875           7875
               FORD             3000          10875          10875

            30 JAMES             950            950            950
               WARD             1250           3450           2200
               MARTIN           1250           3450           3450
               TURNER           1500           4950           4950
               ALLEN            1600           6550           6550
               BLAKE            2850           9400           9400


    Analytics Percentages within a group 

    break on deptno skip 1
    select deptno, ename, sal, 
           to_char( round( ratio_to_report(sal) over (partition by deptno) *100, 2 ), '990.00' )||'%' rtr
      from emp;
        DEPTNO ENAME             SAL RTR
    ---------- ---------- ---------- --------
            10 CLARK            2450   28.00%
               KING             5000   57.14%
               MILLER           1300   14.86%

            20 JONES            2975   27.36%
               FORD             3000   27.59%
               ADAMS            1100   10.11%
               SMITH             800    7.36%
               SCOTT            3000   27.59%

            30 WARD             1250   13.30%
               TURNER           1500   15.96%
               ALLEN            1600   17.02%
               JAMES             950   10.11%
               BLAKE            2850   30.32%
               MARTIN           1250   13.30%

    Analytics Top-N queries 
    set echo on
    break on deptno skip 1
    select deptno, ename, sal, rn
      from (Select deptno, ename, sal, row_number() over (partition by deptno order by sal desc) rn
              from emp)
      where rn <= 3;
        DEPTNO ENAME             SAL         RN
    ---------- ---------- ---------- ----------
            10 KING             5000          1
               CLARK            2450          2
               MILLER           1300          3

            20 SCOTT            3000          1
               FORD             3000          2
               JONES            2975          3

            30 BLAKE            2850          1
               ALLEN            1600          2
               TURNER           1500          3
               
               
    select deptno, ename, sal, rank
      from ( Select deptno, ename, sal,  rank() over (partition by deptno order by sal desc) rank
               from emp )
      where rank <= 3;
        DEPTNO ENAME             SAL       RANK
    ---------- ---------- ---------- ----------
            10 KING             5000          1
               CLARK            2450          2
               MILLER           1300          3

            20 SCOTT            3000          1
               FORD             3000          1
               JONES            2975          3

            30 BLAKE            2850          1
               ALLEN            1600          2
               TURNER           1500          3


    select deptno, ename, sal, dr
      from ( Select deptno, ename, sal, dense_rank() over (partition by deptno order by sal desc) dr
              from emp )
      where dr <= 3;
        DEPTNO ENAME             SAL         DR
    ---------- ---------- ---------- ----------
            10 KING             5000          1
               CLARK            2450          2
               MILLER           1300          3

            20 SCOTT            3000          1
               FORD             3000          1
               JONES            2975          2
               ADAMS            1100          3

            30 BLAKE            2850          1
               ALLEN            1600          2
               TURNER           1500          3
        

    Analytics Moving Averages 

    column last_5 format a30
    set echo on
    Select ename, sal, 
           round( avg(sal) over (order by sal rows 5 preceding) ) avg_sal,
           rtrim( lag(sal) over (order by sal) || ',' ||
                  lag(sal,2) over (order by sal) || ',' ||
                  lag(sal,3) over (order by sal) || ',' ||
                  lag(sal,4) over (order by sal) || ',' ||
                  lag(sal,5) over (order by sal),',') last_5
      from emp 
      order by sal;
    ENAME             SAL    AVG_SAL LAST_5
    ---------- ---------- ---------- ------------------------------
    SMITH             800        800
    JAMES             950        875 800
    ADAMS            1100        950 950,800
    WARD             1250       1025 1100,950,800
    MARTIN           1250       1070 1250,1100,950,800
    MILLER           1300       1108 1250,1250,1100,950,800
    TURNER           1500       1225 1300,1250,1250,1100,950
    ALLEN            1600       1333 1500,1300,1250,1250,1100
    CLARK            2450       1558 1600,1500,1300,1250,1250
    BLAKE            2850       1825 2450,1600,1500,1300,1250
    JONES            2975       2113 2850,2450,1600,1500,1300
    SCOTT            3000       2396 2975,2850,2450,1600,1500
    FORD             3000       2646 3000,2975,2850,2450,1600
    KING             5000       3213 3000,3000,2975,2850,2450


    select round( (3000+3000+2975+2850+2450+5000)/6 ) from dual;
    ROUND((3000+3000+2975+2850+2450+5000)/6)
    ----------------------------------------
                                        3213


    Analytics Ranking Queries 

    set echo on
    break on deptno skip 1
    Select deptno, ename, sal, 
           rank() over ( partition by deptno order by sal desc ) r,
           dense_rank() over ( partition by deptno order by sal desc ) dr,
           row_number() over ( partition by deptno order by sal desc ) rn
      from emp 
      order by deptno, sal desc;
        DEPTNO ENAME             SAL          R         DR         RN
    ---------- ---------- ---------- ---------- ---------- ----------
            10 KING             5000          1          1          1
               CLARK            2450          2          2          2
               MILLER           1300          3          3          3

            20 SCOTT            3000          1          1          1
               FORD             3000          1          1          2
               JONES            2975          3          2          3
               ADAMS            1100          4          3          4
               SMITH             800          5          4          5

            30 BLAKE            2850          1          1          1
               ALLEN            1600          2          2          2
               TURNER           1500          3          3          3
               MARTIN           1250          4          4          4
               WARD             1250          4          4          5
               JAMES             950          6          5          6

    源自:http://pafumi.net/Analytic_Functions.html

  • 相关阅读:
    Docker常用命令
    Spark资源管理
    在Centos 7上安装Docker
    Docker网络管理
    Tensorflow 入门
    Spark远程调试函数
    Spark核心组件
    Spark job执行流程消息图
    数据库优化实践【性能检测工具篇】
    软件工程之系统建模篇【设计数据模型】
  • 原文地址:https://www.cnblogs.com/lhdz_bj/p/9777354.html
Copyright © 2011-2022 走看看