Oracle 10g 增加了一个非常有用的 group 函数 ----- COLLECT, 并且在11g中得到了进一步加强。
这个函数可以用来实现“String Aggregation” 作用, 即把同一个group中的多行数据转成一行(以collection形式出现)。记得曾在园子中回答过一个SQL问题,见这里, 当时用到的是sys_connect_by_path,但是这种方法性能很不好,在下文中会提到。如果用COLLECT来做,性能会得到很大提升!
首先来看下Oracle 10g中的COLLECT函数,
1. The Collect Function in 10g. (参见这里)
(1) 首先来看看一个例子。
[默认schema scott是被锁住的,需要首先解锁!]
SQL> alter user scott account unlock
SQL> conn scott/scott
Connected.
SQL> select deptno,
2 COLLECT(ename) as emps
3 from emp
4 group by
5 deptno;
DEPTNO EMPS
---------- -------------------------------------------------------------------------------------
10 SYSTP1Nm4rC2OTHGbil8G7f3Adg==('CLARK', 'KING', 'MILLER')
20 SYSTP1Nm4rC2OTHGbil8G7f3Adg==('SMITH', 'FORD', 'ADAMS', 'SCOTT', 'JONES')
30 SYSTP1Nm4rC2OTHGbil8G7f3Adg==('ALLEN', 'BLAKE', 'MARTIN', 'TURNER', 'JAMES', 'WARD')
SQL>
注意到返回的结果透漏出Oracle创建了一个类型 “SYSTP1Nm4rC2OTHGbil8G7f3Adg==”.
在10g R1, 和 10g R2中这个类型存在的schema是不同的,如下所示,
-- Oracle 10.1
SQL> SELECT owner 2 , typecode 3 FROM all_types 4 WHERE type_name = 'SYSTP1Nm4rC2OTHGbil8G7f3Adg=='; OWNER TYPECODE ------------------------------ ------------------------------ SYS COLLECTION 1 row selected.
-- Oracle 10.2
SQL> select owner,
2 typecode
3 from all_types
4 where type_name = 'SYSTP1Nm4rC2OTHGbil8G7f3Adg==';
OWNER TYPECODE
-------------------- ------------------------------
SCOTT COLLECTION
正是因为10.1中这个类型是在SYS下面的,因此我们是没法直接用这个类型的,但是在10.2中,我们是可以直接用系统产生的这个类型的,如下,
SQL> select *
2 from
3 TABLE("SYSTP1Nm4rC2OTHGbil8G7f3Adg=="('A', 'B', 'C'));
COLUMN_VAL
----------
A
B
C
SQL>
(2) 可否用我们自己定义的类型呢?
答案是肯定的!
SQL> create or replace type varchar2_ntt as table of varchar2(4000);
2 /
Type created.
SQL> select deptno,
2 CAST(COLLECT(ename) AS varchar2_ntt) as emps
3 from emp
4 group by deptno;
DEPTNO EMPS
---------- -----------------------------------------------------------------------------
10 VARCHAR2_NTT('CLARK', 'KING', 'MILLER')
20 VARCHAR2_NTT('SMITH', 'FORD', 'ADAMS', 'SCOTT', 'JONES')
30 VARCHAR2_NTT('ALLEN', 'BLAKE', 'MARTIN', 'TURNER', 'JAMES', 'WARD')
得到的结果里显示的是我们自己定义的类型—VARCHAR2_NTT (nested table type)
但是要注意的是,如果需要处理的数据类型是NUMBER的,要求自定义的类型和待处理的数据类型的精度必须一样。 如下所示:
SQL> create or replace type number_ntt as table of number;
2 /
Type created.
SQL> select deptno,
2 CAST(COLLECT(sal) as number_ntt) as sals
3 from emp
4 group by deptno;
CAST(COLLECT(sal) as number_ntt) as sals
*
CAST(COLLECT(sal) as number_ntt) as sals
*
ERROR at line 2:
ORA-22814: attribute or element value is larger than specified in type
SQL> desc emp;
Name
Null? Type
--------------------------------------------------------------------------------
EMPNO
NOT NULL NUMBER(4)
ENAME
VARCHAR2(10)
JOB
VARCHAR2(9)
MGR
NUMBER(4)
HIREDATE
DATE
SAL
NUMBER(7,2)
COMM
NUMBER(7,2)
DEPTNO
NUMBER(2)
可以看到SAL的类型是NUMBER(7, 2) 而不是NUMBER, 所以出现问题。解决这个问题有两种方法,要么重新定义类型,要么把SAL转成NUMBER类型。
SQL> select deptno,
2 CAST(COLLECT(CAST(sal as NUMBER)) as number_ntt) as sals
3 from emp
4 group by
5 deptno;
DEPTNO SALS
---------- ----------------------------------------------------------------------
10 NUMBER_NTT(2450, 5000, 1300)
20 NUMBER_NTT(800, 3000, 1100, 3000, 2975)
30 NUMBER_NTT(1600, 2850, 1250, 1500, 950, 1250)
SQL>
SQL> create type number_7_2_ntt as table of number(7, 2);
2 /
Type created.
SQL> select deptno,
2 CAST(COLLECT(sal) as number_7_2_ntt) as sals
3 from emp
4 group by deptno;
DEPTNO SALS
---------- -------------------------------------------------------------
10 NUMBER_7_2_NTT(2450, 5000, 1300)
20 NUMBER_7_2_NTT(800, 3000, 1100, 3000, 2975)
30 NUMBER_7_2_NTT(1600, 2850, 1250, 1500, 950, 1250)
SQL>
(3) String Aggregation using COLLECT
开头提到,可以用COLLECT来实现string aggregation, 那么如何做呢? 既然可以通过COLLECT函数将同一个GROUP的数据放在了一个自定义的集合中,剩下的只是我们对该集合做操作,把其中的数据取出来拼成字符串即可。
首先,需要定义一个FUNCTION – TO_STRING() 来处理自定义集合类型中的数据,
SQL>
1 create or replace function to_string(nt_in in varchar2_ntt,
2 delimiter_in in varchar2 default ',')
3 return varchar2 is
4 v_idx pls_integer;
5 v_str varchar2(32767);
6 v_dlm varchar2(10);
7 begin
8 v_idx := nt_in.FIRST;
9 while v_idx is not null
10 loop
11 v_str := v_str || v_dlm || nt_in(v_idx);
12 v_dlm := delimiter_in;
13 v_idx := nt_in.NEXT(v_idx);
14 end loop;
15 return v_str;
16* end to_string;
SQL> /
Function created.
接下来就可以在SQL语句中直接用了,
SQL> select deptno,
2 to_string(CAST(collect(ename) as varchar2_ntt)) as emps
3 from emp
4 group by deptno;
DEPTNO EMPS
---------- ---------------------------------------------------------------
10 CLARK,KING,MILLER
20 SMITH,FORD,ADAMS,SCOTT,JONES
30 ALLEN,BLAKE,MARTIN,TURNER,JAMES,WARD
(4) 几种String Aggregation方法性能比较
除了刚才提到的用COLLECT来做String Aggregation,还有个比较有名的方法就是Tom的STRAGG函数,如下, 这个其实就是个自定义的聚集函数,实现了Oracle Data Cartridge Interface中的几个 routines, 具体做法可以参见另一篇blog,
Tom Kyte’s STRAGG
scott@ORA9I.WORLD> create or replace type string_agg_type as object
2 (
3 total varchar2(4000),
4
5 static function
6 ODCIAggregateInitialize(sctx IN OUT string_agg_type )
7 return number,
8
9 member function
10 ODCIAggregateIterate(self IN OUT string_agg_type ,
11 value IN varchar2 )
12 return number,
13
14 member function
15 ODCIAggregateTerminate(self IN string_agg_type,
16 returnValue OUT varchar2,
17 flags IN number)
18 return number,
19
20 member function
21 ODCIAggregateMerge(self IN OUT string_agg_type,
22 ctx2 IN string_agg_type)
23 return number
24 );
25 /
Type created.
scott@ORA9I.WORLD>
scott@ORA9I.WORLD> create or replace type body string_agg_type
2 is
3
4 static function ODCIAggregateInitialize(sctx IN OUT string_agg_type)
5 return number
6 is
7 begin
8 sctx := string_agg_type( null );
9 return ODCIConst.Success;
10 end;
11
12 member function ODCIAggregateIterate(self IN OUT string_agg_type,
13 value IN varchar2 )
14 return number
15 is
16 begin
17 self.total := self.total || ',' || value;
18 return ODCIConst.Success;
19 end;
20
21 member function ODCIAggregateTerminate(self IN string_agg_type,
22 returnValue OUT varchar2,
23 flags IN number)
24 return number
25 is
26 begin
27 returnValue := ltrim(self.total,',');
28 return ODCIConst.Success;
29 end;
30
31 member function ODCIAggregateMerge(self IN OUT string_agg_type,
32 ctx2 IN string_agg_type)
33 return number
34 is
35 begin
36 self.total := self.total || ctx2.total;
37 return ODCIConst.Success;
38 end;
39
40
41 end;
42 /
Type body created.
scott@ORA9I.WORLD>
scott@ORA9I.WORLD> CREATE or replace
2 FUNCTION stragg(input varchar2 )
3 RETURN varchar2
4 PARALLEL_ENABLE AGGREGATE USING string_agg_type;
5 /
Function created.
另外还有一种方法就是用oracle 10g增加的sys_connect_by_path来做,但是性能不好。
首先来准备测试数据,创建一个数据量是3倍的DBA_OBJECTS的表T。
SQL> drop table t;
Table dropped.
SQL> create table t
2 as select MOD(rownum, 100) as id,
3 CAST('A' as VARCHAR2(1)) as val
4 from dba_objects,
5 TABLE(varchar2_ntt('A','B', 'C'));
Table created.
SQL> select count(*) from t;
COUNT(*)
----------
188148
SQL> exec dbms_stats.gather_table_stats(user, 'T');
PL/SQL procedure successfully completed.
SQL>
设置autotrace来获取性能信息,
SQL> set autotrace traceonly statistics
SQL> set timing on
-- 1. STRAGG
SQL> select id,
2 STRAGG(val) as vals
3 from t
4 group by id;
100 rows selected.
Elapsed: 00:00:05.10
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
393 consistent gets
0 physical reads
0 redo size
13379 bytes sent via SQL*Net to client
462 bytes received via SQL*Net from client
8 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
100 rows processed
SQL>
时间大约是5秒钟
-- 2. COLLECT
SQL> select id,
2 to_string(cast(collect(val) as varchar2_ntt)) as vals
3 from t
4 group by id;
100 rows selected.
Elapsed: 00:00:01.31
Statistics
----------------------------------------------------------
436 recursive calls
110 db block gets
521 consistent gets
1 physical reads
20392 redo size
13379 bytes sent via SQL*Net to client
462 bytes received via SQL*Net from client
8 SQL*Net roundtrips to/from client
17 sorts (memory)
0 sorts (disk)
100 rows processed
时间大约是1秒半,比STRAGG要好很多(虽然consistent gets, recursive calls多很多)。
之所以COLLECT比STRAGG性能好很多,主要在于前者的context switch的次数很好,只是在调用to_string的时候才发生(100次), 而STRAGG对每条记录都要发生一次context switch (188148次)。
-- 3. SYS_CONNECT_BY_PATH
SQL> select id,
2 max(vals) as vals
3 from
4 (
5 select id,
6 ltrim(sys_connect_by_path(val, ','), ',') as vals
7 from
8 ( select id,
9 val,
10 row_number() over (partition by id order by val) rn
11 from t
12 )
13 start with rn = 1
14 connect by prior id = id and prior rn + 1 = rn
15 )
16 group by id
17 ;
100 rows selected.
Elapsed: 00:24:08.54
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
547953 consistent gets
0 physical reads
0 redo size
62507 bytes sent via SQL*Net to client
462 bytes received via SQL*Net from client
8 SQL*Net roundtrips to/from client
3767 sorts (memory)
0 sorts (disk)
100 rows processed
SQL>
时间超过24分钟!!! 而且consistent gets 非常之多。
所以,还是用COLLECT来做String Aggregation性能最好!
2. COLLECT enhancements in 11g R2. (参见这里)
(1) Order collection elemetns
虽然SYS_CONNECT_BY_PATH性能很差,但是有一点还是好的嘛~ 那就是可以进行排序,最后拼出来的字符串可以是经过排序的。那么COLLECT支持排序吗?
答案是肯定的! 注意这里面讲的是Oracle 11.2对COLLECT的增强,但是其实关于这个排序10g也是支持的,但是其他的比如Distinct等只有在Oracle 11.2之后才有。
SQL> select * from v$version where rownum = 1;
BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - Prod
Elapsed: 00:00:00.00
SQL> conn scott/scott
Connected.
SQL> select deptno,
2 CAST(COLLECT(ename ORDER BY ename)
3 AS varchar2_ntt) as ordered_emps
4 from emp
5 group by deptno;
DEPTNO ORDERED_EMPS
---------- -----------------------------------------------------------------------------
10 VARCHAR2_NTT('CLARK', 'KING', 'MILLER')
20 VARCHAR2_NTT('ADAMS', 'FORD', 'JONES', 'SCOTT', 'SMITH')
30 VARCHAR2_NTT('ALLEN', 'BLAKE', 'JAMES', 'MARTIN', 'TURNER', 'WARD')
Elapsed: 00:00:00.01
SQL>
当然也可以根据其他标准来排序,
SQL> select deptno,
2 CAST(COLLECT(ename ORDER BY hiredate)
3 AS varchar2_ntt) as ordered_emps
4 from emp
5 group by deptno;
DEPTNO ORDERED_EMPS
---------- -----------------------------------------------------------------------
10 VARCHAR2_NTT('CLARK', 'KING', 'MILLER')
20 VARCHAR2_NTT('SMITH', 'JONES', 'FORD', 'SCOTT', 'ADAMS')
30 VARCHAR2_NTT('ALLEN', 'WARD', 'BLAKE', 'TURNER', 'MARTIN', 'JAMES')
Elapsed: 00:00:00.02
(2) Distinct Collection elements
注意DISTICNT/UNIQUE只在Oracle11.2之后才起作用,
SQL> select * from v$version where rownum = 1;
BANNER
-----------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - Production
SQL> select deptno,
2 CAST(COLLECT(distinct job)
3 as varchar2_ntt) as distinct_jobs
4 from emp
5 group by
6 deptno;
DEPTNO DISTINCT_JOBS
---------- --------------------------------------------------------------------------------
10 VARCHAR2_NTT('MANAGER', 'PRESIDENT', 'CLERK')
20 VARCHAR2_NTT('CLERK', 'ANALYST', 'CLERK', 'ANALYST', 'MANAGER')
30 VARCHAR2_NTT('SALESMAN', 'MANAGER', 'SALESMAN', 'SALESMAN', 'CLERK', 'SALESMAN')
Elapsed: 00:00:00.03
SQL>
SQL> select deptno,
2 CAST(COLLECT(UNIQUE job) as varchar2_ntt) as distinct_jobs
3 from emp
4 group by deptno;
DEPTNO DISTINCT_JOBS
---------- --------------------------------------------------------------------------------
10 VARCHAR2_NTT('MANAGER', 'PRESIDENT', 'CLERK')
20 VARCHAR2_NTT('CLERK', 'ANALYST', 'CLERK', 'ANALYST', 'MANAGER')
30 VARCHAR2_NTT('SALESMAN', 'MANAGER', 'SALESMAN', 'SALESMAN', 'CLERK', 'SALESMAN')
Elapsed: 00:00:00.03
可以看出虽然用DISTINCT/UNIQUE,oracle并没有报错,但是结果显示并没有显示出distinct的结果, 因为当前的Oracle版本不是11.2。
那么对于11.2之前的版本,有没有什么workaround呢? 答案是肯定的! 可以用MULTISET函数和操作符。注意,这只对nested tables type (不是VARRAY) 起作用。
SQL> SELECT deptno,
2 SET(
3 CAST(
4 COLLECT(job)
5 AS varchar2_ntt)) as distinct_jobs
6 from emp
7 GROUP BY deptno;
DEPTNO DISTINCT_JOBS
---------- -------------------------------------------------------------------
10 VARCHAR2_NTT('MANAGER', 'PRESIDENT', 'CLERK')
20 VARCHAR2_NTT('CLERK', 'ANALYST', 'MANAGER')
30 VARCHAR2_NTT('SALESMAN', 'MANAGER', 'CLERK')
SQL>
这里面用了SET函数,它用来祛除集合中的重复数据。注意SET必须放在CAST外面,因为SET不能作用于系统创建的类型。