zoukankan      html  css  js  c++  java
  • 在Oracle中如何利用Rowid查找和删除表中的重复记录

    平时工作中可能会遇到当试图对库表中的某一列或几列创建唯一索引时,系统提示 ORA-01452 :不能创建唯一索引,发现重复记录。

    下面总结一下几种查找和删除重复记录的方法(以表CZ为例):
    表CZ的结构如下:
    SQL> desc cz
    Name Null? Type
    ----------------------------------------- -------- ------------------

    C1 NUMBER(10)
    C10 NUMBER(5)
    C20 VARCHAR2(3)

    删除重复记录的方法原理:
    (1).在Oracle中,每一条记录都有一个rowid,rowid在整个数据库中是唯一的,rowid确定了每条记录是在Oracle中的哪一个数据文件、块、行上。

    (2).在重复的记录中,可能所有列的内容都相同,但rowid不会相同,所以只要确定出重复记录中那些具有最大rowid的就可以了,其余全部删除。

    重复记录判断的标准是:
    C1,C10和C20这三列的值都相同才算是重复记录。

    经查看表CZ总共有16条记录:
    SQL>set pagesize 100
    SQL>select * from cz;

    C1 C10 C20
    ---------- ---------- ---
    1 2 dsf
    1 2 dsf
    1 2 dsf
    1 2 dsf
    2 3 che
    1 2 dsf
    1 2 dsf
    1 2 dsf
    1 2 dsf
    2 3 che
    2 3 che
    2 3 che
    2 3 che
    3 4 dff
    3 4 dff
    3 4 dff
    4 5 err
    5 3 dar
    6 1 wee
    7 2 zxc

    20 rows selected.

    1.查找重复记录的几种方法:
    (1).SQL>select * from cz group by c1,c10,c20 having count(*) >1;
    C1 C10 C20
    ---------- ---------- ---
    1 2 dsf
    2 3 che
    3 4 dff

    (2).SQL>select distinct * from cz;

    C1 C10 C20
    ---------- ---------- ---
    1 2 dsf
    2 3 che
    3 4 dff

    (3).SQL>select * from cz a where rowid=(select max(rowid) from cz where c1=a.c1 and c10=a.c10 and c20=a.c20);
    C1 C10 C20
    ---------- ---------- ---
    1 2 dsf
    2 3 che
    3 4 dff

    2.删除重复记录的几种方法:
    (1).适用于有大量重复记录的情况(在C1,C10和C20列上建有索引的时候,用以下语句效率会很高):
    SQL>delete cz where (c1,c10,c20) in (select c1,c10,c20 from cz group by c1,c10,c20 having count(*)>1) and rowid not in
    (select min(rowid) from cz group by c1,c10,c20 having count(*)>1);

    SQL>delete cz where rowid not in(select min(rowid) from cz group by c1,c10,c20);

    (2).适用于有少量重复记录的情况(注意,对于有大量重复记录的情况,用以下语句效率会很低):
    SQL>delete from cz a where a.rowid!=(select max(rowid) from cz b where a.c1=b.c1 and a.c10=b.c10 and a.c20=b.c20);

    SQL>delete from cz a where a.rowid<(select max(rowid) from cz b where a.c1=b.c1 and a.c10=b.c10 and a.c20=b.c20);

    SQL>delete from cz a where rowid <(select max(rowid) from cz where c1=a.c1 and c10=a.c10 and c20=a.c20);

    (3).适用于有少量重复记录的情况(临时表法):
    SQL>create table test as select distinct * from cz; (建一个临时表test用来存放重复的记录)

    SQL>truncate table cz; (清空cz表的数据,但保留cz表的结构)

    SQL>insert into cz select * from test; (再将临时表test里的内容反插回来)

    (4).适用于有大量重复记录的情况(Exception into 子句法):
    采用alter table 命令中的 Exception into 子句也可以确定出库表中重复的记录。这种方法稍微麻烦一些,为了使用“excepeion into ”子句,必须首先创建 EXCEPTIONS 表。创建该表的 SQL 脚本文件为 utlexcpt.sql 。对于win2000系统和 UNIX 系统, Oracle 存放该文件的位置稍有不同,在win2000系统下,该脚本文件存放在$ORACLE_HOMEOra90rdbmsadmin 目录下;而对于 UNIX 系统,该脚本文件存放在$ORACLE_HOME/rdbms/admin 目录下。

    具体步骤如下:
    SQL>@?/rdbms/admin/utlexcpt.sql

    Table created.

    SQL>desc exceptions
    Name Null? Type
    ----------------------------------------- -------- --------------

    ROW_ID ROWID
    OWNER VARCHAR2(30)
    TABLE_NAME VARCHAR2(30)
    CONSTRAINT VARCHAR2(30)

    SQL>alter table cz add constraint cz_unique unique(c1,c10,c20) exceptions into exceptions;
    *
    ERROR at line 1:
    ORA-02299: cannot validate (TEST.CZ_UNIQUE) - duplicate keys found

    SQL>create table dups as select * from cz where rowid in (select row_id from exceptions);

    Table created.

    SQL>select * from dups;

    C1 C10 C20
    ---------- ---------- ---
    1 2 dsf
    1 2 dsf
    1 2 dsf
    1 2 dsf
    2 3 che
    1 2 dsf
    1 2 dsf
    1 2 dsf
    1 2 dsf
    2 3 che
    2 3 che
    2 3 che
    2 3 che
    3 4 dff
    3 4 dff
    3 4 dff

    16 rows selected.

    SQL>select row_id from exceptions;

    ROW_ID
    ------------------
    AAAHD/AAIAAAADSAAA
    AAAHD/AAIAAAADSAAB
    AAAHD/AAIAAAADSAAC
    AAAHD/AAIAAAADSAAF
    AAAHD/AAIAAAADSAAH
    AAAHD/AAIAAAADSAAI
    AAAHD/AAIAAAADSAAG
    AAAHD/AAIAAAADSAAD
    AAAHD/AAIAAAADSAAE
    AAAHD/AAIAAAADSAAJ
    AAAHD/AAIAAAADSAAK
    AAAHD/AAIAAAADSAAL
    AAAHD/AAIAAAADSAAM
    AAAHD/AAIAAAADSAAN
    AAAHD/AAIAAAADSAAO
    AAAHD/AAIAAAADSAAP

    16 rows selected.

    SQL>delete from cz where rowid in ( select row_id from exceptions);

    16 rows deleted.

    SQL>insert into cz select distinct * from dups;

    3 rows created.

    SQL>select *from cz;

    C1 C10 C20
    ---------- ---------- ---
    1 2 dsf
    2 3 che
    3 4 dff
    4 5 err
    5 3 dar
    6 1 wee
    7 2 zxc

    7 rows selected.

    从结果里可以看到重复记录已经删除。

  • 相关阅读:
    真正的e时代
    在线手册
    UVA 10616 Divisible Group Sums
    UVA 10721 Bar Codes
    UVA 10205 Stack 'em Up
    UVA 10247 Complete Tree Labeling
    UVA 10081 Tight Words
    UVA 11125 Arrange Some Marbles
    UVA 10128 Queue
    UVA 10912 Simple Minded Hashing
  • 原文地址:https://www.cnblogs.com/adodo1/p/4328142.html
Copyright © 2011-2022 走看看