zoukankan      html  css  js  c++  java
  • ORACLE 删除重复的数据

     
     
     
     
    查询及删除重复记录的SQL语句
     
    1、查找表中多余的重复记录,重复记录是根据单个字段(Id)来判断
     
    select Id from 表 group byId having count(Id) > 1  --(查找表中那个字段是重复的)
     
    select * from 表 where Id in (select Id from 表 group byId having count(Id) > 1)   --(根据查出来的重复字段数据,查询相关的所有记录)
     
    2、删除表中多余的重复记录,重复记录是根据单个字段(Id)来判断,只留有rowid最小的记录
     
    DELETE from 表 WHERE (id) IN ( SELECT id FROM 表 GROUP BY id HAVING COUNT(id) > 1) AND ROWID NOT IN (SELECT MIN(ROWID) FROM 表 GROUP BY id HAVING COUNT(*) > 1);
     
    根据重复数据的字段判断,删除掉多余的数据,只保留ROWID (行数)最小的记录
     
     
     
    3、查找表中多余的重复记录(多个字段)
     
    select * from 表 a where (a.Id,a.seq) in(select Id,seq from 表 group by Id,seq having count(*) > 1)
     
    4、删除表中多余的重复记录(多个字段),只留有rowid最小的记录
     
    delete from 表 a where (a.Id,a.seq) in (select Id,seq from 表 group by Id,seq having count(*) > 1) and rowid not in (select min(rowid) from 表 group by Id,seq having count(*)>1)
     
    5、查找表中多余的重复记录(多个字段),不包含rowid最小的记录
     
    select * from 表 a where (a.Id,a.seq) in (select Id,seq from 表 group by Id,seq having count(*) > 1) and rowid not in (select min(rowid) from 表 group by Id,seq having count(*)>1)
     
     

    一:重复数据根据单个字段进行判断

    1、首先,查询表中多余的数据,由关键字段(name)来查询。

    select * from OA_ADDRESS_BOOK where name in (select name from OA_ADDRESS_BOOK group by name having count(name)>1)

    2、删除表中重复数据,重复数据是根据单个字段(Name)来判断,只留有rowid最小的记录

    delete from OA_ADDRESS_BOOK where (Name) in 

    (select Name from OA_ADDRESS_BOOK group by Name having count(Name) >1) 

    and rowid not in (select min(rowid) from OA_ADDRESS_BOOK group by Name having count(Name)>1)

    二:重复数据根据多个字段进行判断

    1、首先,查询表中重复数据,由关键字段(Name,UNIT_ID)来查询。

    select * from OA_ADDRESS_BOOK book1 where (book1.name,book1.unit_id) in 
    (select book2.name,book2.unit_id from OA_ADDRESS_BOOK book2 group by  book2.name,book2.unit_id  having count(*)>1)

    2、删除表中重复数据,重复数据是根据多个字段(Name,UNIT_ID)来判断,只留有rowid最小的记录

    delete from OA_ADDRESS_BOOK a where (a.Name,a.UNIT_ID) in 
    (select Name,UNIT_ID from OA_ADDRESS_BOOK group by Name,UNIT_ID having count(*) > 1) 
    and rowid not in (select min(rowid) from OA_ADDRESS_BOOK group by Name,UNIT_ID having count(*)>1)

    3、查询表中重复数据,重复数据是根据多个字段(Name,UNIT_ID)来判断,不包含rowid最小的记录
     
    select name,unit_id from OA_ADDRESS_BOOK a where (a.Name,a.UNIT_ID) in 
    (select Name,UNIT_ID from OA_ADDRESS_BOOK group by Name,UNIT_ID having count(*) > 1) 
    and rowid not in (select min(rowid) from OA_ADDRESS_BOOK group by Name,UNIT_ID having count(*)>1)
     
     

    1. 问题描述

    BBSCOMMENT表为BBSDETAIL的从表,记录商户评价信息。因为数据倒腾来倒腾去的,有很多重复数据。表结构如下:

    COMMENT_ID NOT NULL NUMBER  --主键
    DETAIL_ID NOT NULL NUMBER  --外键,引用BBSDETAIL表
    COMMENT_BODY NOT NULL VARCHAR2(500)  --评价内容

    --其它字段忽略

    其中主键是没有重复的,重复的是DETAIL_ID+COMMENT_BODY+......等信息,就是某些商家的评价信息有重复。

    2. 解决步骤

    2.1 查找表中多余的重复记录

    复制代码
    复制代码
    --查询出所有有重复的数据
    select DETAIL_ID,COMMENT_BODY,count(*)
    from BBSCOMMENT
    group by DETAIL_ID,COMMENT_BODY
    having count(*)>1
    order by DETAIL_ID, COMMENT_BODY; --1955条
    复制代码
    复制代码

    2.2 显示了所有的非冗余的数据

    --这一条命令显示了所有的非冗余的数据
    select min(COMMENT_ID) as COMMENT_ID,DETAIL_ID,COMMENT_BODY
    from BBSCOMMENT
    group by DETAIL_ID,COMMENT_BODY;   --21453条,之所以此值不等于表总记录数-1955,是因为1955条记录中,有的重复了不止一次。

    2.3 如果记录数量少(千级别),可以把上面的语句做成子查询然后直接删除

    复制代码
    复制代码
    --如果表数据量不是很大(1千条以内),可以把上面的语句做成子查询然后直接删除
    delete from BBSCOMMENT where COMMENT_ID not in(
        select min(COMMENT_ID)
        from BBSCOMMENT
        group by DETAIL_ID,COMMENT_BODY
    );          --782秒,在我这里,2万条记录,重复记录2千多(太慢了!!)
    复制代码
    复制代码

    2.4 另一种删除方法

    复制代码
    复制代码
    --这条语句也能够实现上述功能,但不好测试了,数据已经被我删除了
    --删除条件一:有重复数据的记录;条件二:保留最小rowid的记录。
    delete from BBSCOMMENT a
    where
        (a.DETAIL_ID,a.COMMENT_BODY) in(select DETAIL_ID,COMMENT_BODY from BBSCOMMENT group by DETAIL_ID,COMMENT_BODY having count(*) > 1)
        and rowid not in (select min(rowid) from BBSCOMMENT group by DETAIL_ID,COMMENT_BODY having count(*)>1);
    复制代码
    复制代码

    2.5 大数据量还是用PL/SQL方便快捷

    复制代码
    复制代码
    declare
    --定义存储结构
    type bbscomment_type is record
    (
        comment_id BBSCOMMENT.COMMENT_ID%type,
        detail_id BBSCOMMENT.DETAIL_ID%type,
        comment_body BBSCOMMENT.COMMENT_BODY%type
    );
    bbscomment_record bbscomment_type;
    
    --可供比较的变量
    v_comment_id BBSCOMMENT.COMMENT_ID%type;
    v_detail_id BBSCOMMENT.DETAIL_ID%type;
    v_comment_body BBSCOMMENT.COMMENT_BODY%type;
    
    --其它变量
    v_batch_size integer := 5000;
    v_counter integer := 0;
    
    cursor cur_dupl is
        --取出所有有重复的记录
        select COMMENT_ID, DETAIL_ID, COMMENT_BODY
        from BBSCOMMENT
        where(DETAIL_ID, COMMENT_BODY) in (
            --这些记录有重复
            select DETAIL_ID, COMMENT_BODY
            from BBSCOMMENT
            group by DETAIL_ID, COMMENT_BODY
            having count(*) > 1)
        order by DETAIL_ID, COMMENT_BODY;
    begin
        for bbscomment_record in cur_dupl loop
            if v_detail_id is null or (bbscomment_record.detail_id != v_detail_id or nvl(bbscomment_record.comment_body, ' ') != nvl(v_comment_body, ' ')) then
                --首次进入、换记录了,都重新赋值
                v_detail_id := bbscomment_record.detail_id;
                v_comment_body := bbscomment_record.comment_body;
            else
                --其它记录删除
                delete from BBSCOMMENT where COMMENT_ID = bbscomment_record.comment_id;
                v_counter := v_counter + 1;
    
                if mod(v_counter, v_batch_size) = 0 then
                    --每多少条提交一次
                    commit;
                end if;
            end if;
        end loop;
    
        if v_counter > 0 then
            --最后一次提交
            commit;
        end if;
    
        dbms_output.put_line(to_char(v_counter)||'条记录被删除!');
    exception
        when others then
            dbms_output.put_line('sqlerrm-->' ||sqlerrm);
            rollback;
    end;
  • 相关阅读:
    5.19 省选模拟赛 T1 小B的棋盘 双指针 性质
    5.15 省选模拟赛 容斥 生成函数 dp
    5.15 省选模拟赛 T1 点分治 FFT
    5.15 牛客挑战赛40 B 小V的序列 关于随机均摊分析 二进制
    luogu P4929 【模板】舞蹈链 DLX
    CF 878E Numbers on the blackboard 并查集 离线 贪心
    5.10 省选模拟赛 拍卖 博弈 dp
    5.12 省选模拟赛 T2 贪心 dp 搜索 差分
    5.10 省选模拟赛 tree 树形dp 逆元
    luogu P6088 [JSOI2015]字符串树 可持久化trie 线段树合并 树链剖分 trie树
  • 原文地址:https://www.cnblogs.com/JIKes/p/11583687.html
Copyright © 2011-2022 走看看