zoukankan      html  css  js  c++  java
  • Mysql删除重复的数据 hello*

    最近在做一个多线程的爬虫程序,由于队列中有重复的数据,尽管程序中有判断不存在则插入,但由于多个线程并发,导致数据库中存在部分重复的数据。

     程序中的bug已经修复,但重新爬一遍耗时耗力,于是就选择删除重复的数据,只保留一条有效数据

    解决的思路就是根据确定其数据唯一的聚合字段进行分组,然后只保留一条有效数据

    1.查询重复数据

    select * FROM ZYZBBData
    WHERE (code,year,report_type) IN (SELECT
                              code,
                              year,
                              report_type
                            FROM (SELECT
                                    code,
                                    year,
                                    report_type
                                  FROM ZYZBBData
                                  GROUP BY code,year,report_type
                                  HAVING COUNT( * ) > 1) a)

     2.只保留Id最小的1条数据,过滤出要被删除的数据

    select * FROM ZYZBBData
    WHERE (code,year,report_type) IN (SELECT
                              code,
                              year,
                              report_type
                            FROM (SELECT
                                    code,
                                    year,
                                    report_type
                                  FROM ZYZBBData
                                  GROUP BY code,year,report_type
                                  HAVING COUNT( * ) > 1) a)
        AND id NOT IN(SELECT
                        id
                      FROM (SELECT
                              MIN(id) AS id
                            FROM ZYZBBData
                            GROUP BY code,year,report_type
                            HAVING COUNT( * ) > 1) b)

    3.删除重复的数据

    DELETE
    FROM ZYZBBData
    WHERE (code,year,report_type) IN (SELECT
                              code,
                              year,
                              report_type
                            FROM (SELECT
                                    code,
                                    year,
                                    report_type
                                  FROM ZYZBBData
                                  GROUP BY code,year,report_type
                                  HAVING COUNT( * ) > 1) a)
        AND id NOT IN(SELECT
                       id
                      FROM (SELECT
                              MIN(id) AS id
                            FROM ZYZBBData
                            GROUP BY code,year,report_type
                            HAVING COUNT( * ) > 1) b)

     数据正常

  • 相关阅读:
    AcWing 1018. 最低通行费
    蓝桥杯赛第10届省赛
    P5745 【深基附B例】区间最大和
    P3383 【模板】线性筛素数
    第12届蓝桥杯赛国赛 小蓝买瓜子
    P4715 【深基16.例1】淘汰赛
    AcWing 1015. 摘花生
    第12届蓝桥杯赛省赛 种菜的最大价值
    linq to sql初步
    汇编语言学习笔记接收鼠标消息
  • 原文地址:https://www.cnblogs.com/HTLucky/p/15516231.html
Copyright © 2011-2022 走看看