zoukankan      html  css  js  c++  java
  • Mysql删除重复的数据 hello*

    最近在做一个多线程的爬虫程序,由于队列中有重复的数据,尽管程序中有判断不存在则插入,但由于多个线程并发,导致数据库中存在部分重复的数据。

     程序中的bug已经修复,但重新爬一遍耗时耗力,于是就选择删除重复的数据,只保留一条有效数据

    解决的思路就是根据确定其数据唯一的聚合字段进行分组,然后只保留一条有效数据

    1.查询重复数据

    select * FROM ZYZBBData
    WHERE (code,year,report_type) IN (SELECT
                              code,
                              year,
                              report_type
                            FROM (SELECT
                                    code,
                                    year,
                                    report_type
                                  FROM ZYZBBData
                                  GROUP BY code,year,report_type
                                  HAVING COUNT( * ) > 1) a)

     2.只保留Id最小的1条数据,过滤出要被删除的数据

    select * FROM ZYZBBData
    WHERE (code,year,report_type) IN (SELECT
                              code,
                              year,
                              report_type
                            FROM (SELECT
                                    code,
                                    year,
                                    report_type
                                  FROM ZYZBBData
                                  GROUP BY code,year,report_type
                                  HAVING COUNT( * ) > 1) a)
        AND id NOT IN(SELECT
                        id
                      FROM (SELECT
                              MIN(id) AS id
                            FROM ZYZBBData
                            GROUP BY code,year,report_type
                            HAVING COUNT( * ) > 1) b)

    3.删除重复的数据

    DELETE
    FROM ZYZBBData
    WHERE (code,year,report_type) IN (SELECT
                              code,
                              year,
                              report_type
                            FROM (SELECT
                                    code,
                                    year,
                                    report_type
                                  FROM ZYZBBData
                                  GROUP BY code,year,report_type
                                  HAVING COUNT( * ) > 1) a)
        AND id NOT IN(SELECT
                       id
                      FROM (SELECT
                              MIN(id) AS id
                            FROM ZYZBBData
                            GROUP BY code,year,report_type
                            HAVING COUNT( * ) > 1) b)

     数据正常

  • 相关阅读:
    c++ --> 虚函数
    Algorithm --> 全排列
    Algorithm --> 矩阵链乘法
    STL --> set用法
    STL --> list用法
    Algorithm --> 最长公共子序列(LCS)
    Zookeeper使用实例——服务节点管理
    Zookeeper使用实例——分布式共享锁
    Zookeeper初探
    Java设计模式应用——备忘录模式
  • 原文地址:https://www.cnblogs.com/HTLucky/p/15516231.html
Copyright © 2011-2022 走看看