zoukankan      html  css  js  c++  java
  • Mysql删除重复的数据 hello*

    最近在做一个多线程的爬虫程序,由于队列中有重复的数据,尽管程序中有判断不存在则插入,但由于多个线程并发,导致数据库中存在部分重复的数据。

     程序中的bug已经修复,但重新爬一遍耗时耗力,于是就选择删除重复的数据,只保留一条有效数据

    解决的思路就是根据确定其数据唯一的聚合字段进行分组,然后只保留一条有效数据

    1.查询重复数据

    select * FROM ZYZBBData
    WHERE (code,year,report_type) IN (SELECT
                              code,
                              year,
                              report_type
                            FROM (SELECT
                                    code,
                                    year,
                                    report_type
                                  FROM ZYZBBData
                                  GROUP BY code,year,report_type
                                  HAVING COUNT( * ) > 1) a)

     2.只保留Id最小的1条数据,过滤出要被删除的数据

    select * FROM ZYZBBData
    WHERE (code,year,report_type) IN (SELECT
                              code,
                              year,
                              report_type
                            FROM (SELECT
                                    code,
                                    year,
                                    report_type
                                  FROM ZYZBBData
                                  GROUP BY code,year,report_type
                                  HAVING COUNT( * ) > 1) a)
        AND id NOT IN(SELECT
                        id
                      FROM (SELECT
                              MIN(id) AS id
                            FROM ZYZBBData
                            GROUP BY code,year,report_type
                            HAVING COUNT( * ) > 1) b)

    3.删除重复的数据

    DELETE
    FROM ZYZBBData
    WHERE (code,year,report_type) IN (SELECT
                              code,
                              year,
                              report_type
                            FROM (SELECT
                                    code,
                                    year,
                                    report_type
                                  FROM ZYZBBData
                                  GROUP BY code,year,report_type
                                  HAVING COUNT( * ) > 1) a)
        AND id NOT IN(SELECT
                       id
                      FROM (SELECT
                              MIN(id) AS id
                            FROM ZYZBBData
                            GROUP BY code,year,report_type
                            HAVING COUNT( * ) > 1) b)

     数据正常

  • 相关阅读:
    静态链接与动态链接的区别(转载)
    VC++界面编程之--阴影窗口的实现详解
    win7 vmware 无法使用usb
    如何划分子网,确定子网和掩码的位数?
    华为三层交换机-路由-硬件防火墙的配置
    jquery json ajax -2
    jquery json ajax
    wes开发笔记
    伪类和伪元素
    优化sql语句
  • 原文地址:https://www.cnblogs.com/HTLucky/p/15516231.html
Copyright © 2011-2022 走看看