zoukankan      html  css  js  c++  java
  • mysql数据去除重复及相关优化(转)

    由于mysql不支持同时对一张表进行操作,即子查询和要进行的操作不能是同一张表,因此需要通过临时表中专以下。

    1、单字段重复

    生成临时表,其中uid是需要去重的字段

    create table tmp_uid as (select uid from user_info group by uid having count(uid))
    
    create table tmp_id as (select min(id) from user_info group by uid having count()uid)

    数量量大时一定要为uid创建索引

    alter table tmp_uid add index 索引名 (字段名)
    
    alter table tmp_id add index 索引名 (字段名)

    删除多余的重复数据,保留重复数据中id最小的

    delete from user_info
    where id not in (select id from tmp_id)
    and uid in (select uid from tmp_uid)

    2、多字段重复

    如以上由于uid的重复间接导致了relationship中的记录重复,所以继续去重。

    2.1 一般方法

    基本的同上面:

    生成临时表

    create table tmp_relation as (select source,target from relationship group by source,target having count(*)>1)
    
    create table tmp_relationship_id as (select min(id) as id from relationship group by source,target having count(*)>1)

    创建索引

    alter table tmp_relationship_id add index 索引名(字段名)

    删除

    delete from relationship
    where id not in (select id from tmp_relationship_id)
    and (source,target) in (select source,target from relationship)

    2.2 快速方法

    实践中发现上面的删除字段重复的方法,由于没有办法为多字段重建索引,导致数据量大时效率极低,低到无法忍受。最后,受不了等了半天没反应的状况,本人决定,另辟蹊径。

    考虑到,估计同一记录的重复次数比较低。一般为2,或3,重复次数比较集中。所以可以尝试直接删除重复项中最大的,直到删除到不重复,这时其id自然也是当时重复的里边最小的。

    大致流程如下:

    (1)、选择每个重复项中的id最大的一个记录

    create table tmp_relation_id2 as (select max(id) from relationship group by source,target having count(*)>1)

    (2)、创建索引(仅需在第一次时执行)

    alter table tmp_relation_id2 add index 索引名 (字段名)

    (3)、删除重复项中id最大的记录

    delete from relationship where id in (select id from tmp_relation_id2)

    (4)、删除临时表

    drop table tmp_relation_id2

    重复上述步骤(1),(2),(3),(4),直到创建的临时表中不存在记录就结束(对于重复次数的数据,比较高效)

    本文章转自 http://www.cnblogs.com/rainduck/archive/2013/05/15/3079868.html

  • 相关阅读:
    【crontab】误删crontab及其恢复
    New Concept English there (7)
    New Concept English there (6)
    New Concept English there (5)
    New Concept English there (4)
    New Concept English there (3)
    New Concept English there (2)Typing speed exercise
    New Concept English there (1)Typing speed exercise
    New Concept English Two 34 game over
    New Concept English Two 33 94
  • 原文地址:https://www.cnblogs.com/ym1992it/p/4068852.html
Copyright © 2011-2022 走看看