zoukankan      html  css  js  c++  java
  • 删除一个表中的重复数据同时保留第一次插入那一条以及sql优化

           业务:一个表中有很多数据(id为自增主键),在这些数据中有个别数据出现了重复的数据。

           目标:需要把这些重复数据删除同时保留第一次插入的那一条数据,还要保持其它的数据不受影响。

           解题过程:

           第一步:查出所有要保留的下来的数据的id(save_id)

    SELECT id as save_id
      FROM yujing.alarm_event_info_snapshot aeis
     where aeis.event_id in
           (SELECT ae.id
              FROM yujing.alarm_event ae
             where ae.event_uuid like 'yuanwtj_%')
     group by (aeis.event_id)
    

    优化后:

     

    SELECT aeis.id as save_id
      FROM yujing.alarm_event ae
     right join yujing.alarm_event_info_snapshot aeis
        on aeis.event_id = ae.id
     where ae.event_uuid like 'yuanwtj_%'
     group by (aeis.event_id)

      

           第二步:获取所有相关数据的id(all_id)

     

    SELECT aeis.id as all_id
      FROM yujing.alarm_event_info_snapshot aeis
     where aeis.event_id in
           (SELECT ae.id
              FROM yujing.alarm_event ae
             where ae.event_uuid like 'yuanwtj_%')
     order by aeis.event_id

    优化后:       

     

    SELECT aeis.id as all_id
      FROM yujing.alarm_event ae
     right join yujing.alarm_event_info_snapshot aeis
        on aeis.event_id = ae.id
     where ae.event_uuid like 'yuanwtj_%'
    


             第三步:获取要删除的数据的 id(del_id)

    select ad.all_id as del_id
      from (SELECT aeis.id as all_id
              FROM yujing.alarm_event_info_snapshot aeis
             where aeis.event_id in
                   (SELECT ae.id
                      FROM yujing.alarm_event ae
                     where ae.event_uuid like 'yuanwtj_%')) as ad
     where ad.all_id not in (SELECT id as save_id
                               FROM yujing.alarm_event_info_snapshot aeis
                              where aeis.event_id in
                                    (SELECT ae.id
                                       FROM yujing.alarm_event ae
                                      where ae.event_uuid like 'yuanwtj_%')
                              group by (aeis.event_id))
    

    优化后:

    select ad.all_id as del_id
      from (SELECT aeis.id as all_id
              FROM yujing.alarm_event ae
             right join yujing.alarm_event_info_snapshot aeis
                on aeis.event_id = ae.id
             where ae.event_uuid like 'yuanwtj_%') as ad
      left join (SELECT aeis.id as save_id
                   FROM yujing.alarm_event ae
                  right join yujing.alarm_event_info_snapshot aeis
                     on aeis.event_id = ae.id
                  where ae.event_uuid like 'yuanwtj_%'
                  group by (aeis.event_id)) as sd
        on ad.all_id = sd.save_id
     where sd.save_id is null
    

           

           第四步:根据id删除所有节点,注意mysql中如果有大量数据时需要批量删除,我最后使用了ETL工具进行的批量删除

           总结:在mysql数据库中,sql语句中最好不要在in或not in关键字的查询里动态获取匹配的值,数据量大的情况下使用它们效率很低,可以使用左右连接来代替in操作,这样效率会提高很多倍,大数据量下尤为明显。


  • 相关阅读:
    Python: Best Way to Exchange Keys with Values in a Dictionary?
    install pymssql on centos
    centos 5.5 deploy full procedure
    centos 5.5 deploy full procedure
    change defaut python to 2.6.5 on centos
    Python: Best Way to Exchange Keys with Values in a Dictionary?
    install eventlet ,redis,dreque on centos
    install freetds on centos
    use alias
    install eventlet ,redis,dreque on centos
  • 原文地址:https://www.cnblogs.com/pangblog/p/3268549.html
Copyright © 2011-2022 走看看