zoukankan      html  css  js  c++  java
  • MYSQL数据去重与外表填充

    经常要对数据库中的数据进行去重,有时还需要使用外部表填冲数据,本文档记录数据去重与外表填充数据。
    date:2016/8/17
    author:wangxl

    1 需求

    对user_info1表去重,并添加age项。

    2 表数据

    user_info1:
    +----+----------+------+------+
    | id | name     | sex  | age  |
    +----+----------+------+------+
    |  1 | xiaolong | 1    | NULL |
    |  2 | xiaoyun  | 1    | NULL |
    |  3 | xiaoqin  | 2    | NULL |
    |  4 | xiaolong | 1    | NULL |
    |  5 | xiaodong | 1    | NULL |
    |  6 | xiaokai  | 1    | NULL |
    |  7 | xiaohong | 2    | NULL |
    |  8 | xiaolong | 1    | NULL |
    |  9 | xiaohong | 2    | NULL |
    | 10 | xiaofen  | 2    | NULL |
    +----+----------+------+------+
    
    user_info2:
    +----------+------+
    | name     | age  |
    +----------+------+
    | xiaolong |   26 |
    | xiaoyun  |   28 |
    | xiaoqin  |   27 |
    | xiaodong |   27 |
    | xiaokai  |   27 |
    | xiaohong |   24 |
    | xiaofen  |   22 |
    +----------+------+
    

    3 实战

    3.1 去重

    (1) 找出有重复字段
    	select * from user_info1 where name in (select name from user_info1 group by name having count(name) > 1);
    (2) 找出要删除的记录,重复记录是根据单个字段(name)来判断,只留有id最小的记录
    	select * from user_info1 where name in (select name from user_info1 group by name having count(name) > 1) and id not in (select min(id) from user_info1 group by name having count(name) > 1);
    (3) 删除表中多余的重复记录
    	delete from user_info1 where name in (select name from user_info1 group by name having count(name) > 1) and id not in (select min(id) from user_info1 group by name having count(name) > 1);
    
    	报错:ERROR 1093 (HY000): You can't specify target table 'user_info1' for update in FROM clause
    

    更换思路:找出每组中非最小id并删除,如下:

    (4) 找出每组最小id
    	select min(id) from user_info1 group by name
    (5) 找出每组非最小id
    	select * from user_info1 where id not in (select min(id) from user_info1 group by name);
    (6) 删除每组中非最小id所在行
    	delete from user_info1 where id not in (select id from select min(id) from user_info1 group by name);
    	ERROR 1093 (HY000): You can't specify target table 'user_info1' for update in FROM clause
    	更正:
    	delete from user_info1 where id not in (select minid from (select min(id) as minid from user_info1 group by name) a);、
    
    结果展示:
    +----+----------+------+------+
    | id | name     | sex  | age  |
    +----+----------+------+------+
    | 1  | xiaolong | 1    | NULL |
    | 2  | xiaoyun  | 1    | NULL |
    | 3  | xiaoqin  | 2    | NULL |
    | 5  | xiaodong | 1    | NULL |
    | 6  | xiaokai  | 1    | NULL |
    | 7  | xiaohong | 2    | NULL |
    | 10 | xiaofen  | 2    | NULL |
    +----+----------+------+------+
    

    对于没有primary key的话,怎么去重呢?

    (7) 创建表test
    (8) insert into test select distinct(name),sex,age from user_info1 group by name; 
    
    暂时没想出一句话解决方案.
    

    3.2 外表插入

    update user_info1 t set age=(select age from user_info2 where name=t.name);
    结果如下:
    +----+----------+------+------+
    | id | name     | sex  | age  |
    +----+----------+------+------+
    | 1  | xiaolong | 1    |   26 |
    | 2  | xiaoyun  | 1    |   28 |
    | 3  | xiaoqin  | 2    |   27 |
    | 5  | xiaodong | 1    |   27 |
    | 6  | xiaokai  | 1    |   27 |
    | 7  | xiaohong | 2    |   24 |
    | 10 | xiaofen  | 2    |   22 |
    +----+----------+------+------+
  • 相关阅读:
    使用git将本地代码上传到gitee【码云】
    .net设置实例字段在请求参数中不展示
    Springboot 操作Elasticsearch 方式一 【spring-data-elasticsearch】
    Java工具类 (3)------>WordUtils------>利用Poi根据模板生成新的word文档
    Java工具类 (2)------>TreeUtils------>树形结构生成类
    Java工具类 (1)------>IPUtils------>获取用户登录IP地址
    Idea安装MyBatisCodeHelper-Pro插件破解版以及去除mybatis的mapper.xml文件背景颜色
    Python 线程队列 LifoQueue – LIFO
    Python 线程队列 Queue – FIFO
    Python 线程障碍对象 Barrier
  • 原文地址:https://www.cnblogs.com/riordon/p/5782286.html
Copyright © 2011-2022 走看看