zoukankan      html  css  js  c++  java
  • MYSQL数据去重与外表填充

    经常要对数据库中的数据进行去重,有时还需要使用外部表填冲数据,本文档记录数据去重与外表填充数据。
    date:2016/8/17
    author:wangxl

    1 需求

    对user_info1表去重,并添加age项。

    2 表数据

    user_info1:
    +----+----------+------+------+
    | id | name     | sex  | age  |
    +----+----------+------+------+
    |  1 | xiaolong | 1    | NULL |
    |  2 | xiaoyun  | 1    | NULL |
    |  3 | xiaoqin  | 2    | NULL |
    |  4 | xiaolong | 1    | NULL |
    |  5 | xiaodong | 1    | NULL |
    |  6 | xiaokai  | 1    | NULL |
    |  7 | xiaohong | 2    | NULL |
    |  8 | xiaolong | 1    | NULL |
    |  9 | xiaohong | 2    | NULL |
    | 10 | xiaofen  | 2    | NULL |
    +----+----------+------+------+
    
    user_info2:
    +----------+------+
    | name     | age  |
    +----------+------+
    | xiaolong |   26 |
    | xiaoyun  |   28 |
    | xiaoqin  |   27 |
    | xiaodong |   27 |
    | xiaokai  |   27 |
    | xiaohong |   24 |
    | xiaofen  |   22 |
    +----------+------+
    

    3 实战

    3.1 去重

    (1) 找出有重复字段
    	select * from user_info1 where name in (select name from user_info1 group by name having count(name) > 1);
    (2) 找出要删除的记录,重复记录是根据单个字段(name)来判断,只留有id最小的记录
    	select * from user_info1 where name in (select name from user_info1 group by name having count(name) > 1) and id not in (select min(id) from user_info1 group by name having count(name) > 1);
    (3) 删除表中多余的重复记录
    	delete from user_info1 where name in (select name from user_info1 group by name having count(name) > 1) and id not in (select min(id) from user_info1 group by name having count(name) > 1);
    
    	报错:ERROR 1093 (HY000): You can't specify target table 'user_info1' for update in FROM clause
    

    更换思路:找出每组中非最小id并删除,如下:

    (4) 找出每组最小id
    	select min(id) from user_info1 group by name
    (5) 找出每组非最小id
    	select * from user_info1 where id not in (select min(id) from user_info1 group by name);
    (6) 删除每组中非最小id所在行
    	delete from user_info1 where id not in (select id from select min(id) from user_info1 group by name);
    	ERROR 1093 (HY000): You can't specify target table 'user_info1' for update in FROM clause
    	更正:
    	delete from user_info1 where id not in (select minid from (select min(id) as minid from user_info1 group by name) a);、
    
    结果展示:
    +----+----------+------+------+
    | id | name     | sex  | age  |
    +----+----------+------+------+
    | 1  | xiaolong | 1    | NULL |
    | 2  | xiaoyun  | 1    | NULL |
    | 3  | xiaoqin  | 2    | NULL |
    | 5  | xiaodong | 1    | NULL |
    | 6  | xiaokai  | 1    | NULL |
    | 7  | xiaohong | 2    | NULL |
    | 10 | xiaofen  | 2    | NULL |
    +----+----------+------+------+
    

    对于没有primary key的话,怎么去重呢?

    (7) 创建表test
    (8) insert into test select distinct(name),sex,age from user_info1 group by name; 
    
    暂时没想出一句话解决方案.
    

    3.2 外表插入

    update user_info1 t set age=(select age from user_info2 where name=t.name);
    结果如下:
    +----+----------+------+------+
    | id | name     | sex  | age  |
    +----+----------+------+------+
    | 1  | xiaolong | 1    |   26 |
    | 2  | xiaoyun  | 1    |   28 |
    | 3  | xiaoqin  | 2    |   27 |
    | 5  | xiaodong | 1    |   27 |
    | 6  | xiaokai  | 1    |   27 |
    | 7  | xiaohong | 2    |   24 |
    | 10 | xiaofen  | 2    |   22 |
    +----+----------+------+------+
  • 相关阅读:
    famous summary stack trace from Oracle Version 8.1.7.4.0 Bug Note
    Mysql:命令选项、配置选项、(全局、会话)系统变量、状态变量:状态变量
    Mysql:命令选项、配置选项、(全局、会话)系统变量、状态变量:(仅会话)系统变量
    Mysql:命令选项、配置选项、(全局、会话)系统变量、状态变量:命令选项
    Mysql:命令选项、配置选项、(全局、会话)系统变量、状态变量:系统变量
    Mysql:命令选项、配置选项、(全局、会话)系统变量、状态变量:(动态)系统变量
    正式入住CSDN了
    关于ansi控制码
    源码网站
    Imail2006 联系人???问题解决
  • 原文地址:https://www.cnblogs.com/riordon/p/5782286.html
Copyright © 2011-2022 走看看