zoukankan      html  css  js  c++  java
  • 再战mysql 数据去重

    年初时,写过一篇去重的,在小表中还能用用,在大表中真的是效率低下,现在给了一次优化
    https://www.cnblogs.com/jarjune/p/8328013.html

    继上一篇文章

    方法三:

    DELIMITER //
    
    DROP PROCEDURE IF EXISTS delete_rows_2;
    
    CREATE PROCEDURE delete_rows_2(IN TABLENAME VARCHAR(50), IN FIELDNAMES VARCHAR(100), IN AUTOFIELD VARCHAR(50))
    BEGIN
    
    DECLARE DELETE_TABLE_ROWS_SQL VARCHAR(1000);
    
    SET DELETE_TABLE_ROWS_SQL = CONCAT('
    		DELETE 
    		FROM 
    			', TABLENAME ,' 
    		WHERE 
    			(', FIELDNAMES ,') IN ( 
    				SELECT ', FIELDNAMES ,' 
    				FROM (
    					SELECT 
    						', FIELDNAMES ,' 
    					FROM 
    						', TABLENAME ,' 
    					GROUP BY 
    						', FIELDNAMES ,' 
    					HAVING 
    						COUNT(1) > 1 
    				) t1
    			) 
    		AND ', AUTOFIELD ,' NOT IN ( 
    			SELECT ', AUTOFIELD ,' 
    			FROM (
    				SELECT 
    					MAX(', AUTOFIELD ,') ', AUTOFIELD ,' 
    				FROM 
    					', TABLENAME ,'
    				GROUP BY 
    					', FIELDNAMES ,' 
    				HAVING 
    					COUNT(1) > 1 
    				) t2
    			)
    ');
    
    SET @DELETE_TABLE_ROWS_SQL = DELETE_TABLE_ROWS_SQL;
    
    PREPARE DELETE_TABLE_ROWS_SQL_PRE FROM @DELETE_TABLE_ROWS_SQL;
    EXECUTE DELETE_TABLE_ROWS_SQL_PRE;
    
    END//
    
    DELIMITER ;
    
    CALL delete_rows_1('表名', '字段1,字段2,字段3...', '主键(唯一)字段');
    

    之后发现删除的效率还是挺低,又优化成

    方法三(优化):

    DELIMITER //
    
    DROP PROCEDURE IF EXISTS delete_rows_2;
    
    CREATE PROCEDURE delete_rows_2(IN TABLENAME VARCHAR(50), IN FIELDNAMES VARCHAR(100), IN AUTOFIELD VARCHAR(50))
    BEGIN
    
    DECLARE DELETE_TABLE_ROWS_SQL VARCHAR(1000);
    
    SET DELETE_TABLE_ROWS_SQL = CONCAT('
    		DELETE 
    		FROM 
    			', TABLENAME ,' 
    		WHERE 
    			', AUTOFIELD ,' IN ( 
    				SELECT 
    					', AUTOFIELD ,' 
    				FROM
    					(
    					SELECT 
    						', AUTOFIELD ,' 
    					FROM 
    						', TABLENAME ,' 
    					WHERE 
    						(', FIELDNAMES ,') IN ( 
    							SELECT 
    								', FIELDNAMES ,' 
    							FROM 
    								', TABLENAME ,' 
    							GROUP BY 
    								', FIELDNAMES ,' 
    							HAVING 
    								COUNT(1) > 1 
    						) 
    					AND ', AUTOFIELD ,' NOT IN ( 
    						SELECT 
    							MAX(', AUTOFIELD ,') 
    						FROM 
    							', TABLENAME ,'
    						GROUP BY 
    							', FIELDNAMES ,' 
    						HAVING 
    							COUNT(1) > 1 
    					) 
    				) t2 
    			) 
    	');
    
    SET @DELETE_TABLE_ROWS_SQL = DELETE_TABLE_ROWS_SQL;
    
    PREPARE DELETE_TABLE_ROWS_SQL_PRE FROM @DELETE_TABLE_ROWS_SQL;
    EXECUTE DELETE_TABLE_ROWS_SQL_PRE;
    
    END//
    
    DELIMITER ;
    
    CALL delete_rows_2('表名', '字段1,字段2,字段3...', '主键字段');
    

    由于上述都要group by 两次,又换了一种思路

    方法四

    DELETE t1
    FROM
    	l_weijij_47 t1,
    	(
    		SELECT
    			f01,
    			f02,
    			f03,
    			MAX(seq_value) seq_value
    		FROM
    			l_weijij_47
    		GROUP BY
    			f01,
    			f02,
    			f03
    		HAVING
    			COUNT(1) > 1
    		ORDER BY NULL
    	) t2
    where
    	t1.f01 = t2.f01
    AND t1.f02 = t2.f02
    AND t1.f03 = t2.f03
    and t1.seq_value < t2.seq_value
    

    注:group by默认会进行排序,所以要加上order by NULL就避免了排序
    group by a,b,c的时候,a,b,c一定要加索引才快

    综上,方法四是目前在用的去重。

  • 相关阅读:
    数据结构与算法基础 模块七
    操作系统
    数据结构与算法基础 模块六
    数据库技术
    数据库技术
    数据库 SQL语句
    数据结构与算法基础 模块五
    同源策略和解决
    初识单例模式
    Django—内置用户权限管理
  • 原文地址:https://www.cnblogs.com/jarjune/p/9951573.html
Copyright © 2011-2022 走看看