MySQL 8中新增的窗口函数使得原来麻烦的去重操作变得很简单。
truncate t_target;
insert into t_target
select item_id, created_time, modified_time, item_name, other
from (select *, row_number(http://www.amjmh.com/v/) over(partition by created_time,item_name) as rn
from t_source) t where rn=1;
这个语句执行只需要12秒,而且写法清晰易懂,其查询计划如下:
mysql> explain select item_id, created_time, modified_time, item_name, other
-> from (select *, row_number() over(partition by created_time,item_name) as rn
-> from t_source) t where rn=1;
+----+-------------+------------+------------+------+---------------+-------------+---------+-------+--------+----------+----------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+------+---------------+-------------+---------+-------+--------+----------+----------------+
| 1 | PRIMARY | <derived2> | NULL | ref | <auto_key0> | <auto_key0> | 8 | const | 10 | 100.00 | NULL |
| 2 | DERIVED | t_source | NULL | ALL | NULL | NULL | NULL | NULL | 997281 | 100.00 | Using filesort |
+----+-------------+------------+------------+------+---------------+-------------+---------+-------+--------+----------+----------------+
2 rows in set, 2 warnings (0.00 sec)
该查询对t_source表进行了一次全表扫描,同时用filesort对表按分区字段created_time、item_name进行了排序。外层查询从每个分区中保留一条数据。因为重复created_time和item_name的多条数据中可以保留任意一条,所以oevr中不需要使用order by子句。
从执行计划看,窗口函数去重语句似乎没有消除嵌套查询的变量去重好,但此方法实际执行是最快的。