zoukankan      html  css  js  c++  java
  • Mysql exists 与 in

    今天公司同事反馈一个SQL语句删除数据删除了一个小时,还没有删除完,强制中断。 第一眼看到 exists 的时候,脑子里要有这么个概念:

    Oracle exists 的效率比in 高。而Mysql 则不一定。 Mysql 使用eixsts 与使用in的规则为:

    子查询的表大的时候,使用EXISTS可以有效减少总的循环次数来提升速度;
    外查询的表大的时候,使用IN可以有效减少对外查询表循环遍历来提升速度。
    从本质上讲,exists 是以外查询为驱动表,而in 是以子查询为驱动表(驱动表决定了以 哪个结果集作为nestloop的对比依据)。

    3.1.1 SQL

    DELETE t FROM   o.`AI_AD_U_L` t   WHERE EXISTS (SELECT     1   FROM     o.`AI_AD_U_L_TEMP`  AS a   WHERE a.`ca_id`=t.`ca_id`);
    

    3.1.2 分析过程

    1. 查看表上的索引

      mysql> show index from AI_AD_U_L;
      +-----------+------------+---------------------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
      | Table     | Non_unique | Key_name                        | Seq_in_index | Column_name  | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
      +-----------+------------+---------------------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
      | AI_AD_U_L |          0 | PRIMARY                         |            1 | prod_inst_id | A         |    21162012 |     NULL | NULL   |      | BTREE      |         |               |
      | AI_AD_U_L |          1 | ai_sync_prod_level_cust_addr_id |            1 | cust_addr_id | A         |     8266746 |     NULL | NULL   | YES  | BTREE      |         |               |
      | AI_AD_U_L |          1 | ai_sync_prod_level_mac          |            1 | mac          | A         |    12227460 |     NULL | NULL   | YES  | BTREE      |         |               |
      +-----------+------------+---------------------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
      3 rows in set (0.00 sec)
      mysql> show index from AI_AD_U_L_TEMP;
      +----------------+------------+-------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
      | Table          | Non_unique | Key_name          | Seq_in_index | Column_name  | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
      +----------------+------------+-------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
      | AI_AD_U_L_TEMP |          1 | idx_cust_addr_id2 |            1 | cust_addr_id | A         |        2366 |     NULL | NULL   | YES  | BTREE      |         |               |
      | AI_AD_U_L_TEMP |          1 | idx_prod_inst_id  |            1 | prod_inst_id | A         |        3791 |     NULL | NULL   |      | BTREE      |         |               |
      +----------------+------------+-------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
      2 rows in set (0.00 sec)
      

      此时表上是有对应字段的索引的,如果索引不存在,需要创建索引。

    2. 查看执行计划

      mysql> explain DELETE   t FROM   o.`AI_AD_U_L` t WHERE EXISTS   (SELECT     1   FROM     o.`AI_AD_U_L_TEMP` AS a   WHERE a.prod_inst_id = t.prod_inst_id);
      +----+--------------------+-------+------------+------+------------------+------------------+---------+-----------------------+----------+----------+-------------+
      | id | select_type        | table | partitions | type | possible_keys    | key              | key_len | ref                   | rows     | filtered | Extra       |
      +----+--------------------+-------+------------+------+------------------+------------------+---------+-----------------------+----------+----------+-------------+
      |  1 | DELETE             | t     | NULL       | ALL  | NULL             | NULL             | NULL    | NULL                  | 21162122 |   100.00 | Using where |
      |  2 | DEPENDENT SUBQUERY | a     | NULL       | ref  | idx_prod_inst_id | idx_prod_inst_id | 8       | o.t.prod_inst_id      |        1 |   100.00 | Using index |
      +----+--------------------+-------+------------+------+------------------+------------------+---------+-----------------------+----------+----------+-------------+
      2 rows in set, 1 warning (0.01 sec)
      

      通过执行计划发现两点问题:

      1. 外查询表数据量大,21162122,也就是访问了21162122次,而子查询通过索引只访问了一次。
      2. 发现子查询使用了索引,而外查询表上没有使用索引。

      从以上两点发现,说明外查询作为了驱动表。

    3. 查看子查询中表的数据量

      mysql> select count(*) from AI_AD_U_L_TEMP;
      +----------+
      | count(*) |
      +----------+
      |     3791 |
      +----------+
      1 row in set (0.00 sec)
      

      子查询中数据量小,应以子查询为驱动表。应该用exists 应换成in。

    4. 调整SQL语句并查看执行计划 将exists 改为in 的用法 。

      mysql> explain DELETE   t FROM   o.`AI_AD_U_L` t WHERE t.prod_inst_id in  (SELECT prod_inst_id FROM     o.`AI_AD_U_L_TEMP` AS a   );
      +----+-------------+-------+------------+--------+------------------+------------------+---------+-----------------------+------+----------+------------------------+
      | id | select_type | table | partitions | type   | possible_keys    | key              | key_len | ref                   | rows | filtered | Extra                  |
      +----+-------------+-------+------------+--------+------------------+------------------+---------+-----------------------+------+----------+------------------------+
      |  1 | SIMPLE      | a     | NULL       | index  | idx_prod_inst_id | idx_prod_inst_id | 8       | NULL                  | 3791 |   100.00 | Using index; LooseScan |
      |  1 | DELETE      | t     | NULL       | eq_ref | PRIMARY          | PRIMARY          | 8       | o.a.prod_inst_id |    1 |   100.00 | NULL                   |
      +----+-------------+-------+------------+--------+------------------+------------------+---------+-----------------------+------+----------+------------------------+
      2 rows in set (0.00 sec)
      

      从执行计划中可以看到,两张表都在使用索引。而外表的访问次数也明显下降为子查询表中的行数。大量减少了循环访问外表的次数。

    5. 执行SQL语句

      mysql> DELETE   t FROM   o.`AI_AD_U_L` t WHERE t.prod_inst_id in  (SELECT prod_inst_id FROM     o.`AI_AD_U_L_TEMP` AS a   );
      Query OK, 3525 rows affected (0.44 sec)
      

      我们看到效果明显, 原来1小时都无法执行完成的SQL,现在只需要0.44秒。

  • 相关阅读:
    Python开发环境Spyder介绍
    Python干货整理之数据结构篇
    通过Python爬虫按关键词抓取相关的新闻
    疫情后来场说走就走的旅行,Python制作一份可视化的旅行攻略
    详细介绍去一年在 PyPI 上下载次数最多的 Python 包
    Python错误与异常
    python爬虫爬取2020年中国大学排名
    微信史上最短的一行功能代码:拍一拍
    Python爬取某宝商品数据案例:100页的价格、购买人数等数据
    我的SAS菜鸟之路7
  • 原文地址:https://www.cnblogs.com/halberd-lee/p/10643431.html
Copyright © 2011-2022 走看看