zoukankan      html  css  js  c++  java
  • sql优化-派生表与inner-join

    首先来说明一下派生表

    外部的表查询的结果集是从子查询中生成的.如下形式:

    select ... from (select ....) dt

    如上形式中括号中的查询的结果作为外面select语句的查询源,派生表必须指定别名,因此后面的dt必须指定。派生表和临时表差不多,但是在select语句中派生表比临时表要容易,因为派生表不用创建。

    一个有关派生表优化的实例。

    开发同事发来一个sql优化,涉及到4张表,表中的数据都不是很大,但是查询起来真的很慢。服务器性能又差,查询总是超时。

    四张表的表结构如下:

           Table: t_info_setting
    Create Table: CREATE TABLE `t_info_setting` (
      `id` int(11) NOT NULL AUTO_INCREMENT,
      `parent_key` varchar(32) NOT NULL,
      `column_name` varchar(32) NOT NULL,
      `column_key` varchar(32) NOT NULL,
      `storage_way` tinyint(4) DEFAULT '0',
      `check_way` tinyint(4) DEFAULT '0',
      `remark` varchar(500) DEFAULT '',
      `operator` varchar(128) DEFAULT '',
      `status` int(11) DEFAULT '1',
      `update_time` datetime DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
      PRIMARY KEY (`id`),
      KEY `column_key` (`column_key`)
    ) ENGINE=InnoDB AUTO_INCREMENT=9 DEFAULT CHARSET=utf8
    t_info_setting
           Table: t_articles_status
    Create Table: CREATE TABLE `t_articles_status` (
      `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
      `linkId` varchar(36) NOT NULL,
      `column_key` varchar(32) NOT NULL,
      `status` int(11) DEFAULT '50000',
      `operator_time` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
      PRIMARY KEY (`id`),
      KEY `article_status` (`linkId`,`column_key`)
    ) ENGINE=InnoDB AUTO_INCREMENT=22232 DEFAULT CHARSET=utf8
    1 row in set (0.00 sec)
    t_articles_status
           Table: t_article_operations
    Create Table: CREATE TABLE `t_article_operations` (
      `id` int(11) NOT NULL AUTO_INCREMENT,
      `linkId` varchar(36) NOT NULL,
      `column_key` varchar(32) NOT NULL DEFAULT '',
      `type` varchar(16) DEFAULT '',
      `operator` varchar(128) DEFAULT '',
      `operator_time` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
      PRIMARY KEY (`id`),
      KEY `article_operation` (`linkId`,`column_key`),
      KEY `operator_time` (`operator_time`)
    ) ENGINE=InnoDB AUTO_INCREMENT=23316 DEFAULT CHARSET=utf8
    1 row in set (0.00 sec)
    t_article_operations
           Table: t_articles
    Create Table: CREATE TABLE `t_articles` (
      `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
      `linkId` varchar(36) DEFAULT NULL,
      `source` int(11) DEFAULT '0',
      `title` varchar(150) NOT NULL,
      `author` varchar(150) NOT NULL,
      `tags` varchar(200) DEFAULT NULL,
      `abstract` varchar(512) DEFAULT NULL,
      `content` mediumtext,
      `thumbnail` varchar(256) DEFAULT NULL,
      `sourceId` varchar(24) DEFAULT '',
      `accessoryUrl` text,
      `relatedStock` text,
      `contentUrl` text,
      `secuInfo` text,
      `market` varchar(10) DEFAULT 'hk',
      `code` varchar(10) DEFAULT '',
      `updator` varchar(64) DEFAULT '',
      `createTime` timestamp NULL DEFAULT CURRENT_TIMESTAMP COMMENT '建立时间',
      `updateTime` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间',
      PRIMARY KEY (`id`),
      UNIQUE KEY `linkId` (`linkId`)
    ) ENGINE=InnoDB AUTO_INCREMENT=15282 DEFAULT CHARSET=utf8
    t_articles

    上面四张表,由上面的自增字段的值可以知道表的数据并不是很大,最大的表也就2万多行,表中的索引情况已经一目了然。开发同学给出的sql语句如下:

    (
      SELECT
        'daily' AS category,
        e.linkId,
        e.title,
        e.updateTime
      FROM
        (
          SELECT DISTINCT
            b.column_key,
            b. STATUS,
            b.linkId
          FROM
            t_info_setting a
          inner JOIN t_articles_status b ON a.column_key = b.column_key
          inner JOIN t_article_operations c ON b.column_key = c.column_key
          WHERE
            a.parent_key = 'daily'
          AND a. STATUS = 1
          AND b. STATUS = 80000
          ORDER BY
            c.operator_time DESC
          LIMIT 1
        ) AS d
      inner JOIN t_articles e ON d.linkId = e.linkId
    )
    UNION ALL
      (
        SELECT
          'ipo' AS category,
          e.linkId,
          e.title,
          e.updateTime
        FROM
          (
            SELECT DISTINCT
              b.column_key,
              b. STATUS,
              b.linkId
            FROM
              t_info_setting a
            inner JOIN t_articles_status b ON a.column_key = b.column_key
            inner JOIN t_article_operations c ON b.column_key = c.column_key
            WHERE
              a.parent_key = 'ipo'
            AND a. STATUS = 1
            AND b. STATUS = 80000
            ORDER BY
              c.operator_time DESC
            LIMIT 1
          ) AS d
        inner JOIN t_articles e ON d.linkId = e.linkId
      )
    UNION ALL
      (
        SELECT
          'research' AS category,
          e.linkId,
          e.title,
          e.updateTime
        FROM
          (
            SELECT DISTINCT
              b.column_key,
              b. STATUS,
              b.linkId
            FROM
              t_info_setting a
            inner JOIN t_articles_status b ON a.column_key = b.column_key
            inner JOIN t_article_operations c ON b.column_key = c.column_key
            WHERE
              a.parent_key = 'research'
            AND a. STATUS = 1
            AND b. STATUS = 80000
            ORDER BY
              c.operator_time DESC
            LIMIT 1
          ) AS d
        inner JOIN t_articles e ON d.linkId = e.linkId
      )
    UNION ALL
      (
        SELECT
          'news' AS category,
          e.linkId,
          e.title,
          e.updateTime
        FROM
          (
            SELECT DISTINCT
              b.column_key,
              b. STATUS,
              b.linkId
            FROM
              t_info_setting a
            inner JOIN t_articles_status b ON a.column_key = b.column_key
            inner JOIN t_article_operations c ON b.column_key = c.column_key
            WHERE
              a.parent_key = 'news'
            AND a. STATUS = 1
            AND b. STATUS = 80000
            ORDER BY
              c.operator_time DESC
            LIMIT 1
          ) AS d
        inner JOIN t_articles e ON d.linkId = e.linkId
      )
    开发给的sql

    原sql很长大概有107行,但是分析这条sql发现了使用了三个union联合查询,然后每条联合的sql语句基本是一模一样的,只是改变了a.parent_key = 'research'这个条件。这说明我们只需要分析其中的一条sql即可。

            SELECT
                'research' AS category,
                e.linkId,
                e.title,
                e.updateTime
            FROM
                (                             -- 这里使用了派生表
                    SELECT DISTINCT           --a
                        b.column_key,
                        b. STATUS,
                        b.linkId
                    FROM
                        t_info_setting a
                    inner JOIN t_articles_status b ON a.column_key = b.column_key
                    inner JOIN t_article_operations c ON b.column_key = c.column_key           -- c
                    WHERE
                        a.parent_key = 'research'
                    AND a. STATUS = 1
                    AND b. STATUS = 80000
                    ORDER BY
                        c.operator_time DESC
                    LIMIT 1
                ) AS d
            inner JOIN t_articles e ON d.linkId = e.linkId           -- b

    首先:这条sql语句中使用了派生表,分析里面的子查询,最后有一个limit 1也就是只查出一条数据,并且是按照operator_time 进行排序,那么distinct的去重复就是不需要的。再看子查询中查询出了三个字段,但是在b处和e表进行联合查询的时候只使用了linkId 这一个字段,因此子查询中多余的两个字段需要去掉。

    在表t_article_operations上有一个符合索引,我们知道mysql在使用复合索引时,采用最左原则,因此在c处的联合查询我们需要加上linkId ,根据上面分析,改写sql如下:

    select
        'research' as category,
        e.linkId,
        e.title,
        e.updateTime
    from (
        select b.linkId -- 去除不必要的列、distinct操作
        from t_info_setting a
        inner join t_articles_status b
            on a.column_key=b.column_key
        inner join t_article_operations c
            on b.linkId=c.linkId and b.column_key=c.column_key -- 关联条件应包含linkId
        where
            a.parent_key='research'
            and a.status=1
            and b.status=80000
        order by c.operator_time desc
        limit 1
    ) d
    inner join t_articles e
        on d.linkId=e.linkId;

    然后查看下改写前后两个sql的执行计划。

    改写后的执行计划:

    +----+-------------+------------+--------+-------------------+----------------+---------+-----------------------------------------------------------+-------+-------------+
    | id | select_type | table      | type   | possible_keys     | key            | key_len | ref                                                       | rows  | Extra       |
    +----+-------------+------------+--------+-------------------+----------------+---------+-----------------------------------------------------------+-------+-------------+
    |  1 | PRIMARY     | <derived2> | system | NULL              | NULL           | NULL    | NULL                                                      |     1 | NULL        |
    |  1 | PRIMARY     | e          | const  | linkId            | linkId         | 111     | const                                                     |     1 | NULL        |
    |  2 | DERIVED     | c          | index  | article_operation | operator_time  | 5       | NULL                                                      | 14711 | NULL        |
    |  2 | DERIVED     | a          | ref    | column_key        | column_key     | 98      | wlb_live_contents.c.column_key                            |     1 | Using where |
    |  2 | DERIVED     | b          | ref    | article_status    | article_status | 208     | wlb_live_contents.c.linkId,wlb_live_contents.c.column_key |     1 | Using where |
    +----+-------------+------------+--------+-------------------+----------------+---------+-----------------------------------------------------------+-------+-------------+

    改写之后的单个sql很快就有了结果,大概0.12秒就有了结束,而原来的sql会超时结束的。

    在原sql语句中使用了union,因为我们最后的结果并不要求去重复,只是四个结果集的联合,因此这里我们可以使用union all代替上面的union。

    改写后的执行计划DERIVED表示使用了派生表,我们看到在e表与派生表进行inner查询的使用了索引。

    分析:

    之前看到一种说法是,在数据表和派生表联合进行查询时,不能使用索引,但是上面的的执行计划说明使用了索引(e表和派生表联合查询,e表使用了索引)。【究竟要怎么用还需进一步研究】

    改写sql

    上面使用了派生表,其实数据量比较大时,派生表的效率并不是很高的,上面的查询我们试着用4张表的联合查询来改写。

    改写之后的sql如下:

    SELECT
        'research' AS category,
        e.linkId,
        e.title,
        e.updateTime
    FROM
        t_info_setting a
    INNER JOIN t_articles_status b ON a.column_key = b.column_key
    INNER JOIN t_article_operations c ON b.linkId = c.linkId
    AND b.column_key = c.column_key
    INNER JOIN t_articles e ON c.linkId = e.linkId
    WHERE
        a.parent_key = 'research'
    AND a. STATUS = 1
    AND b. STATUS = 80000
    ORDER BY
        c.operator_time DESC
    LIMIT 1

    查看执行计划:

    +----+-------------+-------+-------+-------------------+----------------+---------+-----------------------------------------------------------+------+-------------+
    | id | select_type | table | type  | possible_keys     | key            | key_len | ref                                                       | rows | Extra       |
    +----+-------------+-------+-------+-------------------+----------------+---------+-----------------------------------------------------------+------+-------------+
    |  1 | SIMPLE      | c     | index | article_operation | operator_time  | 5       | NULL                                                      |    1 | NULL        |
    |  1 | SIMPLE      | a     | ref   | column_key        | column_key     | 98      | wlb_live_contents.c.column_key                            |    1 | Using where |
    |  1 | SIMPLE      | b     | ref   | article_status    | article_status | 208     | wlb_live_contents.c.linkId,wlb_live_contents.c.column_key |    1 | Using where |
    |  1 | SIMPLE      | e     | ref   | linkId            | linkId         | 111     | wlb_live_contents.c.linkId                                |    1 | NULL        |
    +----+-------------+-------+-------+-------------------+----------------+---------+-----------------------------------------------------------+------+-------------+

    根据执行计划,这个inner join的执行计划是要比上面的使用派生表的执行计划要高一些。

    说明:

    1:在使用联合查询的时候,可以考虑联合查询的键上创建索引,效率可能会高点。

    2:可以考虑在order by的键上创建索引。

    3:根据数据可以知道,t_article_operations本质上是一个流水表,记录日志类信息,不应出现在日常查询中。解决此种查询的办法:operator_time保存在t_articles_status中,查询彻底移除t_article_operations,或临时方法:t_article_operations只保留短期数据,历史记录定期迁移至其他表。

  • 相关阅读:
    学习淘宝指数有感
    STL学习小结
    Java里泛型有什么作用
    android 内存泄漏分析技巧
    学道1.3
    严苛模式(StrictMode)
    年龄大了还能够学习编程吗
    ORACLE EXP命令
    数学之路-python计算实战(13)-机器视觉-图像增强
    《C语言编写 学生成绩管理系统》
  • 原文地址:https://www.cnblogs.com/wxzhe/p/11495349.html
Copyright © 2011-2022 走看看