zoukankan      html  css  js  c++  java
  • sql优化-派生表与inner-join

    首先来说明一下派生表

    外部的表查询的结果集是从子查询中生成的.如下形式:

    select ... from (select ....) dt

    如上形式中括号中的查询的结果作为外面select语句的查询源,派生表必须指定别名,因此后面的dt必须指定。派生表和临时表差不多,但是在select语句中派生表比临时表要容易,因为派生表不用创建。

    一个有关派生表优化的实例。

    开发同事发来一个sql优化,涉及到4张表,表中的数据都不是很大,但是查询起来真的很慢。服务器性能又差,查询总是超时。

    四张表的表结构如下:

           Table: t_info_setting
    Create Table: CREATE TABLE `t_info_setting` (
      `id` int(11) NOT NULL AUTO_INCREMENT,
      `parent_key` varchar(32) NOT NULL,
      `column_name` varchar(32) NOT NULL,
      `column_key` varchar(32) NOT NULL,
      `storage_way` tinyint(4) DEFAULT '0',
      `check_way` tinyint(4) DEFAULT '0',
      `remark` varchar(500) DEFAULT '',
      `operator` varchar(128) DEFAULT '',
      `status` int(11) DEFAULT '1',
      `update_time` datetime DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
      PRIMARY KEY (`id`),
      KEY `column_key` (`column_key`)
    ) ENGINE=InnoDB AUTO_INCREMENT=9 DEFAULT CHARSET=utf8
    t_info_setting
           Table: t_articles_status
    Create Table: CREATE TABLE `t_articles_status` (
      `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
      `linkId` varchar(36) NOT NULL,
      `column_key` varchar(32) NOT NULL,
      `status` int(11) DEFAULT '50000',
      `operator_time` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
      PRIMARY KEY (`id`),
      KEY `article_status` (`linkId`,`column_key`)
    ) ENGINE=InnoDB AUTO_INCREMENT=22232 DEFAULT CHARSET=utf8
    1 row in set (0.00 sec)
    t_articles_status
           Table: t_article_operations
    Create Table: CREATE TABLE `t_article_operations` (
      `id` int(11) NOT NULL AUTO_INCREMENT,
      `linkId` varchar(36) NOT NULL,
      `column_key` varchar(32) NOT NULL DEFAULT '',
      `type` varchar(16) DEFAULT '',
      `operator` varchar(128) DEFAULT '',
      `operator_time` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
      PRIMARY KEY (`id`),
      KEY `article_operation` (`linkId`,`column_key`),
      KEY `operator_time` (`operator_time`)
    ) ENGINE=InnoDB AUTO_INCREMENT=23316 DEFAULT CHARSET=utf8
    1 row in set (0.00 sec)
    t_article_operations
           Table: t_articles
    Create Table: CREATE TABLE `t_articles` (
      `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
      `linkId` varchar(36) DEFAULT NULL,
      `source` int(11) DEFAULT '0',
      `title` varchar(150) NOT NULL,
      `author` varchar(150) NOT NULL,
      `tags` varchar(200) DEFAULT NULL,
      `abstract` varchar(512) DEFAULT NULL,
      `content` mediumtext,
      `thumbnail` varchar(256) DEFAULT NULL,
      `sourceId` varchar(24) DEFAULT '',
      `accessoryUrl` text,
      `relatedStock` text,
      `contentUrl` text,
      `secuInfo` text,
      `market` varchar(10) DEFAULT 'hk',
      `code` varchar(10) DEFAULT '',
      `updator` varchar(64) DEFAULT '',
      `createTime` timestamp NULL DEFAULT CURRENT_TIMESTAMP COMMENT '建立时间',
      `updateTime` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间',
      PRIMARY KEY (`id`),
      UNIQUE KEY `linkId` (`linkId`)
    ) ENGINE=InnoDB AUTO_INCREMENT=15282 DEFAULT CHARSET=utf8
    t_articles

    上面四张表,由上面的自增字段的值可以知道表的数据并不是很大,最大的表也就2万多行,表中的索引情况已经一目了然。开发同学给出的sql语句如下:

    (
      SELECT
        'daily' AS category,
        e.linkId,
        e.title,
        e.updateTime
      FROM
        (
          SELECT DISTINCT
            b.column_key,
            b. STATUS,
            b.linkId
          FROM
            t_info_setting a
          inner JOIN t_articles_status b ON a.column_key = b.column_key
          inner JOIN t_article_operations c ON b.column_key = c.column_key
          WHERE
            a.parent_key = 'daily'
          AND a. STATUS = 1
          AND b. STATUS = 80000
          ORDER BY
            c.operator_time DESC
          LIMIT 1
        ) AS d
      inner JOIN t_articles e ON d.linkId = e.linkId
    )
    UNION ALL
      (
        SELECT
          'ipo' AS category,
          e.linkId,
          e.title,
          e.updateTime
        FROM
          (
            SELECT DISTINCT
              b.column_key,
              b. STATUS,
              b.linkId
            FROM
              t_info_setting a
            inner JOIN t_articles_status b ON a.column_key = b.column_key
            inner JOIN t_article_operations c ON b.column_key = c.column_key
            WHERE
              a.parent_key = 'ipo'
            AND a. STATUS = 1
            AND b. STATUS = 80000
            ORDER BY
              c.operator_time DESC
            LIMIT 1
          ) AS d
        inner JOIN t_articles e ON d.linkId = e.linkId
      )
    UNION ALL
      (
        SELECT
          'research' AS category,
          e.linkId,
          e.title,
          e.updateTime
        FROM
          (
            SELECT DISTINCT
              b.column_key,
              b. STATUS,
              b.linkId
            FROM
              t_info_setting a
            inner JOIN t_articles_status b ON a.column_key = b.column_key
            inner JOIN t_article_operations c ON b.column_key = c.column_key
            WHERE
              a.parent_key = 'research'
            AND a. STATUS = 1
            AND b. STATUS = 80000
            ORDER BY
              c.operator_time DESC
            LIMIT 1
          ) AS d
        inner JOIN t_articles e ON d.linkId = e.linkId
      )
    UNION ALL
      (
        SELECT
          'news' AS category,
          e.linkId,
          e.title,
          e.updateTime
        FROM
          (
            SELECT DISTINCT
              b.column_key,
              b. STATUS,
              b.linkId
            FROM
              t_info_setting a
            inner JOIN t_articles_status b ON a.column_key = b.column_key
            inner JOIN t_article_operations c ON b.column_key = c.column_key
            WHERE
              a.parent_key = 'news'
            AND a. STATUS = 1
            AND b. STATUS = 80000
            ORDER BY
              c.operator_time DESC
            LIMIT 1
          ) AS d
        inner JOIN t_articles e ON d.linkId = e.linkId
      )
    开发给的sql

    原sql很长大概有107行,但是分析这条sql发现了使用了三个union联合查询,然后每条联合的sql语句基本是一模一样的,只是改变了a.parent_key = 'research'这个条件。这说明我们只需要分析其中的一条sql即可。

            SELECT
                'research' AS category,
                e.linkId,
                e.title,
                e.updateTime
            FROM
                (                             -- 这里使用了派生表
                    SELECT DISTINCT           --a
                        b.column_key,
                        b. STATUS,
                        b.linkId
                    FROM
                        t_info_setting a
                    inner JOIN t_articles_status b ON a.column_key = b.column_key
                    inner JOIN t_article_operations c ON b.column_key = c.column_key           -- c
                    WHERE
                        a.parent_key = 'research'
                    AND a. STATUS = 1
                    AND b. STATUS = 80000
                    ORDER BY
                        c.operator_time DESC
                    LIMIT 1
                ) AS d
            inner JOIN t_articles e ON d.linkId = e.linkId           -- b

    首先:这条sql语句中使用了派生表,分析里面的子查询,最后有一个limit 1也就是只查出一条数据,并且是按照operator_time 进行排序,那么distinct的去重复就是不需要的。再看子查询中查询出了三个字段,但是在b处和e表进行联合查询的时候只使用了linkId 这一个字段,因此子查询中多余的两个字段需要去掉。

    在表t_article_operations上有一个符合索引,我们知道mysql在使用复合索引时,采用最左原则,因此在c处的联合查询我们需要加上linkId ,根据上面分析,改写sql如下:

    select
        'research' as category,
        e.linkId,
        e.title,
        e.updateTime
    from (
        select b.linkId -- 去除不必要的列、distinct操作
        from t_info_setting a
        inner join t_articles_status b
            on a.column_key=b.column_key
        inner join t_article_operations c
            on b.linkId=c.linkId and b.column_key=c.column_key -- 关联条件应包含linkId
        where
            a.parent_key='research'
            and a.status=1
            and b.status=80000
        order by c.operator_time desc
        limit 1
    ) d
    inner join t_articles e
        on d.linkId=e.linkId;

    然后查看下改写前后两个sql的执行计划。

    改写后的执行计划:

    +----+-------------+------------+--------+-------------------+----------------+---------+-----------------------------------------------------------+-------+-------------+
    | id | select_type | table      | type   | possible_keys     | key            | key_len | ref                                                       | rows  | Extra       |
    +----+-------------+------------+--------+-------------------+----------------+---------+-----------------------------------------------------------+-------+-------------+
    |  1 | PRIMARY     | <derived2> | system | NULL              | NULL           | NULL    | NULL                                                      |     1 | NULL        |
    |  1 | PRIMARY     | e          | const  | linkId            | linkId         | 111     | const                                                     |     1 | NULL        |
    |  2 | DERIVED     | c          | index  | article_operation | operator_time  | 5       | NULL                                                      | 14711 | NULL        |
    |  2 | DERIVED     | a          | ref    | column_key        | column_key     | 98      | wlb_live_contents.c.column_key                            |     1 | Using where |
    |  2 | DERIVED     | b          | ref    | article_status    | article_status | 208     | wlb_live_contents.c.linkId,wlb_live_contents.c.column_key |     1 | Using where |
    +----+-------------+------------+--------+-------------------+----------------+---------+-----------------------------------------------------------+-------+-------------+

    改写之后的单个sql很快就有了结果,大概0.12秒就有了结束,而原来的sql会超时结束的。

    在原sql语句中使用了union,因为我们最后的结果并不要求去重复,只是四个结果集的联合,因此这里我们可以使用union all代替上面的union。

    改写后的执行计划DERIVED表示使用了派生表,我们看到在e表与派生表进行inner查询的使用了索引。

    分析:

    之前看到一种说法是,在数据表和派生表联合进行查询时,不能使用索引,但是上面的的执行计划说明使用了索引(e表和派生表联合查询,e表使用了索引)。【究竟要怎么用还需进一步研究】

    改写sql

    上面使用了派生表,其实数据量比较大时,派生表的效率并不是很高的,上面的查询我们试着用4张表的联合查询来改写。

    改写之后的sql如下:

    SELECT
        'research' AS category,
        e.linkId,
        e.title,
        e.updateTime
    FROM
        t_info_setting a
    INNER JOIN t_articles_status b ON a.column_key = b.column_key
    INNER JOIN t_article_operations c ON b.linkId = c.linkId
    AND b.column_key = c.column_key
    INNER JOIN t_articles e ON c.linkId = e.linkId
    WHERE
        a.parent_key = 'research'
    AND a. STATUS = 1
    AND b. STATUS = 80000
    ORDER BY
        c.operator_time DESC
    LIMIT 1

    查看执行计划:

    +----+-------------+-------+-------+-------------------+----------------+---------+-----------------------------------------------------------+------+-------------+
    | id | select_type | table | type  | possible_keys     | key            | key_len | ref                                                       | rows | Extra       |
    +----+-------------+-------+-------+-------------------+----------------+---------+-----------------------------------------------------------+------+-------------+
    |  1 | SIMPLE      | c     | index | article_operation | operator_time  | 5       | NULL                                                      |    1 | NULL        |
    |  1 | SIMPLE      | a     | ref   | column_key        | column_key     | 98      | wlb_live_contents.c.column_key                            |    1 | Using where |
    |  1 | SIMPLE      | b     | ref   | article_status    | article_status | 208     | wlb_live_contents.c.linkId,wlb_live_contents.c.column_key |    1 | Using where |
    |  1 | SIMPLE      | e     | ref   | linkId            | linkId         | 111     | wlb_live_contents.c.linkId                                |    1 | NULL        |
    +----+-------------+-------+-------+-------------------+----------------+---------+-----------------------------------------------------------+------+-------------+

    根据执行计划,这个inner join的执行计划是要比上面的使用派生表的执行计划要高一些。

    说明:

    1:在使用联合查询的时候,可以考虑联合查询的键上创建索引,效率可能会高点。

    2:可以考虑在order by的键上创建索引。

    3:根据数据可以知道,t_article_operations本质上是一个流水表,记录日志类信息,不应出现在日常查询中。解决此种查询的办法:operator_time保存在t_articles_status中,查询彻底移除t_article_operations,或临时方法:t_article_operations只保留短期数据,历史记录定期迁移至其他表。

  • 相关阅读:
    C#利用反射动态调用类及方法
    系统程序监控软件
    SQL server 2008 安装和远程访问的问题
    sql server 创建临时表
    IIS 时间问题
    windows 2008 安装 sql server 2008
    sql server xml nodes 的使用
    Window 7sp1 安装vs2010 sp1 打开xaml文件崩溃
    CSS资源网址
    Could not load type 'System.ServiceModel.Activation.HttpModule' from assembly 'System.ServiceModel, Version=3.0.0.0
  • 原文地址:https://www.cnblogs.com/wxzhe/p/11495349.html
Copyright © 2011-2022 走看看