zoukankan      html  css  js  c++  java
  • Mysql Order BY

    order by

    默认情况下,MySQL对GROUP BY col1,col2,...查询进行排序,就好像您在查询中还包含了ORDER BY col1,col2,...一样。如果您包含一个包含相同列列表的显式ORDER BY子句,则MySQL会对其进行优化,而不会造成任何速度损失,尽管排序仍然会发生。如果查询包含GROUP BY,但您希望避免对结果进行排序的开销,则可以通过指定ORDER BY NULL来抑制排序。例如:

    INSERT INTO foo
    SELECT a, COUNT(*) FROM bar GROUP BY a ORDER BY NULL;
    

    优化器可能仍选择使用排序来实现分组操作。 ORDER BY NULL禁止对结果排序,而不是通过分组操作确定结果的先前排序。

    Mysql 在执行排序操作的时候有以下三种方案:

    • sort By index
    • file sort
    • priority queue

    sort By Index

    因为B+tree 的特性,在叶子节点会存储索引 对应的列数据 和 对应的主键列数据,以双向列表的形式顺序存储。所以在某些情况下,MySQL可以使用索引来满足ORDER BY子句,并避免执行文件排序操作时涉及的额外排序。

    mysql> desc orderIndex;
    +-------+------+------+-----+---------+----------------+
    | Field | Type | Null | Key | Default | Extra          |
    +-------+------+------+-----+---------+----------------+
    | id    | int  | NO   | PRI | NULL    | auto_increment |
    | a     | int  | YES  | MUL | NULL    |                |
    | b     | int  | YES  | MUL | NULL    |                |
    | c     | int  | YES  |     | NULL    |                |
    +-------+------+------+-----+---------+----------------+
    4 rows in set (0.00 sec)
    
    mysql> show index from orderIndex;
    +------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
    | Table      | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
    +------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
    | orderIndex |          0 | PRIMARY  |            1 | id          | A         |        4878 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
    | orderIndex |          1 | b        |            1 | b           | A         |        4878 |     NULL |   NULL | YES  | BTREE      |         |               | YES     | NULL       |
    | orderIndex |          1 | a_2      |            1 | a           | A         |        5000 |     NULL |   NULL | YES  | BTREE      |         |               | YES     | NULL       |
    | orderIndex |          1 | a_2      |            2 | b           | A         |        5000 |     NULL |   NULL | YES  | BTREE      |         |               | YES     | NULL       |
    +------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
    4 rows in set (0.00 sec)
    

    如下例所示:

    mysql> explain select b from orderIndex order by a limit 10;
    +----+-------------+------------+------------+-------+---------------+------+---------+------+------+----------+-------------+
    | id | select_type | table      | partitions | type  | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
    +----+-------------+------------+------------+-------+---------------+------+---------+------+------+----------+-------------+
    |  1 | SIMPLE      | orderIndex | NULL       | index | NULL          | a_2  | 10      | NULL |   10 |   100.00 | Using index |
    +----+-------------+------------+------------+-------+---------------+------+---------+------+------+----------+-------------+
    1 row in set, 1 warning (0.00 sec)
    
    

    extra中仅仅显示为 Using Index 表明此查询无需去主键索引查询表数据,仅仅使用二级索引即可满足查询语句(即索引覆盖),也无需执行额外排序,下例也是同理:

    mysql> explain select b from orderIndex order by a DESC limit 10;
    +----+-------------+------------+------------+-------+---------------+------+---------+------+------+----------+----------------------------------+
    | id | select_type | table      | partitions | type  | possible_keys | key  | key_len | ref  | rows | filtered | Extra                            |
    +----+-------------+------------+------------+-------+---------------+------+---------+------+------+----------+----------------------------------+
    |  1 | SIMPLE      | orderIndex | NULL       | index | NULL          | a_2  | 10      | NULL |   10 |   100.00 | Backward index scan; Using index |
    +----+-------------+------------+------------+-------+---------------+------+---------+------+------+----------+----------------------------------+
    1 row in set, 1 warning (0.00 sec)
    

    有些查询语句及时无法达到索引覆盖的效果,但仍然可以依赖 index 避免执行额外的 排序 操作。即使ORDER BY与索引不完全匹配,也可以使用索引,只要索引的所有未使用部分和所有额外的ORDER BY列在WHERE子句中都是常量即可。如下例所示:

    mysql> explain select * from orderIndex where a=100 order by b;
    +----+-------------+------------+------------+------+---------------+------+---------+-------+------+----------+-------+
    | id | select_type | table      | partitions | type | possible_keys | key  | key_len | ref   | rows | filtered | Extra |
    +----+-------------+------------+------------+------+---------------+------+---------+-------+------+----------+-------+
    |  1 | SIMPLE      | orderIndex | NULL       | ref  | a_2           | a_2  | 5       | const |    1 |   100.00 | NULL  |
    +----+-------------+------------+------------+------+---------------+------+---------+-------+------+----------+-------+
    1 row in set, 1 warning (0.00 sec)
    
    explain select * from orderIndex where a=100 order by b DESC;
    +----+-------------+------------+------------+------+---------------+------+---------+-------+------+----------+---------------------+
    | id | select_type | table      | partitions | type | possible_keys | key  | key_len | ref   | rows | filtered | Extra               |
    +----+-------------+------------+------------+------+---------------+------+---------+-------+------+----------+---------------------+
    |  1 | SIMPLE      | orderIndex | NULL       | ref  | a_2           | a_2  | 5       | const |    1 |   100.00 | Backward index scan |
    +----+-------------+------------+------------+------+---------------+------+---------+-------+------+----------+---------------------+
    1 row in set, 1 warning (0.00 sec)
    
    

    如果索引不包含查询访问的所有列,则仅当索引访问比其他访问方法便宜时才使用索引。

    # 索引包含所有查询列,没有使用额外排序
    mysql> explain select a,b from orderIndex order by a, b;
    +----+-------------+------------+------------+-------+---------------+------+---------+------+------+----------+-------------+
    | id | select_type | table      | partitions | type  | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
    +----+-------------+------------+------------+-------+---------------+------+---------+------+------+----------+-------------+
    |  1 | SIMPLE      | orderIndex | NULL       | index | NULL          | a_2  | 10      | NULL | 5000 |   100.00 | Using index |
    +----+-------------+------------+------------+-------+---------------+------+---------+------+------+----------+-------------+
    1 row in set, 1 warning (0.00 sec)
    
    # 索引不包含所有查询列,使用了额外排序
    mysql> explain select * from orderIndex order by a, b;
    +----+-------------+------------+------------+------+---------------+------+---------+------+------+----------+----------------+
    | id | select_type | table      | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra          |
    +----+-------------+------------+------------+------+---------------+------+---------+------+------+----------+----------------+
    |  1 | SIMPLE      | orderIndex | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 5000 |   100.00 | Using filesort |
    +----+-------------+------------+------------+------+---------------+------+---------+------+------+----------+----------------+
    1 row in set, 1 warning (0.00 sec)
    

    但有时一些查询语句单单依赖index是无法满足 排序语句的, 必须要进行额外排序:

    # 满足索引覆盖条件
    mysql> explain select a,b from orderIndex order by b;
    +----+-------------+------------+------------+-------+---------------+------+---------+------+------+----------+-----------------------------+
    | id | select_type | table      | partitions | type  | possible_keys | key  | key_len | ref  | rows | filtered | Extra                       |
    +----+-------------+------------+------------+-------+---------------+------+---------+------+------+----------+-----------------------------+
    |  1 | SIMPLE      | orderIndex | NULL       | index | NULL          | a_2  | 10      | NULL | 5000 |   100.00 | Using index; Using filesort |
    +----+-------------+------------+------------+-------+---------------+------+---------+------+------+----------+-----------------------------+
    1 row in set, 1 warning (0.00 sec)
    

    注意,以上示例有在不同情况下执行结果可能会发生变动,比如 引擎认为全表扫描然后排序的开销 比依赖索引然后回表查询的 开销更低的情况下 则会使用额外的排序操作。

    file sort

    如上文所述,当order BY index 无法满足的时候,必须要通过额外的排序手段才能完成排序任务。mysql使用 file sort方法来实现额外排序,需要我们注意的是 file sort 并不意味一定要创建临时文件,mysql优化器会为file sort的执行开辟一个块内存缓存,它的预置大小是由 sort_buffer_size参数控制的,开发者可以根据需求和场景去调整,各个会话可以根据需要更改此变量的会话值,以避免过多的内存使用,或根据需要分配更多的内存。当数据缓存超出了sort_buffer上限时就会转存储到多个临时文件中,这和mysql一贯的设计思想是一致的:如果内存足够就放在内存,如果内存不够再开辟磁盘文件存储,尽量减少IO开销

    现在就会分为两种情况:

    1. 排序数据小于 sort_buffer_size
    2. 排序数据大于 sort_buffer_size

    这两种不同情况下采用的排序策略是不一样的,第一种情况,直接在内存中进行排序,此时使用的排序算法要根据具体的语句去分析;第二种情况下数据会被划分为多个文件,在每个文件内部保证数据的有序(写文件前进行排序),然后再用归并排序对各个文件数据进行总排序;下面做一个简单的验证:

    mysql> set optimizer_trace='enabled=on';
    mysql> select count(*) from orderIndex;
    +----------+
    | count(*) |
    +----------+
    |     5000 |
    +----------+
    1 row in set (0.02 sec)
    
    mysql> show variables like 'sort_buffer_size';
    +------------------+--------+
    | Variable_name    | Value  |
    +------------------+--------+
    | sort_buffer_size | 262144 |
    +------------------+--------+
    1 row in set (0.00 sec)
    
    

    filesort_priority_queue

    此时在表orderIndex中包含5000条数据,sort_buffer_size = 262144,下面根据c列进行全表排序,因为c没有索引,所以肯定会触发file sort

    mysql> select * from orderIndex order by c;
    ...
    5000 rows in set (0.01 sec)
    
    mysql> select * from information_schema.OPTIMIZER_TRACEG;
    {
    ...
      "join_execution": {
        "select#": 1,
        "steps": [
          {
            "sorting_table": "orderIndex",
            "filesort_information": [
              {
                "direction": "asc",
                "expression": "`orderIndex`.`c`"
              }
            ],
            "filesort_priority_queue_optimization": {
              "usable": false,
              "cause": "not applicable (no LIMIT)"
            },
            "filesort_execution": [
            ],
            "filesort_summary": {
              "memory_available": 262144,
              "key_size": 9,
              "row_size": 26,
              "max_rows_per_buffer": 7710,
              "num_rows_estimate": 11234,
              "num_rows_found": 5000,
              "num_initial_chunks_spilled_to_disk": 0,
              "peak_memory_used": 221184,
              "sort_algorithm": "std::stable_sort",
              "unpacked_addon_fields": "skip_heuristic",
              "sort_mode": "<fixed_sort_key, additional_fields>"
            }
          }
        ]
      }
    ...
    }  
    

    我们通过optimizer_trace可以看出row_size=26,整个表的数据无法填满sort_buffer,可以直接在内存之中使用快速排序算法进行排序。而filesort_priority_queue_optimization项显示并未开启优先队列排序,原因是没有使用Limit,我们现在再加上Limit进行查询:

    mysql> select * from orderIndex order by c limit 5000;
    ... ...
    5000 rows in set (0.00 sec)
    
    mysql> select * from information_schema.OPTIMIZER_TRACEG;
    {
      ... ...
      "join_execution": {
        "select#": 1,
        "steps": [
          {
            "sorting_table": "orderIndex",
            "filesort_information": [
              {
                "direction": "asc",
                "expression": "`orderIndex`.`c`"
              }
            ],
            "filesort_priority_queue_optimization": {
              "limit": 5000,
              "chosen": true
            },
            "filesort_execution": [
            ],
            "filesort_summary": {
              "memory_available": 262144,
              "key_size": 9,
              "row_size": 26,
              "max_rows_per_buffer": 5001,
              "num_rows_estimate": 11234,
              "num_rows_found": 5000,
              "num_initial_chunks_spilled_to_disk": 0,
              "peak_memory_used": 170034,
              "sort_algorithm": "std::stable_sort",
              "unpacked_addon_fields": "using_priority_queue",
              "sort_mode": "<fixed_sort_key, additional_fields>"
            }
          }
        ]
      }
      ... ...
    }
    

    可以看出现在是开启了filesort_priority_queue_optimization,也就是说只有使用Limit时会触发优先队列排序优化,优先队列排序是使用堆排序算法实现的,它执行的流程如下:

    1. 扫描表,将选择的每一列中的选择列表列按顺序插入队列。如果队列已满,请按排序顺序移出最后一行。
    2. 返回队列的前N行。 (如果指定了offset,请跳过前offset行,然后返回后N行。)

    如果是DESC排序则使用大堆法,否则使用小堆法。

    Optimization Using filesort

    现在调整sort_buffer_size来模拟排序数据量大于sort_buffer_size的情况:

    mysql> set sort_buffer_size=10240;
    Query OK, 0 rows affected, 1 warning (0.00 sec)
    
    mysql> show variables like 'sort_buffer_size';
    +------------------+-------+
    | Variable_name    | Value |
    +------------------+-------+
    | sort_buffer_size | 32768 |
    +------------------+-------+
    1 row in set (0.00 sec)
    

    我们设置了 10240,但小于sort_buffer_size的最小值 32768,所以默认设置为了最小值。

    mysql> select * from orderIndex order by c;
    ... ...
    5000 rows in set (0.00 sec)
    
    mysql> select * from information_schema.OPTIMIZER_TRACEG;
    {
    ...
      "join_execution": {
        "select#": 1,
        "steps": [
          {
            "sorting_table": "orderIndex",
            "filesort_information": [
              {
                "direction": "asc",
                "expression": "`orderIndex`.`c`"
              }
            ],
            "filesort_priority_queue_optimization": {
              "usable": false,
              "cause": "not applicable (no LIMIT)"
            },
            "filesort_execution": [
            ],
            "filesort_summary": {
              "memory_available": 32768,
              "key_size": 9,
              "row_size": 26,
              "max_rows_per_buffer": 1260,
              "num_rows_estimate": 11234,
              "num_rows_found": 5000,
              "num_initial_chunks_spilled_to_disk": 6,
              "peak_memory_used": 33256,
              "sort_algorithm": "std::stable_sort",
              "unpacked_addon_fields": "skip_heuristic",
              "sort_mode": "<fixed_sort_key, additional_fields>"
            }
          }
        ]
      }
    ...
    }
    

    我们此时可以清晰的看到num_initial_chunks_spilled_to_disk = 6(num_initial_chunks_spilled_to_disk表示在执行合并之前块的数量),说明是使用了临时文件进行内存外排序的,它的执行流程如下:

    1. 读取与WHERE子句匹配的行。
    2. 对于每一行,在排序缓冲区中存储一个元组,该元组由排序键值和查询引用的列组成。
    3. 当排序缓冲区已满时,按内存中的排序键值对元组进行排序,并将其写入临时文件。
    4. 对临时文件进行归并排序后,按排序顺序检索行,但直接从排序后的元组中读取查询所需的列,而不是第二次访问该表。

    此时我们需要额外说明的是,上面的流程是优化后的算法,是在5.6 version后面引进来的,原始的算法流程如下:

    1. 根据键或通过表扫描读取所有行。跳过与WHERE子句不匹配的行。
    2. 对于每一行,在排序缓冲区中存储一个由一对值(排序键值和行ID)组成的元组。
    3. 如果所有对都适合排序缓冲区,则不会创建临时文件。否则,当排序缓冲区已满时,请在内存中对其进行快速排序,然后将其写入临时文件。保存一个指向已排序块的指针。
    4. 重复上述步骤,直到已读取所有行。
    5. 将多达MERGEBUFF(7)个区合并到另一个临时文件中的一个块。重复直到第一个文件中的所有块都在第二个文件中。
    6. 重复以下步骤,直到剩余的块少于MERGEBUFF2(15)。
    7. 在最后一个多重合并中,仅行ID(值对的最后一部分)被写入结果文件。
    8. 使用结果文件中的行ID按排序顺序读取行。要对此进行优化,请读取大块的行ID,对它们进行排序,然后使用它们按排序顺序将行读取到行缓冲区中。行缓冲区大小是read_rnd_buffer_size系统变量值。此步骤的代码在sql / records.cc源文件中。

    max_length_for_sort_data系统变量会决定是使用 原始的算法还是优化后的算法,我们现在模拟一下原始的算法:

    mysql> show variables like 'max_length_for_sort_data';
    +--------------------------+-------+
    | Variable_name            | Value |
    +--------------------------+-------+
    | max_length_for_sort_data | 4096  |
    +--------------------------+-------+
    1 row in set (0.01 sec)
    
    mysql> set max_length_for_sort_data=16;
    Query OK, 0 rows affected, 1 warning (0.00 sec)
    
    mysql> show variables like 'max_length_for_sort_data';
    +--------------------------+-------+
    | Variable_name            | Value |
    +--------------------------+-------+
    | max_length_for_sort_data | 16    |
    +--------------------------+-------+
    1 row in set (0.00 sec)
    

    max_length_for_sort_data 设置为 16,row_size=26 >max_length_for_sort_data,此时会触发原始算法流程,但遗憾的是无法展示,因为在 8.0.20版本后此变量已经被舍弃了:

    mysql> select version();
    +-----------+
    | version() |
    +-----------+
    | 8.0.20    |
    +-----------+
    1 row in set (0.00 sec)
    

    此时再使用Limit [M], N进行查询,如果M + N 过大也会出现大于 sort_buffer_size的情况,需要进行内存外排序:

    mysql> select * from orderIndex order by c limit 2000, 20;
    
    mysql> select * from information_schema.OPTIMIZER_TRACEG;
    {
    ... ...
        "join_execution": {
        "select#": 1,
        "steps": [
          {
            "sorting_table": "orderIndex",
            "filesort_information": [
              {
                "direction": "asc",
                "expression": "`orderIndex`.`c`"
              }
            ],
            "filesort_priority_queue_optimization": {
              "limit": 2020
            },
            "filesort_execution": [
            ],
            "filesort_summary": {
              "memory_available": 32768,
              "key_size": 9,
              "row_size": 26,
              "max_rows_per_buffer": 1260,
              "num_rows_estimate": 11234,
              "num_rows_found": 5000,
              "num_initial_chunks_spilled_to_disk": 6,
              "peak_memory_used": 33256,
              "sort_algorithm": "std::stable_sort",
              "unpacked_addon_fields": "skip_heuristic",
              "sort_mode": "<fixed_sort_key, additional_fields>"
            }
          }
        ]
        }
    ... ...
    }
    

    其执行流程大致如下(存疑):

    1. 遍历整张表,重复这些步骤:(1)选择行,直到填充了排序缓冲区,(2)将缓冲区中的前N行(如果指定了M,则为M + N行)写入合并文件。
    2. 对合并文件进行排序,并返回前N行。 (如果指定了M,请跳过前M行,然后返回后N行。)

    参考网址

  • 相关阅读:
    BZOJ1217: [HNOI2003]消防局的设立
    BZOJ3925: [Zjoi2015]地震后的幻想乡
    BZOJ2328: [HNOI2011]赛车游戏
    BZOJ1011: [HNOI2008]遥远的行星
    BZOJ2007: [Noi2010]海拔
    BZOJ1833: [ZJOI2010]count 数字计数
    BZOJ1026: [SCOI2009]windy数
    BZOJ1196: [HNOI2006]公路修建问题
    BZOJ1076: [SCOI2008]奖励关
    BZOJ2752: [HAOI2012]高速公路(road)
  • 原文地址:https://www.cnblogs.com/cnblogs-wangzhipeng/p/13323675.html
Copyright © 2011-2022 走看看