zoukankan      html  css  js  c++  java
  • Mysql:Optimization and Indexes:优化-索引

    8.3 Optimization and Indexes

    The best way to improve the performance of SELECT operations is to create indexes on one or more of the columns that are tested in the query. The index entries act like pointers to the table rows, allowing the query to quickly determine which rows match a condition in the WHERE clause, and retrieve the other column values for those rows. All MySQL data types can be indexed.

    Although it can be tempting to create an indexes for every possible column used in a query, unnecessary indexes waste space and waste time for MySQL to determine which indexes to use. Indexes also add to the cost of inserts, updates, and deletes because each index must be updated. You must find the right balance to achieve fast queries using the optimal set of indexes.

    8.3.1 How MySQL Uses Indexes

    Indexes are used to find rows with specific column values quickly. Without an index, MySQL must begin with the first row and then read through the entire table to find the relevant rows. The larger the table, the more this costs. If the table has an index for the columns in question, MySQL can quickly determine the position to seek to in the middle of the data file without having to look at all the data. This is much faster than reading every row sequentially.

    Most MySQL indexes (PRIMARY KEY, UNIQUE, INDEX, and FULLTEXT) are stored in B-trees. Exceptions: Indexes on spatial data types use R-trees; MEMORY tables also support hash indexes; InnoDB uses inverted lists for FULLTEXT indexes.

    In general, indexes are used as described in the following discussion. Characteristics specific to hash indexes (as used in MEMORY tables) are described in Section 8.3.8, “Comparison of B-Tree and Hash Indexes”.

    MySQL uses indexes for these operations:

    • To find the rows matching a WHERE clause quickly.

    • To eliminate rows from consideration. If there is a choice between multiple indexes, MySQL normally uses the index that finds the smallest number of rows (the most selective index).

    • If the table has a multiple-column index, any leftmost prefix of the index can be used by the optimizer to look up rows. For example, if you have a three-column index on (col1, col2, col3), you have indexed search capabilities on (col1), (col1, col2), and (col1, col2, col3). For more information, see Section 8.3.5, “Multiple-Column Indexes”.

    • To retrieve rows from other tables when performing joins. MySQL can use indexes on columns more efficiently if they are declared as the same type and size. In this context, VARCHAR and CHAR are considered the same if they are declared as the same size. For example, VARCHAR(10) and CHAR(10) are the same size, but VARCHAR(10) and CHAR(15) are not.

      For comparisons between nonbinary string columns, both columns should use the same character set. For example, comparing a utf8 column with a latin1 column precludes use of an index.

      Comparison of dissimilar columns (comparing a string column to a temporal or numeric column, for example) may prevent use of indexes if values cannot be compared directly without conversion. For a given value such as 1 in the numeric column, it might compare equal to any number of values in the string column such as '1', ' 1', '00001', or '01.e1'. This rules out use of any indexes for the string column.

    • To find the MIN() or MAX() value for a specific indexed column key_col. This is optimized by a preprocessor that checks whether you are using WHERE key_part_N = constant on all key parts that occur before key_col in the index. In this case, MySQL does a single key lookup for each MIN() or MAX() expression and replaces it with a constant. If all expressions are replaced with constants, the query returns at once. For example:

      SELECT MIN(key_part2),MAX(key_part2)
        FROM tbl_name WHERE key_part1=10;
      
    • To sort or group a table if the sorting or grouping is done on a leftmost prefix of a usable index (for example, ORDER BY key_part1, key_part2). If all key parts are followed by DESC, the key is read in reverse order. See Section 8.2.1.14, “ORDER BY Optimization”, and Section 8.2.1.15, “GROUP BY Optimization”.

    • In some cases, a query can be optimized to retrieve values without consulting the data rows. (An index that provides all the necessary results for a query is called a covering index.) If a query uses from a table only columns that are included in some index, the selected values can be retrieved from the index tree for greater speed:

      SELECT key_part3 FROM tbl_name
        WHERE key_part1=1
      

    Indexes are less important for queries on small tables, or big tables where report queries process most or all of the rows. When a query needs to access most of the rows, reading sequentially is faster than working through an index. Sequential reads minimize disk seeks, even if not all the rows are needed for the query. See Section 8.2.1.20, “Avoiding Full Table Scans” for details.

    8.3.2 Primary Key Optimization

    The primary key for a table represents the column or set of columns that you use in your most vital queries. It has an associated index, for fast query performance. Query performance benefits from the NOT NULL optimization, because it cannot include any NULL values. With the InnoDB storage engine, the table data is physically organized to do ultra-fast lookups and sorts based on the primary key column or columns.

    If your table is big and important, but does not have an obvious column or set of columns to use as a primary key, you might create a separate column with auto-increment values to use as the primary key. These unique IDs can serve as pointers to corresponding rows in other tables when you join tables using foreign keys.

    8.3.3 Foreign Key Optimization

    If a table has many columns, and you query many different combinations of columns, it might be efficient to split the less-frequently used data into separate tables with a few columns each, and relate them back to the main table by duplicating the numeric ID column from the main table. That way, each small table can have a primary key for fast lookups of its data, and you can query just the set of columns that you need using a join operation. Depending on how the data is distributed, the queries might perform less I/O and take up less cache memory because the relevant columns are packed together on disk. (To maximize performance, queries try to read as few data blocks as possible from disk; tables with only a few columns can fit more rows in each data block.)

    8.3.4 Column Indexes

    The most common type of index involves a single column, storing copies of the values from that column in a data structure, allowing fast lookups for the rows with the corresponding column values. The B-tree data structure lets the index quickly find a specific value, a set of values, or a range of values, corresponding to operators such as =, >, , BETWEEN, IN, and so on, in a WHERE clause.

    The maximum number of indexes per table and the maximum index length is defined per storage engine. See Chapter 14, The InnoDB Storage Engine, and Chapter 15, Alternative Storage Engines. All storage engines support at least 16 indexes per table and a total index length of at least 256 bytes. Most storage engines have higher limits.

    For additional information about column indexes, see Section 13.1.14, “CREATE INDEX Statement”.

    Index Prefixes

    With col_name(N) syntax in an index specification for a string column, you can create an index that uses only the first N characters of the column. Indexing only a prefix of column values in this way can make the index file much smaller. When you index a BLOB or TEXT column, you must specify a prefix length for the index. For example:

    CREATE TABLE test (blob_col BLOB, INDEX(blob_col(10)));

    Prefixes can be up to 1000 bytes long (767 bytes for InnoDB tables, unless you have innodb_large_prefix set).

    Note

    Prefix limits are measured in bytes, whereas the prefix length in CREATE TABLE, ALTER TABLE, and CREATE INDEX statements is interpreted as number of characters for nonbinary string types (CHAR, VARCHAR, TEXT) and number of bytes for binary string types (BINARY, VARBINARY, BLOB). Take this into account when specifying a prefix length for a nonbinary string column that uses a multibyte character set.

    If a search term exceeds the index prefix length, the index is used to exclude non-matching rows, and the remaining rows are examined for possible matches.

    For additional information about index prefixes, see Section 13.1.14, “CREATE INDEX Statement”.

    FULLTEXT Indexes

    FULLTEXT indexes are used for full-text searches. Only the InnoDB and MyISAM storage engines support FULLTEXT indexes and only for CHAR, VARCHAR, and TEXT columns. Indexing always takes place over the entire column and column prefix indexing is not supported. For details, see Section 12.9, “Full-Text Search Functions”.

    Optimizations are applied to certain kinds of FULLTEXT queries against single InnoDB tables. Queries with these characteristics are particularly efficient:

    • FULLTEXT queries that only return the document ID, or the document ID and the search rank.

    • FULLTEXT queries that sort the matching rows in descending order of score and apply a LIMIT clause to take the top N matching rows. For this optimization to apply, there must be no WHERE clauses and only a single ORDER BY clause in descending order.

    • FULLTEXT queries that retrieve only the COUNT(*) value of rows matching a search term, with no additional WHERE clauses. Code the WHERE clause as WHERE MATCH(text) AGAINST ('other_text'), without any > 0 comparison operator.

    For queries that contain full-text expressions, MySQL evaluates those expressions during the optimization phase of query execution. The optimizer does not just look at full-text expressions and make estimates, it actually evaluates them in the process of developing an execution plan.

    An implication of this behavior is that EXPLAIN for full-text queries is typically slower than for non-full-text queries for which no expression evaluation occurs during the optimization phase.

    EXPLAIN for full-text queries may show Select tables optimized away in the Extra column due to matching occurring during optimization; in this case, no table access need occur during later execution.

    Spatial Indexes

    You can create indexes on spatial data types. MyISAM and InnoDB support R-tree indexes on spatial types. Other storage engines use B-trees for indexing spatial types (except for ARCHIVE, which does not support spatial type indexing).

    Indexes in the MEMORY Storage Engine

    The MEMORY storage engine uses HASH indexes by default, but also supports BTREE indexes.

    8.3.5 Multiple-Column Indexes

    MySQL can create composite indexes (that is, indexes on multiple columns). An index may consist of up to 16 columns. For certain data types, you can index a prefix of the column (see Section 8.3.4, “Column Indexes”).

    MySQL can use multiple-column indexes for queries that test all the columns in the index, or queries that test just the first column, the first two columns, the first three columns, and so on. If you specify the columns in the right order in the index definition, a single composite index can speed up several kinds of queries on the same table.

    A multiple-column index can be considered a sorted array, the rows of which contain values that are created by concatenating the values of the indexed columns.

    Note

    As an alternative to a composite index, you can introduce a column that is hashed” based on information from other columns. If this column is short, reasonably unique, and indexed, it might be faster than a wide” index on many columns. In MySQL, it is very easy to use this extra column:

    SELECT * FROM tbl_name
      WHERE hash_col=MD5(CONCAT(val1,val2))
      AND col1=val1 AND col2=val2;
    

    Suppose that a table has the following specification:

    CREATE TABLE test (
        id         INT NOT NULL,
        last_name  CHAR(30) NOT NULL,
        first_name CHAR(30) NOT NULL,
        PRIMARY KEY (id),
        INDEX name (last_name,first_name)
    );

    The name index is an index over the last_name and first_name columns. The index can be used for lookups in queries that specify values in a known range for combinations of last_name and first_name values. It can also be used for queries that specify just a last_name value because that column is a leftmost prefix of the index (as described later in this section). Therefore, the name index is used for lookups in the following queries:

    SELECT * FROM test WHERE last_name='Jones';
    
    SELECT * FROM test
      WHERE last_name='Jones' AND first_name='John';
    
    SELECT * FROM test
      WHERE last_name='Jones'
      AND (first_name='John' OR first_name='Jon');
    
    SELECT * FROM test
      WHERE last_name='Jones'
      AND first_name >='M' AND first_name < 'N';

    However, the name index is not used for lookups in the following queries:

    SELECT * FROM test WHERE first_name='John';
    
    SELECT * FROM test
      WHERE last_name='Jones' OR first_name='John';

    Suppose that you issue the following SELECT statement:

    SELECT * FROM tbl_name
      WHERE col1=val1 AND col2=val2;
    

    If a multiple-column index exists on col1 and col2, the appropriate rows can be fetched directly. If separate single-column indexes exist on col1 and col2, the optimizer attempts to use the Index Merge optimization (see Section 8.2.1.3, “Index Merge Optimization”), or attempts to find the most restrictive index by deciding which index excludes more rows and using that index to fetch the rows.

    If the table has a multiple-column index, any leftmost prefix of the index can be used by the optimizer to look up rows. For example, if you have a three-column index on (col1, col2, col3), you have indexed search capabilities on (col1), (col1, col2), and (col1, col2, col3).

    MySQL cannot use the index to perform lookups if the columns do not form a leftmost prefix of the index. Suppose that you have the SELECT statements shown here:

    SELECT * FROM tbl_name WHERE col1=val1;
    SELECT * FROM tbl_name WHERE col1=val1 AND col2=val2;
    
    SELECT * FROM tbl_name WHERE col2=val2;
    SELECT * FROM tbl_name WHERE col2=val2 AND col3=val3;
    

    If an index exists on (col1, col2, col3), only the first two queries use the index. The third and fourth queries do involve indexed columns, but do not use an index to perform lookups because (col2) and (col2, col3) are not leftmost prefixes of (col1, col2, col3).

    8.3.6 Verifying Index Usage

    Always check whether all your queries really use the indexes that you have created in the tables. Use the EXPLAIN statement, as described in Section 8.8.1, “Optimizing Queries with EXPLAIN”.

    8.3.7 InnoDB and MyISAM Index Statistics Collection

    Storage engines collect statistics about tables for use by the optimizer. Table statistics are based on value groups, where a value group is a set of rows with the same key prefix value. For optimizer purposes, an important statistic is the average value group size.

    MySQL uses the average value group size in the following ways:

    • To estimate how many rows must be read for each ref access

    • To estimate how many rows a partial join will produce; that is, the number of rows that an operation of this form will produce:

      (...) JOIN tbl_name ON tbl_name.key = expr
      

    As the average value group size for an index increases, the index is less useful for those two purposes because the average number of rows per lookup increases: For the index to be good for optimization purposes, it is best that each index value target a small number of rows in the table. When a given index value yields a large number of rows, the index is less useful and MySQL is less likely to use it.

    The average value group size is related to table cardinality, which is the number of value groups. The SHOW INDEX statement displays a cardinality value based on N/S, where N is the number of rows in the table and S is the average value group size. That ratio yields an approximate number of value groups in the table.

    For a join based on the <=> comparison operator, NULL is not treated differently from any other value: NULL <=> NULL, just as N <=> N for any other N.

    However, for a join based on the = operator, NULL is different from non-NULL values: expr1 = expr2 is not true when expr1 or expr2 (or both) are NULL. This affects ref accesses for comparisons of the form tbl_name.key = expr: MySQL will not access the table if the current value of expr is NULL, because the comparison cannot be true.

    For = comparisons, it does not matter how many NULL values are in the table. For optimization purposes, the relevant value is the average size of the non-NULL value groups. However, MySQL does not currently enable that average size to be collected or used.

    For InnoDB and MyISAM tables, you have some control over collection of table statistics by means of the innodb_stats_method and myisam_stats_method system variables, respectively. These variables have three possible values, which differ as follows:

    • When the variable is set to nulls_equal, all NULL values are treated as identical (that is, they all form a single value group).

      If the NULL value group size is much higher than the average non-NULL value group size, this method skews the average value group size upward. This makes index appear to the optimizer to be less useful than it really is for joins that look for non-NULL values. Consequently, the nulls_equal method may cause the optimizer not to use the index for ref accesses when it should.

    • When the variable is set to nulls_unequal, NULL values are not considered the same. Instead, each NULL value forms a separate value group of size 1.

      If you have many NULL values, this method skews the average value group size downward. If the average non-NULL value group size is large, counting NULL values each as a group of size 1 causes the optimizer to overestimate the value of the index for joins that look for non-NULL values. Consequently, the nulls_unequal method may cause the optimizer to use this index for ref lookups when other methods may be better.

    • When the variable is set to nulls_ignored, NULL values are ignored.

    If you tend to use many joins that use <=> rather than =, NULL values are not special in comparisons and one NULL is equal to another. In this case, nulls_equal is the appropriate statistics method.

    The innodb_stats_method system variable has a global value; the myisam_stats_method system variable has both global and session values. Setting the global value affects statistics collection for tables from the corresponding storage engine. Setting the session value affects statistics collection only for the current client connection. This means that you can force a table's statistics to be regenerated with a given method without affecting other clients by setting the session value of myisam_stats_method.

    To regenerate MyISAM table statistics, you can use any of the following methods:

    Some caveats regarding the use of innodb_stats_method and myisam_stats_method:

    • You can force table statistics to be collected explicitly, as just described. However, MySQL may also collect statistics automatically. For example, if during the course of executing statements for a table, some of those statements modify the table, MySQL may collect statistics. (This may occur for bulk inserts or deletes, or some ALTER TABLE statements, for example.) If this happens, the statistics are collected using whatever value innodb_stats_method or myisam_stats_method has at the time. Thus, if you collect statistics using one method, but the system variable is set to the other method when a table's statistics are collected automatically later, the other method will be used.

    • There is no way to tell which method was used to generate statistics for a given table.

    • These variables apply only to InnoDB and MyISAM tables. Other storage engines have only one method for collecting table statistics. Usually it is closer to the nulls_equal method.

    8.3.8 Comparison of B-Tree and Hash Indexes

    Understanding the B-tree and hash data structures can help predict how different queries perform on different storage engines that use these data structures in their indexes, particularly for the MEMORY storage engine that lets you choose B-tree or hash indexes.

    B-Tree Index Characteristics

    A B-tree index can be used for column comparisons in expressions that use the =, >, >=, <, <=, or BETWEEN operators. The index also can be used for LIKE comparisons if the argument to LIKE is a constant string that does not start with a wildcard character. For example, the following SELECT statements use indexes:

    SELECT * FROM tbl_name WHERE key_col LIKE 'Patrick%';
    SELECT * FROM tbl_name WHERE key_col LIKE 'Pat%_ck%';
    

    In the first statement, only rows with 'Patrick' <= key_col < 'Patricl' are considered. In the second statement, only rows with 'Pat' <= key_col < 'Pau' are considered.

    The following SELECT statements do not use indexes:

    SELECT * FROM tbl_name WHERE key_col LIKE '%Patrick%';
    SELECT * FROM tbl_name WHERE key_col LIKE other_col;
    

    In the first statement, the LIKE value begins with a wildcard character. In the second statement, the LIKE value is not a constant.

    If you use ... LIKE '%string%' and string is longer than three characters, MySQL uses the Turbo Boyer-Moore algorithm to initialize the pattern for the string and then uses this pattern to perform the search more quickly.

    A search using col_name IS NULL employs indexes if col_name is indexed.

    Any index that does not span all AND levels in the WHERE clause is not used to optimize the query. In other words, to be able to use an index, a prefix of the index must be used in every AND group.

    The following WHERE clauses use indexes:

    ... WHERE index_part1=1 AND index_part2=2 AND other_column=3
    
        /* index = 1 OR index = 2 */
    ... WHERE index=1 OR A=10 AND index=2
    
        /* optimized like "index_part1='hello'" */
    ... WHERE index_part1='hello' AND index_part3=5
    
        /* Can use index on index1 but not on index2 or index3 */
    ... WHERE index1=1 AND index2=2 OR index1=3 AND index3=3;
    

    These WHERE clauses do not use indexes:

        /* index_part1 is not used */
    ... WHERE index_part2=1 AND index_part3=2
    
        /*  Index is not used in both parts of the WHERE clause  */
    ... WHERE index=1 OR A=10
    
        /* No index spans all rows  */
    ... WHERE index_part1=1 OR index_part2=10
    

    Sometimes MySQL does not use an index, even if one is available. One circumstance under which this occurs is when the optimizer estimates that using the index would require MySQL to access a very large percentage of the rows in the table. (In this case, a table scan is likely to be much faster because it requires fewer seeks.) However, if such a query uses LIMIT to retrieve only some of the rows, MySQL uses an index anyway, because it can much more quickly find the few rows to return in the result.

    Hash Index Characteristics

    Hash indexes have somewhat different characteristics from those just discussed:

    • They are used only for equality comparisons that use the = or <=> operators (but are very fast). They are not used for comparison operators such as < that find a range of values. Systems that rely on this type of single-value lookup are known as key-value stores”; to use MySQL for such applications, use hash indexes wherever possible.

    • The optimizer cannot use a hash index to speed up ORDER BY operations. (This type of index cannot be used to search for the next entry in order.)

    • MySQL cannot determine approximately how many rows there are between two values (this is used by the range optimizer to decide which index to use). This may affect some queries if you change a MyISAM or InnoDB table to a hash-indexed MEMORY table.

    • Only whole keys can be used to search for a row. (With a B-tree index, any leftmost prefix of the key can be used to find rows.)

    8.3.9 Use of Index Extensions

    InnoDB automatically extends each secondary index by appending the primary key columns to it. Consider this table definition:

    CREATE TABLE t1 (
      i1 INT NOT NULL DEFAULT 0,
      i2 INT NOT NULL DEFAULT 0,
      d DATE DEFAULT NULL,
      PRIMARY KEY (i1, i2),
      INDEX k_d (d)
    ) ENGINE = InnoDB;

    This table defines the primary key on columns (i1, i2). It also defines a secondary index k_d on column (d), but internally InnoDB extends this index and treats it as columns (d, i1, i2).

    The optimizer takes into account the primary key columns of the extended secondary index when determining how and whether to use that index. This can result in more efficient query execution plans and better performance.

    The optimizer can use extended secondary indexes for ref, range, and index_merge index access, for Loose Index Scan access, for join and sorting optimization, and for MIN()/MAX() optimization.

    The following example shows how execution plans are affected by whether the optimizer uses extended secondary indexes. Suppose that t1 is populated with these rows:

    INSERT INTO t1 VALUES
    (1, 1, '1998-01-01'), (1, 2, '1999-01-01'),
    (1, 3, '2000-01-01'), (1, 4, '2001-01-01'),
    (1, 5, '2002-01-01'), (2, 1, '1998-01-01'),
    (2, 2, '1999-01-01'), (2, 3, '2000-01-01'),
    (2, 4, '2001-01-01'), (2, 5, '2002-01-01'),
    (3, 1, '1998-01-01'), (3, 2, '1999-01-01'),
    (3, 3, '2000-01-01'), (3, 4, '2001-01-01'),
    (3, 5, '2002-01-01'), (4, 1, '1998-01-01'),
    (4, 2, '1999-01-01'), (4, 3, '2000-01-01'),
    (4, 4, '2001-01-01'), (4, 5, '2002-01-01'),
    (5, 1, '1998-01-01'), (5, 2, '1999-01-01'),
    (5, 3, '2000-01-01'), (5, 4, '2001-01-01'),
    (5, 5, '2002-01-01');

    Now consider this query:

    EXPLAIN SELECT COUNT(*) FROM t1 WHERE i1 = 3 AND d = '2000-01-01'

    The execution plan depends on whether the extended index is used.

    When the optimizer does not consider index extensions, it treats the index k_d as only (d). EXPLAIN for the query produces this result:

    mysql> EXPLAIN SELECT COUNT(*) FROM t1 WHERE i1 = 3 AND d = '2000-01-01'G
    *************************** 1. row ***************************
               id: 1
      select_type: SIMPLE
            table: t1
             type: ref
    possible_keys: PRIMARY,k_d
              key: k_d
          key_len: 4
              ref: const
             rows: 5
            Extra: Using where; Using index
    

    When the optimizer takes index extensions into account, it treats k_d as (d, i1, i2). In this case, it can use the leftmost index prefix (d, i1) to produce a better execution plan:

    mysql> EXPLAIN SELECT COUNT(*) FROM t1 WHERE i1 = 3 AND d = '2000-01-01'G
    *************************** 1. row ***************************
               id: 1
      select_type: SIMPLE
            table: t1
             type: ref
    possible_keys: PRIMARY,k_d
              key: k_d
          key_len: 8
              ref: const,const
             rows: 1
            Extra: Using index
    

    In both cases, key indicates that the optimizer will use secondary index k_d but the EXPLAIN output shows these improvements from using the extended index:

    • key_len goes from 4 bytes to 8 bytes, indicating that key lookups use columns d and i1, not just d.

    • The ref value changes from const to const,const because the key lookup uses two key parts, not one.

    • The rows count decreases from 5 to 1, indicating that InnoDB should need to examine fewer rows to produce the result.

    • The Extra value changes from Using where; Using index to Using index. This means that rows can be read using only the index, without consulting columns in the data row.

    Differences in optimizer behavior for use of extended indexes can also be seen with SHOW STATUS:

    FLUSH TABLE t1;
    FLUSH STATUS;
    SELECT COUNT(*) FROM t1 WHERE i1 = 3 AND d = '2000-01-01';
    SHOW STATUS LIKE 'handler_read%'

    The preceding statements include FLUSH TABLES and FLUSH STATUS to flush the table cache and clear the status counters.

    Without index extensions, SHOW STATUS produces this result:

    +-----------------------+-------+
    | Variable_name         | Value |
    +-----------------------+-------+
    | Handler_read_first    | 0     |
    | Handler_read_key      | 1     |
    | Handler_read_last     | 0     |
    | Handler_read_next     | 5     |
    | Handler_read_prev     | 0     |
    | Handler_read_rnd      | 0     |
    | Handler_read_rnd_next | 0     |
    +-----------------------+-------+

    With index extensions, SHOW STATUS produces this result. The Handler_read_next value decreases from 5 to 1, indicating more efficient use of the index:

    +-----------------------+-------+
    | Variable_name         | Value |
    +-----------------------+-------+
    | Handler_read_first    | 0     |
    | Handler_read_key      | 1     |
    | Handler_read_last     | 0     |
    | Handler_read_next     | 1     |
    | Handler_read_prev     | 0     |
    | Handler_read_rnd      | 0     |
    | Handler_read_rnd_next | 0     |
    +-----------------------+-------+

    The use_index_extensions flag of the optimizer_switch system variable permits control over whether the optimizer takes the primary key columns into account when determining how to use an InnoDB table's secondary indexes. By default, use_index_extensions is enabled. To check whether disabling use of index extensions will improve performance, use this statement:

    SET optimizer_switch = 'use_index_extensions=off';

    Use of index extensions by the optimizer is subject to the usual limits on the number of key parts in an index (16) and the maximum key length (3072 bytes).

    8.3.10 Optimizer Use of Generated Column Indexes

    MySQL supports indexes on generated columns. For example:

    CREATE TABLE t1 (f1 INT, gc INT AS (f1 + 1) STORED, INDEX (gc));

    The generated column, gc, is defined as the expression f1 + 1. The column is also indexed and the optimizer can take that index into account during execution plan construction. In the following query, the WHERE clause refers to gc and the optimizer considers whether the index on that column yields a more efficient plan:

    SELECT * FROM t1 WHERE gc > 9;

    The optimizer can use indexes on generated columns to generate execution plans, even in the absence of direct references in queries to those columns by name. This occurs if the WHERE, ORDER BY, or GROUP BY clause refers to an expression that matches the definition of some indexed generated column. The following query does not refer directly to gc but does use an expression that matches the definition of gc:

    SELECT * FROM t1 WHERE f1 + 1 > 9;

    The optimizer recognizes that the expression f1 + 1 matches the definition of gc and that gc is indexed, so it considers that index during execution plan construction. You can see this using EXPLAIN:

    mysql> EXPLAIN SELECT * FROM t1 WHERE f1 + 1 > 9G
    *************************** 1. row ***************************
               id: 1
      select_type: SIMPLE
            table: t1
       partitions: NULL
             type: range
    possible_keys: gc
              key: gc
          key_len: 5
              ref: NULL
             rows: 1
         filtered: 100.00
            Extra: Using index condition
    

    In effect, the optimizer has replaced the expression f1 + 1 with the name of the generated column that matches the expression. That is also apparent in the rewritten query available in the extended EXPLAIN information displayed by SHOW WARNINGS:

    mysql> SHOW WARNINGSG
    *************************** 1. row ***************************
      Level: Note
       Code: 1003
    Message: /* select#1 */ select `test`.`t1`.`f1` AS `f1`,`test`.`t1`.`gc`
             AS `gc` from `test`.`t1` where (`test`.`t1`.`gc` > 9)
    

    The following restrictions and conditions apply to the optimizer's use of generated column indexes:

    • For a query expression to match a generated column definition, the expression must be identical and it must have the same result type. For example, if the generated column expression is f1 + 1, the optimizer will not recognize a match if the query uses 1 + f1, or if f1 + 1 (an integer expression) is compared with a string.

    • The optimization applies to these operators: =, <, <=, >, >=, BETWEEN, and IN().

      For operators other than BETWEEN and IN(), either operand can be replaced by a matching generated column. For BETWEEN and IN(), only the first argument can be replaced by a matching generated column, and the other arguments must have the same result type. BETWEEN and IN() are not yet supported for comparisons involving JSON values.

    • The generated column must be defined as an expression that contains at least a function call or one of the operators mentioned in the preceding item. The expression cannot consist of a simple reference to another column. For example, gc INT AS (f1) STORED consists only of a column reference, so indexes on gc are not considered.

    • For comparisons of strings to indexed generated columns that compute a value from a JSON function that returns a quoted string, JSON_UNQUOTE() is needed in the column definition to remove the extra quotes from the function value. (For direct comparison of a string to the function result, the JSON comparator handles quote removal, but this does not occur for index lookups.) For example, instead of writing a column definition like this:

      doc_name TEXT AS (JSON_EXTRACT(jdoc, '$.name')) STORED

      Write it like this:

      doc_name TEXT AS (JSON_UNQUOTE(JSON_EXTRACT(jdoc, '$.name'))) STORED

      With the latter definition, the optimizer can detect a match for both of these comparisons:

      ... WHERE JSON_EXTRACT(jdoc, '$.name') = 'some_string' ...
      ... WHERE JSON_UNQUOTE(JSON_EXTRACT(jdoc, '$.name')) = 'some_string' ...
      

      Without JSON_UNQUOTE() in the column definition, the optimizer detects a match only for the first of those comparisons.

    • If the optimizer fails to choose the desired index, an index hint can be used to force the optimizer to make a different choice.

    8.3.11 Indexed Lookups from TIMESTAMP Columns

    Temporal values are stored in TIMESTAMP columns as UTC values, and values inserted into and retrieved from TIMESTAMP columns are converted between the session time zone and UTC. (This is the same type of conversion performed by the CONVERT_TZ() function. If the session time zone is UTC, there is effectively no time zone conversion.)

    Due to conventions for local time zone changes such as Daylight Saving Time (DST), conversions between UTC and non-UTC time zones are not one-to-one in both directions. UTC values that are distinct may not be distinct in another time zone. The following example shows distinct UTC values that become identical in a non-UTC time zone:

    mysql> CREATE TABLE tstable (ts TIMESTAMP);
    mysql> SET time_zone = 'UTC'; -- insert UTC values
    mysql> INSERT INTO tstable VALUES
           ('2018-10-28 00:30:00'),
           ('2018-10-28 01:30:00');
    mysql> SELECT ts FROM tstable;
    +---------------------+
    | ts                  |
    +---------------------+
    | 2018-10-28 00:30:00 |
    | 2018-10-28 01:30:00 |
    +---------------------+
    mysql> SET time_zone = 'MET'; -- retrieve non-UTC values
    mysql> SELECT ts FROM tstable;
    +---------------------+
    | ts                  |
    +---------------------+
    | 2018-10-28 02:30:00 |
    | 2018-10-28 02:30:00 |
    +---------------------+
    
    Note

    To use named time zones such as 'MET' or 'Europe/Amsterdam', the time zone tables must be properly set up. For instructions, see Section 5.1.12, “MySQL Server Time Zone Support”.

    You can see that the two distinct UTC values are the same when converted to the 'MET' time zone. This phenomenon can lead to different results for a given TIMESTAMP column query, depending on whether the optimizer uses an index to execute the query.

    Suppose that a query selects values from the table shown earlier using a WHERE clause to search the ts column for a single specific value such as a user-provided timestamp literal:

    SELECT ts FROM tstable
    WHERE ts = 'literal';
    

    Suppose further that the query executes under these conditions:

    • The session time zone is not UTC and has a DST shift. For example:

      SET time_zone = 'MET';
    • Unique UTC values stored in the TIMESTAMP column are not unique in the session time zone due to DST shifts. (The example shown earlier illustrates how this can occur.)

    • The query specifies a search value that is within the hour of entry into DST in the session time zone.

    Under those conditions, the comparison in the WHERE clause occurs in different ways for nonindexed and indexed lookups and leads to different results:

    • If there is no index or the optimizer cannot use it, comparisons occur in the session time zone. The optimizer performs a table scan in which it retrieves each ts column value, converts it from UTC to the session time zone, and compares it to the search value (also interpreted in the session time zone):

      mysql> SELECT ts FROM tstable
             WHERE ts = '2018-10-28 02:30:00';
      +---------------------+
      | ts                  |
      +---------------------+
      | 2018-10-28 02:30:00 |
      | 2018-10-28 02:30:00 |
      +---------------------+
      

      Because the stored ts values are converted to the session time zone, it is possible for the query to return two timestamp values that are distinct as UTC values but equal in the session time zone: One value that occurs before the DST shift when clocks are changed, and one value that was occurs after the DST shift.

    • If there is a usable index, comparisons occur in UTC. The optimizer performs an index scan, first converting the search value from the session time zone to UTC, then comparing the result to the UTC index entries:

      mysql> ALTER TABLE tstable ADD INDEX (ts);
      mysql> SELECT ts FROM tstable
             WHERE ts = '2018-10-28 02:30:00';
      +---------------------+
      | ts                  |
      +---------------------+
      | 2018-10-28 02:30:00 |
      +---------------------+
      

      In this case, the (converted) search value is matched only to index entries, and because the index entries for the distinct stored UTC values are also distinct, the search value can match only one of them.

    Due to different optimizer operation for nonindexed and indexed lookups, the query produces different results in each case. The result from the nonindexed lookup returns all values that match in the session time zone. The indexed lookup cannot do so:

    • It is performed within the storage engine, which knows only about UTC values.

    • For the two distinct session time zone values that map to the same UTC value, the indexed lookup matches only the corresponding UTC index entry and returns only a single row.

    In the preceding discussion, the data set stored in tstable happens to consist of distinct UTC values. In such cases, all index-using queries of the form shown match at most one index entry.

    If the index is not UNIQUE, it is possible for the table (and the index) to store multiple instances of a given UTC value. For example, the ts column might contain multiple instances of the UTC value '2018-10-28 00:30:00'. In this case, the index-using query would return each of them (converted to the MET value '2018-10-28 02:30:00' in the result set). It remains true that index-using queries match the converted search value to a single value in the UTC index entries, rather than matching multiple UTC values that convert to the search value in the session time zone.

    If it is important to return all ts values that match in the session time zone, the workaround is to suppress use of the index with an IGNORE INDEX hint:

    mysql> SELECT ts FROM tstable
           IGNORE INDEX (ts)
           WHERE ts = '2018-10-28 02:30:00';
    +---------------------+
    | ts                  |
    +---------------------+
    | 2018-10-28 02:30:00 |
    | 2018-10-28 02:30:00 |
    +---------------------+
    

    The same lack of one-to-one mapping for time zone conversions in both directions occurs in other contexts as well, such as conversions performed with the FROM_UNIXTIME() and UNIX_TIMESTAMP() functions. See Section 12.6, “Date and Time Functions”.

  • 相关阅读:
    将h.264视频流封装成flv格式文件(一.flv格式)(转载)<a href="http://blog.csdn.net/yeyumin89/article/details/7932368"></a>
    01背包,求最大值
    JNI vc6.0 dll
    jni java c++ 参数传递问题解决
    (转载)将h.264视频流封装成flv格式文件(二.开始动手)http://blog.csdn.net/yeyumin89/article/details/7932431
    达到最高效益的调度
    按照RFC3984协议实现H264视频流媒体
    语音识别 转载 待测试
    活动选择问题 动态规划算法(最大子集合或最大收益)
    整齐打印
  • 原文地址:https://www.cnblogs.com/jinzhenshui/p/12615208.html
Copyright © 2011-2022 走看看