zoukankan      html  css  js  c++  java
  • Notes on <High Performance MySQL> Ch4: Query Performance Optimization

     

    Slow Query Basics: Optimize Data Access

    Analyze a poorly performing query in two steps:

    -          Find out whether your application is retrieving more data than you need. That usually means it’s accessing too many rows, but it might also be accessing too many columns.

    -          Find out whether the MySQL Server is analyzing more rows than it needs.

    Are You Asking the Database for Data You Don’t Need?

    Here are some typical mistakes:

    -          Fetching more rows than needed

    -          Fetching all columns from a multitable join

    -          Fetching all columns

    Is MySQL Examining Too Much Data?

    In MySQL, the simplest query cost metrics are:

    -          Execution time

    -          Number of rows examined

    -          Number of rows returned

    All these metrics are logged in the slow query log, so looking at the slow query log is one of the best ways to find queries that examine too much data.

    -          Execution time

    -          Rows examined and rows returned

    -          Rows examined and access types

    The access method(s) appear in the type column in EXPLAIN’s output. The access types range from a full table scan to index scans, range scans, unique index lookups, and constants. Each of these is faster than the one before it, because it requires reading less data.

    In general, MySQL can apply a WHERE clause in three ways, from best to worst:

    • Apply the conditions to the index lookup operation to eliminate nonmatching rows. This happens at the storage engine layer.
    • Using the covering index (“Using index” in the Extra column) to avoid row accesses, and filter out nonmatching rows after retrieving each result from the index. This happens at the server layer, but it doesn’t reading rows from the table.
    • Retrieving rows from the table, then filter nonmatching rows (“Using where” in the Extra column). This happens at the server layer and requires the server to read rows from the table before it can filter them.

    Ways to Restructure Queries

    Complex Queries Versus Many Queries

    MySQL was designed to handle connecting and disconnecting very efficiently and to respond to small and simple queries very quickly.

    Chopping Up a Query

    Join Decomposition

    Many high-performance web sites use join decomposition. You can decompose a join by running multiple single queries instead of a multitable join, and then performing the join in the application.

    -          Caching can be more efficient. Many applications cache “objects” that map directly to tables

    -          For MyISAM tables, performing one query per table uses table locks more efficiently: the queries will lock the tables individually and relatively briefly, instead of locking them all for a longer time.

    -          Doing joins in the application makes it easier to scale the database by placing tables on different servers.

    -          The queries themselves can be more efficient.

    -          You can reduce redundant row accesses.

    -          To some extent, you can view this technique as manually implementing a hash join instead of the nested loops algorithm MySQL uses to execute a join.

    Summary: When Application Joins May Be More Efficient

    • You cache and reuse a lot of data from earlier queries
    • You use multiple MyISAM tables
    • You distribute data across multiple servers
    • You  replace joins with IN() lists on large tables
    • A join refers to the same table multiple times

     

    Query Execution Basics

    The MySQL Client/Server Protocol

    The client sends a query to the server as a single packet of data. This is why the max_packet_size configuration variable is important if you have large queries.

    Query states

    Each MySQL connection, or thread, has a state that shows what it is doing at any given time. There are several ways to view these states, but the easiest is to use the SHOW FULL PROCESSLIST command (the states appear in the Command column)

    -          Sleep: The thread is waiting for a new query from the client

    -          Query: The thread is either executing the query or sending the result back to the client

    -          Locked: The thread is waiting for a table lock to be granted at the server level.

    -          Analyzing and statistics: The thread is checking storage engine statistics and optimizing the query.

    -          Copying to tmp table [on disk]

    -          Sorting result

    -          Sending data: This can mean several things: the thread might be sending data between stages of the query, generating the result set, or returning the result set to the client.

    The Query Cache

    Case sensitive hash lookup.

    If MySQL does find a match in the query cache, it must check privileges before returning the cached query. This is possible without parsing the query, because MySQL stores table information with the cached query.

    The Query Optimization Process

    The parser and the preprocessor

    The query optimizer

    MySQL uses a cost-based optimizer. The unit of cost is a single random 4K data page read. You can see how expensive the optimizer estimated a query to be by running the query, then inspecting the last_query_cost session variable:

    The optimizer does not include the effects of any type of caching in its estimates – it assumes every read will result in a disk I/O operation.

    There are two basic types of optimizations, which we call static and dynamic. Static optimizations can be performed simply by inspecting the parse tree. Static optimizations are independent of values, such as the value of a constant in a WHERE clause. They can be performed once and will always be valid, even then the query is reexecuted with different values. In contrast, dynamic optimizations are based on context and can depend on many factors, such as which value is in a WHERE clause or how many rows are in an index.

    IN() List comparisions

                    In many database servers, IN() is just a synonym for multiple OR clauses, because the two are logically equivalent. Not so in MySQL, which sorts the values in the IN() list and uses a fast binary search to see whether a value is in the list. This is O(log n) in the size of the list, whereas an equivalent series of OR clauses is O(n) in the size of the list(i.e. much slower for large list)

    Table and index statistics

    MySQL’s join execution strategy

    MySQL considers every query a join – not just every query that matches rows from two tables, but every query, period (including subqueries, and even a SELECT against a single table).

    MySQL treats every join as a nested-loop join.

    MySQL executes UNION queries with temporary tables,and it rewrites all RIGHT OUTER JOIN queries to equivalent LEFT OUTER JION.

    MySQL doesn’t support FULL OUTER JOIN

    The execution plan

    If you execute EXPLAIN EXTENDED on a query, followed by SHOW WARNINGS, you’ll see the reconstructed query.

    The join optimizer

    The most important part of the MySQL query optimizer is the join optimizer, which decides the best order of the execution for multitable queries.

    STRAIGHT_JOIN

    Sort optimizer

    It can do sort in memory or on disk, but it always calls this process a filesort, even if it doesn’t actually use a file.

    There are two filesort algorithms:

    -          Two passes (old)

    Reads row pointers and ORDER BY columns, sorts them, and then scans the sorted list and rereads the rows for output.

    -          One pass (new)

    Reads all the columns needed for the query, sorts them by the ORDER BY columns, and then scans the sorted list and output the specified columns.

    MySQL allocates a fixed-size record for each tuple it will sort, these records are large enough to hold the largest possible tuple, including the full length of each VARCHAR column. Also, if you’re using UTF-8, MySQL allocates 3 bytes for each character.

    The Query Execution Engine

    Retuning Results to the Client

    Limitations of the MySQL Query Optimizer

    Correlated Subqueries

    When a correlated subquery is good

    UNION limitations

    MySQL sometimes can’t “push down” conditions from the outside of a UNION to the inside, where they could be used to limit results or enable additional optimizations.

    Index merge optimizations

    Index merge algorithms let MySQL use more than one index per table in a query.

    There are 3 variations on the algorithm: union for OR conditions, intersection for AND conditions, and unions of intersections for combinations of the two.

    Equality propagation

    Parallel execution

    MySQL can’t execute a single query in parallel on many CPUs.

    Hash joins

    MySQL can’t do true hash joins now.

    Loose index scans

    MySQL has historically been unable to do loose index scans, which scan noncontiguous ranges of an index. MySQL’s index scans generally require a defined start point and a defined end point in the index, even if only a few noncontiguous rows in the middle are really desired for the query. MySQL will scan the entire range of rows within these end points.

    Beginning in MySQL 5.0, loose index scans are possible in certain limited circumstances, such as queries that find maximum and minimum values in a grouped query

    MIN() and MAX()

    ==>

    SELECT and UPDATE on the same table

    MySQL doesn’t let you SELECT from a table while simultaneous running an UPDATE on it.

    Optimizer Specific Types of Queries

    Optimizing COUNT() Queries

    COUNT() counts values and rows. A value is a non-NULL expression (NULL is the absence of a value).

    When you want to know the number of rows in the result, you should always use COUNT(*). This communicates your intention clearly and avoids poor performance.

    ==>

    Optimizing JOIN Queries

    -          Make sure there are indexes on the columns in the ON or USING clauses. In general, you need to add indexes only on the second table in the join order, unless they’re needed for some other reason.

    -          Try to ensure that any GROUP BY or ORDER BY expression refers only to columns from a single table, so MySQL can try to use an index for that operation.

    -           

    Optimizing Subqueries

    You should usually prefer a join where possible.

    Optimizing GROUP BY and DISTINCT

    MySQL has two kinds of GROUP BY strategies when it can’t use an index: it can use a temporary table or a filesort to perform the grouping.   You can force the optimizer to choose one method or the other with the SQL_BIG_RESULT and SQL_SMALL_RESULT optimizer hints.

    MySQL automatically orders grouped queries by the columns in the GROUP BY clause, unless you specify an ORDER BY clause explicitly. If you don’t care about the order and you see this causing a filesort, you can use ORDER BY NULL to skip the automatic sort. You can also add an optional DESC or ASC keyword right after the GROUP BY clause to order the result in the desired direction by the clause’s columns.


    Optimizing LIMIT and OFFSET

    One simple technique to improve efficiency is to do the offset on a covering index, rather than the full rows. You can then join the result to the full row and retrieve the additional columns you need. This can be much more efficient.

    SELECT film_id, description FROM film ORDER BY title LIMIT 50, 5;

    ==> (if the table is very large, this query is better written as follows:)

    SELECT film_id, description FROM film

    INNER JOIN (select film_id FROM film ORDER BY title LIMIT 50, 5) as lim USING(film_id);

    If you really need to optimize pagination systems, you should probably use precomputed summaries.

    Optimizing SQL_CALC_FOUND_ROWS

    Optimizing UNION

    MySQL always executes UNION queries by creating a temporary table and filling it with the UNION results. You might have to help the optimizer by manually “pushing down” WHERE, LIMIT, ORDER BY, and other conditions. MySQL always places results into a temporary table and then reads them out again, even when it’s not really necessary.

    Query Optimizer Hints

    -          HIGH_PRIORITY and LOW_PRIORITY

    These hints are effective on storage engines with table-level locking, but you should never need them on InnoDB or other engines with fine-grained locking and concurrency control. Be careful when using then on MyISAM, because they can disable concurrent insert and greatly reduce performance.

    -          DELAYED

    -          STRAIGHT_JOIN

    -          SQL_SMALL_RESULT and SQL_BIG_RESULT

    SQL_SMALL_RESULT tells the optimizer that the result set will be small and can be put into indexed temporary tables to avoid sorting for the grouping, whereas SQL_BIG_RESULT indicates that the result will be large and that it will be better to use temporary tables on disk with sorting.

    -          SQL_BUFFER_RESULT

    This hint tells the optimizer to put the results into a temporary table and release table locks as soon as possible.

    -          SQL_CACHE and SQL_NO_CACHE

    -          SQL_CACL_FOUND_ROWS

    This hint tells MySQL to calculate a full result set when there’s a LIMIT clause, even though it returns only LIMIT ROWS. You can retrieve the total number of rows it found via FOUND_ROWS()

    -          FOR UDPATE and LOCK IN SHARE MODE

    When using these hints with InnoDB, be aware that they may disable some optimizations, such as covering indexes.

    InnoDB can’t lock rows exclusively without accessing the primary key, which is where the row version information is stored.

    -          USE INDEX, IGNORE INDEX, and FORCE INDEX

    User-Defined Variables

    Where executes before the SELECT, that’s why there are two records returned.

    The query returns every row in the table, because the ORDER BY added a filesort and the WHERE is evaluated before the filesort.

    (The @rownum := @rownum + 1 in the SELECT clause is executed at last, thus the value in the column cnt is in order)

    The solution to this problem is to assign and read in the same stage of query execution.

    Note that the @rownum is 7 after the query!

    (Though the returned row count is 1, the @rownum := @rownum + 1 is executed for every row in the table as MySQL doesn’t know when to stop evaluating @rownum := @rownum + 1!)

    Note the column cnt is 7 this time as we add “Order by” clause. Thus, the select will be “executed” after the sort operation; the @rownum will be the last value. 

  • 相关阅读:
    VC ODBC使用总结
    AppBaseJs 类库 网上常用的javascript函数及其他js类库写的
    VC 6.0 下搭建 wxWidgets 开发环境
    ASP.NET页面周期学习笔记之一
    自己动手写三层代码生成器学习总结
    锋利的JQuery学习笔记之JQueryAjax的应用
    缓存(Cache)学习笔记
    C#基础与常用数据结构学习笔记
    ASP.NET网络安全简单防护公开课视频学习笔记
    黑马公开课——运行原理与GC学习笔记
  • 原文地址:https://www.cnblogs.com/fangwenyu/p/2581312.html
Copyright © 2011-2022 走看看