Notes on <High Performance MySQL> Ch4: Query Performance Optimization

zoukankan html css js c++ java

Notes on <High Performance MySQL> Ch4: Query Performance Optimization
Slow Query Basics: Optimize Data Access

Analyze a poorly performing query in two steps:

-          Find out whether your application is retrieving more data than you need. That usually means it’s accessing too many rows, but it might also be accessing too many columns.

-          Find out whether the MySQL Server is analyzing more rows than it needs.

Are You Asking the Database for Data You Don’t Need?

Here are some typical mistakes:

-          Fetching more rows than needed

-          Fetching all columns from a multitable join

-          Fetching all columns

Is MySQL Examining Too Much Data?

In MySQL, the simplest query cost metrics are:

-          Execution time

-          Number of rows examined

-          Number of rows returned

All these metrics are logged in the slow query log, so looking at the slow query log is one of the best ways to find queries that examine too much data.

-          Execution time

-          Rows examined and rows returned

-          Rows examined and access types

The access method(s) appear in the type column in EXPLAIN’s output. The access types range from a full table scan to index scans, range scans, unique index lookups, and constants. Each of these is faster than the one before it, because it requires reading less data.

In general, MySQL can apply a WHERE clause in three ways, from best to worst:
- Apply the conditions to the index lookup operation to eliminate nonmatching rows. This happens at the storage engine layer.
- Using the covering index (“Using index” in the Extra column) to avoid row accesses, and filter out nonmatching rows after retrieving each result from the index. This happens at the server layer, but it doesn’t reading rows from the table.
- Retrieving rows from the table, then filter nonmatching rows (“Using where” in the Extra column). This happens at the server layer and requires the server to read rows from the table before it can filter them.
Ways to Restructure Queries

Complex Queries Versus Many Queries

MySQL was designed to handle connecting and disconnecting very efficiently and to respond to small and simple queries very quickly.

Chopping Up a Query

Join Decomposition

Many high-performance web sites use join decomposition. You can decompose a join by running multiple single queries instead of a multitable join, and then performing the join in the application.

-          Caching can be more efficient. Many applications cache “objects” that map directly to tables

-          For MyISAM tables, performing one query per table uses table locks more efficiently: the queries will lock the tables individually and relatively briefly, instead of locking them all for a longer time.

-          Doing joins in the application makes it easier to scale the database by placing tables on different servers.

-          The queries themselves can be more efficient.

-          You can reduce redundant row accesses.

-          To some extent, you can view this technique as manually implementing a hash join instead of the nested loops algorithm MySQL uses to execute a join.

Summary: When Application Joins May Be More Efficient
- You cache and reuse a lot of data from earlier queries
- You use multiple MyISAM tables
- You distribute data across multiple servers
- You replace joins with IN() lists on large tables
- A join refers to the same table multiple times
Query Execution Basics

The MySQL Client/Server Protocol

The client sends a query to the server as a single packet of data. This is why the max_packet_size configuration variable is important if you have large queries.

Query states

Each MySQL connection, or thread, has a state that shows what it is doing at any given time. There are several ways to view these states, but the easiest is to use the SHOW FULL PROCESSLIST command (the states appear in the Command column)

-          Sleep: The thread is waiting for a new query from the client

-          Query: The thread is either executing the query or sending the result back to the client

-          Locked: The thread is waiting for a table lock to be granted at the server level.

-          Analyzing and statistics: The thread is checking storage engine statistics and optimizing the query.

-          Copying to tmp table [on disk]

-          Sorting result

-          Sending data: This can mean several things: the thread might be sending data between stages of the query, generating the result set, or returning the result set to the client.

The Query Cache

Case sensitive hash lookup.

If MySQL does find a match in the query cache, it must check privileges before returning the cached query. This is possible without parsing the query, because MySQL stores table information with the cached query.

The Query Optimization Process

The parser and the preprocessor

The query optimizer

MySQL uses a cost-based optimizer. The unit of cost is a single random 4K data page read. You can see how expensive the optimizer estimated a query to be by running the query, then inspecting the last_query_cost session variable:

The optimizer does not include the effects of any type of caching in its estimates – it assumes every read will result in a disk I/O operation.

There are two basic types of optimizations, which we call static and dynamic. Static optimizations can be performed simply by inspecting the parse tree. Static optimizations are independent of values, such as the value of a constant in a WHERE clause. They can be performed once and will always be valid, even then the query is reexecuted with different values. In contrast, dynamic optimizations are based on context and can depend on many factors, such as which value is in a WHERE clause or how many rows are in an index.

IN() List comparisions

                In many database servers, IN() is just a synonym for multiple OR clauses, because the two are logically equivalent. Not so in MySQL, which sorts the values in the IN() list and uses a fast binary search to see whether a value is in the list. This is O(log n) in the size of the list, whereas an equivalent series of OR clauses is O(n) in the size of the list(i.e. much slower for large list)

Table and index statistics

MySQL’s join execution strategy

MySQL considers every query a join – not just every query that matches rows from two tables, but every query, period (including subqueries, and even a SELECT against a single table).

MySQL treats every join as a nested-loop join.

MySQL executes UNION queries with temporary tables,and it rewrites all RIGHT OUTER JOIN queries to equivalent LEFT OUTER JION.

MySQL doesn’t support FULL OUTER JOIN

The execution plan

If you execute EXPLAIN EXTENDED on a query, followed by SHOW WARNINGS, you’ll see the reconstructed query.

The join optimizer

The most important part of the MySQL query optimizer is the join optimizer, which decides the best order of the execution for multitable queries.

STRAIGHT_JOIN

Sort optimizer

It can do sort in memory or on disk, but it always calls this process a filesort, even if it doesn’t actually use a file.

There are two filesort algorithms:

-          Two passes (old)

Reads row pointers and ORDER BY columns, sorts them, and then scans the sorted list and rereads the rows for output.

-          One pass (new)

Reads all the columns needed for the query, sorts them by the ORDER BY columns, and then scans the sorted list and output the specified columns.

MySQL allocates a fixed-size record for each tuple it will sort, these records are large enough to hold the largest possible tuple, including the full length of each VARCHAR column. Also, if you’re using UTF-8, MySQL allocates 3 bytes for each character.

The Query Execution Engine

Retuning Results to the Client

Limitations of the MySQL Query Optimizer

Correlated Subqueries

When a correlated subquery is good

UNION limitations

MySQL sometimes can’t “push down” conditions from the outside of a UNION to the inside, where they could be used to limit results or enable additional optimizations.

Index merge optimizations

Index merge algorithms let MySQL use more than one index per table in a query.

There are 3 variations on the algorithm: union for OR conditions, intersection for AND conditions, and unions of intersections for combinations of the two.

Equality propagation

Parallel execution

MySQL can’t execute a single query in parallel on many CPUs.

Hash joins

MySQL can’t do true hash joins now.

Loose index scans

MySQL has historically been unable to do loose index scans, which scan noncontiguous ranges of an index. MySQL’s index scans generally require a defined start point and a defined end point in the index, even if only a few noncontiguous rows in the middle are really desired for the query. MySQL will scan the entire range of rows within these end points.

Beginning in MySQL 5.0, loose index scans are possible in certain limited circumstances, such as queries that find maximum and minimum values in a grouped query

MIN() and MAX()

==>

SELECT and UPDATE on the same table

MySQL doesn’t let you SELECT from a table while simultaneous running an UPDATE on it.

Optimizer Specific Types of Queries

Optimizing COUNT() Queries

COUNT() counts values and rows. A value is a non-NULL expression (NULL is the absence of a value).

When you want to know the number of rows in the result, you should always use COUNT(*). This communicates your intention clearly and avoids poor performance.

==>

Optimizing JOIN Queries

-          Make sure there are indexes on the columns in the ON or USING clauses. In general, you need to add indexes only on the second table in the join order, unless they’re needed for some other reason.

-          Try to ensure that any GROUP BY or ORDER BY expression refers only to columns from a single table, so MySQL can try to use an index for that operation.

-

Optimizing Subqueries

You should usually prefer a join where possible.

Optimizing GROUP BY and DISTINCT

MySQL has two kinds of GROUP BY strategies when it can’t use an index: it can use a temporary table or a filesort to perform the grouping.   You can force the optimizer to choose one method or the other with the SQL_BIG_RESULT and SQL_SMALL_RESULT optimizer hints.

MySQL automatically orders grouped queries by the columns in the GROUP BY clause, unless you specify an ORDER BY clause explicitly. If you don’t care about the order and you see this causing a filesort, you can use ORDER BY NULL to skip the automatic sort. You can also add an optional DESC or ASC keyword right after the GROUP BY clause to order the result in the desired direction by the clause’s columns.

Optimizing LIMIT and OFFSET

One simple technique to improve efficiency is to do the offset on a covering index, rather than the full rows. You can then join the result to the full row and retrieve the additional columns you need. This can be much more efficient.

SELECT film_id, description FROM film ORDER BY title LIMIT 50, 5;

==> (if the table is very large, this query is better written as follows:)

SELECT film_id, description FROM film

INNER JOIN (select film_id FROM film ORDER BY title LIMIT 50, 5) as lim USING(film_id);

If you really need to optimize pagination systems, you should probably use precomputed summaries.

Optimizing SQL_CALC_FOUND_ROWS

Optimizing UNION

MySQL always executes UNION queries by creating a temporary table and filling it with the UNION results. You might have to help the optimizer by manually “pushing down” WHERE, LIMIT, ORDER BY, and other conditions. MySQL always places results into a temporary table and then reads them out again, even when it’s not really necessary.

Query Optimizer Hints

-          HIGH_PRIORITY and LOW_PRIORITY

These hints are effective on storage engines with table-level locking, but you should never need them on InnoDB or other engines with fine-grained locking and concurrency control. Be careful when using then on MyISAM, because they can disable concurrent insert and greatly reduce performance.

-          DELAYED

-          STRAIGHT_JOIN

-          SQL_SMALL_RESULT and SQL_BIG_RESULT

SQL_SMALL_RESULT tells the optimizer that the result set will be small and can be put into indexed temporary tables to avoid sorting for the grouping, whereas SQL_BIG_RESULT indicates that the result will be large and that it will be better to use temporary tables on disk with sorting.

-          SQL_BUFFER_RESULT

This hint tells the optimizer to put the results into a temporary table and release table locks as soon as possible.

-          SQL_CACHE and SQL_NO_CACHE

-          SQL_CACL_FOUND_ROWS

This hint tells MySQL to calculate a full result set when there’s a LIMIT clause, even though it returns only LIMIT ROWS. You can retrieve the total number of rows it found via FOUND_ROWS()

-          FOR UDPATE and LOCK IN SHARE MODE

When using these hints with InnoDB, be aware that they may disable some optimizations, such as covering indexes.

InnoDB can’t lock rows exclusively without accessing the primary key, which is where the row version information is stored.

-          USE INDEX, IGNORE INDEX, and FORCE INDEX

User-Defined Variables

Where executes before the SELECT, that’s why there are two records returned.

The query returns every row in the table, because the ORDER BY added a filesort and the WHERE is evaluated before the filesort.

(The @rownum := @rownum + 1 in the SELECT clause is executed at last, thus the value in the column cnt is in order)

The solution to this problem is to assign and read in the same stage of query execution.

Note that the @rownum is 7 after the query!

(Though the returned row count is 1, the @rownum := @rownum + 1 is executed for every row in the table as MySQL doesn’t know when to stop evaluating @rownum := @rownum + 1!)

Note the column cnt is 7 this time as we add “Order by” clause. Thus, the select will be “executed” after the sort operation; the @rownum will be the last value.
--------------------------------------
Regards,
FangwenYu
查看全文

相关阅读:
Parallel Decision Tree
基础知识整理
 方差分析——单因素方差分析
 方差分析（2）
方差分析（1）
统计编程的框架与R语言统计分析基础——摘(2)统计分析之线性回归
 统计编程的框架与R语言统计分析基础——摘(1)
龙门镖局
 公开课可下载资源汇总
 tomcat jdk servlet websocket版本对应关系及websocket 1.1的实现

原文地址：https://www.cnblogs.com/fangwenyu/p/2581312.html

Notes on <High Performance MySQL> Ch4: Query Performance Optimization

Slow Query Basics: Optimize Data Access

Are You Asking the Database for Data You Don’t Need?

Is MySQL Examining Too Much Data?

Ways to Restructure Queries

Complex Queries Versus Many Queries

Chopping Up a Query

Join Decomposition

Query Execution Basics

The MySQL Client/Server Protocol

The Query Cache

The Query Optimization Process

The parser and the preprocessor

The query optimizer

Table and index statistics

MySQL’s join execution strategy

The execution plan

The join optimizer

Sort optimizer

The Query Execution Engine

Retuning Results to the Client

Limitations of the MySQL Query Optimizer

Correlated Subqueries

When a correlated subquery is good

UNION limitations

Index merge optimizations

Equality propagation

Parallel execution

Hash joins

Loose index scans

MIN() and MAX()

SELECT and UPDATE on the same table

Optimizer Specific Types of Queries

Optimizing COUNT() Queries

Optimizing JOIN Queries

Optimizing Subqueries

Optimizing GROUP BY and DISTINCT

Optimizing LIMIT and OFFSET

Optimizing SQL_CALC_FOUND_ROWS

Optimizing UNION

Query Optimizer Hints

User-Defined Variables