关系型数据库为了满足ACID的特性,需要使用事务来对其进行保证,其中的D(持久性)需要调用fsync()函数将数据持久化到磁盘,就是俗称的“刷盘,这里只讨论MySQL最常用的存储引擎InnoDB以及MySQL5.6以及之后的版本。但是fsync()这种操作是比较昂贵的,一秒钟能进行几百次就不错了,为了提高数据库性能,就要尽量减少fsync()这种操作。MySQL是通过组提交(Group Commit)来减少写入日志时频繁fsync()的,当事务提交时,将其写入日志redo log,但不马上fsync()刷盘,而是等待多个事务写入日志dedo log之后,再进行一次fsync()把几个事务修改的数据一起刷到磁盘,这样多个fsync()压缩成了一个fsync(),可以大大减少fsync()这种性能消耗比较大的数据库随机访问。
我们知道,MySQL事务使用了两阶段提交(2PC)来保证事务的完整性。两阶段提交可以参考https://en.wikipedia.org/wiki/Two-phase_commit_protocol
两阶段提交流程
这里以innodb_flush_logs_at_trx_commit和sync_binlog均为默认值1的情况下说下两阶段的过程:
第一阶段——准备阶段会将事务的redo log从Server读入,置入InnoDB引擎为redo、undo日志开辟的内存缓冲空间,叫log buffer。并且fsync()写入磁盘上的文件log file(性能较好的顺序写入)
第二阶段——bin log协调阶段会将Binary Log从Server读入的事务,用write()写入文件系统缓存,然后调用fsync()将文件系统缓存的二进制日志刷到磁盘。
以往的MySQL版本中,两阶段至少需要进行3次fsync(),(分别是redo,bin log,以及事务所修改数据的刷盘,其中事务所修改数据的刷盘是随机写入,相对顺序写入很慢,这里简称第3次fsynch())从这两个阶段的描述可以看到,只进行了前2次fsync(),都是针对日志文件的数据库顺序写入,相对随机写入来说基本可以忽略性能影响,而比较昂贵的随机写入,我们通过控制innodb_flush_logs_at_trx_commit和sync_binlog两个参数,可以将几次事务的数据落盘(也就是第3次fsync())都放在一次磁盘IO中进行。
上面提到的两个参数innodb_flush_logs_at_trx_commit和sync_binlog,其默认都是1,用于控制事务提交后写入日志r(redo或者bin log)和刷盘的时机和频率,可以通过SHOW GLOBAL VARIABLES LIKE '%logs%'进行查看,MySQL的官方文档推荐,总和考虑一致性(C)和持久性(D),最好将两个参数设为1(也就是默认值),如下
For durability and consistency in a replication setup that uses InnoDB
with transactions:
-
If binary logging is enabled, set
sync_binlog=1
. -
Always set
innodb_flush_log_at_trx_commit=1
.
innodb_flush_logs_at_trx_commit的值可以取0,1,2。查阅官方文档:
The default setting of 1 is required for full ACID compliance. Logs are written and flushed to disk at each transaction commit.
值为1可以完全满足ACID标准。每提交一个事务,记录事务的redo log会马上被写入日志缓存log buffer,其内容也会马上被刷到磁盘上的日志文件log file
With a setting of 0, logs are written and flushed to disk once per second. Transactions for which logs have not been flushed can be lost in a crash.
值为0,记录事务的redo log会被写入日志缓存log buffe,其内容也会被刷到磁盘上的log file,但不是马上,而是以每秒一次的频率进行。发生crash时,未刷盘到log file的事务可能会丢失,这个很好理解,为刷盘就是存在于log buffer,也就是内存中,crash重启自然就没了。
With a setting of 2, logs are written after each transaction commit and flushed to disk once per second. Transactions for which logs have not been flushed can be lost in a crash.
值为2,记录事务的redo log会在事务提交后被写入日志缓存log buffer,而刷盘到log file则是每秒一次。发生crash时,未刷盘到log file的事务可能会丢失。
sync_binlog可以取0,1,n。同样查看官方文档:
sync_binlog=0: Disables synchronization of the binary log to disk by the MySQL server. Instead, the MySQL server relies on the operating system to flush the binary log to disk from time to time as it does for any other file. This setting provides the best performance, but in the event of a power failure or operating system crash, it is possible that the server has committed transactions that have not been synchronized to the binary log.
设为0时,由操作系统决定什么时候将bin log刷盘,这种模式性能最好,但是出现断电或者系统crash时,最有可能出现Server提交了还没来得及刷盘到bin log的log file上日志
sync_binlog=1: Enables synchronization of the binary log to disk before transactions are committed. This is the safest setting but can have a negative impact on performance due to the increased number of disk writes. In the event of a power failure or operating system crash, transactions that are missing from the binary log are only in a prepared state. This permits the automatic recovery routine to roll back the transactions, which guarantees that no transaction is lost from the binary log.
设为1时,每提交一个事务就进行一次bin log刷盘,性能会打折扣
sync_binlog=N, where N is a value other than 0 or 1: The binary log is synchronized to disk after Nbinary log commit groups have been collected. In the event of a power failure or operating system crash, it is possible that the server has committed transactions that have not been flushed to the binary log. This setting can have a negative impact on performance due to the increased number of disk writes. A higher value improves performance, but with an increased risk of data loss.
日志设为n时,提交n个事务才会进行一次bin log刷盘,使用者自行平衡性能需要和ACID的需要。