zoukankan      html  css  js  c++  java
  • 关于innodb_flush_log_at_trx_commit、innodb_flush_method、innodb_log_block_size和fsync()、O_DIRECT、iops、云盘的关系与总结

     想着整理关于innodb_flush_log_at_trx_commit、innodb_flush_method、innodb_log_block_size和fsync()、O_DIRECT、iops的关系,纯属这两天处理http://www.cnblogs.com/zhjh256/p/6519032.html帖子的问题,顺便整理下,以备下次不用各处重新汇总。

    innodb_flush_log_at_trx_commit用于控制commit的时候log_buffer同步到磁盘的行为,跟oracle中commit_logging、commit_wait参数的作用大同小异吧。不管如何最后都会触发写入disk的行为。这就涉及到innodb_log_block_size和innodb_flush_method了,innodb_log_block_size用于设置innodb log的逻辑大小,默认512,SSD或者ext4下可以选择4096(主要是ext4 fs bs改成4096的原因),具体差别到底多大,没特别仔细测试,反正往上各种答案都有,不过大体差别在15%以内。innodb_flush_method用于InnoDB 控制刷新数据文件和日志文件的行为,根据是否SSD、RAID、SAN或者云盘,不同的配置是会影响IO吞吐量的。取值如下:

    • fdatasync(默认,oracle mysql默认为fsync: use fsync() to flush both the data and log files.
    • O_SYNC: use O_SYNC to open and flush the log files; use fsync() to flush the data files.
    • O_DIRECT: use O_DIRECT to open the data files and fsync() system call to flush both the data and log files.
    • O_DIRECT_NO_FSYNC: use O_DIRECT to open the data files but don’t use fsync() system call to flush both the data and log files. This option isn’t suitable for XFS file system.
    • ALL_O_DIRECT: use O_DIRECT to open both data and log files, and use fsync() to flush the data files but not the log files. This option is recommended when InnoDB log files are big (more than 8GB), otherwise there might be even a performance degradation. Note: When using this option on ext4 filesystem variable innodb_log_block_size should be set to 4096 (default log-block-size in ext4) in order to avoid the unaligned AIO/DIO warnings.

    How each setting affects performance depends on hardware configuration and workload. Benchmark your particular configuration to decide which setting to use, or whether to keep the default setting. Examine the Innodb_data_fsyncs status variable to see the overall number of fsync() calls for each setting. The mix of read and write operations in your workload can affect how a setting performs. For example, on a system with a hardware RAID controller and battery-backed write cache, O_DIRECT can help to avoid double buffering between the InnoDB buffer pool and the operating system file system cache. On some systems where InnoDB data and log files are located on a SAN, the default value or O_DSYNC might be faster for a read-heavy workload with mostly SELECT statements. Always test this parameter with hardware and workload that reflect your production environment. For general I/O tuning advice, see Section 8.5.8, “Optimizing InnoDB Disk I/O”.

    再看下fsync()/fdatasync(),它是linux提供的系统调用。

    fsync() transfers ("flushes") all modified in-core data of (i.e.,
           modified buffer cache pages for) the file referred to by the file
           descriptor fd to the disk device (or other permanent storage device)
           so that all changed information can be retrieved even after the
           system crashed or was rebooted.  This includes writing through or
           flushing a disk cache if present.  The call blocks until the device
           reports that the transfer has completed.  It also flushes metadata
           information associated with the file。
    没什么特别需要解释的,主要是刷新更改数据的缓存到磁盘或者磁盘驱动器自带的缓存。

    最后的重点是O_DIRECT和O_SYNC选项。这些选项主要是系统调用open()(http://man7.org/linux/man-pages/man2/open.2.html)时使用的选项。
    O_DIRECT:读写文件时尽量最小化的使用缓存,绕过操作系统的缓存,常用于具有自己缓存的应用比如数据库,所以特别适用于具有自身缓存的存储。
    O_SYNC:write()+fsync()
    所以,应该来说对于SAN/云盘来讲,可能还是ALL_O_DIRECT会合适些。

    iops,就非DSS系统来说,主要还是关注4k/8/16k为主的顺序写为主,随机io意义不大,因为绝大部分数据都cache在sga/buffer pool中,所以主要还是redo的写入占据绝大部分,checkpoint占据小部分random io。该值要依赖于innodb_flush_log_at_trx_commit、innodb_flush_method、innodb_log_block_size(因为mysql的innodb_log和binlog是分开的,所以还涉及到sync_binlog的值),不同的组合会有不同的tps,因此最差情况下可能是低于dd/iometer/orion测试出来iops的1/2。
    除此之外,还有一个无法控制的因素,即使innodb_flush_log_at_trx_commit=1,innodb_flush_method=O_DIRECT,也不意味着一次commit就是一次fsync,因为os可以决定多个同时commit合并到一个fsync,所以这一点而言,只能是极端情况下每commit/fsync,当然按照这么估计也是合理的。
    具体多少范围合理,一定程度上还是依赖于经验。
    最后,就要说到在云盘了,云盘技术上来说就是SAN的机制,所以对于写入很频繁、但是有没有采用SSD盘的数据库就很尴尬了,因为系统盘只有40G,剩余可用估计也就20G左右,此时如果innodb_flush_log_at_trx_commit=1,如何同时保证不丢失数据、满足性能要求,然后就呵呵考验DBA和架构师的水平了。

    PS: 网上找了张对比图,目前常用的linux iops测试工具有如下:

    从上面可以发现,其中少了fio,主要是fio会损坏文件系统,所以一般不推荐使用。

    iometer使用可参考:http://www.linuxidc.com/Linux/2010-09/28918.htm
  • 相关阅读:
    2013-05-25 14:04 zend studio10正式版如何汉化?
    网页前端优化之滚动延时加载图片
    appium框架之bootstrap
    软件测试面试题(一)
    Java中的经典算法之冒泡排序(Bubble Sort)
    Git知识总览(一) 从 git clone 和 git status 谈起
    Webdriver获取多个元素
    idea注册码到期,破解idea
    安装appium桌面版和命令行版
    Python+Appium学习篇之WebView处理
  • 原文地址:https://www.cnblogs.com/zhjh256/p/6522433.html
Copyright © 2011-2022 走看看