blktrace是一款block层的trace工具,block层在IO路径上的位置:
一个IO的生命周期大约是:
● I/O enters block layer – it can be:
– Remapped onto another device (MD, DM)
– Split into 2 separate I/Os (alignment, size, ...)
– Added to the request queue
– Merged with a previous entry on the queue All I/Os end up on a request queue at some point
● At some later time, the I/O is issued to a device driver, and submitted to a device
● Later, the I/O is completed by the device, and its driver
blkparse显示的各指标点示意:
Q------->G------------>I--------->M------------------->D----------------------------->C
|-Q time-|-Insert time-|
|--------- merge time ------------|-merge with other IO|
|----------------scheduler time time-------------------|---driver,adapter,storagetime--|
|----------------------- await time in iostat output ----------------------------------|
其中:
Q2Q — time between requests sent to the block layer
Q2G — timefrom a block I/O is queued to the time it gets a request allocated
for
it
G2I — time from a request is allocated to the time it is Inserted into the device's queue
Q2M — timefrom a block I/O is queued to the time it gets merged with an existing request
I2D — timefrom a request is inserted into the device's queue to the time it is actually issued to the device
M2D — time froma block I/O is merged with an exiting request until the request is issued to the device
D2C — service time of the request by the device
Q2C — total time spent in the block layerfor
a request
下面通过示例简单介绍使用blktrace工具链分析IO的一般方法:
1,使用blktrace 抓取设备上的IO信息:
blktrace -w 120 -d /dev/nvme0n1
这会在本地目录下面生成device.blktrace.cpu命名的一堆二进制文件
2,使用blkparse读取blktrace生成的二进制文件:
blkparse -i nvme0n1 -d blkparse.out
这个命令会将分析结果输出到屏幕,并且将分析结果的二进制数据输出到blkparse.out文件中
3,使用btt查看和分析各种IO相关数据
3.1 使用btt查看IO的整体情况:
btt -i blkparse.out
上图中几个X2Y的解释:
Q2I – time it takes to process an I/O prior to it being inserted or merged onto a request queue – Includes split, and remap time
I2D – time the I/O is “idle” on the request queue
D2C – time the I/O is “active” in the driver and on the device
Q2I + I2D + D2C = Q2C
Q2C: Total processing time of the I/O
可以看到设备处理时间D2C占整个处理时间Q2C的91.95%
3.3 使用btt查看每个请求的latency的详细情况:
btt -i blkparse.out -q q2c.lat
它会生成下面这些文件:
-rw-r--r-- 1 root root 876 Jun 13 18:14 sys_mbps_fp.dat
-rw-r--r-- 1 root root 451 Jun 13 18:14 sys_iops_fp.dat
-rw-r--r-- 1 root root 429815 Jun 13 18:14 q2c.lat_259,6_q2c.dat
-rw-r--r-- 1 root root 876 Jun 13 18:14 259,6_mbps_fp.dat
-rw-r--r-- 1 root root 451 Jun 13 18:14 259,6_iops_fp.dat
-rw-r--r-- 1 root root 451 Jun 13 18:14 sys_iops_fp.dat
-rw-r--r-- 1 root root 429815 Jun 13 18:14 q2c.lat_259,6_q2c.dat
-rw-r--r-- 1 root root 876 Jun 13 18:14 259,6_mbps_fp.dat
-rw-r--r-- 1 root root 451 Jun 13 18:14 259,6_iops_fp.dat
sys_mbps_fs.dat中是本次统计中所有设备吞吐量,sys_iops_fp.dat中是本次统计中所有设备的IOPS,q2c.lat_259,6_q2c.dat中是每个请求的q2c的latency详情:
第一列表示时间(以秒为单位),第二列表示每个请求的q2c处理时间
也可以用-l查看d2c的latency
3.4 使用btt查看IO pattern
btt -i blkparse.out -B offset
它会生成三个文件:
-rw-r--r-- 1 root root 819132 Jun 13 18:21 offset_259,6_w.dat
-rw-r--r-- 1 root root 108 Jun 13 18:21 offset_259,6_r.dat
-rw-r--r-- 1 root root 819240 Jun 13 18:21 offset_259,6_c.dat
-rw-r--r-- 1 root root 108 Jun 13 18:21 offset_259,6_r.dat
-rw-r--r-- 1 root root 819240 Jun 13 18:21 offset_259,6_c.dat
prefix_device_r.dat
All read block numbers are output, first column is time (seconds), second is the block number, and the third column is the ending block number.
prefix_device_w.dat
All write block numbers are output, first column is time (seconds), second is the block number, and the third column is the ending block number.
prefix_device_c.dat
All block numbers (read and write) are output, first column is time (seconds), second is the block number, and the third column is the ending block number.
All read block numbers are output, first column is time (seconds), second is the block number, and the third column is the ending block number.
prefix_device_w.dat
All write block numbers are output, first column is time (seconds), second is the block number, and the third column is the ending block number.
prefix_device_c.dat
All block numbers (read and write) are output, first column is time (seconds), second is the block number, and the third column is the ending block number.
4,高级功能
blkparse的 -f 选项能从trace数据中抓取特定的信息输出。
比如:
blkparse -i nvme0n1.blktrace.* -f "%5T.%9t, %p, %C, %a, %d, %N " -a complete -o output.txt
它会将进程号(%p),进程名(%C),操作类型(%a),LBA号(%d)和LBA个数(%N)这些信息输出到output.txt中:
其他格式化参数请man blkparse。
更多用法请参考man blktrace和man blkparse.
值得一提的是,blktrace对应用程序的性能影响极小,作者是这么说的:Seeing less than 2% hits to application performance in relatively stressful I/O situations。