专题:Linux内存管理专题
关键词:slub_debug、kmemleak、kasan、oob、Redzone、Padding。
Linux常见的内存访问错误有:
- 越界访问(out of bounds)
- 访问已经释放的内存(use after free)
- 重复释放
- 内存泄露(memory leak)
- 栈溢出(stack overflow)
不同的工具有不同的侧重点,本章主要从slub_debug、kmemleak、kasan三个工具介绍。
kmemleak侧重于内存泄露问题发现。
slub_debug和kasan有一定的重复,部分slub_debug问题需要借助slabinfo去发现;kasan更快,所有问题独立上报,缺点是需要高版本GCC支持(gcc 4.9.2 or gcc 5.0)。
1 测试环境准备
更新内核版本到Kernel v4.4,然后编译:
git clone https://github.com/arnoldlu/linux.git -b running_kernel_4.4
export ARCH=arm64
export CROSS_COMPILE=aarch64-linux-gnu-
make defconfig
make bzImage -j4 ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu-
2. slub_debug
关键词:Red Zone、Padding、Object Layout。
Linux内核中,小块内存大量使用slab/slub分配器,slub_debug提供了内存检测小功能。
内存中比较容易出错的地方有:
- 访问已经释放的内存
- 越界访问
- 重复释放内存
关于slub_debug的两篇文章:《图解slub》《SLUB DEBUG原理》
2.1 编译支持slub_debug内核
首先需要打开General setup -> Enable SLUB debugging support,然后再选择Kernel hacking -> Memory Debugging -> SLUB debugging on by default。
CONFIG_SLUB=y
CONFIG_SLUB_DEBUG=y
CONFIG_SLUB_DEBUG_ON=y
CONFIG_SLUB_STATS=y
2.3 测试环境:slabinfo、slub.ko
通过slub.ko模拟内存异常访问,有些可以直接显示,有些需要通过slabinfo -v来查看。
在tools/vm目录下,执行如下命令,生成可执行文件slabinfo。放入_install目录,打包到zImage中。
make slabinfo CFLAGS=-static ARCH=arm CROSS_COMPILE=arm-linux-gnueabi-
将编译好的slabinfo放入sbin。
下面三个测试代码:https://github.com/arnoldlu/linux/tree/running_kernel_4.4/test_code/slub_debug
在test_code/slub_debug目录下执行make.sh,将slub.ko/slub2.ko/slub3.ko放入data。
2.4 进行测试
启动QEMU:
qemu-system-aarch64 -machine virt -cpu cortex-a57 -machine type=virt -smp 2 -m 2048 -kernel arch/arm64/boot/Image --append "rdinit=/linuxrc console=ttyAMA0 loglevel=8 slub_debug=UFPZ" -nographic
F:在free的时候会执行检查。
Z:表示Red Zone的意思。
P:是Poison的意思。
U:会记录slab的使用者信息,如果打开,会会显示分配释放对象的栈回溯。
在slub_debug打开SLAB_STORE_USER选项后,可以清晰地看到问题点的backtrace。
2.5 测试结果
内存越界访问包括Redzone overwritten和Object padding overwritten。
重复释放对应Object already free。访问已释放内存为Posion overwritten。
2.5.1 Redzone overwritten
执行insmod data/slub.ko,使用slabinfo -v查看结果。
static void create_slub_error(void) { buf = kmalloc(32, GFP_KERNEL); if(buf) { memset(buf, 0x55, 80);-----------------------------------虽然分配32字节,但是对应分配了64字节。所以设置为80字节访问触发异常。从buf开始的80个字节仍然被初始化成功。 } }
虽然kmalloc申请了32字节的slab缓冲区,但是内核分配的是kmalloc-64。所以memset 36字节不会报错,将36改成大于64即可。
一个slub Debug输出包括四大部分:
=============================================================================
BUG kmalloc-64 (Tainted: G O ): Redzone overwritten-------------------------------------------------------------1. 问题描述:slab名称-kmalloc-64,什么错误-Redzone overwritten。
-----------------------------------------------------------------------------Disabling lock debugging due to kernel taint
INFO: 0xeddb3640-0xeddb3643. First byte 0x55 instead of 0xcc------------------------------------------------1.1 问题起始和结束地址,这里一共4字节。
INFO: Allocated in 0x55555555 age=1766 cpu=0 pid=771---------------------------------------------------------1.2 slab的分配栈回溯
0x55555555
0xbf002014
do_one_initcall+0x90/0x1d8
do_init_module+0x60/0x38c
load_module+0x1bac/0x1e94
SyS_init_module+0x14c/0x15c
ret_fast_syscall+0x0/0x3c
INFO: Freed in do_one_initcall+0x78/0x1d8 age=1766 cpu=0 pid=771-----------------------------------------1.3 slab的释放栈回溯
do_one_initcall+0x78/0x1d8
do_init_module+0x60/0x38c
load_module+0x1bac/0x1e94
SyS_init_module+0x14c/0x15c
ret_fast_syscall+0x0/0x3c
INFO: Slab 0xefdb5660 objects=16 used=14 fp=0xeddb3700 flags=0x0081-----------------------------------1.4 slab的地址,以及其它信息。
INFO: Object 0xeddb3600 @offset=1536 fp=0x55555555-----------------------------------------------------------1.5 当前Object起始,及相关信息Bytes b4 eddb35f0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ------------2. 问题slab对象内容。2.1 打印问题slab对象内容之前一些字节。
Object eddb3600: 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 UUUUUUUUUUUUUUUU---------2.2 slab对象内容,全部为0x55。
Object eddb3610: 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 UUUUUUUUUUUUUUUU
Object eddb3620: 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 UUUUUUUUUUUUUUUU
Object eddb3630: 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 UUUUUUUUUUUUUUUU
Redzone eddb3640: 55 55 55 55 UUUU----------------------------------------------------------------------------------2.3 Redzone内容,问题出在这里。
Padding eddb36e8: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ------------2.4 Padding内容,为了对象对齐而补充。
Padding eddb36f8: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ
CPU: 2 PID: 773 Comm: slabinfo Tainted: G B O 4.4.0+ #93--------------------------------------------------------3. 检查问题点的栈打印,这里是由于slabinfo找出来的。
Hardware name: ARM-Versatile Express
[<c0016588>] (unwind_backtrace) from [<c0013070>] (show_stack+0x10/0x14)
[<c0013070>] (show_stack) from [<c0244130>] (dump_stack+0x78/0x88)
[<c0244130>] (dump_stack) from [<c00e1874>] (check_bytes_and_report+0xd0/0x10c)
[<c00e1874>] (check_bytes_and_report) from [<c00e1a14>] (check_object+0x164/0x234)
[<c00e1a14>] (check_object) from [<c00e29bc>] (validate_slab_slab+0x198/0x1bc)
[<c00e29bc>] (validate_slab_slab) from [<c00e578c>] (validate_store+0xac/0x190)
[<c00e578c>] (validate_store) from [<c0146780>] (kernfs_fop_write+0xb8/0x1b4)
[<c0146780>] (kernfs_fop_write) from [<c00ebfc4>] (__vfs_write+0x1c/0xd8)
[<c00ebfc4>] (__vfs_write) from [<c00ec808>] (vfs_write+0x90/0x170)
[<c00ec808>] (vfs_write) from [<c00ed008>] (SyS_write+0x3c/0x90)
[<c00ed008>] (SyS_write) from [<c000f3c0>] (ret_fast_syscall+0x0/0x3c)
FIX kmalloc-64: Restoring 0xeddb3640-0xeddb3643=0xcc----------------------------------------------------------4. 问题点是如何被解决的,此处恢复4个字节为0xcc。
2.5.2 Object padding overwritten
void create_slub_error(void) { int i; buf = kmalloc(32, GFP_KERNEL); if(buf) { buf[-1] = 0x55;------------------------------------------------------------------------向左越界访问 kfree(buf); } }
执行insmod data/slub4.ko,结果如下。
这里的越界访问和之前有点不一样的是,这里向左越界。覆盖到了Padding区域。
al: slub error test init
=============================================================================
BUG kmalloc-128 (Tainted: G O ): Object padding overwritten------------------------------------------------------覆盖到Padding区域
-----------------------------------------------------------------------------Disabling lock debugging due to kernel taint
INFO: 0xffff80007767e9ff-0xffff80007767e9ff. First byte 0x55 instead of 0x5a
INFO: Allocated in call_usermodehelper_setup+0x44/0xb8 age=1 cpu=1 pid=789
alloc_debug_processing+0x17c/0x188
___slab_alloc.constprop.30+0x3f8/0x440
__slab_alloc.isra.27.constprop.29+0x24/0x38
kmem_cache_alloc+0x1ec/0x260
call_usermodehelper_setup+0x44/0xb8
/ # kobject_uevent_env+0x494/0x500
kobject_uevent+0x10/0x18
load_module+0x18cc/0x1d78
SyS_init_module+0x150/0x178
el0_svc_naked+0x24/0x28
INFO: Slab 0xffff7bffc2dd9f80 objects=16 used=9 fp=0xffff80007767ea00 flags=0x4081
INFO: Object 0xffff80007767e800 @offset=2048 fp=0xffff80007767ea00Bytes b4 ffff80007767e7f0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
Object ffff80007767e800: 00 01 00 00 00 00 00 00 08 e8 67 77 00 80 ff ff ..........gw....
Object ffff80007767e810: 08 e8 67 77 00 80 ff ff f8 83 0c 00 00 80 ff ff ..gw............
Object ffff80007767e820: 00 00 00 00 00 00 00 00 00 6e aa 00 00 80 ff ff .........n......
Object ffff80007767e830: 00 23 67 78 00 80 ff ff 18 23 67 78 00 80 ff ff .#gx.....#gx....
Object ffff80007767e840: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object ffff80007767e850: b8 8e 32 00 00 80 ff ff 00 23 67 78 00 80 ff ff ..2......#gx....
Object ffff80007767e860: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object ffff80007767e870: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Redzone ffff80007767e880: cc cc cc cc cc cc cc cc ........
Padding ffff80007767e9c0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
Padding ffff80007767e9d0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
Padding ffff80007767e9e0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
Padding ffff80007767e9f0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 55 ZZZZZZZZZZZZZZZU
CPU: 0 PID: 790 Comm: mdev Tainted: G B O 4.4.0+ #116
Hardware name: linux,dummy-virt (DT)
Call trace:
[<ffff800000089738>] dump_backtrace+0x0/0x108
[<ffff800000089854>] show_stack+0x14/0x20
[<ffff8000003253c4>] dump_stack+0x94/0xd0
[<ffff800000196460>] print_trailer+0x128/0x1b8
[<ffff800000196848>] check_bytes_and_report+0xd8/0x118
[<ffff800000196928>] check_object+0xa0/0x240
[<ffff8000001987e0>] free_debug_processing+0x128/0x380
[<ffff80000019a1cc>] __slab_free+0x344/0x4a0
[<ffff80000019ab94>] kfree+0x1ec/0x220
[<ffff8000000c8278>] umh_complete+0x58/0x68
[<ffff8000000c83d8>] call_usermodehelper_exec_async+0x150/0x170
[<ffff800000085c50>] ret_from_fork+0x10/0x40
FIX kmalloc-128: Restoring 0xffff80007767e9ff-0xffff80007767e9ff=0x5a---------------------------------------------------------问题处理是将对应字节恢复为0x5a。
2.5.3 Object already free
void create_slub_error(void) { buf = kmalloc(32, GFP_KERNEL); if(buf) { memset(buf, 0x55, 32); kfree(buf); printk("al: Object already freed"); kfree(buf); } }
内核中free执行流程如下:
kfree
->slab_free
->__slab_free
->kmem_cache_debug
->free_debug_processing
->on_freelist
执行insmod data/slub2.ko,结果如下。
重复释放,是对同一个对象连续释放了多次。
al: slub error test init
al: Object already freed
=============================================================================
BUG kmalloc-128 (Tainted: G B O ): Object already free------------------------------------------------------------------在64位系统,32字节的kmalloc变成了kmalloc-128,问题类型是:Object already free,也即重复释放。
-----------------------------------------------------------------------------INFO: Allocated in create_slub_error+0x20/0x80 [slub2] age=0 cpu=1 pid=791------------------------------------内存分配点栈回溯
alloc_debug_processing+0x17c/0x188
___slab_alloc.constprop.30+0x3f8/0x440
__slab_alloc.isra.27.constprop.29+0x24/0x38
kmem_cache_alloc+0x1ec/0x260
create_slub_error+0x20/0x80 [slub2]
my_test_init+0x14/0x28 [slub2]
do_one_initcall+0x90/0x1a0
do_init_module+0x60/0x1cc
load_module+0x18dc/0x1d78
SyS_init_module+0x150/0x178
el0_svc_naked+0x24/0x28
INFO: Freed in create_slub_error+0x50/0x80 [slub2] age=0 cpu=1 pid=791------------------------------------------内存释放点栈回溯
free_debug_processing+0x17c/0x380
__slab_free+0x344/0x4a0
kfree+0x1ec/0x220
create_slub_error+0x50/0x80 [slub2]
my_test_init+0x14/0x28 [slub2]
do_one_initcall+0x90/0x1a0
do_init_module+0x60/0x1cc
load_module+0x18dc/0x1d78
SyS_init_module+0x150/0x178
el0_svc_naked+0x24/0x28
INFO: Slab 0xffff7bffc2dda800 objects=16 used=7 fp=0xffff8000776a0800 flags=0x4081
INFO: Object 0xffff8000776a0800 @offset=2048 fp=0xffff8000776a0a00Bytes b4 ffff8000776a07f0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
Object ffff8000776a0800: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk-----------------内存内容打印,供128字节。
Object ffff8000776a0810: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object ffff8000776a0820: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object ffff8000776a0830: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object ffff8000776a0840: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object ffff8000776a0850: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object ffff8000776a0860: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object ffff8000776a0870: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 kkkkkkkkkkkkkkk.
Redzone ffff8000776a0880: bb bb bb bb bb bb bb bb ........
Padding ffff8000776a09c0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
Padding ffff8000776a09d0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
Padding ffff8000776a09e0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
Padding ffff8000776a09f0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
CPU: 1 PID: 791 Comm: insmod Tainted: G B O 4.4.0+ #116--------------------------------------------------------------此处问题在insmod就发现了,所以检查出问题的进程就是insmod。
Hardware name: linux,dummy-virt (DT)
Call trace:
[<ffff800000089738>] dump_backtrace+0x0/0x108
[<ffff800000089854>] show_stack+0x14/0x20
[<ffff8000003253c4>] dump_stack+0x94/0xd0
[<ffff800000196460>] print_trailer+0x128/0x1b8
[<ffff800000198954>] free_debug_processing+0x29c/0x380
[<ffff80000019a1cc>] __slab_free+0x344/0x4a0
[<ffff80000019ab94>] kfree+0x1ec/0x220
[<ffff7ffffc008060>] create_slub_error+0x60/0x80 [slub2]
[<ffff7ffffc00a014>] my_test_init+0x14/0x28 [slub2]
[<ffff800000082930>] do_one_initcall+0x90/0x1a0
[<ffff80000014647c>] do_init_module+0x60/0x1cc
[<ffff800000120704>] load_module+0x18dc/0x1d78
[<ffff800000120cf0>] SyS_init_module+0x150/0x178
[<ffff800000085cb0>] el0_svc_naked+0x24/0x28
FIX kmalloc-128: Object at 0xffff8000776a0800 not freed------------------------------------------------------------------处理的结果是,此处slab 对象是没有被释放。
2.5.4 Poison overwritten
访问已释放内存的测试代码如下:
static void create_slub_error(void)
{
buf = kmalloc(32, GFP_KERNEL);-----------------------此时的buf内容都是0x6B
if(buf) {
kfree(buf);
printk("al: Access after free");
memset(buf, 0x55, 32);-----------------------------虽然被释放,但是memset仍然生效了变成了0x55。
}
}
执行insmod data/slub3.ko ,使用slabinfo -v查看结果。
=============================================================================
BUG kmalloc-128 (Tainted: G B O ): Poison overwritten----------------------------------------------slab名称为kmalloc-64,问题类型是:Poison overwritten,即访问已释放内存。
-----------------------------------------------------------------------------INFO: 0xffff800077692800-0xffff80007769281f. First byte 0x55 instead of 0x6b
INFO: Allocated in create_slub_error+0x28/0xf0 [slub3] age=1089 cpu=1 pid=793----------分配点的栈回溯
alloc_debug_processing+0x17c/0x188
___slab_alloc.constprop.30+0x3f8/0x440
__slab_alloc.isra.27.constprop.29+0x24/0x38
kmem_cache_alloc+0x1ec/0x260
create_slub_error+0x28/0xf0 [slub3]
0xffff7ffffc00e014
do_one_initcall+0x90/0x1a0
do_init_module+0x60/0x1cc
load_module+0x18dc/0x1d78
SyS_init_module+0x150/0x178
el0_svc_naked+0x24/0x28
INFO: Freed in create_slub_error+0x80/0xf0 [slub3] age=1089 cpu=1 pid=793--------------释放点的栈回溯
free_debug_processing+0x17c/0x380
__slab_free+0x344/0x4a0
kfree+0x1ec/0x220
create_slub_error+0x80/0xf0 [slub3]
0xffff7ffffc00e014
do_one_initcall+0x90/0x1a0
do_init_module+0x60/0x1cc
load_module+0x18dc/0x1d78
SyS_init_module+0x150/0x178
el0_svc_naked+0x24/0x28
INFO: Slab 0xffff7bffc2dda480 objects=16 used=16 fp=0x (null) flags=0x4080
INFO: Object 0xffff800077692800 @offset=2048 fp=0xffff800077692400Bytes b4 ffff8000776927f0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
Object ffff800077692800: 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 UUUUUUUUUUUUUUUU--------前32字节仍然被修改成功。
Object ffff800077692810: 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 UUUUUUUUUUUUUUUU
Object ffff800077692820: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object ffff800077692830: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object ffff800077692840: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object ffff800077692850: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object ffff800077692860: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object ffff800077692870: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 kkkkkkkkkkkkkkk.
Redzone ffff800077692880: bb bb bb bb bb bb bb bb ........
Padding ffff8000776929c0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
Padding ffff8000776929d0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
Padding ffff8000776929e0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
Padding ffff8000776929f0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
CPU: 0 PID: 795 Comm: slabinfo Tainted: G B O 4.4.0+ #116
Hardware name: linux,dummy-virt (DT)
Call trace:
[<ffff800000089738>] dump_backtrace+0x0/0x108
[<ffff800000089854>] show_stack+0x14/0x20
[<ffff8000003253c4>] dump_stack+0x94/0xd0
[<ffff800000196460>] print_trailer+0x128/0x1b8
[<ffff800000196848>] check_bytes_and_report+0xd8/0x118
[<ffff800000196a54>] check_object+0x1cc/0x240
[<ffff800000197920>] alloc_debug_processing+0x108/0x188
[<ffff800000199670>] ___slab_alloc.constprop.30+0x3f8/0x440
[<ffff8000001996dc>] __slab_alloc.isra.27.constprop.29+0x24/0x38
[<ffff8000001998dc>] kmem_cache_alloc+0x1ec/0x260
[<ffff8000001d42fc>] seq_open+0x34/0x90
[<ffff80000022059c>] kernfs_fop_open+0x194/0x370
[<ffff8000001afb04>] do_dentry_open+0x214/0x318
[<ffff8000001b0dc8>] vfs_open+0x58/0x68
[<ffff8000001bf338>] path_openat+0x460/0xdf0
[<ffff8000001c0ff0>] do_filp_open+0x60/0xe0
[<ffff8000001b117c>] do_sys_open+0x12c/0x218
[<ffff8000001fd53c>] compat_SyS_open+0x1c/0x28
[<ffff800000085cb0>] el0_svc_naked+0x24/0x28
FIX kmalloc-128: Restoring 0xffff800077692800-0xffff80007769281f=0x6bFIX kmalloc-128: Marking all objects used
SLUB: kmalloc-128 210 slabs counted but counter=211
slabinfo (795) used greatest stack depth: 12976 bytes left
3. kmemleak
kmemleak是内核提供的一种检测内存泄露工具,启动一个内核线程扫描内存,并打印发现新的未引用对象数量。
3.1 支持kmemleak内核选项
要使用kmemlieak,需要打开如下内核选项。
Kernel hacking->Memory Debugging->Kernel memory leak detector:
CONFIG_HAVE_DEBUG_KMEMLEAK=y
CONFIG_DEBUG_KMEMLEAK=y
CONFIG_DEBUG_KMEMLEAK_EARLY_LOG_SIZE=400
# CONFIG_DEBUG_KMEMLEAK_TEST is not set
CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF=y---------或者关闭此选项,则不需要在命令行添加kmemleak=on。
3.2 构造测试环境
同时还需要在内核启动命令行中添加kmemleak=on。
qemu-system-aarch64 -machine virt -cpu cortex-a57 -machine type=virt -smp 2 -m 2048 -kernel arch/arm64/boot/Image --append "rdinit=/linuxrc console=ttyAMA0 loglevel=8 kmemleak=on" -nographic
测试代码如下:
static char *buf; void create_kmemleak(void) { buf = kmalloc(120, GFP_KERNEL); buf = vmalloc(4096); }
3.3 进行测试
进行kmemleak测试之前,需要写入scan触发扫描操作。
然后通过读kmemlean节点读取相关信息。
- 打开kmemlean扫描功能:echo scan > sys/kernel/debug/kmemleak
- 加载问题module:insmod data/kmemleak.ko
- 等待问题发现:kmemleak: 2 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
- 查看kmemleak结果:cat /sys/kernel/debug/kmemleak
3.4 分析测试结果
每处泄露,都标出泄露地址和大小;相关进程信息;内存内容dump;栈回溯。
kmemleak会提示内存泄露可疑对象的具体栈调用信息、可疑对象的大小、使用哪个函数分配、二进制打印。
unreferenced object 0xede22dc0 (size 128):-------------------------------------第一处可疑泄露128字节 comm "insmod", pid 765, jiffies 4294941257 (age 104.920s)--------------------相关进程信息 hex dump (first 32 bytes):---------------------------------------------------二进制打印 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk backtrace:-------------------------------------------------------------------栈回溯 [<bf002014>] 0xbf002014 [<c000973c>] do_one_initcall+0x90/0x1d8 [<c00a71f4>] do_init_module+0x60/0x38c [<c0086898>] load_module+0x1bac/0x1e94 [<c0086ccc>] SyS_init_module+0x14c/0x15c [<c000f3c0>] ret_fast_syscall+0x0/0x3c [<ffffffff>] 0xffffffff unreferenced object 0xf12ba000 (size 4096): comm "insmod", pid 765, jiffies 4294941257 (age 104.920s) hex dump (first 32 bytes): d8 21 00 00 02 18 00 00 e4 21 00 00 02 18 00 00 .!.......!...... 46 22 00 00 02 18 00 00 52 22 00 00 02 18 00 00 F"......R"...... backtrace: [<c00d77c8>] vmalloc+0x2c/0x34 [<bf002014>] 0xbf002014 [<c000973c>] do_one_initcall+0x90/0x1d8 [<c00a71f4>] do_init_module+0x60/0x38c [<c0086898>] load_module+0x1bac/0x1e94 [<c0086ccc>] SyS_init_module+0x14c/0x15c [<c000f3c0>] ret_fast_syscall+0x0/0x3c [<ffffffff>] 0xffffffff
4. kasan
相关文档阅读:《Kasan - Linux 内核的内存检测工具》《KASAN实现原理》。
kasan暂不支持32位ARM,支持ARM64和X86。
kasan是一个动态检查内存错误的工具,可以检查内存越界访问、使用已释放内存、重复释放以及栈溢出。
4.1 使能kasan
使用kasan,必须打开CONFIG_KASAN。
Kernel hacking->Memory debugging->KASan: runtime memory debugger
CONFIG_HAVE_ARCH_KASAN=y
CONFIG_KASAN=y
# CONFIG_KASAN_OUTLINE is not set
CONFIG_KASAN_INLINE=y
CONFIG_TEST_KASAN=m
4.2 代码分析
kasan_report
->kasan_report_error
->print_error_description
->print_address_description
->print_shadow_for_address
4.3 测试用及分析
kasan提供了一个测试程序test_kacan.c,将其编译成模块,加载到内核。可以模拟很多内存错误场景。
kasan可以检测到越界访问、访问已释放内存、重复释放等类型错误,其中重复释放借助于slub_debug。
insmod data/kasan.ko
越界访问包括slab越界、栈越界、全局变量越界;访问已释放内存use-after-free;重复释放可以被slub_debug识别。
4.3.1 slab-out-of-bounds
static noinline void __init kmalloc_oob_right(void) { char *ptr; size_t size = 123; pr_info("out-of-bounds to right "); ptr = kmalloc(size, GFP_KERNEL); if (!ptr) { pr_err("Allocation failed "); return; } ptr[size] = 'x'; kfree(ptr); }
此种错误类型是对slab的越界访问,包括左侧、右侧、扩大、缩小后越界访问。除了数组赋值,还包括memset、指针访问等等。
al: kasan error test init
kasan test: kmalloc_oob_right out-of-bounds to right
==================================================================
BUG: KASAN: slab-out-of-bounds in kmalloc_oob_right+0xa4/0xe0 [kasan] at addr ffff800066539c7b----------------错误类型是slab-out-of-bounds,在kmalloc_oob_right中产生。
Write of size 1 by task insmod/788
=============================================================================
BUG kmalloc-128 (Tainted: G O ): kasan: bad access detected-------------------------------------------------------------------slab非法非法访问
-----------------------------------------------------------------------------Disabling lock debugging due to kernel taint
INFO: Allocated in kmalloc_oob_right+0x54/0xe0 [kasan] age=0 cpu=1 pid=788--------------------------------------------问题点kmalloc_oob_right的栈回溯
alloc_debug_processing+0x17c/0x188
___slab_alloc.constprop.30+0x3f8/0x440
__slab_alloc.isra.27.constprop.29+0x24/0x38
kmem_cache_alloc+0x220/0x280
kmalloc_oob_right+0x54/0xe0 [kasan]
kmalloc_tests_init+0x18/0x70 [kasan]
do_one_initcall+0x11c/0x310
do_init_module+0x1cc/0x588
load_module+0x48cc/0x5dc0
SyS_init_module+0x1a8/0x1e0
el0_svc_naked+0x24/0x28
INFO: Freed in do_one_initcall+0x10c/0x310 age=0 cpu=1 pid=788
free_debug_processing+0x17c/0x368
__slab_free+0x344/0x4a0
kfree+0x21c/0x250
do_one_initcall+0x10c/0x310
do_init_module+0x1cc/0x588
load_module+0x48cc/0x5dc0
SyS_init_module+0x1a8/0x1e0
el0_svc_naked+0x24/0x28
INFO: Slab 0xffff7bffc2994e00 objects=16 used=2 fp=0xffff800066539e00 flags=0x4080
INFO: Object 0xffff800066539c00 @offset=7168 fp=0xffff800066538200Bytes b4 ffff800066539bf0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................------------------------------内存dump
Object ffff800066539c00: 00 82 53 66 00 80 ff ff 74 65 73 74 73 5f 69 6e ..Sf....tests_in
Object ffff800066539c10: 69 74 20 5b 6b 61 73 61 6e 5d 00 00 00 00 00 00 it [kasan]......
Object ffff800066539c20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object ffff800066539c30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object ffff800066539c40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object ffff800066539c50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object ffff800066539c60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object ffff800066539c70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Padding ffff800066539db0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Padding ffff800066539dc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Padding ffff800066539dd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Padding ffff800066539de0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Padding ffff800066539df0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
CPU: 1 PID: 788 Comm: insmod Tainted: G B O 4.4.0+ #108------------------------------------------------------------------打印此log消息的栈回溯
Hardware name: linux,dummy-virt (DT)
Call trace:
[<ffff80000008e938>] dump_backtrace+0x0/0x270
[<ffff80000008ebbc>] show_stack+0x14/0x20
[<ffff800000735bb0>] dump_stack+0x100/0x188
[<ffff800000318f60>] print_trailer+0xf8/0x160
[<ffff80000031ea8c>] object_err+0x3c/0x50
[<ffff8000003209a0>] kasan_report_error+0x240/0x558
[<ffff800000320e90>] __asan_report_store1_noabort+0x48/0x50
[<ffff7ffffc008324>] kmalloc_oob_right+0xa4/0xe0 [kasan]
[<ffff7ffffc009070>] kmalloc_tests_init+0x18/0x70 [kasan]
[<ffff80000008309c>] do_one_initcall+0x11c/0x310
[<ffff8000002648c4>] do_init_module+0x1cc/0x588
[<ffff800000206724>] load_module+0x48cc/0x5dc0
[<ffff800000207dc0>] SyS_init_module+0x1a8/0x1e0
[<ffff800000086cb0>] el0_svc_naked+0x24/0x28
Memory state around the buggy address:
ffff800066539b00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff800066539b80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff800066539c00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03
^
ffff800066539c80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff800066539d00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================
4.3.2 user-after-free
user-after-free是释放后使用的意思。
static noinline void __init kmalloc_uaf(void) { char *ptr; size_t size = 10; pr_info("use-after-free "); ptr = kmalloc(size, GFP_KERNEL); if (!ptr) { pr_err("Allocation failed "); return; } kfree(ptr); *(ptr + 8) = 'x'; }
测试结果如下:
kasan test: kmalloc_uaf use-after-free
==================================================================
BUG: KASAN: use-after-free in kmalloc_uaf+0xac/0xe0 [kasan] at addr ffff800066539e08
Write of size 1 by task insmod/788
=============================================================================
BUG kmalloc-128 (Tainted: G B O ): kasan: bad access detected
-----------------------------------------------------------------------------INFO: Allocated in kmalloc_uaf+0x54/0xe0 [kasan] age=0 cpu=1 pid=788
alloc_debug_processing+0x17c/0x188
___slab_alloc.constprop.30+0x3f8/0x440
__slab_alloc.isra.27.constprop.29+0x24/0x38
kmem_cache_alloc+0x220/0x280
kmalloc_uaf+0x54/0xe0 [kasan]
kmalloc_tests_init+0x48/0x70 [kasan]
do_one_initcall+0x11c/0x310
do_init_module+0x1cc/0x588
load_module+0x48cc/0x5dc0
SyS_init_module+0x1a8/0x1e0
el0_svc_naked+0x24/0x28
INFO: Freed in kmalloc_uaf+0x84/0xe0 [kasan] age=0 cpu=1 pid=788
free_debug_processing+0x17c/0x368
__slab_free+0x344/0x4a0
kfree+0x21c/0x250
kmalloc_uaf+0x84/0xe0 [kasan]
kmalloc_tests_init+0x48/0x70 [kasan]
do_one_initcall+0x11c/0x310
do_init_module+0x1cc/0x588
load_module+0x48cc/0x5dc0
SyS_init_module+0x1a8/0x1e0
el0_svc_naked+0x24/0x28
INFO: Slab 0xffff7bffc2994e00 objects=16 used=1 fp=0xffff800066539e00 flags=0x4080
INFO: Object 0xffff800066539e00 @offset=7680 fp=0xffff800066539800Bytes b4 ffff800066539df0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object ffff800066539e00: 00 98 53 66 00 80 ff ff 00 00 00 00 00 00 00 00 ..Sf............
Object ffff800066539e10: 00 9e 53 66 00 80 ff ff d0 51 12 00 00 80 ff ff ..Sf.....Q......
Object ffff800066539e20: 00 00 00 00 00 00 00 00 e0 14 6d 01 00 80 ff ff ..........m.....
Object ffff800066539e30: 00 69 a3 66 00 80 ff ff 18 69 a3 66 00 80 ff ff .i.f.....i.f....
Object ffff800066539e40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object ffff800066539e50: 30 da 73 00 00 80 ff ff 00 69 a3 66 00 80 ff ff 0.s......i.f....
Object ffff800066539e60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object ffff800066539e70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Padding ffff800066539fb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Padding ffff800066539fc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Padding ffff800066539fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Padding ffff800066539fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Padding ffff800066539ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
CPU: 1 PID: 788 Comm: insmod Tainted: G B O 4.4.0+ #108
Hardware name: linux,dummy-virt (DT)
Call trace:
[<ffff80000008e938>] dump_backtrace+0x0/0x270
[<ffff80000008ebbc>] show_stack+0x14/0x20
[<ffff800000735bb0>] dump_stack+0x100/0x188
[<ffff800000318f60>] print_trailer+0xf8/0x160
[<ffff80000031ea8c>] object_err+0x3c/0x50
[<ffff8000003209a0>] kasan_report_error+0x240/0x558
[<ffff800000320e90>] __asan_report_store1_noabort+0x48/0x50
[<ffff7ffffc00874c>] kmalloc_uaf+0xac/0xe0 [kasan]
[<ffff7ffffc0090a0>] kmalloc_tests_init+0x48/0x70 [kasan]
[<ffff80000008309c>] do_one_initcall+0x11c/0x310
[<ffff8000002648c4>] do_init_module+0x1cc/0x588
[<ffff800000206724>] load_module+0x48cc/0x5dc0
[<ffff800000207dc0>] SyS_init_module+0x1a8/0x1e0
[<ffff800000086cb0>] el0_svc_naked+0x24/0x28
Memory state around the buggy address:
ffff800066539d00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff800066539d80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff800066539e00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff800066539e80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff800066539f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================
4.3.3 stack-out-of-bounds
栈越界访问是函数中数组越界,在实际工程中经常出现,问题难以发现。
static noinline void __init kasan_stack_oob(void) { char stack_array[10]; volatile int i = 0; char *p = &stack_array[ARRAY_SIZE(stack_array) + i]; pr_info("out-of-bounds on stack "); *(volatile char *)p; }
kasan test: kasan_stack_oob out-of-bounds on stack
==================================================================
BUG: KASAN: stack-out-of-bounds in kasan_stack_oob+0xa8/0xf0 [kasan] at addr ffff800066acb95a
Read of size 1 by task insmod/788
page:ffff7bffc29ab2c0 count:0 mapcount:0 mapping: (null) index:0x0
flags: 0x0()
page dumped because: kasan: bad access detected
CPU: 1 PID: 788 Comm: insmod Tainted: G B O 4.4.0+ #108
Hardware name: linux,dummy-virt (DT)
Call trace:
[<ffff80000008e938>] dump_backtrace+0x0/0x270
[<ffff80000008ebbc>] show_stack+0x14/0x20
[<ffff800000735bb0>] dump_stack+0x100/0x188
[<ffff800000320c90>] kasan_report_error+0x530/0x558
[<ffff800000320d00>] __asan_report_load1_noabort+0x48/0x50
[<ffff7ffffc0080a8>] kasan_stack_oob+0xa8/0xf0 [kasan]
[<ffff7ffffc0090b0>] kmalloc_tests_init+0x58/0x70 [kasan]
[<ffff80000008309c>] do_one_initcall+0x11c/0x310
[<ffff8000002648c4>] do_init_module+0x1cc/0x588
[<ffff800000206724>] load_module+0x48cc/0x5dc0
[<ffff800000207dc0>] SyS_init_module+0x1a8/0x1e0
[<ffff800000086cb0>] el0_svc_naked+0x24/0x28
Memory state around the buggy address:
ffff800066acb800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff800066acb880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1
>ffff800066acb900: f1 f1 04 f4 f4 f4 f2 f2 f2 f2 00 02 f4 f4 f3 f3
^
ffff800066acb980: f3 f3 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1
ffff800066acba00: f1 f1 00 00 00 00 00 00 00 00 f3 f3 f3 f3 00 00
==================================================================
4.3.4 global-out-of-bounds
static char global_array[10]; static noinline void __init kasan_global_oob(void) { volatile int i = 3; char *p = &global_array[ARRAY_SIZE(global_array) + i]; pr_info("out-of-bounds global variable "); *(volatile char *)p; }
测试结果如下:
kasan test: kasan_global_oob out-of-bounds global variable
==================================================================
BUG: KASAN: global-out-of-bounds in kasan_global_oob+0x9c/0xe8 [kasan] at addr ffff7ffffc001c8d
Read of size 1 by task insmod/788
Address belongs to variable global_array+0xd/0xffffffffffffe3f8 [kasan]
CPU: 1 PID: 788 Comm: insmod Tainted: G B O 4.4.0+ #108
Hardware name: linux,dummy-virt (DT)
Call trace:
[<ffff80000008e938>] dump_backtrace+0x0/0x270
[<ffff80000008ebbc>] show_stack+0x14/0x20
[<ffff800000735bb0>] dump_stack+0x100/0x188
[<ffff800000320c90>] kasan_report_error+0x530/0x558
[<ffff800000320d00>] __asan_report_load1_noabort+0x48/0x50
[<ffff7ffffc00818c>] kasan_global_oob+0x9c/0xe8 [kasan]
[<ffff7ffffc0090b4>] kmalloc_tests_init+0x5c/0x70 [kasan]
[<ffff80000008309c>] do_one_initcall+0x11c/0x310
[<ffff8000002648c4>] do_init_module+0x1cc/0x588
[<ffff800000206724>] load_module+0x48cc/0x5dc0
[<ffff800000207dc0>] SyS_init_module+0x1a8/0x1e0
[<ffff800000086cb0>] el0_svc_naked+0x24/0x28
Memory state around the buggy address:
ffff7ffffc001b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff7ffffc001c00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff7ffffc001c80: 00 02 fa fa fa fa fa fa 00 00 00 00 00 00 00 00
^
ffff7ffffc001d00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff7ffffc001d80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================
5. 小结
kmemleak检查内存泄露的独门绝技,让其有一定市场空间。但功能比较单一,专注于内存泄露问题。
对于非ARM64/x86平台,只能使用slub_debug进行内存问题分析;kasan更高效,但也需要更高的内核和GCC版本支持。