zoukankan      html  css  js  c++  java
  • linux 内核假死循环导致的问题

    [445713.507663] Pid: 32739, comm: 117-IFileSender Tainted: G    B      ENX 3.0.101-0.47.90-default #1 ZTE Grantley/S1008
    [445713.507671] RIP: 0010:[<ffffffff810fb2cb>]  [<ffffffff810fb2cb>] find_get_pages_contig+0x5b/0x110
    [445713.507678] RSP: 0018:ffff883edfc2fab8  EFLAGS: 00000282
    [445713.507679] RAX: ffff881f68ef68d0 RBX: 0000000000000001 RCX: 0000000000004000
    [445713.507681] RDX: 0000000000000000 RSI: ffffea000a9164e8 RDI: ffffea000a9164e0
    [445713.507682] RBP: 0000000000000000 R08: 0000000000000006 R09: ffff883edfc2fa78
    [445713.507683] R10: 0000000000000005 R11: 000000000000003f R12: ffffffff8146ffee
    [445713.507684] R13: 000000000006cc46 R14: 00000000000007fe R15: ffffe8e001baa620
    [445713.507686] FS:  00007f06f82a9700(0000) GS:ffff88207fda0000(0000) knlGS:0000000000000000
    [445713.507688] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [445713.507689] CR2: 00007fd39233d810 CR3: 0000003ec0289000 CR4: 00000000001407e0
    [445713.507690] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [445713.507691] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    [445713.507693] Process 117-IFileSender (pid: 32739, threadinfo ffff883edfc2e000, task ffff883edfc2c480)
    [445713.507694] Stack:
    [445713.507700]  0000001400000000 000000059715f240 ffff881f6a1bd2e8 0000000000003ffb
    [445713.507704]  ffff881dab7a5180 0000000000000000 0000000000000005 0000000000004c04
    [445713.507706]  ffff883edfc2fdf8 ffffffff811898f1 ffff881d69cf4c40 ffff881dab7a5180
    [445713.507709] Call Trace:
    [445713.507718]  [<ffffffff811898f1>] __generic_file_splice_read+0x111/0x4c0
    [445713.507722]  [<ffffffff81189ceb>] generic_file_splice_read+0x4b/0x90
    [445713.507767]  [<ffffffffa05fec5a>] xfs_file_splice_read+0x14a/0x200 [xfs]
    [445713.507832]  [<ffffffff81188c6c>] splice_direct_to_actor+0xcc/0x1d0
    [445713.507837]  [<ffffffffa070a9ab>] do_sendfile+0x1db/0x270 [witdriver]
    [445713.507844]  [<ffffffffa0716b3a>] my_sendfile+0x2da/0x530 [witdriver]
    [445713.507863]  [<ffffffff8146f5f2>] system_call_fastpath+0x16/0x1b
    [445713.507870]  [<00007f07326683c9>] 0x7f07326683c8

    获取页面超时,反汇编响应的代码,确定了是在进行page计数增加。

            if (!page_cache_get_speculative(page))
                goto repeat;

    根据代码逻辑,page的计数不对,本来page的计数为0是一个瞬间状态,结果却一直在循环。

    查看crash之前的打印,找到对应的日志为:

    kernel: [111747.127259] BUG: Bad page state in process 93-IFileSender  pfn:2375c8e
    kernel: [111747.127259] BUG: Bad page state in process 93-IFileSender  pfn:2375c8e
    kernel: [111747.127263] page:ffffea007c1c3f10 count:0 mapcount:0 mapping:ffff882b63e766e0 index:0x3ffc
    kernel: [111747.127263] page:ffffea007c1c3f10 count:0 mapcount:0 mapping:ffff882b63e766e0 index:0x3ffc
    kernel: [111747.127265] page flags: 0x60000000080010(uptodate|mappedtodisk)
    kernel: [111747.127265] page flags: 0x60000000080010(uptodate|mappedtodisk)
    kernel: [111747.127269] Pid: 26575, comm: 93-IFileSender Tainted: G           ENX 3.0.101-0.47.90-default #1
    kernel: [111747.127269] Pid: 26575, comm: 93-IFileSender Tainted: G           ENX 3.0.101-0.47.90-default #1
    kernel: [111747.127271] Call Trace:
    kernel: [111747.127271] Call Trace:
    kernel: [111747.127284]  [<ffffffff81004b95>] dump_trace+0x75/0x300
    kernel: [111747.127284]  [<ffffffff81004b95>] dump_trace+0x75/0x300
    kernel: [111747.127290]  [<ffffffff81464663>] dump_stack+0x69/0x6f
    kernel: [111747.127290]  [<ffffffff81464663>] dump_stack+0x69/0x6f
    kernel: [111747.127296]  [<ffffffff81100da1>] bad_page+0xb1/0x120
    kernel: [111747.127296]  [<ffffffff81100da1>] bad_page+0xb1/0x120
    kernel: [111747.127300]  [<ffffffff81101306>] free_pages_prepare+0xe6/0x110
    kernel: [111747.127300]  [<ffffffff81101306>] free_pages_prepare+0xe6/0x110
    kernel: [111747.127304]  [<ffffffff81104d69>] free_hot_cold_page+0x49/0x1f0
    kernel: [111747.127304]  [<ffffffff81104d69>] free_hot_cold_page+0x49/0x1f0
    kernel: [111747.127311]  [<ffffffffa0621224>] wit_free_shinfo_pages+0x34/0x50 [witdriver]
    kernel: [111747.127311]  [<ffffffffa0621224>] wit_free_shinfo_pages+0x34/0x50 [witdriver]
    kernel: [111747.127331]  [<ffffffffa0631306>] my_writev+0x2b6/0x6e0 [witdriver]
    kernel: [111747.127331]  [<ffffffffa0631306>] my_writev+0x2b6/0x6e0 [witdriver]
    kernel: [111747.127341]  [<ffffffff8146f5f2>] system_call_fastpath+0x16/0x1b
    kernel: [111747.127341]  [<ffffffff8146f5f2>] system_call_fastpath+0x16/0x1b
    kernel: [111747.127361]  [<00007f2018ce33c9>] 0x7f2018ce33c8
    kernel: [111747.127361]  [<00007f2018ce33c9>] 0x7f2018ce33c8

    从这个打印,知道了我们当时流程释放页的时候,page的计数为0了,此时的mapping指针还在,说明了不该释放的页给释放了。

    这种计数问题本来很头疼,偶然间看到page的index 为:

    kernel: [111747.127263] page:ffffea007c1c3f10 count:0 mapcount:0 mapping:ffff882b63e766e0 index:0x3ffc------------这个index

    查看了其他类似的打印发现这个index都靠近0x4000,而这个0x4000是我们一种业务文件的固定大小。

    再查找对应的业务日志,发现当时sendfile调用失败,然后使用writev方式发送文件,而writev的时候,又出现了调用方传送的参数不对的情况,导致我们内核模块在释放内存的时候,  

    将发送失败那几页page释放多执行了一次。从而使得page计数被减成0了。

    void put_page(struct page *page)
    {
        if (unlikely(PageCompound(page)))
            put_compound_page(page);
        else if (put_page_testzero(page))//原子计数减,然后判断是否为0,为0则继续释放
            __put_single_page(page);
    }

    这个page因为还在被inode的radix树管理,所以那边读文件找page的时候,发现计数为0,则循环等待,否则加1,这样的话,导致了一个死循环,因为这个计数本来不应该长时间为0,是被

    我们释放计数错了才导致为0的。

    void free_hot_cold_page(struct page *page, int cold)
    {
        struct zone *zone = page_zone(page);
        struct per_cpu_pages *pcp;
        unsigned long flags;
        int migratetype;
        int wasMlocked = __TestClearPageMlocked(page);
    
    #ifdef CONFIG_XEN
        WARN_ON(PageForeign(page) && wasMlocked);
    #endif
        if (!free_pages_prepare(page, 0))
            return;

    最终由于mapping指针不为NULL而在bad_page 的流程中打印警告。

    不过这种page会不会used after free呢?不会的,因为page的计数虽然为0,但是并没有放到管理单元中去,假设以后再也不访问这个page,则只会泄露。

    这种循环,会因为定时器中断而中断,最终系统hung住:

     kernel: [217186.608729]  [<ffffffff8102735a>] arch_trigger_all_cpu_backtrace+0x5a/0xa0
     kernel: [217186.608729]  [<ffffffff8102735a>] arch_trigger_all_cpu_backtrace+0x5a/0xa0---------------等回调,
     kernel: [217186.608734]  [<ffffffff810d0021>] check_cpu_stall+0xe1/0x140
     kernel: [217186.608734]  [<ffffffff810d0021>] check_cpu_stall+0xe1/0x140
     kernel: [217186.608738]  [<ffffffff810d00a9>] __rcu_pending+0x29/0x180
     kernel: [217186.608738]  [<ffffffff810d00a9>] __rcu_pending+0x29/0x180
     kernel: [217186.608742]  [<ffffffff810d025f>] rcu_check_callbacks+0x5f/0x110
     kernel: [217186.608742]  [<ffffffff810d025f>] rcu_check_callbacks+0x5f/0x110
     kernel: [217186.608748]  [<ffffffff81071dff>] update_process_times+0x3f/0x80
     kernel: [217186.608748]  [<ffffffff81071dff>] update_process_times+0x3f/0x80--------------------------定时中断,更新进程时间等
     kernel: [217186.608752]  [<ffffffff8109615b>] tick_sched_timer+0x5b/0xc0
     kernel: [217186.608752]  [<ffffffff8109615b>] tick_sched_timer+0x5b/0xc0
     kernel: [217186.608757]  [<ffffffff81088a4e>] __run_hrtimer+0xbe/0x1a0
     kernel: [217186.608757]  [<ffffffff81088a4e>] __run_hrtimer+0xbe/0x1a0
     kernel: [217186.608762]  [<ffffffff81088d6d>] hrtimer_interrupt+0xed/0x2a0
     kernel: [217186.608762]  [<ffffffff81088d6d>] hrtimer_interrupt+0xed/0x2a0
     kernel: [217186.608767]  [<ffffffff81026da3>] smp_apic_timer_interrupt+0x63/0xa0
     kernel: [217186.608767]  [<ffffffff81026da3>] smp_apic_timer_interrupt+0x63/0xa0
     kernel: [217186.608771]  [<ffffffff8146fff3>] apic_timer_interrupt+0x13/0x20
     kernel: [217186.608771]  [<ffffffff8146fff3>] apic_timer_interrupt+0x13/0x20--------------被中断
     kernel: [217186.608777]  [<ffffffff810fb2c3>] find_get_pages_contig+0x53/0x110
     kernel: [217186.608777]  [<ffffffff810fb2c3>] find_get_pages_contig+0x53/0x110
     kernel: [217186.608782]  [<ffffffff811898f1>] __generic_file_splice_read+0x111/0x4c0
     kernel: [217186.608782]  [<ffffffff811898f1>] __generic_file_splice_read+0x111/0x4c0
     kernel: [217186.608786]  [<ffffffff81189ceb>] generic_file_splice_read+0x4b/0x90
     kernel: [217186.608786]  [<ffffffff81189ceb>] generic_file_splice_read+0x4b/0x90
     kernel: [217186.608805]  [<ffffffffa0be6c5a>] xfs_file_splice_read+0x14a/0x200 [xfs]
     kernel: [217186.608805]  [<ffffffffa0be6c5a>] xfs_file_splice_read+0x14a/0x200 [xfs]
     kernel: [217186.608868]  [<ffffffff81188c6c>] splice_direct_to_actor+0xcc/0x1d0
     kernel: [217186.608868]  [<ffffffff81188c6c>] splice_direct_to_actor+0xcc/0x1d0
     kernel: [217186.608874]  [<ffffffffa0620a8b>] do_sendfile+0x1db/0x270 [witdriver]
     kernel: [217186.608874]  [<ffffffffa0620a8b>] do_sendfile+0x1db/0x270 [witdriver]
     kernel: [217186.608881]  [<ffffffffa0630df0>] my_sendfile+0x2e0/0x540 [witdriver]
     kernel: [217186.608881]  [<ffffffffa0630df0>] my_sendfile+0x2e0/0x540 [witdriver]
     kernel: [217186.608892]  [<ffffffff8146f5f2>] system_call_fastpath+0x16/0x1b
     kernel: [217186.608892]  [<ffffffff8146f5f2>] system_call_fastpath+0x16/0x1b
     kernel: [217186.608898]  [<00007f2018ce33c9>] 0x7f2018ce33c8
     kernel: [217186.608898]  [<00007f2018ce33c9>] 0x7f2018ce33c8
    水平有限,如果有错误,请帮忙提醒我。如果您觉得本文对您有帮助,可以点击下面的 推荐 支持一下我。版权所有,需要转发请带上本文源地址,博客一直在更新,欢迎 关注 。
  • 相关阅读:
    创建索引锁的表
    安装STS报错(三)
    安装STS报错(二)
    安装STS报错(一)
    Linux Top使用说明
    平实给力的写作指导入门手册——leo鉴书57
    Flex中配置FusionCharts
    Java Web项目中缺少Java EE 6 Libraries怎么添加
    Java Web项目结构
    MyEclipse中修改项目运行地址栏中项目名称
  • 原文地址:https://www.cnblogs.com/10087622blog/p/10011384.html
Copyright © 2011-2022 走看看