zoukankan      html  css  js  c++  java
  • 【转】分析问题之:如何使用crash分析vmcore

    如何使用crash分析vmcore - 之基础思路case1

    dmesg查看内核日志

    1.  
      [2493382.671020] systemd-shutdown[1]: Sending SIGKILL to PID 28975 (docker-containe).
    2.  
      [2493382.671078] systemd-shutdown[1]: Sending SIGKILL to PID 29015 (systemd).
    3.  
      [2493420.208723] EXT4-fs (nvme0n1p1): sb orphan head is 140906170
    4.  
      [2493420.209198] sb_info orphan list:
    5.  
      [2493420.209663] inode nvme0n1p1:140906170 at ffff88490edabfb8: mode 100666, nlink 0, next 149423507
    6.  
      [2493420.210129] inode nvme0n1p1:149423507 at ffff8801b99391a8: mode 100666, nlink 0, next 17567381
    7.  
      [2493420.210583] inode nvme0n1p1:17567381 at ffff8806d4a26998: mode 100744, nlink 0, next 17570510
    8.  
      [2493420.211050] inode nvme0n1p1:17570510 at ffff886387f82ef8: mode 100644, nlink 0, next 17570503
    9.  
      [2493420.211508] inode nvme0n1p1:17570503 at ffff886a1f15bfb8: mode 100644, nlink 0, next 241700498
    10.  
      [2493420.211966] inode nvme0n1p1:241700498 at ffff8877481800e8: mode 100644, nlink 0, next 243138756
    11.  
      [2493420.212431] inode nvme0n1p1:243138756 at ffff88761ad10518: mode 100644, nlink 0, next 241565954
    12.  
      [2493420.212900] inode nvme0n1p1:241565954 at ffff8870d64bbfb8: mode 100755, nlink 0, next 241566333
    13.  
      [2493420.213366] inode nvme0n1p1:241566333 at ffff88721ae74c48: mode 100644, nlink 0, next 241050093
    14.  
      [2493420.213833] inode nvme0n1p1:241050093 at ffff887704958948: mode 100755, nlink 0, next 241567324
    15.  
      [2493420.214545] ------------[ cut here ]------------
    16.  
      [2493420.219336] kernel BUG at fs/ext4/super.c:879! <<<======这里指明BUG的代码位置
    17.  
      [2493420.223948] invalid opcode: 0000 [#1] SMP
    18.  
      [2493420.228133] Modules linked in: kpatch_D751550(OE) kpatch_D631237(OE) unix_diag(E) af_packet_diag(E) netlink_diag(E) dccp_diag(E) dccp(E) tcp_diag(E) udp_diag(E) inet_diag(E) [last unloaded: aisqos_hotfixes]
    19.  
      [2493420.246846] CPU: 58 PID: 1 Comm: systemd-shutdow Tainted: G W OE K 4.9.79-009.ali3000.alios7.x86_64 #1
    20.  
      [2493420.257009] Hardware name: Inventec AliServer Thor01-2U /TB800G4-G1 , BIOS A1.20 03/06/2018
    21.  
      [2493420.267339] task: ffff887e45918000 task.stack: ffffc90000014000
    22.  
      [2493420.273425] RIP: 0010:[<ffffffffa031a8df>] [<ffffffffa031a8df>] ext4_put_super+0x36f/0x3c0 [ext4] <<<=======这里指明BUG的代码位置
    23.  
      [2493420.282593] RSP: 0018:ffffc90000017de8 EFLAGS: 00010206
    24.  
      [2493420.288079] RAX: ffff88490edabf50 RBX: ffff887e43299000 RCX: 00000001949b336d
    25.  
      [2493420.295384] RDX: 0000000000000000 RSI: 0000000000000206 RDI: 0000000000000206
    26.  
      [2493420.302682] RBP: ffffc90000017e18 R08: 00000000000081a4 R09: 0000000000000000
    27.  
      [2493420.309988] R10: 0000000000000cb8 R11: 0000000000001e92 R12: ffff887e43299278
    28.  
      [2493420.317293] R13: ffff887e43298800 R14: ffff887e43299278 R15: ffffffffa034ff88
    29.  
      [2493420.324598] FS: 00007f3241ccf840(0000) GS:ffff887e78480000(0000) knlGS:0000000000000000
    30.  
      [2493420.332850] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    31.  
      [2493420.338767] CR2: 00007f5e1372fbd0 CR3: 00000004daa52000 CR4: 00000000007606f0
    32.  
      [2493420.346065] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    33.  
      [2493420.353361] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    34.  
      [2493420.360660] PKRU: 55555554
    35.  
      [2493420.363536] Stack:
    36.  
      [2493420.365721] 9cbae75a00000000 ffff887e43298800 ffffffffa034a5e0 ffff887e3818c7b8
    37.  
      [2493420.373365] 0000000000000000 ffff887e45918bb0 ffffc90000017e38 ffffffff81244aaf
    38.  
      [2493420.380991] 0000000000000083 ffff887e357b8680 ffffc90000017e58 ffffffff81244e37
    39.  
      [2493420.388617] Call Trace:
    40.  
      [2493420.391239] [<ffffffff81244aaf>] generic_shutdown_super+0x6f/0x100
    41.  
      [2493420.397676] [<ffffffff81244e37>] kill_block_super+0x27/0x70
    42.  
      [2493420.403508] [<ffffffff81244f73>] deactivate_locked_super+0x43/0x70
    43.  
      [2493420.409945] [<ffffffff8124547a>] deactivate_super+0x5a/0x60
    44.  
      [2493420.415770] [<ffffffff81264b2f>] cleanup_mnt+0x3f/0x90
    45.  
      [2493420.421169] [<ffffffff81264bc2>] __cleanup_mnt+0x12/0x20
    46.  
      [2493420.426733] [<ffffffff810a7b50>] task_work_run+0x80/0xa0
    47.  
      [2493420.432306] [<ffffffff810032ba>] exit_to_usermode_loop+0xaa/0xb0
    48.  
      [2493420.438572] [<ffffffff81003baa>] syscall_return_slowpath+0xaa/0xb0
    49.  
      [2493420.445011] [<ffffffff8171a783>] entry_SYSCALL_64_fastpath+0xc3/0xc5
    50.  
      [2493420.451623] Code: 60 04 00 00 48 8b 80 e0 00 00 <0f> 0b 49 c7 c7 88 ff 34 a0 49 8b
    51.  
      [2493420.459829] RIP [<ffffffffa031a8df>] ext4_put_super+0x36f/0x3c0 [ext4]
    52.  
      [2493420.466633] RSP <ffffc90000017de8>
    53.  
      crash>

    通过dmesg日志,我们可以通过两个方法判断 bug的代码位置:

    1.  
      1. [2493420.219336] kernel BUG at fs/ext4/super.c:879!
    2.  
       
    3.  
      2. [2493420.273425] RIP: 0010:[<ffffffffa031a8df>] [<ffffffffa031a8df>] ext4_put_super+0x36f/0x3c0 [ext4]
    4.  
      其中(0x36f代表和ext4_put_super函数入口的偏移量,0x3c0是基准地址 )

    从2找到代码crash的具体位置:

    1.  
      (gdb) p 0x36f
    2.  
      $11 = 879

    反汇编函数,找到位置

    crash> dis -l ext4_put_super
     

    在crash中查看代码

    crash本身是可以查看代码的,前提是你需要加载模块, 比如:

    加载模块ext4:

    1.  
      crash> mod -s ext4
    2.  
      crash> mod <<----列出所有的模块

    第879行:

    1.  
      crash> l *ext4_put_super+0x36f
    2.  
      0xffffffffa031a8df is in ext4_put_super (fs/ext4/super.c:879).
    3.  
      874 * isn't empty. The on-disk one can be non-empty if we've
    4.  
      875 * detected an error and taken the fs readonly, but the
    5.  
      876 * in-memory list had better be clean by this point. */
    6.  
      877 if (!list_empty(&sbi->s_orphan))
    7.  
      878 dump_orphan_list(sb, sbi);
    8.  
      879 J_ASSERT(list_empty(&sbi->s_orphan));
    9.  
      880
    10.  
      881 sync_blockdev(sb->s_bdev);
    11.  
      882 invalidate_bdev(sb->s_bdev);
    12.  
      883 if (sbi->journal_bdev && sbi->journal_bdev != sb->s_bdev) {

    只有当我们找到具体的代码,才能进一步分析代码,究竟为什么会crash,比如,这个函数的参数(可能是某个struct)的值到底是什么?

    bt打印栈

    bt栈[exception RIP: ext4_put_super+879] 有可以看到是在 函数ext4_put_super的第879行

    1.  
      crash> bt
    2.  
      PID: 1 TASK: ffff887e45918000 CPU: 58 COMMAND: "systemd-shutdow"
    3.  
      #0 [ffffc90000017a58] machine_kexec at ffffffff810603e8
    4.  
      #1 [ffffc90000017ab8] __crash_kexec at ffffffff811211cd
    5.  
      #2 [ffffc90000017b80] __crash_kexec at ffffffff811212a5
    6.  
      #3 [ffffc90000017b98] crash_kexec at ffffffff811212eb
    7.  
      #4 [ffffc90000017bb8] oops_end at ffffffff81030905
    8.  
      #5 [ffffc90000017be0] die at ffffffff81030ddb
    9.  
      #6 [ffffc90000017c10] do_trap at ffffffff8102df02
    10.  
      #7 [ffffc90000017c60] do_error_trap at ffffffff8102e2d9
    11.  
      #8 [ffffc90000017d20] do_invalid_op at ffffffff8102e830
    12.  
      #9 [ffffc90000017d30] invalid_op at ffffffff8171b63e
    13.  
      [exception RIP: ext4_put_super+879]
    14.  
      RIP: ffffffffa031a8df RSP: ffffc90000017de8 RFLAGS: 00010206
    15.  
      RAX: ffff88490edabf50 RBX: ffff887e43299000 RCX: 00000001949b336d
    16.  
      RDX: 0000000000000000 RSI: 0000000000000206 RDI: 0000000000000206
    17.  
      RBP: ffffc90000017e18 R8: 00000000000081a4 R9: 0000000000000000
    18.  
      R10: 0000000000000cb8 R11: 0000000000001e92 R12: ffff887e43299278
    19.  
      R13: ffff887e43298800 R14: ffff887e43299278 R15: ffffffffa034ff88
    20.  
      ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
    21.  
      #10 [ffffc90000017de0] ext4_put_super at ffffffffa031a91c [ext4]
    22.  
      #11 [ffffc90000017e20] generic_shutdown_super at ffffffff81244aaf
    23.  
      #12 [ffffc90000017e40] kill_block_super at ffffffff81244e37
    24.  
      #13 [ffffc90000017e60] deactivate_locked_super at ffffffff81244f73
    25.  
      #14 [ffffc90000017e80] deactivate_super at ffffffff8124547a
    26.  
      #15 [ffffc90000017e98] cleanup_mnt at ffffffff81264b2f
    27.  
      #16 [ffffc90000017eb0] __cleanup_mnt at ffffffff81264bc2
    28.  
      #17 [ffffc90000017ec0] task_work_run at ffffffff810a7b50
    29.  
      #18 [ffffc90000017f00] exit_to_usermode_loop at ffffffff810032ba
    30.  
      #19 [ffffc90000017f30] syscall_return_slowpath at ffffffff81003baa
    31.  
      #20 [ffffc90000017f50] entry_SYSCALL_64_fastpath at ffffffff8171a783
    32.  
      RIP: 00007f3241195c47 RSP: 00007fffb3db5438 RFLAGS: 00000246
    33.  
      RAX: 0000000000000000 RBX: 0000560b87fbd920 RCX: 00007f3241195c47
    34.  
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000560b87fbdd10
    35.  
      RBP: 0000560b87fbda00 R8: 0000000000000000 R9: 00007f32410e416d
    36.  
      R10: 0000000000000021 R11: 0000000000000246 R12: 0000560b87fbdd10
    37.  
      R13: 00007fffb3db5538 R14: 00007fffb3db5523 R15: 0000000000000000
    38.  
      ORIG_RAX: 00000000000000a6 CS: 0033 SS: 002b
    39.  
      crash>

    反汇编上下函数

    当我们,分析到了出错的具体的代码行,下一步需要分析,传入的参数和struct

    首先,我们需要看下 函数 ext4_put_super的原型,发现是static void ext4_put_super(struct super_block *sb),只有一个参数, 而且是一个结构体struct super_block, 现在我们需要知道 *sb 指针的地址是多少呢? 那这个地址肯定是 上个函数 generic_shutdown_super 传递给它的.

    现在分析的关键是,我们需要知道,当generic_shutdown_superffffffff81244aaf 处,调用到 ext4_put_super的时候,传给 ext4_put_super 的指针地址是多少?

    首先,需要 反汇编 函数generic_shutdown_super 找到地址ffffffff81244aaf

    1.  
      crash> dis -l generic_shutdown_super
    2.  
      /usr/src/debug/kernel-4.9.79-009.ali3000/linux-4.9.79-009.ali3000.alios7.x86_64/fs/super.c: 436
    3.  
      0xffffffff81244aa0 <generic_shutdown_super+96>: mov 0x30(%r12),%rax
    4.  
      0xffffffff81244aa5 <generic_shutdown_super+101>: test %rax,%rax
    5.  
      0xffffffff81244aa8 <generic_shutdown_super+104>: je 0xffffffff81244aaf <generic_shutdown_super+111>
    6.  
      /usr/src/debug/kernel-4.9.79-009.ali3000/linux-4.9.79-009.ali3000.alios7.x86_64/fs/super.c: 437
    7.  
      0xffffffff81244aaa <generic_shutdown_super+106>: mov %rbx,%rdi <===rbx 和 rdi 数据一致
    8.  
      0xffffffff81244aad <generic_shutdown_super+109>: callq *%rax <===在这里调用下个函数
    9.  
      /usr/src/debug/kernel-4.9.79-009.ali3000/linux-4.9.79-009.ali3000.alios7.x86_64/include/linux/compiler.h: 243
    10.  
      0xffffffff81244aaf <generic_shutdown_super+111>: mov 0x608(%rbx),%rax
    11.  
      /usr/src/debug/kernel-4.9.79-009.ali3000/linux-4.9.79-009.ali3000.alios7.x86_64/fs/super.c: 439
    12.  
      0xffffffff81244ab6 <generic_shutdown_super+118>: lea 0x608(%rbx),%rdx
    13.  
      0xffffffff81244abd <generic_shutdown_super+125>: cmp %rax,%rdx
    14.  
      0xffffffff81244ac0 <generic_shutdown_super+128>: jne 0xffffffff81244b1f <generic_shutdown_super+223>

    接着,反汇编ext4_put_super, 你会发现push了很多的寄存器的值到stack

    1.  
      crash> dis -l ext4_put_super
    2.  
      /usr/src/debug/kernel-4.9.79-009.ali3000/linux-4.9.79-009.ali3000.alios7.x86_64/fs/ext4/super.c: 824
    3.  
      0xffffffffa031a570 <ext4_put_super>: nopl 0x0(%rax,%rax,1) [FTRACE NOP]
    4.  
      0xffffffffa031a575 <ext4_put_super+5>: push %rbp
    5.  
      0xffffffffa031a576 <ext4_put_super+6>: mov %rsp,%rbp
    6.  
      0xffffffffa031a579 <ext4_put_super+9>: push %r15 <===第1个寄存器入栈
    7.  
      0xffffffffa031a57b <ext4_put_super+11>: push %r14 <===第2个寄存器入栈
    8.  
      0xffffffffa031a57d <ext4_put_super+13>: push %r13 <===第3个寄存器入栈
    9.  
      0xffffffffa031a57f <ext4_put_super+15>: push %r12 <===第4个寄存器入栈
    10.  
      0xffffffffa031a581 <ext4_put_super+17>: mov %rdi,%r13
    11.  
      0xffffffffa031a584 <ext4_put_super+20>: push %rbx <===第5个寄存器入栈(rbx是在上个函数的时候,就有值的,所以,ext4_put_super函数的第一个参数的指针的地址就是这个寄存器的值)
    12.  
      0xffffffffa031a585 <ext4_put_super+21>: sub $0x8,%rsp
    13.  
      0xffffffffa031a589 <ext4_put_super+25>: mov 0x460(%rdi),%rbx
    14.  
      /usr/src/debug/kernel-4.9.79-009.ali3000/linux-4.9.79-009.ali3000.alios7.x86_64/fs/ext4/super.c: 826
    15.  
      0xffffffffa031a590 <ext4_put_super+32>: mov 0xe0(%rbx),%r14
    16.  
      /usr/src/debug/kernel-4.9.79-009.ali3000/linux-4.9.79-009.ali3000.alios7.x86_64/fs/ext4/super.c: 830
    17.  
      0xffffffffa031a597 <ext4_put_super+39>: callq 0xffffffffa03133f0 <ext4_unregister_li_request>
    1.  
      crash> bt -f
    2.  
      #10 [ffffc90000017de0] ext4_put_super at ffffffffa031a91c [ext4]
    3.  
      ffffc90000017de8: 9cbae75a00000000( ) ffff887e43298800(第5个寄存器的值)
    4.  
      ffffc90000017df8: ffffffffa034a5e0(第4个寄存器的值) ffff887e3818c7b8(第3个寄存器的值)
    5.  
      ffffc90000017e08: 0000000000000000(第2个寄存器的值) ffff887e45918bb0(第1个寄存器的值)
    6.  
      ffffc90000017e18: ffffc90000017e38 ffffffff81244aaf(这两个是不代表寄存器的)
    7.  
      #11 [ffffc90000017e20] generic_shutdown_super at ffffffff81244aaf
    8.  
      ffffc90000017e28: 0000000000000083 ffff887e357b8680
    9.  
      ffffc90000017e38: ffffc90000017e58 ffffffff81244e37
    1.  
      crash> struct super_block ffff887e43298800
    2.  
      struct super_block {
    3.  
      s_list = {
    4.  
      next = 0xffffffff81cb3db0 <super_blocks>, <=======这里也验证了,就是地址ffff887e43298800表示的就是 struct super_block
    5.  
      prev = 0xffff887e43968800
    6.  
      },
    7.  
      s_dev = 271581185,
    8.  
      s_blocksize_bits = 12 'f',
    9.  
      s_blocksize = 4096,
    10.  
      s_maxbytes = 17592186040320,
    11.  
      s_type = 0xffffffffa03589c0 <ext4_fs_type>,
    12.  
      s_op = 0xffffffffa034a5e0 <ext4_sops>,
    13.  
      dq_op = 0xffffffffa034a720 <ext4_quota_operations>,
    14.  
      s_qcop = 0xffffffff81843f60 <dquot_quotactl_sysfile_ops>,
    15.  
      s_export_op = 0xffffffffa034a580 <ext4_export_ops>,
    16.  
      s_flags = 805371904,
    17.  
      s_iflags = 1,
    18.  
      s_magic = 61267,
    19.  
      s_root = 0x0,
    20.  
      s_umount = {
    21.  
      count = {
    22.  
      counter = -4294967295
    23.  
      },
    24.  
      wait_list = {
    25.  
      next = 0xffff887e43298878,
    26.  
      prev = 0xffff887e43298878
    27.  
      },
    28.  
      wait_lock = {
    29.  
      raw_lock = {
    30.  
      val = {
    31.  
      counter = 0
    32.  
      }
    33.  
      }

    Refers

    https://blog.csdn.net/u013982161/article/details/51347944

    转载于:https://www.cnblogs.com/muahao/p/9925629.html

  • 相关阅读:
    ASP.NET Core 中文文档 第四章 MVC(3.2)Razor 语法参考
    ASP.NET Core 中文文档 第四章 MVC(3.1)视图概述
    ASP.NET Core 中文文档 第四章 MVC(2.3)格式化响应数据
    ASP.NET Core 中文文档 第四章 MVC(2.2)模型验证
    ASP.NET Core 中文文档 第四章 MVC(2.1)模型绑定
    ASP.NET Core 中文文档 第四章 MVC(01)ASP.NET Core MVC 概览
    mysql 解除正在死锁的状态
    基于原生JS的jsonp方法的实现
    HTML 如何显示英文单、双引号
    win2008 r2 服务器php+mysql+sqlserver2008运行环境配置(从安装、优化、安全等)
  • 原文地址:https://www.cnblogs.com/coreLeo/p/11759414.html
Copyright © 2011-2022 走看看