zoukankan      html  css  js  c++  java
  • 硬件错误导致的crash

    [683650.031028] BUG: unable to handle kernel paging request at 000000000001b790-----------------------------地址错误
    [683650.031060] IP: [<ffffffff94b13520>] native_queued_spin_lock_slowpath+0x110/0x200
    [683650.031089] PGD 8000005f702d4067 PUD 510c27a067 PMD 0
    [683650.031109] Oops: 0002 [#1] SMP
    [683650.031567] CPU: 40 PID: 474165 Comm: dfget Kdump: loaded Tainted: G ------------ T 3.10.0-957.27.2.el7.x86_64 #1
    [683650.031599] Hardware name: Dell Inc. PowerEdge R640/0RJCR7, BIOS 1.6.13 12/17/2018
    [683650.031621] task: ffff8d37d86ac100 ti: ffff8d4adbbdc000 task.ti: ffff8d4adbbdc000
    [683650.031643] RIP: 0010:[<ffffffff94b13520>] [<ffffffff94b13520>] native_queued_spin_lock_slowpath+0x110/0x200
    [683650.031673] RSP: 0018:ffff8d4adbbdfea0 EFLAGS: 00010006
    [683650.031689] RAX: 000000000000175f RBX: ffff8d37d86ac100 RCX: 0000000001410000
    [683650.031709] RDX: 000000000001b790 RSI: 00000000bafb5140 RDI: ffff8d88afab1048----------这个就是sighand_struct.siglock
    [683650.031729] RBP: ffff8d4adbbdfea0 R08: ffff8d58bdf1b780 R09: 0000000000000000
    [683650.031749] R10: 0000000000000008 R11: 0000000000000206 R12: ffff8d4adbbdfef0
    [683650.031769] R13: 0000000000000000 R14: 0000000000000002 R15: 0000000000000000
    [683650.031789] FS: 00007fb5b67fc700(0000) GS:ffff8d58bdf00000(0000) knlGS:0000000000000000
    [683650.031812] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [683650.031828] CR2: 000000000001b790 CR3: 0000005f757a8000 CR4: 00000000007607e0
    [683650.031849] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [683650.031869] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    [683650.031889] PKRU: 55555554
    [683650.031898] Call Trace:
    [683650.031912] [<ffffffff9515e2cb>] queued_spin_lock_slowpath+0xb/0xf
    [683650.031935] [<ffffffff9516c6a8>] _raw_spin_lock_irq+0x28/0x30
    [683650.031955] [<ffffffff94ab0a97>] __set_current_blocked+0x37/0x70
    [683650.031974] [<ffffffff94ab0c67>] sigprocmask+0x77/0xb0
    [683650.032768] [<ffffffff94ab0d32>] SyS_rt_sigprocmask+0x92/0x100
    [683650.033480] [<ffffffff95176ddb>] system_call_fastpath+0x22/0x27
    [683650.034184] Code: 87 47 02 c1 e0 10 45 31 c9 85 c0 74 44 48 89 c2 c1 e8 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 a0 bf 74 95 <4c> 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 41 8b 40
    [683650.035684] RIP [<ffffffff94b13520>] native_queued_spin_lock_slowpath+0x110/0x200
    [683650.036404] RSP <ffff8d4adbbdfea0>
    [683650.037151] CR2: 000000000001b790

    根据内核代码分析,对应的sighand_struct.siglock值不正确,导致异常。

    crash> sighand_struct.siglock 0xffff8d88afab0840 -x
    siglock = {
    {
    rlock = {
    raw_lock = {
    val = {
    counter = 0x1415140
    }
    }
    }
    }
    }

    按照qspinlock的布局,最后一个8位应该是1,然后如果有人在等待锁,pending位应该位1,其次会有等待的cpu的编号记录。但是根据多年排查crash的经验,明显这个值是异常的。

    由于这个是一把sighand公共锁,所以打印其他线程的锁如下:

    crash> task_struct.sighand ffff8d88983da080
    sighand = 0xffff8d88afeb0840-------------------------e就是1110
    crash> task_struct.sighand ffff8d88983d8000
    sighand = 0xffff8d88afeb0840
    crash> task_struct.sighand ffff8d37d86ac100
    sighand = 0xffff8d88afab0840------------------------a就是1010

    仔细一点可以看出,这把锁发生了bit位翻转。这种情况等锁,就像在公交车站等一艘船,可能也就武汉能偶尔等到。

    另外,这个bit翻转在ipmi和mce中都未看到异常。

  • 相关阅读:
    java基础部分的一些有意思的东西。
    antdvue按需加载插件babelpluginimport报错
    阿超的烦恼 javaScript篇
    .NET E F(Entity Framework)框架 DataBase First 和 Code First 简单用法。
    JQuery获得input ID相同但是type不同的方法
    gridview的删除,修改,数据绑定处理
    jgGrid数据格式
    Cannot read configuration file due to insufficient permissions
    Invoke action which type of result is JsonResult on controller from view using Ajax or geJSon
    Entity model数据库连接
  • 原文地址:https://www.cnblogs.com/10087622blog/p/12149572.html
Copyright © 2011-2022 走看看