zoukankan      html  css  js  c++  java
  • linux 的那些hung 检测机制

    在dmesg中,看到如下信息:

    [424948.577401] ixgbe 0000:86:00.0 eth4: Fake Tx hang detected with timeout of 5 seconds
    [424949.535143] ixgbe 0000:86:00.1 eth5: Fake Tx hang detected with timeout of 5 seconds
    [424955.536045] ixgbe 0000:af:00.0 eth6: Fake Tx hang detected with timeout of 10 seconds
    [424955.567988] ixgbe 0000:af:00.1 eth7: Fake Tx hang detected with timeout of 10 seconds
    [424957.579250] ixgbe 0000:18:00.1 eth1: Fake Tx hang detected with timeout of 10 seconds
    [424957.579285] ixgbe 0000:3b:00.1 eth3: Fake Tx hang detected with timeout of 10 seconds
    [424958.568923] ixgbe 0000:86:00.0 eth4: Fake Tx hang detected with timeout of 10 seconds
    [424959.526676] ixgbe 0000:86:00.1 eth5: Fake Tx hang detected with timeout of 10 seconds
    [424975.489166] ixgbe 0000:af:00.0 eth6: Fake Tx hang detected with timeout of 20 seconds
    [424975.553019] ixgbe 0000:af:00.1 eth7: Fake Tx hang detected with timeout of 20 seconds
    [424977.532376] ixgbe 0000:18:00.1 eth1: Fake Tx hang detected with timeout of 20 seconds
    [424977.532409] ixgbe 0000:3b:00.1 eth3: Fake Tx hang detected with timeout of 20 seconds

    检测超时的函数:

    static void fm10k_tx_timeout(struct net_device *netdev)
    {
        struct fm10k_intfc *interface = netdev_priv(netdev);
        bool real_tx_hang = false;
        int i;
    
    #define TX_TIMEO_LIMIT 16000
        for (i = 0; i < interface->num_tx_queues; i++) {
            struct fm10k_ring *tx_ring = interface->tx_ring[i];
    
            if (check_for_tx_hang(tx_ring) && fm10k_check_tx_hang(tx_ring))
                real_tx_hang = true;
        }
    
        if (real_tx_hang) {
            fm10k_tx_timeout_reset(interface);
        } else {
            netif_info(interface, drv, netdev,
                   "Fake Tx hang detected with timeout of %d seconds
    ",
                   netdev->watchdog_timeo / HZ);
    
            /* fake Tx hang - increase the kernel timeout */
            if (netdev->watchdog_timeo < TX_TIMEO_LIMIT)
                netdev->watchdog_timeo *= 2;-----------按倍数递增,直到大于16s,本文就是5-10-20递增,
        }
    }

    网卡检测是否hung的关键函数是 fm10k_tx_timeout,如果  if (check_for_tx_hang(tx_ring) && fm10k_check_tx_hang(tx_ring)) 条件满足,则会属于real hung,否则是fake hung。

    check_for_tx_hang(tx_ring)肯定都是满足的,一般在probe的时候就会设置,fm10k_check_tx_hang 的代码如下:

    bool fm10k_check_tx_hang(struct fm10k_ring *tx_ring)
    {
        u32 tx_done = fm10k_get_tx_completed(tx_ring);
        u32 tx_done_old = tx_ring->tx_stats.tx_done_old;
        u32 tx_pending = fm10k_get_tx_pending(tx_ring, true);
    
        clear_check_for_tx_hang(tx_ring);
    
        /* Check for a hung queue, but be thorough. This verifies
         * that a transmit has been completed since the previous
         * check AND there is at least one packet pending. By
         * requiring this to fail twice we avoid races with
         * clearing the ARMED bit and conditions where we
         * run the check_tx_hang logic with a transmit completion
         * pending but without time to complete it yet.
         */
        if (!tx_pending || (tx_done_old != tx_done)) {-----------------没有pending的报文,或者pending的值没变过
            /* update completed stats and continue */
            tx_ring->tx_stats.tx_done_old = tx_done;
            /* reset the countdown */
            clear_bit(__FM10K_HANG_CHECK_ARMED, &tx_ring->state);
    
            return false;
        }
    
        /* make sure it is true for two checks in a row */
        return test_and_set_bit(__FM10K_HANG_CHECK_ARMED, &tx_ring->state);----------------两次alarm,则肯定返回true
    }

    伴随网卡hung打印的,一般都有cpu的softlock,如果cpu 是softlock,而且tx做了cpu绑定的话,那么该cpu对应的tx则会没有pending报文,从而触发hung。如果没有做绑定,则这个tx可能被多个cpu来使用,如果再出现hung,则要查看对应的tx的锁,是否被拿了没有释放。

    阶段性总结一下:

    内核中检测hung有不同的对象,不同的级别。

    1.本文说的网卡的hung,针对的是某个设备,级别是网卡的队列,原理是检测是否有pending的tx包超时没有处理。它依赖于网卡设备正常。

    2.还有一种检测某个调度进程的hung的机制,就是hung_task.c文件中的khungtaskd内核线程,该内核线程检测处于uninterrupt状态的进程持续的时间,如果大于一个阈值,则认为该进程hung住了,这个检测的方法是遍历task,然后看task的调度次数是否变化了,这个是单个进程级别。对象是处于uninterrupt状态的进程如果时间长了,则认为hung,它依赖于调度。

    3.一种是检测softlock导致的hung,主要是检测某个cpu级别进程调度是否正常,是watchdog内核线程来做的,因为它是实时进程,如果前后两次它没有获取到调度,则说明调度出了问题,这个前后是指通过hrtimer的硬中断来触发的wakeup来判断。这个对象是某个cpu核(到超线程级别)。它依赖于硬中断,关抢占时间长了没有让出cpu,则会出softlock。

    4.一种是检测hardlock的hung,它依赖于nmi,原理就是利用3里面那个hrtimer,每次3里面的hrtimer来了,则增长 当前cpu的 hrtimer_interrupts ,如果前后两次nmi的回调检测这个计数没有增长,则认为cpu遇到了hardlock,也就是关中断时间长了,则会出hardlock。

    下面详细描述:

    [root@centos7 WakeTest]# ps -ef |grep -i khungtaskd |grep -v grep
    root        93     2  0 9月04 ?       00:00:00 [khungtaskd]----------------------检测处于D状态的进程是否长时间未被调度

    名称是khungtaskd,和watchdog注意区分:

    static int __init hung_task_init(void)
    {
        atomic_notifier_chain_register(&panic_notifier_list, &panic_block);
        watchdog_task = kthread_run(watchdog, NULL, "khungtaskd");--------虽然内核线程的函数是watchdog,但是线程名字却是khungtaskd
    
        return 0;
    }

    另外一个名称为watchdog内核线程:

    ps |grep -i watchdog
          6      2   0  ffff880c11980080  IN   0.0       0      0  [watchdog/0]
         10      2   1  ffff880c11a2b580  IN   0.0       0      0  [watchdog/1]
         14      2   2  ffff880c11a56a80  IN   0.0       0      0  [watchdog/2]
         18      2   3  ffff880c11a62080  IN   0.0       0      0  [watchdog/3]
         22      2   4  ffff880c11a9f580  IN   0.0       0      0  [watchdog/4]
         26      2   5  ffff880c11aa8a80  IN   0.0       0      0  [watchdog/5]
         30      2   6  ffff880c11ab4080  IN   0.0       0      0  [watchdog/6]
         34      2   7  ffff880c11acd580  IN   0.0       0      0  [watchdog/7]
         38      2   8  ffff880c11ad6a80  IN   0.0       0      0  [watchdog/8]
         42      2   9  ffff880c11b04080  IN   0.0       0      0  [watchdog/9]
         46      2  10  ffff880c11b45580  IN   0.0       0      0  [watchdog/10]
         50      2  11  ffff880c11b4ea80  IN   0.0       0      0  [watchdog/11]
         54      2  12  ffff880c11b5e080  IN   0.0       0      0  [watchdog/12]
         58      2  13  ffff880c11b77580  IN   0.0       0      0  [watchdog/13]
         62      2  14  ffff880c11b80a80  IN   0.0       0      0  [watchdog/14]
         66      2  15  ffff880c11baa080  IN   0.0       0      0  [watchdog/15]

    这个是由watchdog.c中,每个cpu一个:

    static struct smp_hotplug_thread watchdog_threads = {
        .store            = &softlockup_watchdog,
        .thread_should_run    = watchdog_should_run,
        .thread_fn        = watchdog,
        .thread_comm        = "watchdog/%u",
        .setup            = watchdog_enable,
        .cleanup        = watchdog_cleanup,
        .park            = watchdog_disable,
        .unpark            = watchdog_enable,
    };

    使能的一些函数以及回调:

    /*
     * common function for watchdog, nmi_watchdog and soft_watchdog parameter
     *
     * caller             | table->data points to | 'which' contains the flag(s)
     * -------------------|-----------------------|-----------------------------
     * proc_watchdog      | watchdog_user_enabled | NMI_WATCHDOG_ENABLED or'ed
     *                    |                       | with SOFT_WATCHDOG_ENABLED
     * -------------------|-----------------------|-----------------------------
     * proc_nmi_watchdog  | nmi_watchdog_enabled  | NMI_WATCHDOG_ENABLED
     * -------------------|-----------------------|-----------------------------
     * proc_soft_watchdog | soft_watchdog_enabled | SOFT_WATCHDOG_ENABLED
     */

     要关闭这些内核线程,使用:

    [root@centos7 WakeTest]# echo 0 > /proc/sys/kernel/watchdog
    [root@centos7 WakeTest]# ps -ef |grep -w watchdog |grep -v grep
    [root@centos7 WakeTest]#
    [root@centos7 WakeTest]#
    [root@centos7 WakeTest]# echo 1 > /proc/sys/kernel/watchdog
    [root@centos7 WakeTest]# ps -ef |grep -w watchdog |grep -v grep
    root     13496     2  0 11:34 ?        00:00:00 [watchdog/0]
    root     13497     2  0 11:34 ?        00:00:00 [watchdog/1]
    root     13498     2  0 11:34 ?        00:00:00 [watchdog/2]
    root     13499     2  0 11:34 ?        00:00:00 [watchdog/3]
    root     13500     2  0 11:34 ?        00:00:00 [watchdog/4]
    root     13501     2  0 11:34 ?        00:00:00 [watchdog/5]
    root     13502     2  0 11:34 ?        00:00:00 [watchdog/6]
    root     13503     2  0 11:34 ?        00:00:00 [watchdog/7]
    root     13504     2  0 11:34 ?        00:00:00 [watchdog/8]
    root     13505     2  0 11:34 ?        00:00:00 [watchdog/9]
    root     13506     2  0 11:34 ?        00:00:00 [watchdog/10]
    root     13507     2  0 11:34 ?        00:00:00 [watchdog/11]
    root     13508     2  0 11:34 ?        00:00:00 [watchdog/12]
    root     13509     2  0 11:34 ?        00:00:00 [watchdog/13]
    root     13510     2  0 11:34 ?        00:00:00 [watchdog/14]
    root     13511     2  0 11:34 ?        00:00:00 [watchdog/15]

     他们都是实时进程:

    top - 11:07:00 up 20:49, 10 users,  load average: 41.97, 45.49, 48.37
    Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
    %Cpu(s):  7.1 us, 14.7 sy,  0.0 ni, 54.7 id,  4.2 wa,  2.5 hi, 16.8 si,  0.0 st, 57.3 id_exact,  2.9 hi_exact, 20.0 irq_exact
    KiB Mem : 36231846+total, 50661748 free, 11323638+used, 19842035+buff/cache
    KiB Swap:        0 total,        0 free,        0 used. 16986171+avail Mem
    
       PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
       403 root      rt   0       0      0      0 S   0.0  0.0   0:00.10 watchdog/3

    watchdog检测的原理是:

    watchdog函数负责根据当前时间戳来更新一个自己保存的时间戳percpu变量watchdog_touch_ts (取到s级别)

    ,然后另外的一个hrtimer负责比较当前时间与watchdog_touch_ts 这个变量的差值,如果这个差值大于某个阈值watchdog,则认为异常。 hrtimer同时负责wakeup watchdog线程,

    hrtimer 中用 is_softlockup 用来确定是否已经软锁,按道理唤醒watchdog之后,watchdog应该要调度,同时更新时间戳,如果没有更新,说明没有获得调度,由于watchdog内核线程是
    绑定cpu核的实时线程,实时线程未能调度,则代表这个cpu出现了软锁。
    static int is_softlockup(unsigned long touch_ts)-----------------------touch_ts就是watchdog线程write的时间
    {
        unsigned long now = get_timestamp();
    
        if ((watchdog_enabled & SOFT_WATCHDOG_ENABLED) && watchdog_thresh){
            /* Warn about unreasonable delays. */
            if (time_after(now, touch_ts + get_softlockup_thresh()))
                return now - touch_ts;
        }
        return 0;
    }

    这个检测机制,大家可以看到,明显依赖于硬中断的到来,假设某个cpu关闭硬中断很长的时间,那显然就没办法保证watchdog的运行了,所以又必要检测一下,这个hardlock登上舞台。

    static bool is_hardlockup(void)
    {
        unsigned long hrint = __this_cpu_read(hrtimer_interrupts);
    
        if (__this_cpu_read(hrtimer_interrupts_saved) == hrint)
            return true;
    
        __this_cpu_write(hrtimer_interrupts_saved, hrint);
        return false;
    }
    水平有限,如果有错误,请帮忙提醒我。如果您觉得本文对您有帮助,可以点击下面的 推荐 支持一下我。版权所有,需要转发请带上本文源地址,博客一直在更新,欢迎 关注 。
  • 相关阅读:
    LG P4449 & JZOJ 于神之怒
    [国家集训队]Crash的数字表格
    LG P3768 简单的数学题
    NOI2018 屠龙勇士
    为什么从后台获取的id到前端后却变了?Long类型转json时前端js丢失精度解决方案-----@JsonSerialize和@JsonDeserialize
    vue的filters过滤器优化
    根据key查询redis中是否存在key对应的value,根据key获取值
    PowerDesigner逆向工程将MYSQL数据库转成pdm
    解决图片验证码不显示的问题
    报错:Unknown column 'province' in 'field list'
  • 原文地址:https://www.cnblogs.com/10087622blog/p/9558024.html
Copyright © 2011-2022 走看看