zoukankan      html  css  js  c++  java
  • linux 的那些hung 检测机制

    在dmesg中,看到如下信息:

    [424948.577401] ixgbe 0000:86:00.0 eth4: Fake Tx hang detected with timeout of 5 seconds
    [424949.535143] ixgbe 0000:86:00.1 eth5: Fake Tx hang detected with timeout of 5 seconds
    [424955.536045] ixgbe 0000:af:00.0 eth6: Fake Tx hang detected with timeout of 10 seconds
    [424955.567988] ixgbe 0000:af:00.1 eth7: Fake Tx hang detected with timeout of 10 seconds
    [424957.579250] ixgbe 0000:18:00.1 eth1: Fake Tx hang detected with timeout of 10 seconds
    [424957.579285] ixgbe 0000:3b:00.1 eth3: Fake Tx hang detected with timeout of 10 seconds
    [424958.568923] ixgbe 0000:86:00.0 eth4: Fake Tx hang detected with timeout of 10 seconds
    [424959.526676] ixgbe 0000:86:00.1 eth5: Fake Tx hang detected with timeout of 10 seconds
    [424975.489166] ixgbe 0000:af:00.0 eth6: Fake Tx hang detected with timeout of 20 seconds
    [424975.553019] ixgbe 0000:af:00.1 eth7: Fake Tx hang detected with timeout of 20 seconds
    [424977.532376] ixgbe 0000:18:00.1 eth1: Fake Tx hang detected with timeout of 20 seconds
    [424977.532409] ixgbe 0000:3b:00.1 eth3: Fake Tx hang detected with timeout of 20 seconds

    检测超时的函数:

    static void fm10k_tx_timeout(struct net_device *netdev)
    {
        struct fm10k_intfc *interface = netdev_priv(netdev);
        bool real_tx_hang = false;
        int i;
    
    #define TX_TIMEO_LIMIT 16000
        for (i = 0; i < interface->num_tx_queues; i++) {
            struct fm10k_ring *tx_ring = interface->tx_ring[i];
    
            if (check_for_tx_hang(tx_ring) && fm10k_check_tx_hang(tx_ring))
                real_tx_hang = true;
        }
    
        if (real_tx_hang) {
            fm10k_tx_timeout_reset(interface);
        } else {
            netif_info(interface, drv, netdev,
                   "Fake Tx hang detected with timeout of %d seconds
    ",
                   netdev->watchdog_timeo / HZ);
    
            /* fake Tx hang - increase the kernel timeout */
            if (netdev->watchdog_timeo < TX_TIMEO_LIMIT)
                netdev->watchdog_timeo *= 2;-----------按倍数递增,直到大于16s,本文就是5-10-20递增,
        }
    }

    网卡检测是否hung的关键函数是 fm10k_tx_timeout,如果  if (check_for_tx_hang(tx_ring) && fm10k_check_tx_hang(tx_ring)) 条件满足,则会属于real hung,否则是fake hung。

    check_for_tx_hang(tx_ring)肯定都是满足的,一般在probe的时候就会设置,fm10k_check_tx_hang 的代码如下:

    bool fm10k_check_tx_hang(struct fm10k_ring *tx_ring)
    {
        u32 tx_done = fm10k_get_tx_completed(tx_ring);
        u32 tx_done_old = tx_ring->tx_stats.tx_done_old;
        u32 tx_pending = fm10k_get_tx_pending(tx_ring, true);
    
        clear_check_for_tx_hang(tx_ring);
    
        /* Check for a hung queue, but be thorough. This verifies
         * that a transmit has been completed since the previous
         * check AND there is at least one packet pending. By
         * requiring this to fail twice we avoid races with
         * clearing the ARMED bit and conditions where we
         * run the check_tx_hang logic with a transmit completion
         * pending but without time to complete it yet.
         */
        if (!tx_pending || (tx_done_old != tx_done)) {-----------------没有pending的报文,或者pending的值没变过
            /* update completed stats and continue */
            tx_ring->tx_stats.tx_done_old = tx_done;
            /* reset the countdown */
            clear_bit(__FM10K_HANG_CHECK_ARMED, &tx_ring->state);
    
            return false;
        }
    
        /* make sure it is true for two checks in a row */
        return test_and_set_bit(__FM10K_HANG_CHECK_ARMED, &tx_ring->state);----------------两次alarm,则肯定返回true
    }

    伴随网卡hung打印的,一般都有cpu的softlock,如果cpu 是softlock,而且tx做了cpu绑定的话,那么该cpu对应的tx则会没有pending报文,从而触发hung。如果没有做绑定,则这个tx可能被多个cpu来使用,如果再出现hung,则要查看对应的tx的锁,是否被拿了没有释放。

    阶段性总结一下:

    内核中检测hung有不同的对象,不同的级别。

    1.本文说的网卡的hung,针对的是某个设备,级别是网卡的队列,原理是检测是否有pending的tx包超时没有处理。它依赖于网卡设备正常。

    2.还有一种检测某个调度进程的hung的机制,就是hung_task.c文件中的khungtaskd内核线程,该内核线程检测处于uninterrupt状态的进程持续的时间,如果大于一个阈值,则认为该进程hung住了,这个检测的方法是遍历task,然后看task的调度次数是否变化了,这个是单个进程级别。对象是处于uninterrupt状态的进程如果时间长了,则认为hung,它依赖于调度。

    3.一种是检测softlock导致的hung,主要是检测某个cpu级别进程调度是否正常,是watchdog内核线程来做的,因为它是实时进程,如果前后两次它没有获取到调度,则说明调度出了问题,这个前后是指通过hrtimer的硬中断来触发的wakeup来判断。这个对象是某个cpu核(到超线程级别)。它依赖于硬中断,关抢占时间长了没有让出cpu,则会出softlock。

    4.一种是检测hardlock的hung,它依赖于nmi,原理就是利用3里面那个hrtimer,每次3里面的hrtimer来了,则增长 当前cpu的 hrtimer_interrupts ,如果前后两次nmi的回调检测这个计数没有增长,则认为cpu遇到了hardlock,也就是关中断时间长了,则会出hardlock。

    下面详细描述:

    [root@centos7 WakeTest]# ps -ef |grep -i khungtaskd |grep -v grep
    root        93     2  0 9月04 ?       00:00:00 [khungtaskd]----------------------检测处于D状态的进程是否长时间未被调度

    名称是khungtaskd,和watchdog注意区分:

    static int __init hung_task_init(void)
    {
        atomic_notifier_chain_register(&panic_notifier_list, &panic_block);
        watchdog_task = kthread_run(watchdog, NULL, "khungtaskd");--------虽然内核线程的函数是watchdog,但是线程名字却是khungtaskd
    
        return 0;
    }

    另外一个名称为watchdog内核线程:

    ps |grep -i watchdog
          6      2   0  ffff880c11980080  IN   0.0       0      0  [watchdog/0]
         10      2   1  ffff880c11a2b580  IN   0.0       0      0  [watchdog/1]
         14      2   2  ffff880c11a56a80  IN   0.0       0      0  [watchdog/2]
         18      2   3  ffff880c11a62080  IN   0.0       0      0  [watchdog/3]
         22      2   4  ffff880c11a9f580  IN   0.0       0      0  [watchdog/4]
         26      2   5  ffff880c11aa8a80  IN   0.0       0      0  [watchdog/5]
         30      2   6  ffff880c11ab4080  IN   0.0       0      0  [watchdog/6]
         34      2   7  ffff880c11acd580  IN   0.0       0      0  [watchdog/7]
         38      2   8  ffff880c11ad6a80  IN   0.0       0      0  [watchdog/8]
         42      2   9  ffff880c11b04080  IN   0.0       0      0  [watchdog/9]
         46      2  10  ffff880c11b45580  IN   0.0       0      0  [watchdog/10]
         50      2  11  ffff880c11b4ea80  IN   0.0       0      0  [watchdog/11]
         54      2  12  ffff880c11b5e080  IN   0.0       0      0  [watchdog/12]
         58      2  13  ffff880c11b77580  IN   0.0       0      0  [watchdog/13]
         62      2  14  ffff880c11b80a80  IN   0.0       0      0  [watchdog/14]
         66      2  15  ffff880c11baa080  IN   0.0       0      0  [watchdog/15]

    这个是由watchdog.c中,每个cpu一个:

    static struct smp_hotplug_thread watchdog_threads = {
        .store            = &softlockup_watchdog,
        .thread_should_run    = watchdog_should_run,
        .thread_fn        = watchdog,
        .thread_comm        = "watchdog/%u",
        .setup            = watchdog_enable,
        .cleanup        = watchdog_cleanup,
        .park            = watchdog_disable,
        .unpark            = watchdog_enable,
    };

    使能的一些函数以及回调:

    /*
     * common function for watchdog, nmi_watchdog and soft_watchdog parameter
     *
     * caller             | table->data points to | 'which' contains the flag(s)
     * -------------------|-----------------------|-----------------------------
     * proc_watchdog      | watchdog_user_enabled | NMI_WATCHDOG_ENABLED or'ed
     *                    |                       | with SOFT_WATCHDOG_ENABLED
     * -------------------|-----------------------|-----------------------------
     * proc_nmi_watchdog  | nmi_watchdog_enabled  | NMI_WATCHDOG_ENABLED
     * -------------------|-----------------------|-----------------------------
     * proc_soft_watchdog | soft_watchdog_enabled | SOFT_WATCHDOG_ENABLED
     */

     要关闭这些内核线程,使用:

    [root@centos7 WakeTest]# echo 0 > /proc/sys/kernel/watchdog
    [root@centos7 WakeTest]# ps -ef |grep -w watchdog |grep -v grep
    [root@centos7 WakeTest]#
    [root@centos7 WakeTest]#
    [root@centos7 WakeTest]# echo 1 > /proc/sys/kernel/watchdog
    [root@centos7 WakeTest]# ps -ef |grep -w watchdog |grep -v grep
    root     13496     2  0 11:34 ?        00:00:00 [watchdog/0]
    root     13497     2  0 11:34 ?        00:00:00 [watchdog/1]
    root     13498     2  0 11:34 ?        00:00:00 [watchdog/2]
    root     13499     2  0 11:34 ?        00:00:00 [watchdog/3]
    root     13500     2  0 11:34 ?        00:00:00 [watchdog/4]
    root     13501     2  0 11:34 ?        00:00:00 [watchdog/5]
    root     13502     2  0 11:34 ?        00:00:00 [watchdog/6]
    root     13503     2  0 11:34 ?        00:00:00 [watchdog/7]
    root     13504     2  0 11:34 ?        00:00:00 [watchdog/8]
    root     13505     2  0 11:34 ?        00:00:00 [watchdog/9]
    root     13506     2  0 11:34 ?        00:00:00 [watchdog/10]
    root     13507     2  0 11:34 ?        00:00:00 [watchdog/11]
    root     13508     2  0 11:34 ?        00:00:00 [watchdog/12]
    root     13509     2  0 11:34 ?        00:00:00 [watchdog/13]
    root     13510     2  0 11:34 ?        00:00:00 [watchdog/14]
    root     13511     2  0 11:34 ?        00:00:00 [watchdog/15]

     他们都是实时进程:

    top - 11:07:00 up 20:49, 10 users,  load average: 41.97, 45.49, 48.37
    Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
    %Cpu(s):  7.1 us, 14.7 sy,  0.0 ni, 54.7 id,  4.2 wa,  2.5 hi, 16.8 si,  0.0 st, 57.3 id_exact,  2.9 hi_exact, 20.0 irq_exact
    KiB Mem : 36231846+total, 50661748 free, 11323638+used, 19842035+buff/cache
    KiB Swap:        0 total,        0 free,        0 used. 16986171+avail Mem
    
       PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
       403 root      rt   0       0      0      0 S   0.0  0.0   0:00.10 watchdog/3

    watchdog检测的原理是:

    watchdog函数负责根据当前时间戳来更新一个自己保存的时间戳percpu变量watchdog_touch_ts (取到s级别)

    ,然后另外的一个hrtimer负责比较当前时间与watchdog_touch_ts 这个变量的差值,如果这个差值大于某个阈值watchdog,则认为异常。 hrtimer同时负责wakeup watchdog线程,

    hrtimer 中用 is_softlockup 用来确定是否已经软锁,按道理唤醒watchdog之后,watchdog应该要调度,同时更新时间戳,如果没有更新,说明没有获得调度,由于watchdog内核线程是
    绑定cpu核的实时线程,实时线程未能调度,则代表这个cpu出现了软锁。
    static int is_softlockup(unsigned long touch_ts)-----------------------touch_ts就是watchdog线程write的时间
    {
        unsigned long now = get_timestamp();
    
        if ((watchdog_enabled & SOFT_WATCHDOG_ENABLED) && watchdog_thresh){
            /* Warn about unreasonable delays. */
            if (time_after(now, touch_ts + get_softlockup_thresh()))
                return now - touch_ts;
        }
        return 0;
    }

    这个检测机制,大家可以看到,明显依赖于硬中断的到来,假设某个cpu关闭硬中断很长的时间,那显然就没办法保证watchdog的运行了,所以又必要检测一下,这个hardlock登上舞台。

    static bool is_hardlockup(void)
    {
        unsigned long hrint = __this_cpu_read(hrtimer_interrupts);
    
        if (__this_cpu_read(hrtimer_interrupts_saved) == hrint)
            return true;
    
        __this_cpu_write(hrtimer_interrupts_saved, hrint);
        return false;
    }
    水平有限,如果有错误,请帮忙提醒我。如果您觉得本文对您有帮助,可以点击下面的 推荐 支持一下我。版权所有,需要转发请带上本文源地址,博客一直在更新,欢迎 关注 。
  • 相关阅读:
    array and ram
    char as int
    pointer of 2d array and address
    Install SAP HANA EXPRESS on Google Cloud Platform
    Ubuntu remount hard drive
    Compile OpenSSL with Visual Studio 2019
    Install Jupyter notebook and tensorflow on Ubuntu 18.04
    Build OpenCV text(OCR) module on windows with Visual Studio 2019
    Reinstall VirtualBox 6.0 on Ubuntu 18.04
    Pitfall in std::vector<cv::Mat>
  • 原文地址:https://www.cnblogs.com/10087622blog/p/9558024.html
Copyright © 2011-2022 走看看