zoukankan      html  css  js  c++  java
  • linux 再多的running也挡不住锁

    再续《linux 3.10 一次softlock排查》,看运行态进程数量之多:

    crash> mach
              MACHINE TYPE: x86_64
               MEMORY SIZE: 90 GB
                      CPUS: 24
                HYPERVISOR: KVM
           PROCESSOR SPEED: 2394 Mhz
                        HZ: 1000
                 PAGE SIZE: 4096
       KERNEL VIRTUAL BASE: ffff880000000000
       KERNEL VMALLOC BASE: ffffc90000000000
       KERNEL VMEMMAP BASE: ffffea0000000000
          KERNEL START MAP: ffffffff80000000
       KERNEL MODULES BASE: ffffffffc0000000
         KERNEL STACK SIZE: 16384
            IRQ STACK SIZE: 16384
                IRQ STACKS:
                     CPU 0: ffff880b52a00000
                     CPU 1: ffff880b52a40000
                     CPU 2: ffff880b52a80000
                     CPU 3: ffff880b52ac0000
                     CPU 4: ffff880b52b00000
                     CPU 5: ffff880b52b40000
                     CPU 6: ffff880b52b80000
                     CPU 7: ffff880b52bc0000
                     CPU 8: ffff880b52c00000
                     CPU 9: ffff880b52c40000
                    CPU 10: ffff880b52c80000
                    CPU 11: ffff880b52cc0000
                    CPU 12: ffff881692200000
                    CPU 13: ffff881692240000
                    CPU 14: ffff881692280000
                    CPU 15: ffff8816922c0000
                    CPU 16: ffff881692300000
                    CPU 17: ffff881692340000
                    CPU 18: ffff881692380000
                    CPU 19: ffff8816923c0000
                    CPU 20: ffff881692400000
                    CPU 21: ffff881692440000
                    CPU 22: ffff881692480000
                    CPU 23: ffff8816924c0000
    DOUBLEFAULT STACK SIZE: 4096

    但是,我们来看,有多少个running进程呢?

    ps |grep -i RU
    >     0      0   0  ffffffff81a09480  RU   0.0       0      0  [swapper/0]
          0      0   1  ffff88016e798fd0  RU   0.0       0      0  [swapper/1]
          0      0   2  ffff88016e799fa0  RU   0.0       0      0  [swapper/2]
          0      0   3  ffff88016e79af70  RU   0.0       0      0  [swapper/3]
          0      0   4  ffff88016e79bf40  RU   0.0       0      0  [swapper/4]
          0      0   5  ffff88016e79cf10  RU   0.0       0      0  [swapper/5]
          0      0   6  ffff88016e79dee0  RU   0.0       0      0  [swapper/6]
          0      0   7  ffff88016e79eeb0  RU   0.0       0      0  [swapper/7]
    >     0      0   8  ffff88016e008000  RU   0.0       0      0  [swapper/8]
          0      0   9  ffff88016e008fd0  RU   0.0       0      0  [swapper/9]
          0      0  10  ffff88016e009fa0  RU   0.0       0      0  [swapper/10]
    >     0      0  11  ffff88016e00af70  RU   0.0       0      0  [swapper/11]
    >     0      0  12  ffff880beeaceeb0  RU   0.0       0      0  [swapper/12]
          0      0  13  ffff880beeacdee0  RU   0.0       0      0  [swapper/13]
          0      0  14  ffff880beeaccf10  RU   0.0       0      0  [swapper/14]
    >     0      0  15  ffff880beeacbf40  RU   0.0       0      0  [swapper/15]
          0      0  16  ffff880beeacaf70  RU   0.0       0      0  [swapper/16]
          0      0  17  ffff880beeac9fa0  RU   0.0       0      0  [swapper/17]
          0      0  18  ffff880beeb10000  RU   0.0       0      0  [swapper/18]
          0      0  19  ffff880beeb10fd0  RU   0.0       0      0  [swapper/19]
          0      0  20  ffff880beeb11fa0  RU   0.0       0      0  [swapper/20]
          0      0  21  ffff880beeb12f70  RU   0.0       0      0  [swapper/21]
          0      0  22  ffff880beeb13f40  RU   0.0       0      0  [swapper/22]
          0      0  23  ffff880beeb14f10  RU   0.0       0      0  [swapper/23]
          1      0  21  ffff880beeac8000  RU   0.0  201332  14472  systemd
         10      2   4  ffff88016e726eb0  RU   0.0       0      0  [rcu_sched]
         12      2   1  ffff88016e00eeb0  RU   0.0       0      0  [watchdog/1]
         14      2   1  ffff88016e00cf10  RU   0.0       0      0  [ksoftirqd/1]
         17      2   2  ffff88016e040fd0  RU   0.0       0      0  [watchdog/2]
         22      2   3  ffff88016e045ee0  RU   0.0       0      0  [watchdog/3]
         24      2   3  ffff88016e098000  RU   0.0       0      0  [ksoftirqd/3]
         27      2   4  ffff88016e09af70  RU   0.0       0      0  [watchdog/4]
         42      2   7  ffff88016e111fa0  RU   0.0       0      0  [watchdog/7]
         52      2   9  ffff88016e163f40  RU   0.0       0      0  [watchdog/9]
         57      2  10  ffff88016e198fd0  RU   0.0       0      0  [watchdog/10]
         73      2  13  ffff880beeb3bf40  RU   0.0       0      0  [watchdog/13]
         83      2  15  ffff880beeb7dee0  RU   0.0       0      0  [watchdog/15]
         88      2  16  ffff880beebaaf70  RU   0.0       0      0  [watchdog/16]
         93      2  17  ffff880beebf0000  RU   0.0       0      0  [watchdog/17]
         98      2  18  ffff880beebf4f10  RU   0.0       0      0  [watchdog/18]
        103      2  19  ffff880bee021fa0  RU   0.0       0      0  [watchdog/19]
        108      2  20  ffff880bee026eb0  RU   0.0       0      0  [watchdog/20]
        110      2  20  ffff880bee050fd0  RU   0.0       0      0  [ksoftirqd/20]
        113      2  21  ffff880bee053f40  RU   0.0       0      0  [watchdog/21]
        118      2  22  ffff880bee078fd0  RU   0.0       0      0  [watchdog/22]
        409  22481  10  ffff880b06e15ee0  RU   0.5 13597708 511816  zte_uep1
        410  22481  21  ffff88077fe08000  RU   0.5 13597708 511816  zte_uep1
        413  22481   1  ffff8814af3fbf40  RU   0.5 13597708 511816  zte_uep1
        425  17770  10  ffff8807967d6eb0  RU   0.5 2267176 439500  java
        636      1  15  ffff880b4f5b5ee0  RU   0.0   45452   1740  systemd-journal
        865  22481  23  ffff8809a4aacf10  RU   0.5 13597708 511816  zte_uep1
        924  22481  17  ffff880036a6cf10  RU   0.5 13597708 511816  zte_uep1
        932      2  13  ffff880036022f70  IN   0.0       0      0  [xfs_mru_cache]
    。。。。。。。。。。。。。。。。。。。。列不完

    看看个数:

    crash> ps |grep -w RU |wc -l
    1654

    经常敲top的人肯定知道,running的进程数,一般是小于等于cpu的核数的,那为啥这个机器上这么多的running呢?为了排除是crash工具的问题,在从runqueue的角度来看下:

    crash> p runqueues
    PER-CPU DATA TYPE:
      struct rq runqueues;
    PER-CPU ADDRESSES:
      [0]: ffff880b52a17c00
      [1]: ffff880b52a57c00
      [2]: ffff880b52a97c00
      [3]: ffff880b52ad7c00
      [4]: ffff880b52b17c00
      [5]: ffff880b52b57c00
      [6]: ffff880b52b97c00
      [7]: ffff880b52bd7c00
      [8]: ffff880b52c17c00
      [9]: ffff880b52c57c00
      [10]: ffff880b52c97c00
      [11]: ffff880b52cd7c00
      [12]: ffff881692217c00
      [13]: ffff881692257c00
      [14]: ffff881692297c00
      [15]: ffff8816922d7c00
      [16]: ffff881692317c00
      [17]: ffff881692357c00
      [18]: ffff881692397c00
      [19]: ffff8816923d7c00
      [20]: ffff881692417c00
      [21]: ffff881692457c00
      [22]: ffff881692497c00
      [23]: ffff8816924d7c00
    crash> struct rq.nr_running ffff880b52a17c00
      nr_running = 7
    crash> struct rq.nr_running ffff880b52a57c00
      nr_running = 152

    挨个查看一下nr_running个数,然后脚本统计如下:

    grep 'nr_running =' 1.txt |awk '{print $3}BEGIN{sum=0;num=0}{sum+=$3;num+=1}END{printf "%d
    ",sum}'
    7
    152
    113
    138
    65
    4
    4
    38
    4
    148
    43
    1
    1
    141
    4
    43
    107
    102
    91
    110
    108
    105
    98
    3
    1630-------------总数

    为什么这个总数和我们ps看到的RU状态的总数不一致呢?一开始还以为是percpu统计的瞬时值问题,但是这两个总数的相差:1654-1630=24,刚好和cpu的核数一致。

    ps |grep -w 'RU'|grep 'swapper'
    >     0      0   0  ffffffff81a09480  RU   0.0       0      0  [swapper/0]
          0      0   1  ffff88016e798fd0  RU   0.0       0      0  [swapper/1]
          0      0   2  ffff88016e799fa0  RU   0.0       0      0  [swapper/2]
          0      0   3  ffff88016e79af70  RU   0.0       0      0  [swapper/3]
          0      0   4  ffff88016e79bf40  RU   0.0       0      0  [swapper/4]
          0      0   5  ffff88016e79cf10  RU   0.0       0      0  [swapper/5]
          0      0   6  ffff88016e79dee0  RU   0.0       0      0  [swapper/6]
          0      0   7  ffff88016e79eeb0  RU   0.0       0      0  [swapper/7]
    >     0      0   8  ffff88016e008000  RU   0.0       0      0  [swapper/8]
          0      0   9  ffff88016e008fd0  RU   0.0       0      0  [swapper/9]
          0      0  10  ffff88016e009fa0  RU   0.0       0      0  [swapper/10]
    >     0      0  11  ffff88016e00af70  RU   0.0       0      0  [swapper/11]
    >     0      0  12  ffff880beeaceeb0  RU   0.0       0      0  [swapper/12]
          0      0  13  ffff880beeacdee0  RU   0.0       0      0  [swapper/13]
          0      0  14  ffff880beeaccf10  RU   0.0       0      0  [swapper/14]
    >     0      0  15  ffff880beeacbf40  RU   0.0       0      0  [swapper/15]
          0      0  16  ffff880beeacaf70  RU   0.0       0      0  [swapper/16]
          0      0  17  ffff880beeac9fa0  RU   0.0       0      0  [swapper/17]
          0      0  18  ffff880beeb10000  RU   0.0       0      0  [swapper/18]
          0      0  19  ffff880beeb10fd0  RU   0.0       0      0  [swapper/19]
          0      0  20  ffff880beeb11fa0  RU   0.0       0      0  [swapper/20]
          0      0  21  ffff880beeb12f70  RU   0.0       0      0  [swapper/21]
          0      0  22  ffff880beeb13f40  RU   0.0       0      0  [swapper/22]
          0      0  23  ffff880beeb14f10  RU   0.0       0      0  [swapper/23]

    原来是因为idle进程默认在running态,但是并不算到nr_running的计数中:

    crash> ps |grep -w 'RU' |grep -w '12'
    >     0      0  12  ffff880beeaceeb0  RU   0.0       0      0  [swapper/12]
         12      2   1  ffff88016e00eeb0  RU   0.0       0      0  [watchdog/1]-----------这个不是12cpu的进程
      22237      1  12  ffff881254b5eeb0  RU   1.0 14321692 912420  1_scheduler
    crash> struct rq.nr_running ffff881692217c00
      nr_running = 1

     为啥running这么多呢?查看处于running状态的task,堆栈都是 __schedule ,也就是就绪队列很长,

     来挑一些看看堆栈:

    crash> bt 30507
    PID: 30507  TASK: ffff8809d42daf70  CPU: 3   COMMAND: "java"
     #0 [ffff88099eb93d58] __schedule at ffffffff816b6165
     #1 [ffff88099eb93dc0] schedule at ffffffff816b6ae9
     #2 [ffff88099eb93dd0] schedule_hrtimeout_range_clock at ffffffff816b5852
     #3 [ffff88099eb93e68] schedule_hrtimeout_range at ffffffff816b5903
     #4 [ffff88099eb93e78] ep_poll at ffffffff812526de
     #5 [ffff88099eb93f30] sys_epoll_wait at ffffffff81253b9d
     #6 [ffff88099eb93f80] system_call_fastpath at ffffffff816c2789
        RIP: 00007ffa43a652c3  RSP: 00007ffa2595b818  RFLAGS: 00010202
        RAX: 00000000000000e8  RBX: ffffffff816c2789  RCX: 00000000e14ef930
        RDX: 0000000000002000  RSI: 00000000047a5000  RDI: 0000000000000067
        RBP: 00007ffa2595b630   R8: 0000000000000000   R9: 0000000000000003
        R10: 0000000000007530  R11: 0000000000000293  R12: 0000016547341bca
        R13: 0000000000000067  R14: 0000000000007530  R15: 00000000047a5000
        ORIG_RAX: 00000000000000e8  CS: 0033  SS: 002b
    crash> bt 31606
    PID: 31606  TASK: ffff8809cbe03f40  CPU: 9   COMMAND: "java"
     #0 [ffff880a4eedbd58] __schedule at ffffffff816b6165
     #1 [ffff880a4eedbdc0] schedule at ffffffff816b6ae9
     #2 [ffff880a4eedbdd0] schedule_hrtimeout_range_clock at ffffffff816b5852
     #3 [ffff880a4eedbe68] schedule_hrtimeout_range at ffffffff816b5903
     #4 [ffff880a4eedbe78] ep_poll at ffffffff812526de
     #5 [ffff880a4eedbf30] sys_epoll_wait at ffffffff81253b9d
     #6 [ffff880a4eedbf80] system_call_fastpath at ffffffff816c2789
        RIP: 00007f7bb9bb72c3  RSP: 00007f7b8f060510  RFLAGS: 00000246
        RAX: 00000000000000e8  RBX: ffffffff816c2789  RCX: 00000000ffffffff
        RDX: 0000000000002000  RSI: 0000000005be3000  RDI: 0000000000000151
        RBP: 00007f7b8f060310   R8: 0000000000000000   R9: 0000000000000003
        R10: 00000000000003e8  R11: 0000000000000293  R12: 0000016547349167
        R13: 0000000000000151  R14: 00000000000003e8  R15: 0000000005be3000
        ORIG_RAX: 00000000000000e8  CS: 0033  SS: 002b
    crash> task_struct.state ffff8809cbe03f40
      state = 0

    看看状态值的宏:

    #define TASK_RUNNING        0
    #define TASK_INTERRUPTIBLE    1
    #define TASK_UNINTERRUPTIBLE    2
    #define __TASK_STOPPED        4
    #define __TASK_TRACED        8

    再看看占有cpu的进程的堆栈,发现基本都在等锁,原本大家很快都能执行完,然后放到等待队列里面去,但是由于前面某个进程执行占着cpu不放,那么因为等待资源而得到唤醒的,处于就绪队列里面的进程自然得不到调度。当然这个是非抢占式内核的特点,高优先级的进程,也得等低优先级的进程主动放弃cpu,哪怕你已经在就绪队列,哪怕你的状态已经被改成了running,但就得等着。

    水平有限,如果有错误,请帮忙提醒我。如果您觉得本文对您有帮助,可以点击下面的 推荐 支持一下我。版权所有,需要转发请带上本文源地址,博客一直在更新,欢迎 关注 。
  • 相关阅读:
    python map 详解
    python 中的集合(set) 详解
    python中的列表(list) 切片详解
    django 创建数据库表 命令
    django如何检查创建的模型(model)是否有语法错误或者逻辑错误
    python 列表(list)去除重复的元素总结
    python中的 zip函数详解
    django post和get 比较
    正则表达式的特殊符号和字符详细解析
    .NET/MVC-发布到IIS6.1提示未能加载程序集System.Web.Http.WebHost
  • 原文地址:https://www.cnblogs.com/10087622blog/p/9582606.html
Copyright © 2011-2022 走看看