zoukankan      html  css  js  c++  java
  • 一次docker中的nginx进程响应慢问题定位记录

    有个ft测试的环境,其中nginx使用docker发布的。测试用例是curl的时候,没有获得nginx的响应。

    docker ps
    CONTAINER ID        IMAGE                                                                  COMMAND                  CREATED             STATUS              PORTS                                                                                         NAMES
    e39be26bc976        cdn-release-docker.artnj.zte.com.cn/img-for-ottci/python:latest        "/bin/bash -c /home/s"   12 minutes ago      Up 12 minutes       22/tcp, 9220/tcp                                                                              RF_THUMBNAIL
    a7f8ff2c2fac        ft_thumbnail:V5.01.01.01                                               "/usr/sbin/init"         12 minutes ago      Up 12 minutes                                                                                                     FT_thumbnail
    4881975f8e10        cdn-release-docker.artnj.zte.com.cn/img-for-ottci/python:latest        "/bin/bash -c /home/s"   31 minutes ago      Up 31 minutes       22/tcp, 9220/tcp                                                                              RF_filepackager
    bfdc3beaf7e4        ft_filepackager:V5.01.01.01                                            "/usr/sbin/init bash"    31 minutes ago      Up 31 minutes                                
    

     进入容器,看一下具体信息:

    docker exec -it bfdc3beaf7e4 bash
    

      敲一下top命令,可以发现多个nginx的work进程的cpu很高。

    bash-4.2# top
    TERM environment variable not set.
    bash-4.2#
    bash-4.2# echo TERM
    TERM
    bash-4.2# echo $TERM
    dumb
    bash-4.2# export M
    bash-4.2# export TERM=dumb
    bash-4.2# top
    top - 09:28:13 up 81 days,  7:01,  0 users,  load average: 17.46, 44.24, 34.20
    Tasks:  49 total,   2 running,  47 sleeping,   0 stopped,   0 zombie
    %Cpu(s):  2.2 us,  4.4 sy,  0.0 ni, 93.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
    KiB Mem : 26321361+total, 48650644 free, 24579464 used, 18998350+buff/cache
    KiB Swap:        0 total,        0 free,        0 used. 23644137+avail Mem
    
      PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
     1435 root      20   0 9855.0m  44528   3236 R  93.3  0.0  24:27.53 nginx

    bash-4.2# strace -p 1435
    bash: strace: command not found

      

    由于并没有将strace命令一起打包到这个镜像中,所以需要在容器外和容器内的进程对应起来。然后查看对应的进程信息:

    [root@localhost ~]# docker top bfdc3beaf7e4 |grep -i nginx
    root                26807               22109               0                   17:33               ?                   00:00:00            wget -T 10 -t 1 -O /home/zxcdn/agent/plugin/stat/../../crontab/nginxstatus.tmp http://127.0.0.1:6620/NGINXSTATUS
    root                30753               16121               0                   16:53               ?                   00:00:00            nginx: master process /home/zxcdn/ottcache/nginx/sbin/nginx -p /home/zxcdn/ottcache/nginx
    root                30754               30753               72                  16:53               ?                   00:29:28            nginx: worker process
    

      有很多nginx的work进程,而且进程名相同,只截取部分。nginx开启的work进程是64个,所以有一大片。那怎么对应呢?有个小技巧,就是在容器里运行ps -ef |grep -i nginx |grep -i worker,然后对应的顺序,在容器外和docker top |grep -i nginx |grep -i worker是一致的。

    在容器外找到对应的cpu高的进程滞后,先看一下这个进程是用户态cpu高还是内核态cpu高,可以使用

    pidstat -p 30754 1 来确定,然后本文刚好是内核态cpu高,则可以使用strace 来看为什么该进程cpu高了。本文是因为对应的lua代码的覆盖率的频率设置太低,而且lua脚本有个地方触发了循环,导致了频繁操作read和write。

    open("/home/devops/luacov/luacov.stats.out", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 18
    fstat(18, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
    mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5178ddc000
    write(18, "398:/home/zxcdn/ottcac 0 0 0 0 0"..., 4096) = 4096
    write(18, " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0"..., 4096) = 4096
    write(18, "0 0 0 0 0 0 0 0 0 0 0 0 3 3 0 0 "..., 4096) = 4096
    write(18, " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0"..., 4096) = 4096
    write(18, "1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 "..., 4096) = 4096
    write(18, "0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 "..., 4096) = 4096
    write(18, "28 0 0 0 0 0 114 114 114 0 114 0"..., 4096) = 4096
    write(18, "0 0 0 0 0 0 0 0 0 0 0 0 889 0 0 "..., 4096) = 4096
    write(18, " 0 0 0 0 0 0 0 0 0 0 480 0 0 480"..., 4096) = 4096
    write(18, " 0 0 0 0 354 354 0 0 0 0 0 0 0 0"..., 4096) = 4096
    write(18, "0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 "..., 204) = 204
    close(18)                               = 0
    munmap(0x7f5178ddc000, 4096)            = 0
    close(20)                               = 0
    close(22)                               = 0
    open("/home/devops/luacov/luacov.stats.out", O_RDONLY) = 18
    fstat(18, {st_mode=S_IFREG|0644, st_size=41164, ...}) = 0
    mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5178ddc000
    read(18, "398:/home/zxcdn/ottcac 0 0 0 0 0"..., 4096) = 4096
    read(18, " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0"..., 4096) = 4096
    read(18, "0 0 0 0 0 0 0 0 0 0 0 0 3 3 0 0 "..., 4096) = 4096
    read(18, " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0"..., 4096) = 4096
    read(18, "1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 "..., 4096) = 4096
    read(18, "0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 "..., 4096) = 4096
    read(18, "28 0 0 0 0 0 114 114 114 0 114 0"..., 4096) = 4096
    read(18, "0 0 0 0 0 0 0 0 0 0 0 0 889 0 0 "..., 4096) = 4096
    read(18, " 0 0 0 0 0 0 0 0 0 0 480 0 0 480"..., 4096) = 4096
    read(18, " 0 0 0 0 354 354 0 0 0 0 0 0 0 0"..., 4096) = 4096
    read(18, "0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 "..., 4096) = 204
    

      

    水平有限,如果有错误,请帮忙提醒我。如果您觉得本文对您有帮助,可以点击下面的 推荐 支持一下我。版权所有,需要转发请带上本文源地址,博客一直在更新,欢迎 关注 。
  • 相关阅读:
    Golang的演化历程
    优秀的计算机编程类博客和文章
    NET Portability Analyzer
    NET SqlClient
    Razor模板引擎
    js资源
    依赖注入和控制器
    Vue.js 2.0 和 React、Augular
    过滤器
    Prism vs MvvmCross
  • 原文地址:https://www.cnblogs.com/10087622blog/p/9361862.html
Copyright © 2011-2022 走看看