zoukankan      html  css  js  c++  java
  • web服务器挂死问题

    web服务器卡死,登陆到后台查看问题; ps aux执行的时候发现卡死,

    重新ssh 登陆 strace ps 发现如下结果:

    使用gdb 调试也是卡死!

    使用top -b 查看所有的进程,发现 之前的ps 的进程为D状态, 同时web服务器 部分线程进程为D状态;

    dmesg 查看结果发现:

    [20761.085669] INFO: task apache2:7135 blocked for more than 120 seconds.
    [20761.085675]       Tainted: G        W  O    #4
    [20761.085677] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    [20761.085679] apache2         D ffffffc000086ef8     0  7135   4035 0x00000000
    [20761.085683] Call trace:
    [20761.085736] [<ffffffc000086ef8>] __switch_to+0xa0/0xb8
    [20761.085767] [<ffffffc0009ff10c>] __schedule+0x24c/0x704
    [20761.085769] [<ffffffc0009ff5fc>] schedule+0x38/0x90
    [20761.085781] [<ffffffc000a0260c>] rwsem_down_write_failed+0x1d8/0x310
    [20761.085783] [<ffffffc000a00928>] down_write+0x5c/0x74
    [20761.085795] [<ffffffc0002188d4>] SyS_mprotect+0xb0/0x204
    [20761.085797] [<ffffffc000085c74>] el0_svc_naked+0x24/0x28
    [20761.085799] INFO: task apache2:7138 blocked for more than 120 seconds.
    [20761.085800]       Tainted: G        W  O    YUN #4
    [20761.085801] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    [20761.085802] apache2         D ffffffc000086ef8     0  7138   4035 0x00000000
    [20761.085805] Call trace:
    [20761.085807] [<ffffffc000086ef8>] __switch_to+0xa0/0xb8
    [20761.085809] [<ffffffc0009ff10c>] __schedule+0x24c/0x704
    [20761.085811] [<ffffffc0009ff5fc>] schedule+0x38/0x90
    [20761.085813] [<ffffffc000a0260c>] rwsem_down_write_failed+0x1d8/0x310
    [20761.085815] [<ffffffc000a00928>] down_write+0x5c/0x74
    [20761.085818] [<ffffffc000249344>] split_huge_page_to_list+0x64/0x7e4
    [20761.085819] [<ffffffc00024a570>] __split_huge_page_pmd+0x120/0x354
    [20761.085821] [<ffffffc00020dc88>] unmap_single_vma+0x178/0x644
    [20761.085823] [<ffffffc00020ec48>] zap_page_range+0xa8/0x114
    [20761.085825] [<ffffffc000221150>] SyS_madvise+0x2f4/0x520
    [20761.085827] [<ffffffc000085c74>] el0_svc_naked+0x24/0x28
    [20761.085828] INFO: task apache2:7158 blocked for more than 120 seconds.
    [20761.085830]       Tainted: G        W  O    server.YUN #4
    [20761.085831] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    [20761.085832] apache2         D ffffffc000086ef8     0  7158   4035 0x00000008
    [20761.085834] Call trace:
    [20761.085836] [<ffffffc000086ef8>] __switch_to+0xa0/0xb8
    [20761.085838] [<ffffffc0009ff10c>] __schedule+0x24c/0x704
    [20761.085840] [<ffffffc0009ff5fc>] schedule+0x38/0x90
    [20761.085842] [<ffffffc000a0260c>] rwsem_down_write_failed+0x1d8/0x310
    [20761.085843] [<ffffffc000a00928>] down_write+0x5c/0x74
    [20761.085845] [<ffffffc000249344>] split_huge_page_to_list+0x64/0x7e4
    [20761.085847] [<ffffffc00024a570>] __split_huge_page_pmd+0x120/0x354
    [20761.085849] [<ffffffc00020dc88>] unmap_single_vma+0x178/0x644
    [20761.085850] [<ffffffc00020ec48>] zap_page_range+0xa8/0x114
    [20761.085852] [<ffffffc000221150>] SyS_madvise+0x2f4/0x520
    [20761.085854] [<ffffffc000085c74>] el0_svc_naked+0x24/0x28
    [20761.085856] INFO: task ps:17403 blocked for more than 120 seconds.
    [20761.085857]       Tainted: G        W  O     #4

    查看内核代码只接原因为:fs/proc/base.c 文件中的proc_pid_cmdline_read 函数执行如下代码发生获取信号量失败而导致休眠

    down_read(&mm->mmap_sem);
        arg_start = mm->arg_start;
        arg_end = mm->arg_end;
        env_start = mm->env_start;
        env_end = mm->env_end;
        up_read(&mm->mmap_sem);
    void __sched down_read(struct rw_semaphore *sem)
    {
        might_sleep();
        rwsem_acquire_read(&sem->dep_map, 0, 0, _RET_IP_);
    
        LOCK_CONTENDED(sem, __down_read_trylock, __down_read);
    }
    /*
     * lock for reading
     */
    static inline void __down_read(struct rw_semaphore *sem)
    {
        if (unlikely(atomic_long_inc_return_acquire((atomic_long_t *)&sem->count) <= 0))
            rwsem_down_read_failed(sem);
    }

      那是什么进程获取此sem没有释放呢?

    目前怎样查看?------>首先需要获取内核的堆栈 

    同时目前google 结果发现:内核有相关patch对此进行修改;见内核patch

    http代理服务器(3-4-7层代理)-网络事件库公共组件、内核kernel驱动 摄像头驱动 tcpip网络协议栈、netfilter、bridge 好像看过!!!! 但行好事 莫问前程 --身高体重180的胖子
  • 相关阅读:
    判断两个链表是否相交
    【转】TCP连接突然断开的处理方法
    【转】TCP/IP协议——ARP详解
    HTTP协议COOKIE和SESSION有什么区别
    【转】K-Means聚类算法原理及实现
    【转】机器学习实战之K-Means算法
    unity3d 调用Start 注意
    u3d 加载PNG做 UI图片
    Opengl的gl_NormalMatrix
    OpenGL 遮挡查询
  • 原文地址:https://www.cnblogs.com/codestack/p/15155899.html
Copyright © 2011-2022 走看看