zoukankan      html  css  js  c++  java
  • Linux 系统出现异常排查思路


    16 系统出现异常排查思路
    16.1 查看用户信息
    16.1.1查看当前的用户

    # who

     04:39:39 up  1:30,  1 user,  load average: 0.01, 0.01, 0.00

    USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT

    root     pts/0    192.168.215.1    04:27    0.00s  0.16s  0.02s w
    16.1.2查看最近登录的用户

    # last

    ***************

    root     pts/2        hadoop2          Sun Oct 16 15:52 - 15:52  (00:00)    

    root     pts/1        192.168.215.1    Sun Oct 16 15:39 - down   (00:23)    

    hadoop  pts/0        :0.0             Sun Oct 16 00:33 - down   (15:30)    

    hadoop  tty1         :0               Sun Oct 16 00:31 - down   (15:31)    

    reboot   system boot  2.6.32-573.el6.x Sun Oct 16 08:16 - 16:03  (07:47)
    16.2 查看直线执行的命令

    # history

    ***************

      683  last

      684  clear

      685  last

      686  clear

      687  history
    16.3查看现在运行的进程

    # pstree -a

    init

      ├─NetworkManager --pid-file=/var/run/NetworkManager/NetworkManager.pid

      ├─abrtd

      ├─acpid

      ├─atd

      ├─auditd

      │   └─{auditd}

      ├─bonobo-activati --ac-activate --ior-output-fd=12

    *******************

    # ps  aux

    USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND

    root          1  0.0  0.0  19352  1544 ?        Ss   03:09   0:02 /sbin/init

    root          2  0.0  0.0      0     0 ?        S    03:09   0:00 [kthreadd]

    root          3  0.0  0.0      0     0 ?        S    03:09   0:00 [migration/0]

    root          4  0.0  0.0      0     0 ?        S    03:09   0:00 [ksoftirqd/0]

    root          5  0.0  0.0      0     0 ?        S    03:09   0:00 [stopper/0]
    16.4查看网络服务的进程
    16.4.1查看正在运行的端口

    # netstat  -nltl

    Active Internet connections (only servers)

    Proto Recv-Q Send-Q Local Address               Foreign Address             State      

    tcp        0      0 0.0.0.0:22                  0.0.0.0:*                   LISTEN      

    tcp        0      0 127.0.0.1:631               0.0.0.0:*                   LISTEN      

    tcp        0      0 127.0.0.1:25                0.0.0.0:*                   LISTEN      

    tcp        0      0 127.0.0.1:6010              0.0.0.0:*                   LISTEN      

    tcp        0      0 :::2181                     :::*                        LISTEN      

    tcp        0      0 :::37129                    :::*                        LISTEN      
    16.4.2正在活跃的端口

    # netstat  -nulp

    Active Internet connections (only servers)

    Proto Recv-Q Send-Q Local Address               Foreign Address             State       PID/Program name   

    udp        0      0 0.0.0.0:631                 0.0.0.0:*                               2089/cupsd
    16.4.3 查看UNIX活跃的端口

    #  netstat -nxlp

    Active UNIX domain sockets (only servers)

    Proto RefCnt Flags       Type       State         I-Node PID/Program name    Path

    unix  2      [ ACC ]     STREAM     LISTENING     13954  2136/hald           @/var/run/hald/dbus-WAkpL6y5o7

    unix  2      [ ACC ]     STREAM     LISTENING     16245  2614/gnome-session  @/tmp/.ICE-unix/2614

    unix  2      [ ACC ]     STREAM     LISTENING     15966  2524/Xorg           @/tmp/.X11-unix/X0

    unix  2      [ ACC ]     STREAM     LISTENING     13947  2136/hald           @/var/run/hald/dbus-QUMwKtSaJ5

    unix  2      [ ACC ]     STREAM     LISTENING     13818  2089/cupsd          /var/run/cups/cups.sock

    *********************
    16.5查看CPU与内存
    16.5.1查看空闲的内存以及内存与硬盘之间的SWAP

    # free -m

                 total       used       free     shared    buffers     cached

    Mem:          1862        475       1386          1         27        202

    -/+ buffers/cache:        245       1616

    Swap:         2047          0       2047

    # free -g

     总计 已用 空闲 共享 缓冲/缓存    可用

    内存:          15           7           1           0           6           6

    交换:           1           0           1
    16.6查看运行的详细信息

    # uptime

    04:59:59 up  1:50,  1 user,  load average: 0.00, 0.00, 0.00

    当前时间 04:59:59

    系统已运行的时间 1:50

    当前在线用户 1 user

    平均负载:0.00, 0.00, 0.00,最近1分钟、5分钟、15分钟系统的负载
    16.7动态查看运行的内存,CPU等信息

    # top

    top - 12:26:46 up 16:21,  1 user,  load average: 0.00, 0.00, 0.00

    Tasks:  82 total,   1 running,  81 sleeping,   0 stopped,   0 zombie

    Cpu(s):  0.0%us,  0.1%sy,  0.0%ni, 99.7%id,  0.1%wa,  0.0%hi,  0.1%si,  0.0%st

    Mem:   1895288k total,   665188k used,  1230100k free,    20628k buffers

    Swap:  2097144k total,        0k used,  2097144k free,    80392k cached

       PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                         

      2269 root      20   0 15056 1080  832 R  2.0  0.1   0:00.01 top                                                                                                                              

         1 root      20   0 19356 1536 1228 S  0.0  0.1   0:01.81 init                                                                                                                             

         2 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kthreadd                                                                                                                         

         3 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 migration/0                                                                                                                      

         4 root      20   0     0    0    0 S  0.0  0.0   0:01.13 ksoftirqd/0                                                                                                                      

         5 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 migration/0                                                                                                                      

         6 root      RT   0     0    0    0 S  0.0  0.0   0:00.14 watchdog/0                                                                                                                       

         7 root      20   0     0    0    0 S  0.0  0.0   0:41.30 events/0                                                                                                                         

         8 root      20   0     0    0    0 S  0.0  0.0   0:00.00 cgroup                                                                                                                           

         9 root      20   0     0    0    0 S  0.0  0.0   0:00.00 khelper

    ***********************
    16.8 硬件信息
    16.8.1系统中所有PCI总线设备或连接到该总线上的所有设备

    # lspci

    00:00.0 Host bridge: Intel Corporation 440BX/ZX/DX - 82443BX/ZX/DX Host bridge (rev 01)

    00:01.0 PCI bridge: Intel Corporation 440BX/ZX/DX - 82443BX/ZX/DX AGP bridge (rev 01)

    00:07.0 ISA bridge: Intel Corporation 82371AB/EB/MB PIIX4 ISA (rev 08)

    00:07.1 IDE interface: Intel Corporation 82371AB/EB/MB PIIX4 IDE (rev 01)

    00:07.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 08)
    16.8.2查看硬件方面的信息

    # ethtool eth0

    *******************

    Handle 0x0229, DMI type 33, 31 bytes

    64-bit Memory Error Information

    Type: OK

    Granularity: Unknown

    Operation: Unknown

    Vendor Syndrome: Unknown

    Memory Array Address: Unknown

    Device Address: Unknown

    Resolution: Unknown

    Handle 0x022A, DMI type 126, 4 bytes

    Inactive

    Handle 0x022B, DMI type 127, 4 bytes

    End Of Table
    16.9 IO的性能
    16.9.1 查看磁盘的使用情况

    # iostat

    Linux 2.6.32-573.el6.x86_64 (hadoop1) 10/21/2016 _x86_64_(1 CPU)

    avg-cpu:  %user   %nice %system %iowait  %steal   %idle

               0.17    0.00    0.56    2.15    0.00   97.11

    Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn

    sda               1.49        75.27        10.68     645224      91568
    16.9.2 动态的查看服务器的状态值

    # vmstat 2 10

    procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----

     r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st

     0  0      0 1322196  30688 298892    0    0    37     5   39   57  0  1 97  2  0

     0  0      0 1322140  30688 298920    0    0     0     0   57   84  1  1 99  0  0

    *********************
    16.9.3实时的对系统的监控

    # mpstat 2 10

    Linux 2.6.32-573.el6.x86_64 (hadoop1) 10/21/2016 _x86_64_(1 CPU)

    05:37:26 AM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle

    05:37:28 AM  all    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

    05:37:30 AM  all    0.00    0.00    0.50    0.00    0.00    0.00    0.00    0.00   99.50

    05:37:32 AM  all    0.00    0.00    0.00    0.00    0.00    0.50    0.00    0.00   99.50

    *********************
    16.9.4动态显示当前的操作IO的进程

    # yum -y install dstat

    # dstat --top-io --top-bio

    ----most-expensive---- ----most-expensive----

         i/o process      |  block i/o process   

    bash         53k  316B|init         19k  198B

    sshd: root@ 301B  340B|tpvmlpd2      0  4096B

    sshd: root@ 136B  180B|jbd2/sda2-8   0    56k
    16.10文件系统以及外接磁盘的信息
    16.10.1查看当前的挂在的设备

    # mount

    /dev/sda2 on / type ext4 (rw)

    proc on /proc type proc (rw)

    sysfs on /sys type sysfs (rw)

    devpts on /dev/pts type devpts (rw,gid=5,mode=620)

    tmpfs on /dev/shm type tmpfs (rw,rootcontext="system_u:object_r:tmpfs_t:s0")

    /dev/sda1 on /boot type ext4 (rw)

    none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)

    vmware-vmblock on /var/run/vmblock-fuse type fuse.vmware-vmblock (rw,nosuid,nodev,default_permissions,allow_other)
    16.10.2查看是否有专用的文件系统

    打开一下文件进行编辑

    # cat /etc/fstab

    #

    # /etc/fstab

    # Created by anaconda on Sun Oct 16 07:55:57 2016

    #

    # Accessible filesystems, by reference, are maintained under '/dev/disk'

    # See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info

    #

    UUID=b89c0aae-3284-4835-9b1b-04986146cd96 /                       ext4    defaults        1 1

    UUID=a1313d92-6873-402d-95a6-add6cd1321c6 /boot                   ext4    defaults        1 2

    UUID=6a5cde98-2fc5-4d8f-976c-92acb39ab2a9 swap                    swap    defaults        0 0

    tmpfs                   /dev/shm                tmpfs   defaults        0 0

    devpts                  /dev/pts                devpts  gid=5,mode=620  0 0

    sysfs                   /sys                    sysfs   defaults        0 0

    proc                    /proc                   proc    defaults        0 0
    16.10.3查看文件系统的挂在的选项

    # vgs
    16.10.4查看物理卷的信息

    # pvs
    16.11查看磁盘的剩余情况

    # df -h

    Filesystem      Size  Used Avail Use% Mounted on

    /dev/sda2        18G  6.2G   11G  38% /

    tmpfs           932M   72K  932M   1% /dev/shm

    /dev/sda1       283M   41M  228M  16% /boot
    16.12列出当前系统打开文件的工具

    # lsof +D / /* beware not to kill your box */

    ***************

    lsof      3907      root  mem    REG                8,2     22536     265965 /lib64/libdl-2.12.so

    lsof      3907      root  mem    REG                8,2   1926480     265960 /lib64/libc-2.12.so

    lsof      3907      root  mem    REG                8,2    124624     265966 /lib64/libselinux.so.1

    lsof      3907      root  mem    REG                8,2  99158576     394281 /usr/lib/locale/locale-archive
    16.12 内核与网络
    16.12.1显示在/proc/sys目录中的内核参数

    **************

    net.ipv6.nf_conntrack_frag6_high_thresh = 4194304

    net.ipv6.ip6frag_secret_interval = 600

    net.ipv6.mld_max_msf = 64

    net.nf_conntrack_max = 65536

    net.unix.max_dgram_qlen = 10

    abi.vsyscall32 = 1

    crypto.fips_enabled = 0
    16.12.2 显示设备的详细信息

    irq的序号, 在各自cpu上发生中断的次数,可编程中断控制器,设备名称(request_irq的dev_name字段)

    # cat /proc/interrupts

                CPU0       

       0:        261   IO-APIC-edge      timer

       1:          8   IO-APIC-edge      i8042

       4:       4838   IO-APIC-edge    

       8:          1   IO-APIC-edge      rtc0

       9:          0   IO-APIC-fasteoi   acpi

    查看链接数据库的信息

    #  cat /proc/net/ip_conntrack /* may take some time on busy servers */

    **************

    cat: sys/: Is a directory

    cat: tmp/: Is a directory

    cat: usr/: Is a directory

    cat: var/: Is a directory
    16.13查看网络套接字连接情况

    # netstat

    ************

    unix  3      [ ]         STREAM     CONNECTED     13648  

    unix  3      [ ]         STREAM     CONNECTED     13647  

    unix  3      [ ]         DGRAM                    10073  

    unix  3      [ ]         DGRAM                    10072  
    16.14获取socket统计信息

    # ss -s

    Total: 602 (kernel 610)

    TCP:   15 (estab 4, closed 0, orphaned 0, synrecv 0, timewait 0/0), ports 8

    Transport Total     IP        IPv6

    *  610       -         -        

    RAW  0         0         0        

    UDP  1         1         0        

    TCP  15        5         10       

    INET  16        6         10       

    FRAG  0         0         0  
    16.15日志消息与内核信息的查看
    16.15.1 显示linux内核的环形缓冲区信息

    # dmesg  [ tail / less / grep / more  ]

    *************

    eth0: no IPv6 routers present

    lp: driver loaded but no devices found

    ppdev: user-space parallel port driver

    hrtimer: interrupt took 2588670 ns
    16.15.2查看系统报错日志

    # less /var/log/messages

    Oct 16 08:16:22 localhost kernel: imklog 5.8.10, log source = /proc/kmsg started.

    Oct 16 08:16:22 localhost rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="1604" x-info="http://www.rsyslog.com"] start

    Oct 16 08:16:22 localhost kernel: Initializing cgroup subsys cpuset

    Oct 16 08:16:22 localhost kernel: Initializing cgroup subsys cpu

    *************
    16.15.3 安全信息和系统登录与网络连接的信息

    # less /var/log/secure

    Oct 16 08:17:06 localhost sshd[8287]: Server listening on 0.0.0.0 port 22.

    Oct 16 08:17:06 localhost sshd[8287]: Server listening on :: port 22.

    Oct 16 00:22:58 localhost polkitd(authority=local): Registered Authentication Agent for session /org/freedesktop/ConsoleKit/Session1 (system bus name :1.25 [/usr/libexec/polkit-gnome-authentication-agent-1], object path /org/gnome/PolicyKit1/AuthenticationAgent, locale en_US.UTF-8)

    ********************
    16.16查看定时的任务
    16.16.1查看定时任务的运行频率

    # ls /etc/cron* + cat

    /etc/cron.daily:

    cups  logrotate  makewhatis.cron  mlocate.cron  prelink  readahead.cron  tmpwatch

    /etc/cron.hourly:

    0anacron

    /etc/cron.monthly:

    readahead-monthly.cron

    /etc/cron.weekly:
    16.1.2 查看用户是否执行了隐藏的命令

    # for user in $(cat /etc/passwd | cut -f1 -d:); do crontab -l -u $user; done

    no crontab for root

    no crontab for bin

    no crontab for daemon

  • 相关阅读:
    testNg vs junit 4.X @Test
    lombok+slf4j+logback SLF4J和Logback日志框架详解
    IntelliJ IDEA 当pom.xml更新时,自动加载pom.xml
    运算符重载具体解释
    设计模式之十八:桥接模式(Bridge)
    无限层级的组织机构
    实战Jquery(一)--username校验
    Android错误之--Error retrieving parent for item: No resource found that matches the given name 'Theme.A
    hibernate 缓存
    android --多线程下载
  • 原文地址:https://www.cnblogs.com/lcword/p/14361431.html
Copyright © 2011-2022 走看看