zoukankan      html  css  js  c++  java
  • shell实战之Linux主机系统监控

    1、系统监控概述

    采集的监控信息主要有内存占用率,CPU占用率,当前在线用户,磁盘挂载及磁盘空间使用率,平均每秒写入流量,平均每秒流出流量。磁盘IO:平均每秒从磁盘读入内存的速率,平均每秒从内存写入磁盘的速率。

    2、监控原理

    2.1、CPU占用率

    监控原理:

    CPU相关信息记录在文件 /proc/stat中。详情请查看博文:https://blog.csdn.net/ustclu/article/details/1721673

    stephen@stephen-K55VD:~/shell$ cat  /proc/stat
    cpu  348229 906 98356 7304276 81726 0 2821 0 0 0
    cpu0 95033 273 22980 1803962 33023 0 1721 0 0 0
    cpu1 79735 255 24756 1836717 17035 0 454 0 0 0
    cpu2 84045 211 25742 1831963 16753 0 582 0 0 0
    cpu3 89415 166 24876 1831633 14913 0 62 0 0 0
    intr 10306028 7 28486 0 0 0 0 0 0 1 825 0 0 50130 0 0 0 76 284421 0 213811 0 0 0 29 795993 19 0 81 766580 15 648 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    ctxt 51268973
    btime 1554444493
    processes 14526
    procs_running 1
    procs_blocked 0
    softirq 9059312 7 2712077 5 5478 204089 0 1245879 2780432 0 2111345

     代码实现:

    1 #获取CPU的总量与使用量
    2     cpuTotalStart=`awk 'BEGIN{total=0} /cpu / {for(i=2;i<=NF;i++);total+=i}END{print $total}' /proc/stat`
    3     cpuUsedStart=`awk 'BEGIN{used=0} /cpu / { used=$2+$3+$4+$7+$8 }END{print used}' /proc/stat`
    4     #隔30s再获取一次CPU总量与使用量并计算差值
    5     sleep 30
    6     cpuTotalEnd=`awk 'BEGIN{total=0} /cpu / {for(i=2;i<=NF;i++);total+=i}END{print $total}' /proc/stat`
    7     cpuUsedEnd=`awk 'BEGIN{used=0} /cpu / { used=$2+$3+$4+$7+$8 }END{print used}' /proc/stat`
    8     usedCPU=`expr ${cpuUsedEnd} - ${cpuUsedStart}`
    9     totalCPU=`expr ${cpuTotalEnd} - ${cpuTotalStart}`

    2.2、内存占用率

    监控原理:

    内存相关的信息记录在/proc/meminfo文件中,MemTotal为内存总量,单位为kb,MemFree为空闲内存。内存占用率=(总内存-空闲内存)/ 总内存。

    stephen@stephen-K55VD:~/shell$ cat /proc/meminfo
    MemTotal:        3922884 kB
    MemFree:          139108 kB
    MemAvailable:     317700 kB
    Buffers:           31792 kB
    Cached:           538160 kB
    SwapCached:        10012 kB
    Active:          2615652 kB

    代码实现:

     1 #获取内存使用率
     2 function memUsage(){
     3     logInfo "Begin to get mem usage of Host [${ip}]"
     4     #获取总内存
     5     totalMem=`awk '/MemTotal/{print $2}' /proc/meminfo`
     6     #获取空闲内存
     7     freeMem=`awk '/MemFree/{print $2}' /proc/meminfo`
     8     usedMem=`expr ${totalMem} - ${freeMem}`
     9     #echo $(usagePercent ${usedMem} ${totalMem})
    10     #echo $(kbToGb ${totalMem})
    11     logInfo "Host [${ip}] total mem is :  $(kbToGb ${totalMem}) GB"
    12     #计算内存使用率并打印到日志中
    13     logInfo "Host [${ip}] mem usage is :  $(usagePercent ${usedMem} ${totalMem}) %"
    14     logInfo "End to get mem usage of Host [${ip}]"
    15 }

    2.3、流量监控

    监控原理:

    Linux机器流量信息记录在/proc/net/dev文件中。通过计算一段时间段内接收和发送的字节数来计算速率。第一列为网卡信息,第二列为接收的字节数,第10列为发送的字节数。

    stephen@stephen-K55VD:~/shell/sysMonitor$ cat /proc/net/dev
    Inter-|   Receive                                                |  Transmit
     face |bytes    packets errs drop fifo frame compressed multicast|bytes    packets errs drop fifo colls carrier compressed
    wlp3s0: 19595253   41163    0    0    0     0          0         0 34741446   49185    0    0    0     0       0          0
    enp4s0f2:       0       0    0    0    0     0          0         0        0       0    0    0    0     0       0          0
    docker0:       0       0    0    0    0     0          0         0        0       0    0    0    0     0       0          0
        lo:  907275    5032    0    0    0     0          0         0   907275    5032    0    0    0     0       0          0

    代码实现:

    1 #ethName为网卡名称
    2 receiveByteStart=`cat  /proc/net/dev  |grep  -E "${ethName}"|awk '{print $2}'`
    3 sendByteStart=`cat  /proc/net/dev  |grep  -E "${ethName}"|awk '{print $10}'`

    2.4、磁盘IO

    监控原理:

    磁盘IO相关的信息记录在/proc/vmstat文件中,pgpgin对应的为输入方向的数据量。pgpgout对应的为输出方向的数据量。采集一段时间的数据量,除以时间来计算速率。

    代码实现:

     1 #disk IO in
     2 function diskIOIn(){
     3     #获取磁盘入方向IO
     4     inIoStart=`awk '/pgpgin/{print $2}' /proc/vmstat`
     5     sleep 30
     6     inIoEnd=`awk '/pgpgin/{print $2}' /proc/vmstat`
     7     inIo=$(((inIoEnd-inIoStart)/(30*1024)))
     8     logInfo "Host [${ip}] in IO is :  ${inIo} MB / s"
     9 
    10 }

    3、脚本代码

    • hostLists:监控主机的IP集合。
    • sysMonitor.sh*:获取各项监控信息的脚本。
      1 #!/bin/bash
      2 #监控linux主机系统信息
      3 #导入工具模块
      4 source utils
      5 
      6 #获取CPU占用率
      7 function cpuUsage()
      8 {
      9     #物理CPU个数
     10     phyCPUNums=`cat /proc/cpuinfo |grep "physical id"|sort |uniq|wc -l`
     11     #逻辑CPU个数
     12     lgCPUNums=`cat /proc/cpuinfo |grep "processor"|wc -l`
     13         #core
     14     cores=`cat /proc/cpuinfo |grep "cores"|uniq|awk '{print $4}'`
     15     logInfo "Host [${ip}] physical CPU nums is :  ${phyCPUNums}"
     16     logInfo "Host [${ip}] logic CPU nums is :  ${lgCPUNums}"
     17     logInfo "Host [${ip}] core nums is :  ${cores}"
     18     #CPU占用率
     19     #获取CPU的总量与使用量
     20     cpuTotalStart=`awk 'BEGIN{total=0} /cpu / {for(i=2;i<=NF;i++);total+=i}END{print $total}' /proc/stat`
     21         cpuUsedStart=`awk 'BEGIN{used=0} /cpu / { used=$2+$3+$4+$7+$8 }END{print used}' /proc/stat`
     22     #隔30s再获取一次CPU总量与使用量并计算差值
     23     sleep 30
     24     cpuTotalEnd=`awk 'BEGIN{total=0} /cpu / {for(i=2;i<=NF;i++);total+=i}END{print $total}' /proc/stat`
     25     cpuUsedEnd=`awk 'BEGIN{used=0} /cpu / { used=$2+$3+$4+$7+$8 }END{print used}' /proc/stat`
     26     usedCPU=`expr ${cpuUsedEnd} - ${cpuUsedStart}`
     27     totalCPU=`expr ${cpuTotalEnd} - ${cpuTotalStart}`
     28     logInfo "Host [${ip}] CPU usage is :  $(usagePercent ${usedCPU} ${totalCPU}) %"
     29     
     30 }
     31 
     32 #获取内存使用率
     33 function memUsage(){
     34     logInfo "Begin to get mem usage of Host [${ip}]"
     35     #获取总内存
     36     totalMem=`awk '/MemTotal/{print $2}' /proc/meminfo`
     37     #获取空闲内存
     38     freeMem=`awk '/MemFree/{print $2}' /proc/meminfo`
     39     usedMem=`expr ${totalMem} - ${freeMem}`
     40     #echo $(usagePercent ${usedMem} ${totalMem})
     41     #echo $(kbToGb ${totalMem})
     42     logInfo "Host [${ip}] total mem is :  $(kbToGb ${totalMem}) GB"
     43     #计算内存使用率并打印到日志中
     44     logInfo "Host [${ip}] mem usage is :  $(usagePercent ${usedMem} ${totalMem}) %"
     45     logInfo "End to get mem usage of Host [${ip}]"
     46 }
     47 
     48 #网卡平均每秒流量
     49 function netData(){
     50     logInfo "Begin to get  net data of Host [${ip}]"
     51     ethName=$1    
     52     receiveByteStart=`cat  /proc/net/dev  |grep  -E "${ethName}"|awk '{print $2}'`
     53     sendByteStart=`cat  /proc/net/dev  |grep  -E "${ethName}"|awk '{print $10}'`
     54     sleep 10
     55     receiveByteSEnd=`cat  /proc/net/dev  |grep  -E "${ethName}"|awk '{print $2}'`
     56     sendBytesEnd=`cat  /proc/net/dev  |grep  -E "${ethName}"|awk '{print $10}'`
     57     inDataRate=$(echo "scale=2;(${receiveByteSEnd}-${receiveByteStart})/10" | bc)
     58     outDataRate=$(echo "scale=2;(${sendBytesEnd}-${sendByteStart})/10" | bc)
     59     logInfo "Host [${ip}] in data is :  ${inDataRate} kb / s"    
     60     logInfo "Host [${ip}] out data is :  ${outDataRate} kb / s"
     61     logInfo "End to get  net data of Host [${ip}]"
     62 }
     63 
     64 #磁盘空间使用情况
     65 function diskUsage(){
     66     logInfo "Begin to get disk usage of Host [${ip}]"
     67     noTimeLogInfo "`df -h`"
     68     logInfo "End to get disk usage of Host [${ip}]"
     69 }
     70 
     71 #disk IO in
     72 function diskIOIn(){
     73     #获取磁盘入方向IO
     74     inIoStart=`awk '/pgpgin/{print $2}' /proc/vmstat`
     75     sleep 30
     76     inIoEnd=`awk '/pgpgin/{print $2}' /proc/vmstat`
     77     inIo=$(((inIoEnd-inIoStart)/(30*1024)))
     78     logInfo "Host [${ip}] in IO is :  ${inIo} MB / s"
     79 
     80 }
     81 
     82 #disk IO out
     83 function diskIOout(){
     84     #获取磁盘出方向的IO
     85     outIoStart=`awk '/pgpgout/{print $2}' /proc/vmstat`
     86     sleep 60
     87     outIoEnd=`awk '/pgpgout/{print $2}' /proc/vmstat`
     88     outIo=$(((outIoEnd-outIoStart)/(60*1024)))
     89     logInfo "Host [${ip}] out IO is :  ${outIo} MB / s"
     90 }
     91 
     92 #当前在线用户
     93 function onlineUser(){
     94     user=`w |awk  'NR>1'|awk '{print $1 "	" "	" $4}'`
     95     userCount=`w |awk  'NR>1'|wc -l`
     96         #loginAt=`w |awk  'NR>1'|awk '{print $4 }'`
     97         logInfo "There are [${userCount}] users online now."
     98     noTimeLogInfo "UserName        loginAt"
     99         noTimeLogInfo "${user}"
    100 }
    101     
    102 #判断主机网络连通性
    103 function isAlive(){
    104         for ip in `cat hostLists`
    105     do
    106     ping ${ip} -c 3 >/dev/null
    107         if [ $? -eq 0 ];then
    108         logInfo "${ip} is reachable"
    109         #查看在线用户
    110             onlineUser
    111         #获取CPU相关信息
    112         cpuUsage
    113         #获取mem相关信息
    114         memUsage
    115         #获取磁盘IO
    116         diskIOIn
    117         diskIOout
    118         #磁盘使用率
    119         diskUsage
    120         #平均每秒流接收或输出流量
    121         netData wlp3s0
    122     else
    123         logInfo "ERROR ${ip} is unreachable,try login in see more details.."
    124     fi
    125     done
    126 }
    127 
    128 while [ 1 ]
    129     do
    130     isAlive
    131     sleep 60
    132     done
    • utils:打印日志的函数等。
     1 #!/bin/bash
     2 #日志打印
     3 curr_path=`pwd`
     4 function logInfo()
     5 {
     6 local curr_time=`date "+%Y-%m-%d %H:%M:%S"`
     7 log_file=${curr_path}/system_status.log
     8 #判断日志文件是否存在
     9 if [ -e ${log_file} ]
    10    then
    11    #检测文件是否可写
    12    if [ -w ${log_file} ]
    13    then
    14        #若文件无写权限则使用chmod命令赋予权限
    15        chmod 770 ${log_file}
    16    fi
    17 else
    18    #若日志文件不存在则创建
    19    touch ${log_file}
    20 fi
    21 #写日志
    22 local info=$1
    23 echo "${curr_time}  `whoami` [Info] ${info}">>${log_file}
    24 }
    25 function noTimeLogInfo(){
    26     msg=$1
    27     echo  "${msg}">>${log_file}
    28 }
    29 
    30 #把kb转换成gb,精度为3。expr只支持整数计算
    31 function kbToGb(){
    32     kbVal=$1
    33     gbVal=$(echo "scale=3;${kbVal}/1024/1024"| bc)
    34     echo $gbVal
    35 }
    36 #使用率以百分比的形式
    37 #第一个参数为已使用量,第二个参数为总量
    38 function usagePercent(){
    39     used=$1
    40     total=$2
    41     usedPercent=$(echo "scale=2;${used}*100/${total}"| bc)
    42     echo ${usedPercent}
    43 }

    脚本结构:

    1 -rw-r--r-- 1 stephen stephen   30 4月   5 18:33 hostLists
    2 -rwxrwxr-x 1 stephen stephen 4164 4月   5 18:50 sysMonitor.sh*
    3 -rw-r--r-- 1 stephen stephen  951 4月   5 15:23 utils

    4、运行结果

    监控信息记录在日志system_status.log中。运行结果如下:

    2019-04-05 19:44:42  stephen [Info] 192.168.1.109 is reachable
    2019-04-05 19:44:42  stephen [Info] There are [2] users online now.
    UserName        loginAt
    USER        LOGIN@
    stephen        14:09
    2019-04-05 19:44:42  stephen [Info] Host [192.168.1.109] physical CPU nums is :  1
    2019-04-05 19:44:42  stephen [Info] Host [192.168.1.109] logic CPU nums is :  4
    2019-04-05 19:44:42  stephen [Info] Host [192.168.1.109] core nums is :  2
    2019-04-05 19:45:12  stephen [Info] Host [192.168.1.109] CPU usage is :  10.12 %
    2019-04-05 19:45:12  stephen [Info] Begin to get mem usage of Host [192.168.1.109]
    2019-04-05 19:45:12  stephen [Info] Host [192.168.1.109] total mem is :  3.741 GB
    2019-04-05 19:45:12  stephen [Info] Host [192.168.1.109] mem usage is :  95.83 %
    2019-04-05 19:45:12  stephen [Info] End to get mem usage of Host [192.168.1.109]
    2019-04-05 19:45:42  stephen [Info] Host [192.168.1.109] in IO is :  0 MB / s
    2019-04-05 19:46:42  stephen [Info] Host [192.168.1.109] out IO is :  0 MB / s
    2019-04-05 19:46:42  stephen [Info] Begin to get disk usage of Host [192.168.1.109]
    文件系统        容量  已用  可用 已用% 挂载点
    udev            1.9G     0  1.9G    0% /dev
    tmpfs           384M  2.0M  382M    1% /run
    /dev/sda10       42G   20G   20G   51% /
    tmpfs           1.9G   20M  1.9G    2% /dev/shm
    tmpfs           5.0M  4.0K  5.0M    1% /run/lock
    tmpfs           1.9G     0  1.9G    0% /sys/fs/cgroup
    /dev/loop0      3.8M  3.8M     0  100% /snap/notepad-plus-plus/202
    /dev/loop2       54M   54M     0  100% /snap/core18/782
    /dev/loop4      441M  441M     0  100% /snap/wine-platform/111
    /dev/loop5      441M  441M     0  100% /snap/wine-platform/105
    /dev/loop7      3.8M  3.8M     0  100% /snap/notepad-plus-plus/199
    /dev/loop3       90M   90M     0  100% /snap/core/6673
    /dev/loop1      274M  274M     0  100% /snap/wps-office-multilang/1
    /dev/loop6       91M   91M     0  100% /snap/core/6405
    /dev/loop8       92M   92M     0  100% /snap/core/6531
    /dev/loop9       36M   36M     0  100% /snap/gtk-common-themes/1198
    /dev/loop10     3.8M  3.8M     0  100% /snap/notepad-plus-plus/195
    /dev/loop11     441M  441M     0  100% /snap/wine-platform/103
    tmpfs           384M   16K  384M    1% /run/user/125
    tmpfs           384M   52K  384M    1% /run/user/1000
    2019-04-05 19:46:42  stephen [Info] End to get disk usage of Host [192.168.1.109]
    2019-04-05 19:46:42  stephen [Info] Begin to get  net data of Host [192.168.1.109]
    2019-04-05 19:46:52  stephen [Info] Host [192.168.1.109] in data is :  42.90 kb / s
    2019-04-05 19:46:52  stephen [Info] Host [192.168.1.109] out data is :  7.00 kb / s
    2019-04-05 19:46:52  stephen [Info] End to get  net data of Host [192.168.1.109]
    2019-04-05 19:47:04  stephen [Info] ERROR 255.255.255.254 is unreachable,try login in see more details..

    5、参考文档

    5.1、ifstat网络流量监控之/proc/net/dev文件

    https://blog.csdn.net/kongshuai19900505/article/details/80676607

    5.2、awk命令

    http://man.linuxde.net/awk

    5.3、使用shell脚本采集系统cpu、内存、磁盘、网络等信息

    https://www.jb51.net/article/50436.htm

  • 相关阅读:
    jQuery上传插件Uploadify使用详解
    SQL之case when then用法
    myeclipse 上安装 Maven
    ps 简介
    Linux 查看进程和删除进程
    EL表达<%@page isELIgnored="false"%>问题
    刷新本地的DNS缓存
    IDEA中 @override报错的处理步骤
    Caused by: org.springframework.core.NestedIOException: ASM ClassReader failed to parse class file
    jsp中${param.user}不解析,原样输出。
  • 原文地址:https://www.cnblogs.com/webDepOfQWS/p/10659653.html
Copyright © 2011-2022 走看看