zoukankan      html  css  js  c++  java
  • oracle rac grid用户执行命令crs_stat 耗时30秒处理

    现象:

    环境:2节点oracle rac 11.2.0.4.。物理机

    现象1

    一套oracle rac 11.2.0.4 集群环境,grid用户执行一些命令返回时间异常。需要近30

    [grid@rac1 ~]$ time crs_stat -t -v
    Name           Type           R/RA   F/FT   Target    State     Host        
    ----------------------------------------------------------------------
    ora.DATA.dg    ora....up.type 0/5    0/     ONLINE    ONLINE    rac1        
    ora.FRA.dg     ora....up.type 0/5    0/     ONLINE    ONLINE    rac1        
    ora....ER.lsnr ora....er.type 0/5    0/     ONLINE    ONLINE    rac1        
    ora....N1.lsnr ora....er.type 0/5    0/0    ONLINE    ONLINE    rac1        
    ora.OCR.dg     ora....up.type 0/5    0/     ONLINE    ONLINE    rac1        
    ora.asm        ora.asm.type   0/5    0/     ONLINE    ONLINE    rac1        
    ora.*.db     ora....se.type 0/2    0/1    OFFLINE   OFFLINE               
    ora.*.db     ora....se.type 0/2    0/1    ONLINE    ONLINE    rac1        
    ora.cvu        ora.cvu.type   0/5    0/0    ONLINE    ONLINE    rac1        
    ora.gsd        ora.gsd.type   0/5    0/     OFFLINE   OFFLINE               
    ora....network ora....rk.type 0/5    0/     ONLINE    ONLINE    rac1        
    ora.oc4j       ora.oc4j.type  0/1    0/2    ONLINE    ONLINE    rac1        
    ora.ons        ora.ons.type   0/3    0/     ONLINE    ONLINE    rac1        
    ora....SM1.asm application    0/5    0/0    ONLINE    ONLINE    rac1        
    ora....C1.lsnr application    0/5    0/0    ONLINE    ONLINE    rac1        
    ora.rac1.gsd   application    0/5    0/0    OFFLINE   OFFLINE               
    ora.rac1.ons   application    0/3    0/0    ONLINE    ONLINE    rac1        
    ora.rac1.vip   ora....t1.type 0/0    0/0    ONLINE    ONLINE    rac1        
    ora....SM2.asm application    0/5    0/0    ONLINE    ONLINE    rac2        
    ora....C2.lsnr application    0/5    0/0    ONLINE    ONLINE    rac2        
    ora.rac2.gsd   application    0/5    0/0    OFFLINE   OFFLINE               
    ora.rac2.ons   application    0/3    0/0    ONLINE    ONLINE    rac2        
    ora.rac2.vip   ora....t1.type 0/0    0/0    ONLINE    ONLINE    rac2        
    ora.scan1.vip  ora....ip.type 0/0    0/0    ONLINE    ONLINE    rac1        
    ora.*.db     ora....se.type 0/2    0/1    ONLINE    ONLINE    rac1        
    
    real	0m27.927s
    user	0m17.025s
    sys	0m10.712s

    现象2zabbix监控告警,偶尔提示 agent.ping超时。同时zabbix serve日志大量报错

    14975:20210730:051954.737 Zabbix agent item "oracle.status_online.process" on host "ORAC-NODE2-" failed: first network error, wait for 15 seconds
     14980:20210730:052049.401 resuming Zabbix agent checks on host "ORAC-NODE2": connection restored
     14978:20210730:052250.841 Zabbix agent item "oracle.status_offline.process" on host "ORAC-NODE1" failed: first network error, wait for 15 seconds
     14979:20210730:052300.585 Zabbix agent item "oracle.status_online.process" on host "ORAC-NODE2" failed: first network error, wait for 15 seconds
     14980:20210730:052312.489 resuming Zabbix agent checks on host "ORAC-NODE1": connection restored
     14980:20210730:052414.317 resuming Zabbix agent checks on host "ORAC-NODE2": connection restored
     14976:20210730:052557.756 Zabbix agent item "oracle.status_online.process" on host "ORAC-NODE2" failed: first network error, wait for 15 seconds
     14980:20210730:052647.517 resuming Zabbix agent checks on host "ORAC-NODE2": connection restored
     14976:20210730:052750.795 Zabbix agent item "oracle.status_offline.process" on host "ORAC-NODE1" failed: first network error, wait for 15 seconds
     14977:20210730:052800.474 Zabbix agent item "oracle.status_online.process" on host "ORAC-NODE2" failed: first network error, wait for 15 seconds
     14980:20210730:052812.589 resuming Zabbix agent checks on host "ORAC-NODE1": connection restored
     14980:20210730:052911.698 resuming Zabbix agent checks on host "ORAC-NODE2": connection restored
     14978:20210730:052918.528 Zabbix agent item "oracle.status_online.process" on host "ORAC-NODE1" failed: first network error, wait for 15 seconds
     14979:20210730:052956.146 Zabbix agent item "oracle.status_online.process" on host "ORAC-NODE2" failed: first network error, wait for 15 seconds
     14980:20210730:053014.555 resuming Zabbix agent checks on host "ORAC-NODE1": connection restored
     14980:20210730:053015.559 resuming Zabbix agent checks on host "ORAC-NODE2": connection restored
     14977:20210730:053156.451 Zabbix agent item "oracle.status_offline.process" on host "ORAC-NODE2" failed: first network error, wait for 15 seconds
     14980:20210730:053246.504 resuming Zabbix agent checks on host "ORAC-NODE2": connection restored
     14978:20210730:053348.958 Zabbix agent item "oracle.status_online.process" on host "ORAC-NODE1" failed: first network error, wait for 15 seconds
     14980:20210730:053412.577 resuming Zabbix agent checks on host "RAC-NODE1": connection restored
     14975:20210730:053557.316 Zabbix agent item "oracle.status_online.process" on host "ORAC-NODE2-" failed: first network error, wait for 15 seconds
     14980:20210730:053649.374 resuming Zabbix agent checks on host "ORAC-NODE2": connection restored
     14962:20210730:054521.587 item "ORAC-NODE1:oracle.status_offline.process" became not supported: Timeout while executing a shell script.
     14975:20210730:055156.431 Zabbix agent item "oracle.status_offline.process" on host "ORAC-NODE2" failed: first network error, wait for 15 seconds
     14980:20210730:055248.282 resuming Zabbix agent checks on host "ORAC-NODE2": connection restored
     14963:20210730:055347.878 item "ORAC-NODE1:oracle.status_offline.process" became supported
     14975:20210730:055358.595 Zabbix agent item "oracle.status_offline.process" on host "ORAC-NODE2" failed: first network error, wait for 15 seconds
     14980:20210730:055446.042 resuming Zabbix agent checks on host "ORAC-NODE2": connection restored
     14965:20210730:055518.943 item "ORAC-NODE1:oracle.status_online.process" became not supported: Timeout while executing a shell script.
     14964:20210730:055519.942 item "ORAC-NODE1:oracle.status_offline.process" became not supported: Timeout while executing a shell scrip

     

     有时候图形还会断

    分析:

    1 分析oracle错误日志

    [grid@rac2 trace]$ pwd

    /u01/app/grid/diag/asm/+asm/+ASM2/trace

    [grid@rac2 trace]$ tail -n 100 alert_+ASM2.log

    [oracle@rac2 trace]$ pwd

    /u01/app/oracle/diag/rdbms/*/*2/trace

    [oracle@rac2 trace]$ tail -n 100 alert_*.log

    没有发现有关异常

    2 查看服务器性能信息

    [root@rac2 ~]# iostat -x 1
    avg-cpu:  %user   %nice %system %iowait  %steal   %idle
               5.76    0.00    0.53    1.22    0.00   92.49
    
    Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
    sda               0.00    19.00    0.00    6.00     0.00   200.00    33.33     0.00    0.00    0.00    0.00   0.00   0.00
    up-0              0.00     0.00    2.00    1.00     2.00     1.00     1.00     0.00    0.00    0.00    0.00   0.00   0.00
    up-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
    up-2              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
    up-3              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
    up-4            237.00     0.00  562.00    1.00 203376.00     5.00   361.25     0.31    0.55    0.55    1.00   0.31  17.70
    up-5              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
    up-6              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
    up-7              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
    up-8              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
    sdb               0.00     0.00    2.00    1.00     2.00     1.00     1.00     0.00    0.00    0.00    0.00   0.00   0.00
    sdc               0.00     0.00    2.00    1.00     2.00     1.00     1.00     0.00    0.33    0.50    0.00   0.33   0.10
    sdd               0.00     0.00    2.00    1.00     2.00     1.00     1.00     0.00    0.33    0.50    0.00   0.33   0.10
    sde               0.00     0.00 1035.00    3.00 385232.00    39.00   371.17     0.55    0.58    0.57    1.67   0.31  32.40
    sdf               0.00     0.00 1082.00    3.00 396736.00    39.00   365.69     0.59    0.60    0.60    1.00   0.32  34.30
    sdg               0.00     0.00 1049.00    1.00 384224.00    32.00   365.96     0.55    0.56    0.56    1.00   0.32  33.30
    sdh               0.00     0.00    0.00    3.00     0.00    39.00    13.00     0.00    2.00    0.00    2.00   0.33   0.10
    sdi               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
    sdj               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
    up-10             0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
    up-12           233.00     0.00  490.00    3.00 184336.00    39.00   373.99     0.27    0.56    0.55    1.67   0.32  16.00
    up-14           237.00     0.00  563.00    0.00 203744.00     0.00   361.89     0.31    0.55    0.55    0.00   0.33  18.70
    up-16             0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
    up-19             0.00     0.00    2.00    1.00     2.00     1.00     1.00     0.00    0.33    0.50    0.00   0.33   0.10
    up-21           244.00     0.00  545.00    0.00 200896.00     0.00   368.62     0.33    0.60    0.60    0.00   0.34  18.40
    up-23           225.00     0.00  486.00    1.00 180480.00    32.00   370.66     0.29    0.59    0.59    1.00   0.35  17.20
    up-25             0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
    up-27             0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
    up-29             0.00     0.00    2.00    1.00     2.00     1.00     1.00     0.00    0.33    0.50    0.00   0.33   0.10
    up-31           242.00     0.00  520.00    2.00 193360.00    34.00   370.49     0.34    0.65    0.65    1.00   0.37  19.30
    up-33             0.00     0.00    0.00    3.00     0.00    39.00    13.00     0.01    2.00    0.00    2.00   2.00   0.60
    up-35             0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
    dm-0              0.00     0.00    0.00   25.00     0.00   200.00     8.00     0.00    0.00    0.00    0.00   0.00   0.00
    dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
    dm-2              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
    [root@rac2 ~]# pidstat -d 1
    Linux 2.6.32-642.el6.x86_64 (rac2)     07/30/2021     _x86_64_    (32 CPU)
    
    08:45:30 PM       PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
    08:45:31 PM      7829 137292.31      0.00      0.00  oracle
    08:45:31 PM     11890      2.88      0.00      0.00  ocssd.bin
    08:45:31 PM     12525     30.77      0.00      0.00  oracle
    08:45:31 PM     16723     46.15      0.00      0.00  oracle
    
    08:45:31 PM       PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
    08:45:32 PM      7829 122400.00      0.00      0.00  oracle
    08:45:32 PM     11890    836.00      4.00      0.00  ocssd.bin
    08:45:32 PM     12089      0.00     32.00      0.00  ologgerd
    08:45:32 PM     12385      0.00      4.00      0.00  orarootagent.bi
    08:45:32 PM     12525     48.00      0.00      0.00  oracle
    08:45:32 PM     16723     32.00      0.00      0.00  oracle

    3 生成并查看Oracle awr报表

    [oracle@rac1 admin]$ ll /u01/app/oracle/product/11.2.0/db_1/rdbms/admin/awrgrpti.sql

    -rw-r--r-- 1 oracle oinstall 6444 Jul 24  2011 /u01/app/oracle/product/11.2.0/db_1/rdbms/admin/awrgrpti.sql

    4 CRS-4700: The Cluster Time Synchronization Service is in Observer mode.

    处理:

    [grid@rac1 ~]$ cat /opt/synctime.sh 
    #!/bin/bash
    ntpdate ***
    hwclock -w
    [grid@rac1 ~]$ cluvfy comp clocksync -verbose
    
    Verifying Clock Synchronization across the cluster nodes 
    
    Checking if Clusterware is installed on all nodes...
    Check of Clusterware install passed
    
    Checking if CTSS Resource is running on all nodes...
    Check: CTSS Resource running on all nodes
      Node Name                             Status                  
      ------------------------------------  ------------------------
      rac1                                  passed                  
    Result: CTSS resource check passed
    
    
    Querying CTSS for time offset on all nodes...
    Result: Query of CTSS for time offset passed
    
    Check CTSS state started...
    Check: CTSS state
      Node Name                             State                   
      ------------------------------------  ------------------------
      rac1                                  Observer                
    CTSS is in Observer state. Switching over to clock synchronization checks using NTP
    
    
    Starting Clock synchronization checks using Network Time Protocol(NTP)...
    
    NTP Configuration file check started...
    The NTP configuration file "/etc/ntp.conf" is available on all nodes
    NTP Configuration file check passed
    
    Checking daemon liveness...
    
    Check: Liveness for "ntpd"
      Node Name                             Running?                
      ------------------------------------  ------------------------
      rac1                                  no                      
    Result: Liveness check failed for "ntpd"
    PRVF-5494 : The NTP Daemon or Service was not alive on all nodes
    PRVF-5415 : Check to see if NTP daemon or service is running failed
    Result: Clock synchronization check using Network Time Protocol(NTP) failed
    
    
    PRVF-9652 : Cluster Time Synchronization Services check failed
    
    Verification of Clock Synchronization across the cluster nodes was unsuccessful on all the specified nodes. 
    [grid@rac1 ~]$ srvctl status listener
    Listener LISTENER is enabled
    Listener LISTENER is running on node(s): rac2,rac1
    [grid@rac1 ~]$ ssh rac2 date;date
    Fri Jul 30 21:56:39 * 2021
    Fri Jul 30 21:56:39 * 2021
    [grid@rac1 ~]$  crsctl check ctss
    CRS-4700: The Cluster Time Synchronization Service is in Observer mode.
    [grid@rac2 ~]$ crsctl check ctss
    CRS-4700: The Cluster Time Synchronization Service is in Observer mode.
    [root@rac1 ~]# mv /etc/ntp.conf /etc/ntp.conf.bak
    [grid@rac1 ~]$ crsctl check ctss
    CRS-4701: The Cluster Time Synchronization Service is in Active mode.
    CRS-4702: Offset (in msec): 0
    ##节点2执行 [root@rac2 ~]# mv /etc/ntp.conf /etc/ntp.conf.bk [grid@rac2 ~]$ crsctl check ctss CRS-4701: The Cluster Time Synchronization Service is in Active mode. CRS-4702: Offset (in msec): 0 [grid@rac2 ~]$ cluvfy comp clocksync -verbose Verifying Clock Synchronization across the cluster nodes Checking if Clusterware is installed on all nodes... Check of Clusterware install passed Checking if CTSS Resource is running on all nodes... Check: CTSS Resource running on all nodes Node Name Status ------------------------------------ ------------------------ rac2 passed Result: CTSS resource check passed Querying CTSS for time offset on all nodes... Result: Query of CTSS for time offset passed Check CTSS state started... Check: CTSS state Node Name State ------------------------------------ ------------------------ rac2 Active CTSS is in Active state. Proceeding with check of clock time offsets on all nodes... Reference Time Offset Limit: 1000.0 msecs Check: Reference Time Offset Node Name Time Offset Status ------------ ------------------------ ------------------------ rac2 0.0 passed Time offset is within the specified limits on the following set of nodes: "[rac2]" Result: Check of clock time offsets passed Oracle Cluster Time Synchronization Services check passed Verification of Clock Synchronization across the cluster nodes was successful. [grid@rac1 ~]$ time srvctl status asm -a ##执行时间还是没有变化 ASM is running on rac2,rac1 ASM is enabled. real 0m32.048s user 0m20.051s sys 0m11.758s

    5 查看网络日志

    [oracle@rac2 ~]$ tail -n 100 /u01/app/grid/diag/tnslsnr/rac2/listener/alert/log.xml
     host_addr='***'>
     <txt>31-JUL-2021 10:16:55 * (CONNECT_DATA=(CID=(PROGRAM=JDBC Thin Client)(HOST=__jdbc__)(USER=root))(service_name=***)) * (ADDRESS=(PROTOCOL=tcp)(HOST=***)(PORT=49407)) * establish * *** * 0
     </txt>
    </msg>
    <msg time='2021-07-31T10:17:15.788+08:00' org_id='oracle' comp_id='tnslsnr'
     type='UNKNOWN' level='16' host_id='rac2'
     host_addr='***'>
     <txt>31-JUL-2021 10:17:15 * service_update * **** * 0
     </txt>

    6 跟踪命令执行

    [grid@rac1 ~]$ strace crs_stat -t -v
    
    getcwd("/home/grid", 4096)              = 11
    chdir("/u01/app/11.2.0/grid/log/rac1/client") = 0
    getcwd("/u01/app/11.2.0/grid/log/rac1/client", 4096) = 37
    chdir("/home/grid")                     = 0
    stat("/u01/app/11.2.0/grid/log/rac1/client/clsc105036.log", {st_mode=S_IFREG|0644, st_size=262, ...}) = 0
    stat("/u01/app/11.2.0/grid/log/rac1/client/clsc105036.log", {st_mode=S_IFREG|0644, st_size=262, ...}) = 0
    access("/u01/app/11.2.0/grid/log/rac1/client/clsc105036.log", F_OK) = 0
    statfs("/u01/app/11.2.0/grid/log/rac1/client/clsc105036.log", {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=12868767, f_bfree=5387069, f_bavail=4731709, f_files=3276800, f_ffree=2483847, f_fsid={-1532779627, -1637007972}, f_namelen=255, f_frsize=4096}) = 0
    open("/u01/app/11.2.0/grid/log/rac1/client/clsc105036.log", O_RDONLY) = 3
    close(3)                                = 0
    getcwd("/home/grid", 4096)              = 11
    chdir("/u01/app/11.2.0/grid/log/rac1/client") = 0
    getcwd("/u01/app/11.2.0/grid/log/rac1/client", 4096) = 37
    chdir("/home/grid")                     = 0
    stat("/u01/app/11.2.0/grid/log/rac1/client/clsc105037.log", {st_mode=S_IFREG|0644, st_size=262, ...}) = 0
    stat("/u01/app/11.2.0/grid/log/rac1/client/clsc105037.log", {st_mode=S_IFREG|0644, st_size=262, ...}) = 0
    access("/u01/app/11.2.0/grid/log/rac1/client/clsc105037.log", F_OK) = 0
    statfs("/u01/app/11.2.0/grid/log/rac1/client/clsc105037.log", {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=12868767, f_bfree=5387069, f_bavail=4731709, f_files=3276800, f_ffree=2483847, f_fsid={-1532779627, -1637007972}, f_namelen=255, f_frsize=4096}) = 0
    open("/u01/app/11.2.0/grid/log/rac1/client/clsc105037.log", O_RDONLY) = 3
    close(3)                                = 0
    getcwd("/home/grid", 4096)              = 11
    chdir("/u01/app/11.2.0/grid/log/rac1/client") = 0
    getcwd("/u01/app/11.2.0/grid/log/rac1/client", 4096) = 37
    chdir("/home/grid")                     = 0
    stat("/u01/app/11.2.0/grid/log/rac1/client/clsc105038.log", {st_mode=S_IFREG|0644, st_size=262, ...}) = 0
    stat("/u01/app/11.2.0/grid/log/rac1/client/clsc105038.log", {st_mode=S_IFREG|0644, st_size=262, ...}) = 0
    access("/u01/app/11.2.0/grid/log/rac1/client/clsc105038.log", F_OK) = 0
    statfs("/u01/app/11.2.0/grid/log/rac1/client/clsc105038.log", {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=12868767, f_bfree=5387069, f_bavail=4731709, f_files=3276800, f_ffree=2483847, f_fsid={-1532779627, -1637007972}, f_namelen=255, f_frsize=4096}) = 0
    open("/u01/app/11.2.0/grid/log/rac1/client/clsc105038.log", O_RDONLY) = 3
    close(3)                                = 0
    getcwd("/home/grid", 4096)              = 11
    chdir("/u01/app/11.2.0/grid/log/rac1/client") = 0
    getcwd("/u01/app/11.2.0/grid/log/rac1/client", 4096) = 37
    chdir("/home/grid")                     = 0
    stat("/u01/app/11.2.0/grid/log/rac1/client/clsc105039.log", {st_mode=S_IFREG|0644, st_size=262, ...}) = 0
    stat("/u01/app/11.2.0/grid/log/rac1/client/clsc105039.log", {st_mode=S_IFREG|0644, st_size=262, ...}) = 0
    access("/u01/app/11.2.0/grid/log/rac1/client/clsc105039.log", F_OK) = 0
    statfs("/u01/app/11.2.0/grid/log/rac1/client/clsc105039.log", {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=12868767, f_bfree=5387069, f_bavail=4731709, f_files=3276800, f_ffree=2483847, f_fsid={-1532779627, -1637007972}, f_namelen=255, f_frsize=4096}) = 0
    open("/u01/app/11.2.0/grid/log/rac1/client/clsc105039.log", O_RDONLY) = 3
    close(3)                                = 0
    getcwd("/home/grid", 4096)              = 11
    chdir("/u01/app/11.2.0/grid/log/rac1/client") = 0
    getcwd("/u01/app/11.2.0/grid/log/rac1/client", 4096) = 37
    chdir("/home/grid")                     = 0
    stat("/u01/app/11.2.0/grid/log/rac1/client/clsc105040.log", {st_mode=S_IFREG|0644, st_size=262, ...}) = 0
    stat("/u01/app/11.2.0/grid/log/rac1/client/clsc105040.log", {st_mode=S_IFREG|0644, st_size=262, ...}) = 0
    access("/u01/app/11.2.0/grid/log/rac1/client/clsc105040.log", F_OK) = 0
    statfs("/u01/app/11.2.0/grid/log/rac1/client/clsc105040.log", {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=12868767, f_bfree=5387069, f_bavail=4731709, f_files=3276800, f_ffree=2483847, f_fsid={-1532779627, -1637007972}, f_namelen=255, f_frsize=4096}) = 0
    open("/u01/app/11.2.0/grid/log/rac1/client/clsc105040.log", O_RDONLY) = 3
    close(3)                                = 0
    getcwd("/home/grid", 4096)              = 11
    chdir("/u01/app/11.2.0/grid/log/rac1/client") = 0
    getcwd("/u01/app/11.2.0/grid/log/rac1/client", 4096) = 37
    chdir("/home/grid")                     = 0
    stat("/u01/app/11.2.0/grid/log/rac1/client/clsc105041.log", {st_mode=S_IFREG|0644, st_size=262, ...}) = 0
    stat("/u01/app/11.2.0/grid/log/rac1/client/clsc105041.log", {st_mode=S_IFREG|0644, st_size=262, ...}) = 0
    access("/u01/app/11.2.0/grid/log/rac1/client/clsc105041.log", F_OK) = 0
    statfs("/u01/app/11.2.0/grid/log/rac1/client/clsc105041.log", {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=12868767, f_bfree=5387069, f_bavail=4731709, f_files=3276800, f_ffree=2483847, f_fsid={-1532779627, -1637007972}, f_namelen=255, f_frsize=4096}) = 0
    open("/u01/app/11.2.0/grid/log/rac1/client/clsc105041.log", O_RDONLY) = 3
    close(3)                                = 0
    getcwd("/home/grid", 4096)              = 11
    ^Cchdir("/u01/app
    [grid@rac1 ~]$ tail -n 100 /u01/app/11.2.0/grid/log/rac1/client/clsc105036.log
    Oracle Database 11g Clusterware Release 11.2.0.4.0 - Production Copyright 1996, 2011 Oracle. All rights reserved.
    2021-04-28 08:01:57.908: [ CRSCOMM][3588262272] NAME: `UI_DATA` length=7
    2021-04-28 08:01:57.908: [ CRSCOMM][3588262272] Successfully read response

    从上面的跟踪日志很明显就能发现问题所在

    解决:

    发现命令执行大量调用access("/u01/app/11.2.0/grid/log/rac1/client/clsc105041.log", F_OK) = 0

    [grid@rac1 client]$ ll |wc -l
    576583
    [grid@rac1 client]$ du -sh
    2.3G    .
    [grid@rac1 client]$ ll /u01/app/11.2.0/grid/log/rac1/client/clsc105037.log
    -rw-r--r-- 1 zabbix zabbix 262 Apr 28 08:02 /u01/app/11.2.0/grid/log/rac1/client/clsc105037.log
    [grid@rac1 client]$ ll clsc*.log|wc -l
    576561
    [root@rac1 client]# find -type f -mtime -1|wc -l
    2328
    [root@rac1 client]# ll clsc575437.log
    -rw-r--r-- 1 zabbix zabbix 262 Aug  1 10:16 clsc575437.log
    [root@rac1 ~]# df -i
    Filesystem             Inodes  IUsed    IFree IUse% Mounted on
    /dev/mapper/vgnode110102723-lv_root
                          3276800 793009  2483791   25% /
    tmpfs                 1000000   1024   998976    1% /dev/shm
    /dev/sda1              128016     43   127973    1% /boot
    /dev/mapper/vg_node110102723-lv_home
                         13926400     95 13926305    1% /home
    [root@rac1 client]# find -amin -20
    ./clsc576616.log
    ./clsc576613.log
    ./clsc576615.log
    ./clsc576610.log
    ./clsc576614.log
    ./clsc576609.log
    ./clsc576611.log
    ./clsc576612.log
    [root@rac1 client]# ll -h clsc576612.log
    -rw-r--r-- 1 zabbix zabbix 262 Aug  1 22:31 clsc576612.log
    [root@rac1 client]# ll clsc5766*.log |wc -l
    34
    You have mail in /var/spool/mail/root

    发现大量的clsc*.log日志,而且用户和组都是zabbix,由此怀疑zabbix监控项目写入了此文件,而且也是同监控频率一致,一分钟一个文件。

    查看另外一套正常的rac库,该目录下是没有生成如此多的文件。

    [root@rac1 client]# pwd
    /u01/app/11.2.0/grid/log/rac1/client
    [root@rac1 client]# rm -f clsc5*.log
    [root@rac1 client]# ll |wc -l
    [grid@rac1 ~]$ time crs_stat -t -v
    Name           Type           R/RA   F/FT   Target    State     Host        
    ----------------------------------------------------------------------
    ora.DATA.dg    ora....up.type 0/5    0/     ONLINE    ONLINE    rac1        
    ora.FRA.dg     ora....up.type 0/5    0/     ONLINE    ONLINE    rac1        
    ora....ER.lsnr ora....er.type 0/5    0/     ONLINE    ONLINE    rac1        
    ora....N1.lsnr ora....er.type 0/5    0/0    ONLINE    ONLINE    rac1        
    ora.OCR.dg     ora....up.type 0/5    0/     ONLINE    ONLINE    rac1        
    ora.asm        ora.asm.type   0/5    0/     ONLINE    ONLINE    rac1        
    ora.***.db     ora....se.type 0/2    0/1    OFFLINE   OFFLINE               
    ora.***.db     ora....se.type 0/2    0/1    ONLINE    ONLINE    rac1        
    ora.cvu        ora.cvu.type   0/5    0/0    ONLINE    ONLINE    rac1        
    ora.gsd        ora.gsd.type   0/5    0/     OFFLINE   OFFLINE               
    ora....network ora....rk.type 0/5    0/     ONLINE    ONLINE    rac1        
    ora.oc4j       ora.oc4j.type  0/1    0/2    ONLINE    ONLINE    rac1        
    ora.ons        ora.ons.type   0/3    0/     ONLINE    ONLINE    rac1        
    ora....SM1.asm application    0/5    0/0    ONLINE    ONLINE    rac1        
    ora....C1.lsnr application    0/5    0/0    ONLINE    ONLINE    rac1        
    ora.rac1.gsd   application    0/5    0/0    OFFLINE   OFFLINE               
    ora.rac1.ons   application    0/3    0/0    ONLINE    ONLINE    rac1        
    ora.rac1.vip   ora....t1.type 0/0    0/0    ONLINE    ONLINE    rac1        
    ora....SM2.asm application    0/5    0/0    ONLINE    ONLINE    rac2        
    ora....C2.lsnr application    0/5    0/0    ONLINE    ONLINE    rac2        
    ora.rac2.gsd   application    0/5    0/0    OFFLINE   OFFLINE               
    ora.rac2.ons   application    0/3    0/0    ONLINE    ONLINE    rac2        
    ora.rac2.vip   ora....t1.type 0/0    0/0    ONLINE    ONLINE    rac2        
    ora.scan1.vip  ora....ip.type 0/0    0/0    ONLINE    ONLINE    rac1        
    ora.***.db     ora....se.type 0/2    0/1    ONLINE    ONLINE    rac1        
    
    real    0m0.049s
    user    0m0.014s
    sys    0m0.008s

    目前,问题已经解决

    删除相关文件:clsc*.log

    再次执行命令,耗时近real 0m0.049s

    [grid@rac1 ~]$ strace -tt -T -v -o /tmp/strace_crs_20210801.log crs_stat -t -v

    再次开启zabbix监控,发现还是1分钟一个文件生成,看来还没有从根本上解决此问题。目前是解决了。如果找不到根本解决办法,先用定时任务找到该类型文件并删除吧。

    疑问:1 为什么这一套oracle rac库会存在zabbix监控oracle的item,会生成这么多文件。监控项目(1521,ora_pmon,asm.process,session_counts,等

    2 经过查看zabbix针对Oracle环境没有什么特殊的设置?

    3 oracle 环境变量或者参数有没有什么特殊设置?

  • 相关阅读:
    EasyUI--Alert()
    asp.net 页面之间传值的几种方式
    c# 的类成员
    c# protected public private internal
    C#中的多态性
    c# 静态成员和实例成员的区别
    js确认框confirm()用法实例详解
    JS中的switch case
    分分钟用上C#中的委托和事件
    Asp.net MVC中关于@Html标签Label、Editor使用
  • 原文地址:https://www.cnblogs.com/yhq1314/p/15093474.html
Copyright © 2011-2022 走看看