zoukankan      html  css  js  c++  java
  • 记一次 JVM 周期性 FGC 排查

    问题

    观察生产环境 JMX 监控, 发现应用在内存压力不大的情况下每小时会发生一次 full gc, 每次耗时约 300 毫秒. 这显然不是正常情况, 需要进行排查.

    排查过程

    首先查看 GC 日志, 发现 FGC 是由 System.gc() 触发的, 日志内容大致如下(日志内容来源于周期性Full GC的异常排查):

    [GC (System.gc())
    Desired survivor size 5767168 bytes, new threshold 1 (max 15)
     [PSYoungGen: 518231K->2816K(693760K)] 1188617K->673417K(2092032K), 0.0274540 secs] [Times: user=0.06 sys=0.00, real=0.03 secs] 
    Heap after GC invocations=194 (full 1):
     PSYoungGen      total 693760K, used 2816K [0x00000007d5500000, 0x0000000800000000, 0x0000000800000000)
      eden space 688128K, 0% used [0x00000007d5500000,0x00000007d5500000,0x00000007ff500000)
      from space 5632K, 50% used [0x00000007ff500000,0x00000007ff7c0010,0x00000007ffa80000)
      to   space 5632K, 0% used [0x00000007ffa80000,0x00000007ffa80000,0x0000000800000000)
     ParOldGen       total 1398272K, used 670601K [0x000000077ff80000, 0x00000007d5500000, 0x00000007d5500000)
      object space 1398272K, 47% used [0x000000077ff80000,0x00000007a8e627e8,0x00000007d5500000)
     PSPermGen       total 262144K, used 125738K [0x000000076ff80000, 0x000000077ff80000, 0x000000077ff80000)
      object space 262144K, 47% used [0x000000076ff80000,0x0000000777a4a938,0x000000077ff80000)
    }
    

    对项目代码进行搜索后, 排除了代码中调用 System.gc() 的可能. 结合每小时触发一次 FGC 的周期性, 怀疑是有线程在循环调用, 使用了 jstack 找出了两个可疑线程:

    "RMI RenewClean-[xxx.xxx.xxx.xxx:35539]" #323 daemon prio=5 os_prio=0 tid=0x00007f82a4022800 nid=0x144 in Object.wait() [0x00007f81d7ebf000]                                                                           
    java.lang.Thread.State: TIMED_WAITING (on object monitor)                                                                                                                                                       
    at java.lang.Object.wait(Native Method)                                                                                                                                                                    
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143)                                                                                                                                            
    - locked <0x00000007035c6ba0> (a java.lang.ref.ReferenceQueue$Lock)                                                                                                                                        
    at sun.rmi.transport.DGCClient$EndpointEntry$RenewCleanThread.run(DGCClient.java:563)
    at java.lang.Thread.run(Thread.java:748)                                                     
    
    
    "GC Daemon" #244 daemon prio=2 os_prio=0 tid=0x00007f8405f69800 nid=0xf5 in Object.wait() [0x00007f81ddb0d000]
    java.lang.Thread.State: TIMED_WAITING (on object monitor)                                                                     
    at java.lang.Object.wait(Native Method)                                                                                                                                                                    
    at sun.misc.GC$Daemon.run(GC.java:117)
    - locked <0x00000007035bddc8> (a sun.misc.GC$LatencyL
    

    检查 sun.misc.GC 和 sun.rmi.transport.DGCClient 两个类:

    DGCClient:

    private static final long gcInterval = (Long)AccessController.doPrivileged(new GetLongAction("sun.rmi.dgc.client.gcInterval", 3600000L));
    
    private static class EndpointEntry {
        public static DGCClient.EndpointEntry lookup(Endpoint var0) {
            synchronized(endpointTable) {
                DGCClient.EndpointEntry var2 = (DGCClient.EndpointEntry)endpointTable.get(var0);
                if (var2 == null) {
                    var2 = new DGCClient.EndpointEntry(var0);
                    endpointTable.put(var0, var2);
                    if (gcLatencyRequest == null) {
                        gcLatencyRequest = GC.requestLatency(DGCClient.gcInterval);
                    }
                }
    
                return var2;
            }
        }
    }
    

    GC:

    private static Thread daemon = null;
    
    public static GC.LatencyRequest requestLatency(long var0) {
        return new GC.LatencyRequest(var0);
    } 
    
    
    

    由上面的代码可知开启 RMI 会通过 DGCClient 调用 GC#requestLatency(long) 方法, 创建一个守护线程, 周期性执行 System.gc(). 这个周期为 DGCClient.gcInterval, 默认为一小时.

    另外, 项目还依赖了 cxf, cxf 默认会每 10 小时执行一次 System.gc():

    if (!skipHack("org.apache.cxf.JDKBugHacks.gcRequestLatency")) {
        Class<?> clazz = Class.forName("sun.misc.GC");
        Method method = clazz.getDeclaredMethod("currentLatencyTarget");
        Long l = (Long)method.invoke(null);
        if (l != null && l.longValue() == 0) {
            //something already set it, move on
            method = clazz.getDeclaredMethod("requestLatency",
                new Class[] {Long.TYPE});
            method.invoke(null, Long.valueOf(36000000));
        }
    }
    

    解决方法

    1. 将 sun.rmi.dgc.server.gcInterval 和 sun.rmi.dgc.client.gcInterval 参数设置为 9223372036854775807L (对这个值做了特殊处理 GC 守护线程会马上结束), 还需要一个参数 org.apache.cxf.JDKBugHacks.gcRequestLatency 设置为 false
    -Dsun.rmi.dgc.server.gcInterval=9223372036854775807L -Dsun.rmi.dgc.client.gcInterval=9223372036854775807L -Dorg.apache.cxf.JDKBugHacks.gcRequestLatency=false
    
    1. 将 sun.rmi.dgc.server.gcInterval 和 sun.rmi.dgc.client.gcInterval 设置为 9223372036854775806L, 此时守护线程会一直阻塞, 不会再触发 GC.
    2. 使用 CMS 回收器, 并开启并行回收, 以减少停顿时间
    -XX:+ExplicitGCInvokesConcurrent -XX:+UseConcMarkSweepGC
    

    参考

    gc-tuning-in-practice
    周期性Full GC的异常排查

  • 相关阅读:
    centos7.6 安装与配置 MongoDB yum方式
    MongoDB 介绍
    centos 关闭selinux
    前端 HTML标签属性
    前端 HTML 标签嵌套规则
    前端 HTML 标签分类
    前端 HTML body标签相关内容 常用标签 表单标签 form里面的 input标签介绍
    前端 HTML body标签相关内容 常用标签 表单标签 form 表单控件分类
    前端 HTML form表单标签 select标签 option 下拉框
    POJ 1426
  • 原文地址:https://www.cnblogs.com/FJH1994/p/12317544.html
Copyright © 2011-2022 走看看