zoukankan      html  css  js  c++  java
  • 【Netty】ByteBuf.release() was not called before it's garbage-collected

    1 现象

    • 线上应用运行一段时间就发生应用重启,临时调整内存大小,降低重启频率,给定位问题和修复缺陷腾点时间,对业务使用降低影响(重启存在短时不可用状态,秒级别);
    • 线上使用过程中发现文件无法上传;文件下载没问题;业务增删改查使用正常;
    • 日志报错:
      • 提示无法申请直接内存,已超出最大可申请的直接内存;
      • 提示存在垃圾回收前ByteBuf未释放;
    2021-02-18 16:17:58.399	2021-02-18 16:17:58.399 ERROR 1 --- [oServerWorker-2] c.c.c.g.p.i.ClientToProxyConnection : (AWAITING_INITIAL) [id: 0xedf191be, L:/172.30.173.45:8080 - R:/172.30.146.1:35754]: Caught an exception on ClientToProxyConnection
    2021-02-18 16:17:58.399	io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 16777216 byte(s) of direct memory (used: 2781194413, max: 2793406464)
    2021-02-18 16:17:58.399	at io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:725) ~[netty-all-4.1.42.Final.jar:4.1.42.Final]
    
    2021-02-18 16:33:59.636	2021-02-18 16:33:59.636 ERROR 1 --- [oServerWorker-0] i.n.u.ResourceLeakDetector : LEAK: ByteBuf.release() was not called before it's garbage-collected. See https://netty.io/wiki/reference-counted-objects.html for more information.
    2021-02-18 16:33:59.636	Recent access records:
    2021-02-18 16:33:59.636	#1:
    2021-02-18 16:33:59.636	io.netty.handler.codec.http.HttpObjectAggregator$AggregatedFullHttpMessage.release(HttpObjectAggregator.java:379)
    
    2021-02-18 17:13:36.840 ERROR 55896 --- [oServerWorker-3] i.n.u.ResourceLeakDetector               : LEAK: ByteBuf.release() was not called before it's garbage-collected. See https://netty.io/wiki/reference-counted-objects.html for more information.
    Recent access records:
    Created at:
         io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:349)
         io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:187)
         io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:178)
         io.netty.buffer.CompositeByteBuf.allocBuffer(CompositeByteBuf.java:1835)
         io.netty.buffer.CompositeByteBuf.copy(CompositeByteBuf.java:1487)
         io.netty.buffer.AbstractByteBuf.copy(AbstractByteBuf.java:1209)
         io.netty.buffer.WrappedCompositeByteBuf.copy(WrappedCompositeByteBuf.java:493)
    	 io.netty.buffer.AdvancedLeakAwareCompositeByteBuf.copy(AdvancedLeakAwareCompositeByteBuf.java:681)
    io.netty.handler.codec.http.HttpObjectAggregator$AggregatedFullHttpRequest.copy(HttpObjectAggregator.java:405)
    	org.littleshoot.proxy.impl.ClientToProxyConnection.copy(ClientToProxyConnection.java:848)
    org.littleshoot.proxy.impl.ClientToProxyConnection.doReadHTTPInitial(ClientToProxyConnection.java:163)
    	org.littleshoot.proxy.impl.ClientToProxyConnection.readHTTPInitial(ClientToProxyConnection.java:140)
    	org.littleshoot.proxy.impl.ClientToProxyConnection.readHTTPInitial(ClientToProxyConnection.java:56)
    	org.littleshoot.proxy.impl.ProxyConnection.readHTTP(ProxyConnection.java:116)
    	org.littleshoot.proxy.impl.ProxyConnection.read(ProxyConnection.java:101)
    	org.littleshoot.proxy.impl.ProxyConnection.channelRead0(ProxyConnection.java:477)
    io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
    

    2 分析

    • Netty依赖于ByteBuf管理直接内存,但ByteBuf的引用由JVM管理,当ByteBuf引用置为空,但实际申请的直接内存未释放就会导致直接内存溢出;
    • 从提示中定位到org.littleshoot.proxy.impl.ClientToProxyConnection.copy方法;查询Github LittleProxy的issue,也有小伙伴遇到同样的问题,参考他的实现,不返回copy,而是直接返回original试试看。

    3 改造

    (1)添加直接内存检测日志

    int maxMemoryInKb = (int) (PlatformDependent.maxDirectMemory() / 1024);
    int used = (int) (PlatformDependent.usedDirectMemory() / 1024);
    log.info("netty_direct_memory:used({}k) max({}k)", memoryInKb, used, maxMemoryInKb);
    

    (2)ClientToProxyConnection的copy方法修复

    private HttpRequest copy(HttpRequest original) {
        if (original instanceof FullHttpRequest) {
            return original; // 注释掉 ((FullHttpRequest) original).copy();
        } else {
            HttpRequest request = new DefaultHttpRequest(original.protocolVersion(), original.method(), original.uri());
            request.headers().set(original.headers());
            return request;
        }
    }
    

    4 本地测试

    (1)本地打包应用启动

    java -jar -verbose:gc -Xloggc:/tmp/gc.log -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=3M -XX:-TraceClassUnloading -Djava.io.tmpdir=/tmp -XX:OnOutOfMemoryError=$JAVA_HOME/bin/killjava.sh  -XX:+ExitOnOutOfMemoryError -XX:+UseG1GC -Xss228K -Xmx317161K -Xms317161K -XX:MetaspaceSize=64M -XX:MaxMetaspaceSize=64M -Dio.netty.leakDetectionLevel=paranoid -Dio.netty.maxDirectMemory=67108864 gateway-app.jar > log.out
    
    • GC日志打印相关参数:-verbose:gc -Xloggc:/tmp/gc.log -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=3M -XX:-TraceClassUnloading -Djava.io.tmpdir=/tmp
    • OOM退出参数:-XX:OnOutOfMemoryError=$JAVA_HOME/bin/killjava.sh -XX:+ExitOnOutOfMemoryError
    • Netty相关参数:
      • -Dio.netty.leakDetectionLevel=paranoid 溢出检测级别
        • disabled:禁用;
        • 默认为simple:取样1%的ByteBuf是否发生泄漏,不打印详细泄露日志;
        • advanced:取样1%的ByteBuf是否发生泄漏,打印详细泄露日志;
        • paranoid:检测所有的ByteBuf是否发生泄漏,打印详细泄露日志;(正式上线需去掉,以免影响性能。)
      • -Dio.netty.maxDirectMemory=67108864 Netty最大直接内存设置
        • 参数需实际情况设置,关键指标为并发数,Netty性能再高也存在内存正常占用的情况,需要使用完后‘释放’,所以当并发50与并发100配置的大小差距很大;
        • 默认不配置时,取系统中最大可用的直接内存;
    • 分代GC相关参数:-XX:+UseG1GC -Xss228K -Xmx317161K -Xms317161K -XX:MetaspaceSize=64M -XX:MaxMetaspaceSize=64M
    • 日志输出:gateway-app.jar > log.out

    注:对溢出检测感兴趣可以看看ResourceLeakDetector、SimpleLeakAwareByteBuf、AbstractByteBufAllocator等几个类。

    (2)压测

    • 使用Jmeter工具模拟文件上传,并发数缓慢提升,避免直接撑爆最大直接内存,持续压测30分钟;
    • 使用jvisualvm或jconsole观察JVM内存状态,只会发生10多次NewGC;
    • 查看log.out日志中,搜索'ERROR'级别日志;(原先版本基本高负载运行5~10分钟便可以看到内存溢出的提示日志)
    • 查看log.out日志中,查看直接内存增长情况;

    5 测试环境

    (1)提交代码自动化部署至测试环境

    (2)压测

    • 使用Jmeter工具模拟文件上传,并发数缓慢提升,避免直接撑爆最大直接内存,持续压测30分钟;
    • 查看log.out日志,搜索'ERROR'级别日志;(原先版本基本高负载运行5~10分钟便可以看到内存溢出的提示日志);
    • 查看log.out日志中,查看直接内存增长情况;
    • jstat -gcutil pid interval:如:jstat -gcutil jvmPid 1000(1号进程每一秒输出jvm内存状态,百分比);
    • jstat -gccapacity pid interval:如:jstat -gcutil jvmPid 1000(1号进程每一秒输出jvm内存状态,字节数);
    • top 查看进程内存、CPU占用情况;

    6 生产上线

    • 观察每天的请求量,请求量与原先差不多,内存短暂飙升后趋于稳定;
    • 观察直接内存使用情况:基本稳定使用163840k/2727936k = 6%;
    • 观察异常日志情况:暂无发现异常;
  • 相关阅读:
    flex + bison multiple parsers
    Educational Codeforces Round 95 (Rated for Div. 2)
    python学习笔记 day20 序列化模块(二)
    python学习笔记 day20 常用模块(六)
    python 学习笔记 常用模块(五)
    python学习笔记 day19 常用模块(四)
    python学习笔记 day19 常用模块(三)
    python学习笔记 day19 常用模块(二)
    python学习笔记 day19 作业讲解-使用正则表达式实现计算器
    python学习笔记 day19 常用模块
  • 原文地址:https://www.cnblogs.com/linzhanfly/p/14458404.html
Copyright © 2011-2022 走看看