zoukankan      html  css  js  c++  java
  • 压测问题现象分析及解决复盘

    一、
    现象:压测过程中接口返回慢、有超时情况,后其中一台容器自动重启,重启后生成了dump目录
    分析:猜测是发生了OOM并且容器配置了OOM自动输出dump文件。得到的dump文件有thread dump,gcutil,netstat
    解决过程:
    (1)
    从jstack日志入手,依次根据deadlock、blocked关键字查找, 找到以下处于blocked状态的线程堆栈信息

    "asyncExecutor-1093" #9490 prio=5 os_prio=0 tid=0x00007f12b400f800 nid=0x25c7 waiting for monitor entry [0x00007f0d1aa03000]
       java.lang.Thread.State: BLOCKED (on object monitor)
    	at org.apache.logging.log4j.core.appender.rolling.RollingFileManager.checkRollover(RollingFileManager.java:217)
    	- waiting to lock <0x000000065804dcc8> (a org.apache.logging.log4j.core.appender.rolling.RollingRandomAcceslalaileManager)
    	at org.apache.logging.log4j.core.appender.RollingRandomAcceslalaileAppender.append(RollingRandomAcceslalaileAppender.java:207)
    	at org.apache.logging.log4j.core.config.AppenderControl.tryCallAppender(AppenderControl.java:156)
    	at org.apache.logging.log4j.core.config.AppenderControl.callAppender0(AppenderControl.java:129)
    	at org.apache.logging.log4j.core.config.AppenderControl.callAppenderPreventRecursion(AppenderControl.java:120)
    	at org.apache.logging.log4j.core.config.AppenderControl.callAppender(AppenderControl.java:84)
    	at org.apache.logging.log4j.core.config.LoggerConfig.callAppenders(LoggerConfig.java:448)
    	at org.apache.logging.log4j.core.config.LoggerConfig.processLogEvent(LoggerConfig.java:433)
    	at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:417)
    	at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:403)
    	at org.apache.logging.log4j.core.config.AwaitCompletionReliabilityStrategy.log(AwaitCompletionReliabilityStrategy.java:63)
    	at org.apache.logging.log4j.core.Logger.logMessage(Logger.java:146)
    	at org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2091)
    	at org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:2011)
    	at org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:1884)
    	at com.lala.sgs.lalalog.log4j2.Log4j2Logger.error(Log4j2Logger.java:520)
    
    "http-nio-8888-exec-4" #363 daemon prio=5 os_prio=0 tid=0x00007f0f48003000 nid=0x222 waiting for monitor entry [0x00007f11abeb8000]
       java.lang.Thread.State: BLOCKED (on object monitor)
    	at org.apache.logging.log4j.core.layout.TextEncoderHelper.copyDataToDestination(TextEncoderHelper.java:61)
    	- waiting to lock <0x000000065804dcc8> (a org.apache.logging.log4j.core.appender.rolling.RollingRandomAcceslalaileManager)
    	at org.apache.logging.log4j.core.layout.TextEncoderHelper.encodeTextWithCopy(TextEncoderHelper.java:57)
    	at org.apache.logging.log4j.core.layout.StringBuilderEncoder.encodeWithThreadLocals(StringBuilderEncoder.java:70)
    	at org.apache.logging.log4j.core.layout.StringBuilderEncoder.encode(StringBuilderEncoder.java:63)
    	at org.apache.logging.log4j.core.layout.StringBuilderEncoder.encode(StringBuilderEncoder.java:32)
    	at org.apache.logging.log4j.core.layout.PatternLayout.encode(PatternLayout.java:219)
    	at org.apache.logging.log4j.core.layout.PatternLayout.encode(PatternLayout.java:57)
    

    该线程blocked现象通过引入disruptor框架并配置asyncLogger后没有再出现blocked现象。(参考:https://www.cnblogs.com/waterystone/p/11170540.html
    (2)解决日志打印性能问题后重新压测,发现在send kafka,redis pool方面也有相应的性能问题。

    "http-nio-8888-exec-295" #1570 daemon prio=5 os_prio=0 tid=0x00007f0fa40f6000 nid=0x6d7 waiting for monitor entry [0x00007f0d839e1000]
       java.lang.Thread.State: BLOCKED (on object monitor)
    	at kafka.producer.Producer.send(Producer.scala:72)
    	- waiting to lock <0x0000000656965bf8> (a java.lang.Object)
    	at kafka.javaapi.producer.Producer.send(Producer.scala:33)
    	at com.lala.kafka.api.produce.KafkaProducer.send(KafkaProducer.java:65)
    	at com.lala.kafka.api.produce.KafkaProducer.sendBytes(KafkaProducer.java:54)
    	at com.lala.kafka.api.produce.ProducerPool.sendBytes(ProducerPool.java:49)
    	at com.lala.kafka.api.produce.BaseProducer.sendString(BaseProducer.java:14)
    

    通过增大发送kafka的线程数poolSize得以解决。

    "asyncExecutor-253" #1470 prio=5 os_prio=0 tid=0x00007f151000b000 nid=0x672 waiting on condition [0x00007f0d857da000]
       java.lang.Thread.State: WAITING (parking)
    	at sun.misc.Unsafe.park(Native Method)
    	- parking to wait for  <0x0000000659daa5c8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
    	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
    	at org.apache.commons.pool2.impl.LinkedBlockingDeque.takeFirst(LinkedBlockingDeque.java:524)
    	at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:438)
    	at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:361)
    	at redis.clients.util.Pool.getResource(Pool.java:49)
    	at redis.clients.jedis.JedisSentinelPool.getResource(JedisSentinelPool.java:209)
    	at redis.clients.jedis.JedisSentinelPool.getResource(JedisSentinelPool.java:17)
    	at org.springframework.data.redis.connection.jedis.JedisConnectionFactory.fetchJedisConnector(JedisConnectionFactory.java:194)
    	at org.springframework.data.redis.connection.jedis.JedisConnectionFactory.getConnection(JedisConnectionFactory.java:348)
    	at org.springframework.data.redis.core.RedisConnectionUtils.doGetConnection(RedisConnectionUtils.java:129)
    	at org.springframework.data.redis.core.RedisConnectionUtils.getConnection(RedisConnectionUtils.java:92)
    	at org.springframework.data.redis.core.RedisConnectionUtils.getConnection(RedisConnectionUtils.java:79)
    	at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:194)
    	at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:169)
    	at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:157)
    

    看redis的配置使用的是默认的配置,默认值可从GenericObjectPoolConfig查到。
    通过增大maxTotal、maxIdle、minIdle值得以解决。
    (3)压测还在进行中
    (4)通过gcutil查看gc的情况,发现YGC比较频繁,

      S0     S1     E      O      M     CCS    YGC     YGCT    FGC    FGCT     GCT   
     12.42   0.00  96.21  50.59  90.24  86.46   1770  201.800     8    0.950  202.750
    

    JVM配置是:

    -server -Xmx6g -Xms6g -Xmn256m -XX:PermSize=128m -Xss256k -XX:+DisableExplicitGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+UseCMSCompactAtFullCollection -XX:LargePageSizeInBytes=128m -XX:+UseFastAccessorMethods -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=70
    

    发现年轻代配得太小,于是加大了该项配置。改成如下:

    -server -Xmx6g -Xms6g -Xmn3072m -Xss512k -XX:+DisableExplicitGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+UseCMSCompactAtFullCollection -XX:LargePageSizeInBytes=128m -XX:+UseFastAccessorMethods -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=70 -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC -XX:+PrintClassHistogram -Xloggc:../logs/gc.log
    

    重新压测后gc情况有好转

      S0     S1     E      O      M     CCS    YGC     YGCT    FGC    FGCT     GCT   
     15.07   0.00  77.15  32.25  90.37  86.32    116   24.551     8    6.216   30.767
    

    dump获取方法(转):
    (1)获取thread dump

    使用 jstack
    jstack 是 JDK 自带的工具,用于 dump 指定进程 ID(PID)的 JVM 的线程堆栈信息。
    
    # 打印堆栈信息到标准输出
    jstack PID
    
    # 打印堆栈信息到标准输出,会打印关于锁的信息
    jstack -l PID
    
    强制打印堆栈信息到标准输出,如果使用 jstack PID 没有响应的情况下(此时 JVM 进程可能挂起),加 -F 参数
    jstack -F PID
    使用 jcmd
    jcmd 是 JDK 自带的工具,用于向 JVM 进程发送命令,根据命令的不同,可以代替或部分代替 jstack、jmap 等。可以发送命令 Thread.print 来打印出 JVM 的线程堆栈信息。
    
    # 下面的命令同等于 jstack PID
    jcmd PID Thread.print
    
    # 同等于 jstack -l PID
    jcmd PID Thread.print -l
    使用 kill -3
    kill 可以向特定的进程发送信号(SIGNAL),缺省情况是发送终止(TERM) 的信号 ,即 kill PID 与 kill -15 PID 或 kill -TERM PID 是等价的。JVM 进程会监听 QUIT 信号(其值为 3),当收到这个信号时,会打印出当时的线程堆栈和堆内存使用概要,相比 jstack,此时多了堆内存的使用概要情况。但 jstack 可以指定 -l 参数,打印锁的信息。
    
    kill -3 PID
    # 或
    kill -QUIT PID
    

    (2)获取heap dump

    # 将 JVM 的堆 dump 到指定文件,如果堆中对象较多,需要的时间会较长,子参数 format 只支持 b,即二进制格式
    jmap -dump:format=b,file=FILE_WITH_PATH PID
    
    # 如果 JVM 进程未响应命令,可以加上参数 -F 尝试
    jmap -F -dump:format=b,file=FILE_WITH_PATH PID
    
    # 可以只 dump 堆中的存活对象,加上 live 子参数,但使用 -F 时不支持 live
    jmap -dump:live,format=b,file=FILE_WITH_PATH PID
    

    (3)获取堆概要信息

    # -heap 参数用于查看指定 JVM 进程的堆的信息,包括堆的各个参数的值,堆中新生代、年老代的内存大小、使用率等
    jmap -heap PID
    
    # 同样,如果 JVM 进程未响应命令,可以加上参数 -F 尝试
    jmap -F -heap PID
    

    (4)gcutil、gccapacity

    jstat -gcutil PID
    jstat -gccapacity PID
    
  • 相关阅读:
    C# 从服务器下载文件
    不能使用联机NuGet 程序包
    NPOI之Excel——合并单元格、设置样式、输入公式
    jquery hover事件中 fadeIn和fadeOut 效果不能及时停止
    UVA 10519 !! Really Strange !!
    UVA 10359 Tiling
    UVA 10940 Throwing cards away II
    UVA 10079 Pizze Cutting
    UVA 763 Fibinary Numbers
    UVA 10229 Modular Fibonacci
  • 原文地址:https://www.cnblogs.com/frankwin608/p/13420221.html
Copyright © 2011-2022 走看看