这两天一个项目在做压力测试的时候,发现只要并发数超过250个,连续测试两轮就会有连接异常出现,测试轮数越多出现越频繁,异常日志如下:
- Caused by: com.caucho.hessian.client.HessianConnectionException: 500: java.io.IOException: Error writing to server
- at com.caucho.hessian.client.HessianURLConnection.sendRequest(HessianURLConnection.java:142)
- at com.caucho.hessian.client.HessianProxy.sendRequest(HessianProxy.java:283)
- at com.caucho.hessian.client.HessianProxy.invoke(HessianProxy.java:170)
- at $Proxy168.sendOpenAcctInfo(Unknown Source)
- at sun.reflect.GeneratedMethodAccessor750.invoke(Unknown Source)
- at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
- at java.lang.reflect.Method.invoke(Method.java:597)
- at org.springframework.remoting.caucho.HessianClientInterceptor.invoke(HessianClientInterceptor.java:219)
- at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171)
- at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
- at $Proxy169.sendOpenAcctInfo(Unknown Source)
- at com.shine.web.bean.OpenAcctBeanImpl.sendOpenAcctInfo(OpenAcctBeanImpl.java:62)
- ... 32 more
- Caused by: java.io.IOException: Error writing to server
- at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
- at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
- at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
- at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
- at sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1345)
- at java.security.AccessController.doPrivileged(Native Method)
- at sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1339)
- at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:993)
- at com.caucho.hessian.client.HessianURLConnection.sendRequest(HessianURLConnection.java:122)
- ... 43 more
- Caused by: java.io.IOException: Error writing to server
- at sun.net.www.protocol.http.HttpURLConnection.writeRequests(HttpURLConnection.java:453)
- at sun.net.www.protocol.http.HttpURLConnection.writeRequests(HttpURLConnection.java:465)
- at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1047)
- at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:373)
- at com.caucho.hessian.client.HessianURLConnection.sendRequest(HessianURLConnection.java:109)
- ... 43 more
一开始使用Error writing to server去网络上查找原因,发现基本都无关。
后来搜索hessian和spring兼容问题,发现spring2.5.6和hessian4.0.7不兼容。将hessian版本号降低到3.1.3,情况有好转,但测试10轮之后,异常又出现了。
通过内存监控,先排除虚拟机内存问题。虚拟机内存配置为-Xms1024m -Xmx1024m,监控下来发现实际占用内存不到一半。
然后通过netstat -na监控操作系统端口占用情况,发现端口占用高峰不到500个,这个原因也排除了。(测试服务器已经修改注册表将TIME_WAIT时间降低到30秒,所以基本不会出现端口占用问题)。
通过CPU监控,确认并发高峰时,CPU占用也不到50%。
种种迹象表明,这些都不是造成连接断开的原因,那到底瓶颈出现在哪里呢?
于是我们将目光转向JBoss的配置。
首先确认数据库的连接池配置,最大连接数设置为50,从前几轮都可以正常运行来看,数据库连接应该够用;
然后确认JBoss的线程池配置,发现默认配置如下:
- <mbean code="org.jboss.util.threadpool.BasicThreadPool"
- name="jboss.system:service=ThreadPool">
- <attribute name="Name">JBoss System Threads</attribute>
- <attribute name="ThreadGroupName">System Threads</attribute>
- <!-- How long a thread will live without any tasks in MS -->
- <attribute name="KeepAliveTime">60000</attribute>
- <!-- The max number of threads in the pool -->
- <attribute name="MaximumPoolSize">10</attribute>
- <!-- The max number of tasks before the queue is full -->
- <attribute name="MaximumQueueSize">1000</attribute>
- <!-- The behavior of the pool when a task is added and the queue is full.
- abort - a RuntimeException is thrown
- run - the calling thread executes the task
- wait - the calling thread blocks until the queue has room
- discard - the task is silently discarded without being run
- discardOldest - check to see if a task is about to complete and enque
- the new task if possible, else run the task in the calling thread
- -->
- <attribute name="BlockingMode">run</attribute>
- </mbean>
搜索了一下相关配置的说明,在进行高并发的时候,建议修改MaximumPoolSize的大小为并发数的125%。
由于我们测试的不是持续的并发,因此将线程池大小修改成200先测试了一下,发现并发数在300的时候可以正常运行,又将并发数修改到500,持续测试了6个小时,均没有发现异常。
现在比较好奇的是,为什么250个并发的时候就能一直不出错,超过250个并发,就会频繁出错,这个值和MaximumPoolSize的参数到底有什么联系呢?
http://blog.csdn.net/nicholas_lin/article/details/20639481
http://wenku.baidu.com/link?url=eUQiTt73bQN_XBHVNpAhDnSMYfLdfqQXK1AF5Pp2dhTgBrO4nHaws7rEm8WZY5WVIiOEUaX5UQuuQTNCM9DrsNMjetboto1NnikLSEtzH6S