zoukankan      html  css  js  c++  java
  • 一次SocketException:Connection reset 异常排查

    问题描述

    上一期的需求上线之后,线上多了一个异常:Connection reset。如下:

    [2017-03-22 00:45:00 ERROR] [creativeAuditTaskScheduler_Worker-9] (cn.com.ServiceImpl:169) - getAuditResult exception, call adx api failed. msg:I/O error on GET request for "https://biz/getAuditInfo?dspId=13":Connection reset; nested exception is java.net.SocketException: Connection reset
    org.springframework.web.client.ResourceAccessException: I/O error on GET request for "https://biz/getAuditInfo?dspId=13":Connection reset; nested exception is java.net.SocketException: Connection reset
    	at org.springframework.web.client.RestTemplate.doExecute(RestTemplate.java:558)
    	at org.springframework.web.client.RestTemplate.execute(RestTemplate.java:511)
    	at org.springframework.web.client.RestTemplate.getForObject(RestTemplate.java:248)
    	at cn.com.ServiceImpl.java:136)
    	at sun.reflect.GeneratedMethodAccessor66.invoke(Unknown Source)
    	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    	at java.lang.reflect.Method.invoke(Method.java:606)
    	at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)
    	at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:190)
    	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157)
    	at org.springframework.transaction.interceptor.TransactionInterceptor$1.proceedWithInvocation(TransactionInterceptor.java:98)
    	at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:262)
    	at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:95)
    	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
    	at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:207)
    	at com.sun.proxy.$Proxy28.getAuditResult(Unknown Source)
    	at cn.com.QueryAuditResultTask.queryAuditResult(QueryAuditResultTask.java:117)
    	at cn.com.QueryAuditResultTask.execute(QueryAuditResultTask.java:88)
    	at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
    	at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573)
    Caused by: java.net.SocketException: Connection reset
    	at java.net.SocketInputStream.read(SocketInputStream.java:196)
    	at java.net.SocketInputStream.read(SocketInputStream.java:122)
    	at sun.security.ssl.InputRecord.readFully(InputRecord.java:442)
    	at sun.security.ssl.InputRecord.read(InputRecord.java:480)
    	at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:927)
    	at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:884)
    	at sun.security.ssl.AppInputStream.read(AppInputStream.java:102)
    	at org.apache.http.impl.conn.LoggingInputStream.read(LoggingInputStream.java:87)
    	at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:139)
    	at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:155)
    	at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:284)
    	at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140)
    	at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
    	at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:261)
    	at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:165)
    	at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:167)
    	at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:272)
    	at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:124)
    	at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:271)
    	at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184)
    	at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:88)
    	at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
    	at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
    	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
    	at org.springframework.http.client.HttpComponentsClientHttpRequest.executeInternal(HttpComponentsClientHttpRequest.java:87)
    	at org.springframework.http.client.AbstractBufferingClientHttpRequest.executeInternal(AbstractBufferingClientHttpRequest.java:48)
    	at org.springframework.http.client.AbstractClientHttpRequest.execute(AbstractClientHttpRequest.java:52)
    	at org.springframework.web.client.RestTemplate.doExecute(RestTemplate.java:542)
    	... 19 more
    

    这里使用Spring RestTemplate调外部接口查询结果。Spring RestTemplate 配置如下:

        <bean id="ky.pollingConnectionManager" class="org.apache.http.impl.conn.PoolingHttpClientConnectionManager">
            <!--整个连接池的并发-->
            <property name="maxTotal" value="1000" />
            <!--每个路由的并发-->
            <property name="defaultMaxPerRoute" value="32" />
        </bean>
    
        <bean id="ky.httpClientBuilder" class="org.apache.http.impl.client.HttpClientBuilder" factory-method="create">
            <property name="connectionManager" ref="ky.pollingConnectionManager" />
            <!--开启重试-->
            <property name="retryHandler">
            <!--新加的异常处理,只处理ConnectTimeoutException和UnknownHostException异常-->
            <!--上一版本使用的是org.apache.http.impl.client.DefaultHttpRequestRetryHandler-->
            <!--默认处理InterruptedIOException、UnknownHostException、ConnectException、SSLException 4种异常-->
                <bean class="cn.com.autodx.common.http.CustomRequestRetryHandler">
                    <constructor-arg value="3"/>
                    <constructor-arg value="true"/>
                    <constructor-arg value="200"/>
                </bean>
            </property>
            <!--开启keep-Alive-->
            <property name="keepAliveStrategy">
                <bean class="org.apache.http.impl.client.DefaultConnectionKeepAliveStrategy" />
            </property>
            <property name="defaultHeaders">
                <list>
                    <bean class="org.apache.http.message.BasicHeader">
                        <constructor-arg value="User-Agent"/>
                        <constructor-arg value="Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.16 Safari/537.36"/>
                    </bean>
                    <bean class="org.apache.http.message.BasicHeader">
                        <constructor-arg value="Accept-Encoding"/>
                        <constructor-arg value="gzip,deflate"/>
                    </bean>
                    <bean class="org.apache.http.message.BasicHeader">
                        <constructor-arg value="Accept-Language"/>
                        <constructor-arg value="zh-CN"/>
                    </bean>
                    <bean class="org.apache.http.message.BasicHeader">
                        <constructor-arg value="Connection"/>
                        <constructor-arg value="keep-alive"/>
                    </bean>
                </list>
            </property>
        </bean>
    
        <bean id="ky.httpClient" factory-bean="ky.httpClientBuilder" factory-method="build" />
    
        <bean id="ky.clientHttpRequestFactory" class="org.springframework.http.client.HttpComponentsClientHttpRequestFactory">
            <constructor-arg ref="ky.httpClient"/>
            <!--连接超时时间,毫秒-->
            <property name="connectTimeout" value="2000"/>
            <!--读写超时时间,毫秒-->
            <property name="readTimeout" value="5000"/>
        </bean>
    
        <bean id="restTemplate" class="org.springframework.web.client.RestTemplate">
            <constructor-arg ref="ky.clientHttpRequestFactory"/>
            <property name="errorHandler">
                <bean class="org.springframework.web.client.DefaultResponseErrorHandler"/>
            </property>
            <property name="messageConverters">
                <list>
                    <bean class="org.springframework.http.converter.FormHttpMessageConverter"/>
                    <bean class="cn.com.autodx.common.jsonView.ViewAwareJsonMessageConverter"/>
                    <bean class="org.springframework.http.converter.StringHttpMessageConverter">
                        <property name="supportedMediaTypes">
                            <list>
                                <value>text/html;charset=UTF-8</value>
                                <value>application/json</value>
                            </list>
                        </property>
                    </bean>
                </list>
            </property>
        </bean>
    

    本次需求,并没有修改逻辑,为什么会出现这种情况呢?只是网络关系,还是跟代码有关呢。我有几个疑问:

    1. 什么情况下会产生Connection reset?
    2. 长连接中,向server发请求,是先发送数据的,如果连接断开,应该是写数据异常,为什么是读数据异常呢?请求是否发送成功?发送之前有校验连接是否可用吗?
    3. http连接池defaultMaxPerRoute什么意思?每个并发都建立一个长连接吗?
    4. Connection reset之后,如何重新建立连接,继而继续进行业务交互?
    5. RestTemplate中配置了重试,为什么没有重新发起连接?

    我们便来解答上面的问题。

    问题1,什么情况下会产生Connection reset?

    网上搜一下,很多这样的打包附送的答案,如下:

    第1个异常是java.net.BindException:Address already in use: JVM_Bind。该异常发生在服务器端进行new ServerSocket(port)(port是一个0,65536的整型值)操作时。异常的原因是以为与port一样的一个端口已经被启动,并进行监听。此时用netstat –an命令,可以看到一个Listending状态的端口。只需要找一个没有被占用的端口就能解决这个问题。

    第2个异常是java.net.ConnectException: Connection refused: connect。该异常发生在客户端进行new Socket(ip, port)操作时,该异常发生的原因是或者具有ip地址的机器不能找到(也就是说从当前机器不存在到指定ip路由),或者是该ip存在,但找不到指定的端口进行监听。出现该问题,首先检查客户端的ip和port是否写错了,如果正确则从客户端ping一下服务器看是否能ping通,如果能ping通(服务服务器端把ping禁掉则需要另外的办法),则看在服务器端的监听指定端口的程序是否启动,这个肯定能解决这个问题。

    第3个异常是java.net.SocketException: Socket is closed,该异常在客户端和服务器均可能发生。异常的原因是己方主动关闭了连接后(调用了Socket的close方法)再对网络连接进行读写操作。

    第4个异常是java.net.SocketException: (Connection reset或者Connect reset by peer:Socket write error)。该异常在客户端和服务器端均有可能发生,引起该异常的原因有两个,第一个就是如果一端的Socket被关闭(或主动关闭或者因为异常退出而引起的关闭),另一端仍发送数据,发送的第一个数据包引发该异常(Connect reset by peer)。另一个是一端退出,但退出时并未关闭该连接,另一端如果在从连接中读数据则抛出该异常(Connection reset)。简单的说就是在连接断开后的读和写操作引起的。

    第5个异常是java.net.SocketException: Broken pipe。该异常在客户端和服务器均有可能发生。在第4个异常的第一种情况中(也就是抛出SocketExcepton:Connect reset by peer:Socket write error后),如果再继续写数据则抛出该异常。前两个异常的解决方法是首先确保程序退出前关闭所有的网络连接,其次是要检测对方的关闭连接操作,发现对方关闭连接后自己也要关闭该连接。

    这里我们关心的是第四个异常,即server已经关闭了连接,client仍然在从连接中读数据。

    细节剖析

    正常流程(成功日志)剖析

    接下来,先逐步debug,分析调用成功的日志。

    org.springframework.http.client.support.HttpAccessor.createRequest(URI url, HttpMethod method)

    (org.springframework.web.client.RestTemplate:?) - Created GET request for "https://domainName/creative/getAuditInfo"
    (org.springframework.web.client.RestTemplate:?) - Setting request Accept header to [application/json, application/*+json, text/html, application/json]
    (org.apache.http.client.protocol.RequestAddCookies:?) - CookieSpec selected: default
    (org.apache.http.client.protocol.RequestAuthCache:?) - Auth cache not set in the context
    

    org.apache.http.impl.execchain.MainClientExec.execute

    1. 获取ConnectionRequest connManager.requestConnection(route, userToken);
      调用org.apache.http.impl.conn.PoolingHttpClientConnectionManager.requestConnection
    (org.apache.http.impl.conn.PoolingHttpClientConnectionManager:?) - Connection request: [route: {s}->https://domainName:443][total kept alive: 1; route allocated: 1 of 32; total allocated: 1 of 1000]
    
    1. 根据ConnectionRequest,获取HttpClientConnection [调用 org.apache.http.impl.conn.PoolingHttpClientConnectionManager.leaseConnection]
      获取连接后,校验连接不可用,则关闭连接connection。后面判断connection没有打开,就会重新建立连接
    (org.apache.http.wire:?) - http-outgoing-798 << "end of stream" 
     //isStale返回true     
    (org.apache.http.impl.conn.DefaultManagedHttpClientConnection:?) - http-outgoing-798: Close connection    
    (org.apache.http.impl.conn.PoolingHttpClientConnectionManager:?) - Connection leased: [id: 799][route: {s}->https://domainName:443][total kept alive: 0; route allocated: 1 of 32; total allocated: 1 of 1000]
    

    3.如果managedConn没有打开, 则建立路由 establishRoute

    (org.apache.http.impl.execchain.MainClientExec:?) - Opening connection {s}->https://domainName:443
    

    调用 org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect
    调用 org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect 遍历地址集,新建socket,并与connection绑定,成功即返回。 以下是建立连接的过程

    (org.apache.http.impl.conn.DefaultHttpClientConnectionOperator:?) - Connecting to domainName/0.0.0.0:443
    

    调用 sock = sf.connectSocket(connectTimeout, sock, host, remoteAddress, localAddress, context); //建立socket
    conn.bind(sock); //绑定socket

    (org.apache.http.conn.ssl.SSLConnectionSocketFactory:?) - Connecting socket to domainName/0.0.0.0:443 with timeout 2000
    (org.apache.http.conn.ssl.SSLConnectionSocketFactory:?) - Enabled protocols: [TLSv1]
    (org.apache.http.conn.ssl.SSLConnectionSocketFactory:?) - Enabled cipher suites:[]  //session.getCipherSuite()
    

    调用 org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket sslsock.startHandshake();

    (org.apache.http.conn.ssl.SSLConnectionSocketFactory:?) - Starting handshake
    

    调用 org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket.verifyHostname(sslsock, host.getHostName()); session可用,打印信息

    (org.apache.http.conn.ssl.SSLConnectionSocketFactory:?) - Secure session established
    (org.apache.http.conn.ssl.SSLConnectionSocketFactory:?) -  negotiated protocol: TLSv1  
    (org.apache.http.conn.ssl.SSLConnectionSocketFactory:?) -  negotiated cipher suite: TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA
    (org.apache.http.conn.ssl.SSLConnectionSocketFactory:?) -  peer principal: CN=*.home.com.cn, OU=IT, O="Beijing Autohome Information Technology Co., Ltd.", L=Beijing, ST=Beijing, C=CN
    (org.apache.http.conn.ssl.SSLConnectionSocketFactory:?) -  issuer principal: CN=Symantec Class 3 Secure Server CA - G4, OU=Symantec Trust Network, O=Symantec Corporation, C=US
    

    调用 org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect

    (org.apache.http.impl.conn.DefaultHttpClientConnectionOperator:?) - Connection established localhost:4357<->0.0.0.0:443
    
    1. 设置socket超时时间 managedConn.setSocketTimeout(timeout);
    (org.apache.http.impl.conn.DefaultManagedHttpClientConnection:?) - http-outgoing-799: set socket timeout to 5000
    
    1. 开始处理请求 根据request头部参数,作相应操作
    (org.apache.http.impl.execchain.MainClientExec:?) - Executing request GET /creative/getAuditInfo HTTP/1.1
    (org.apache.http.impl.execchain.MainClientExec:?) - Target auth state: UNCHALLENGED
    (org.apache.http.impl.execchain.MainClientExec:?) - Proxy auth state: UNCHALLENGED
    
    1. 处理请求 requestExecutor.execute(request, managedConn, context);
      调用:org.apache.http.protocol.HttpRequestExecutor.execute
      主要功能:
      HttpResponse response = doSendRequest(request, conn, context);
      if (response == null) {
          response = doReceiveResponse(request, conn, context);
      }
    

    6.1 doSendRequest
    6.1.1 发送请求头部
    调用org.apache.http.impl.DefaultBHttpClientConnection.sendRequestHeader (确保socket有效,然后向缓存写请求头部,写完后打印以下日志)

    (org.apache.http.headers:?) - http-outgoing-799 >> GET /creative/getAuditInfo HTTP/1.1
    (org.apache.http.headers:?) - http-outgoing-799 >> Accept: application/json, application/*+json, text/html, application/json
    (org.apache.http.headers:?) - http-outgoing-799 >> User-Agent: Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.16 Safari/537.36
    (org.apache.http.headers:?) - http-outgoing-799 >> Accept-Encoding: gzip,deflate
    (org.apache.http.headers:?) - http-outgoing-799 >> Accept-Language: zh-CN
    (org.apache.http.headers:?) - http-outgoing-799 >> Connection: keep-alive
    (org.apache.http.headers:?) - http-outgoing-799 >> Host: domainName
    

    6.1.2 发送请求头部之后,如果请求带有entity,则继续发送entity,即conn.sendRequestEntity。这里对HTTP 1.0协议做了兼容判断

    6.1.3 通过connection将所有缓存数据发送到服务端,并记录日志如下: (headers和wire的日志稍有不同。)

    (org.apache.http.wire:?) - http-outgoing-799 >> "GET /creative/getAuditInfo HTTP/1.1[
    ][
    ]"
    (org.apache.http.wire:?) - http-outgoing-799 >> "Accept: application/json, application/*+json, text/html, application/json[
    ][
    ]"
    (org.apache.http.wire:?) - http-outgoing-799 >> "User-Agent: Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.16 Safari/537.36[
    ][
    ]"
    (org.apache.http.wire:?) - http-outgoing-799 >> "Accept-Encoding: gzip,deflate[
    ][
    ]"
    (org.apache.http.wire:?) - http-outgoing-799 >> "Accept-Language: zh-CN[
    ][
    ]"
    (org.apache.http.wire:?) - http-outgoing-799 >> "Connection: keep-alive[
    ][
    ]"
    (org.apache.http.wire:?) - http-outgoing-799 >> "Host: domainName[
    ][
    ]"
    (org.apache.http.wire:?) - http-outgoing-799 >> "[
    ][
    ]"
    

    6.2 doReceiveResponse 如果response为空,则接收服务端的响应

     response = conn.receiveResponseHeader();
     if (canResponseHaveBody(request, response)) {
         conn.receiveResponseEntity(response);
     }
    

    6.2.1 接收响应头 receiveResponseHeader
    --- 读入数据,头部和数据体,第一次读取内容【这里数据包读了两次】 << 表示接收数据,从输入流读入

    (org.apache.http.wire:?) - http-outgoing-799 << "HTTP/1.1 200 OK[
    ][
    ]"
    (org.apache.http.wire:?) - http-outgoing-799 << "Server: 10.29[
    ][
    ]"
    (org.apache.http.wire:?) - http-outgoing-799 << "Date: Wed, 22 Mar 2017 01:50:00 GMT[
    ][
    ]"
    (org.apache.http.wire:?) - http-outgoing-799 << "Content-Type: application/json;charset=UTF-8[
    ][
    ]"
    (org.apache.http.wire:?) - http-outgoing-799 << "Transfer-Encoding: chunked[
    ][
    ]"
    (org.apache.http.wire:?) - http-outgoing-799 << "Connection: keep-alive[
    ][
    ]"
    (org.apache.http.wire:?) - http-outgoing-799 << "Pragma: no-cache[
    ][
    ]"
    (org.apache.http.wire:?) - http-outgoing-799 << "Cache-Control: no-cache, no-store, max-age=0[
    ][
    ]"
    (org.apache.http.wire:?) - http-outgoing-799 << "Expires: Thu, 01 Jan 1970 00:00:00 GMT[
    ][
    ]"
    (org.apache.http.wire:?) - http-outgoing-799 << "Content-Language: zh-CN[
    ][
    ]"
    (org.apache.http.wire:?) - http-outgoing-799 << "[
    ][
    ]"
    (org.apache.http.wire:?) - http-outgoing-799 << "6f7[
    ][
    ]"
    (org.apache.http.wire:?) - http-outgoing-799 << "{"data":{......... data1
    

    --- 接收完之后,打印日志

    (org.apache.http.headers:?) - http-outgoing-799 << HTTP/1.1 200 OK
    (org.apache.http.headers:?) - http-outgoing-799 << Server: 10.29
    (org.apache.http.headers:?) - http-outgoing-799 << Date: Wed, 22 Mar 2017 01:50:00 GMT
    (org.apache.http.headers:?) - http-outgoing-799 << Content-Type: application/json;charset=UTF-8
    (org.apache.http.headers:?) - http-outgoing-799 << Transfer-Encoding: chunked
    (org.apache.http.headers:?) - http-outgoing-799 << Connection: keep-alive
    (org.apache.http.headers:?) - http-outgoing-799 << Pragma: no-cache
    (org.apache.http.headers:?) - http-outgoing-799 << Cache-Control: no-cache, no-store, max-age=0
    (org.apache.http.headers:?) - http-outgoing-799 << Expires: Thu, 01 Jan 1970 00:00:00 GMT
    (org.apache.http.headers:?) - http-outgoing-799 << Content-Language: zh-CN
    

    6.2.2 如果响应含有消息体,则接收消息体

    1. 检查reuse策略和keepAlive策略,设置connection属性。 这里永久保留
      (org.apache.http.impl.execchain.MainClientExec:?) - Connection can be kept alive indefinitely

    org.springframework.web.client.RestTemplate

    1. 处理请求response = request.execute(); 日志如上
      2.如果响应没有出错,则打印以下日志
    (org.springframework.web.client.RestTemplate:?) - GET request for "https://domainName/creative/getAuditInfo?" resulted in 200 (OK)
    
    1. 读取扩展字段 extractData

    3.1使用HttpMessageConverter读取并解析数据
    调用org.springframework.http.converter.GenericHttpMessageConverter.read方法,从输入流中读取数据

    (org.springframework.web.client.RestTemplate:?) - Reading [java.lang.String] as "application/json;charset=UTF-8" using [cn.com.autodx.common.jsonView.ViewAwareJsonMessageConverter@56a00a64]
    

    3.2如果上一次的数据没有读全,这里会接着从输入流读取数据

    (org.apache.http.wire:?) - http-outgoing-799 << "......  data2"
    (org.apache.http.wire:?) - http-outgoing-799 << "[
    ][
    ]"
    (org.apache.http.wire:?) - http-outgoing-799 << "0[
    ][
    ]"
    (org.apache.http.wire:?) - http-outgoing-799 << "[
    ][
    ]"
    

    3.3 数据读取完毕,就释放连接
    调用org.apache.http.impl.conn.PoolingHttpClientConnectionManager.releaseConnection

    (org.apache.http.impl.conn.PoolingHttpClientConnectionManager:?) - Connection [id: 799][route: {s}->https://domainName:443] can be kept alive indefinitely
    (org.apache.http.impl.conn.PoolingHttpClientConnectionManager:?) - Connection released: [id: 799][route: {s}->https://domainName:443][total kept alive: 1; route allocated: 1 of 32; total allocated: 1 of 1000]
    

    RestTemplate调用完毕

    (cn.com.autodx.creative.task.service.autohomeAdx.AutoHomeAdxServiceImpl:?) - getAuditResult .....
    

    总结一下流程:

    • 创建连接请求
    • 根据连接请求的参数,从连接池中获取一个连接
    • 如果连接没有打开,则创建一个底层的socket连接。
    • 设置socket超时时间
    • 发送请求头部(如果请求中带有entity,则发送)
    • 接收响应(先接收头部,如果有主体,则接收)
    • 读取扩展数据(使用HttpMessageConverter读取并解析数据,读取完成后,关闭输入流及释放连接池中的连接)
    • 调用完毕,返回数据

    本次异常日志剖析

    接下来,查看失败日志:
    org.springframework.http.client.support.HttpAccessor.createRequest(URI url, HttpMethod method)

     (org.springframework.web.client.RestTemplate:?) - Created GET request for "https://domain/creative/getAuditInfo"
     (org.springframework.web.client.RestTemplate:?) - Setting request Accept header to [application/json, application/*+json, text/html, application/json]
     (org.apache.http.client.protocol.RequestAddCookies:?) - CookieSpec selected: default
     (org.apache.http.client.protocol.RequestAuthCache:?) - Auth cache not set in the context
    

    org.apache.http.impl.execchain.MainClientExec.execute

    1. 获取ConnectionRequest connManager.requestConnection(route, userToken);
      调用org.apache.http.impl.conn.PoolingHttpClientConnectionManager.requestConnection
     (org.apache.http.impl.conn.PoolingHttpClientConnectionManager:?) - Connection request: [route: {s}->https://domain:443][total kept alive: 1; route allocated: 1 of 32; total allocated: 1 of 1000]
    

    2.根据ConnectionRequest,获取HttpClientConnection
    调用 org.apache.http.impl.conn.PoolingHttpClientConnectionManager.leaseConnection
    其间获取entry时,校验connection().isStale()。

        public boolean isStale() {
            if (!isOpen()) {
                return true;
            }
            try {
            	//如果测试结果返回-1说明不可用
                final int bytesRead = fillInputBuffer(1);
                return bytesRead < 0;
            } catch (final SocketTimeoutException ex) {
                //注意这里SocketTimeoutException时,认为是可用的
                return false;
            } catch (final IOException ex) {
                //有I/O异常,不可用
                return true;
            }
        }
    

    这里读超时,返回连接可用。于是后面,就没有关闭连接。也就没有重新建立新连接。具体参见这篇文章

     (org.apache.http.wire:?) - http-outgoing-766 << "[read] I/O error: Read timed out"
     (org.apache.http.impl.conn.PoolingHttpClientConnectionManager:?) - Connection leased: [id: 766][route: {s}->https://domain:443][total kept alive: 0; route allocated: 1 of 32; total allocated: 1 of 1000]
    

    设置socket超时时间 managedConn.setSocketTimeout(timeout);

     (org.apache.http.impl.conn.DefaultManagedHttpClientConnection:?) - http-outgoing-766: set socket timeout to 5000
    

    处理请求,并发送请求数据。处理过程同成功日志中的流程。

     (org.apache.http.impl.execchain.MainClientExec:?) - Executing request GET /creative/getAuditInfo HTTP/1.1
     (org.apache.http.impl.execchain.MainClientExec:?) - Target auth state: UNCHALLENGED
     (org.apache.http.impl.execchain.MainClientExec:?) - Proxy auth state: UNCHALLENGED
     (org.apache.http.headers:?) - http-outgoing-766 >> GET /creative/getAuditInfo HTTP/1.1
     (org.apache.http.headers:?) - http-outgoing-766 >> Accept: application/json, application/*+json, text/html, application/json
     (org.apache.http.headers:?) - http-outgoing-766 >> User-Agent: Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.16 Safari/537.36
     (org.apache.http.headers:?) - http-outgoing-766 >> Accept-Encoding: gzip,deflate
     (org.apache.http.headers:?) - http-outgoing-766 >> Accept-Language: zh-CN
     (org.apache.http.headers:?) - http-outgoing-766 >> Connection: keep-alive
     (org.apache.http.headers:?) - http-outgoing-766 >> Host: domain
     (org.apache.http.wire:?) - http-outgoing-766 >> "GET /creative/getAuditInfo HTTP/1.1[
    ][
    ]"
     (org.apache.http.wire:?) - http-outgoing-766 >> "Accept: application/json, application/*+json, text/html, application/json[
    ][
    ]"
     (org.apache.http.wire:?) - http-outgoing-766 >> "User-Agent: Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.16 Safari/537.36[
    ][
    ]"
     (org.apache.http.wire:?) - http-outgoing-766 >> "Accept-Encoding: gzip,deflate[
    ][
    ]"
     (org.apache.http.wire:?) - http-outgoing-766 >> "Accept-Language: zh-CN[
    ][
    ]"
     (org.apache.http.wire:?) - http-outgoing-766 >> "Connection: keep-alive[
    ][
    ]"
     (org.apache.http.wire:?) - http-outgoing-766 >> "Host: domain[
    ][
    ]"
     (org.apache.http.wire:?) - http-outgoing-766 >> "[
    ][
    ]"
    

    接收相应数据时,出现I/O异常,关闭连接,并向上抛出异常。

     (org.apache.http.wire:?) - http-outgoing-766 << "[read] I/O error: Connection reset"
     (org.apache.http.impl.conn.DefaultManagedHttpClientConnection:?) - http-outgoing-766: Close connection
     (org.apache.http.impl.conn.DefaultManagedHttpClientConnection:?) - http-outgoing-766: Shutdown connection
     (org.apache.http.impl.execchain.MainClientExec:?) - Connection discarded
     (org.apache.http.impl.conn.PoolingHttpClientConnectionManager:?) - Connection released: [id: 766][route: {s}->https://domain:443][total kept alive: 0; route allocated: 0 of 32; total allocated: 0 of 1000]
     (cn.com.autodx.creative.task.service.autohomeAdx.AutoHomeAdxServiceImpl:169) - getAuditResult exception, call adx api failed. msg:I/O error on GET request for "https://domain/creative/getAuditInfo":Connection reset; nested exception is java.net.SocketException: Connection reset
    

    对比失败与成功的日志,发现问题在于获取connection的时候,校验connection是否可用的操作上。此时服务器因为不可知的原因断开了连接(服务端不可以向客户端发数据),这里应该是没有按照正常流程进行四次挥手,所以客户端还保持着连接(可以向服务端发数据,但收不到数据)。测试连接时,客户端读超时(必然的),但此时认为连接可用,实际上不可用(不知道这里是不是认为给的1ms探测时间太短了,允许读超时?),然后就没有重新建立连接。将错误操作延迟到读取请求这一步。

    现在可以回答前面的问题

    • 长连接中,向server发请求,是先发送数据的,如果连接断开,应该是写数据异常,为什么是读数据异常呢?请求是否发送成功?发送之前有校验连接是否可用吗?
      本次异常发生在发送完请求,读取response的时候,所以是read异常。既然服务端连接断掉,请求应该是没有发送成功。发送之前有检查连接是否可用,然而检查认为连接可用。

    • http连接池defaultMaxPerRoute什么意思?每个并发都建立一个长连接吗?
      为每一个路由建立一个连接池,连接数最大为defaultMaxPerRoute。每一个连接都是一个socket连接。如果配置为长连接,则是长连接。

    • Connection reset之后,如何重新建立连接,继而继续进行业务交互?
      上面的分析,connection reset之后,把有问题的连接关闭掉了,所以,后面不会再使用这个连接,只要重试,一般是可以成功的。

    • RestTemplate中配置了重试,为什么没有重新发起连接?
      在org.apache.http.impl.execchain.RetryExec的execute(调用MainClientExec的execute,但是被RestTemplate的doExecute调用)中,有如下重试处理:

        public CloseableHttpResponse execute(final HttpRoute route,
                final HttpRequestWrapper request, final HttpClientContext context,
                final HttpExecutionAware execAware) throws IOException, HttpException {
            ......
    
            //不停重试,停下的判断在retryHandler.retryRequest
            for (int execCount = 1;; execCount++) {
                try {
                    //执行操作
                    return this.requestExecutor.execute(route, request, context, execAware);
                } catch (final IOException ex) {
                    ......
    
                    //判断是否符合重试的条件   
                    if (retryHandler.retryRequest(ex, execCount, context)) {
                        //重试的话就打印日志
                        if (this.log.isInfoEnabled()) {
                            this.log.info("I/O exception ("+ ex.getClass().getName() +
                                    ") caught when processing request to "
                                    + route + ": " + ex.getMessage());
                        }
                        //是否可以重试
                        if (!RequestEntityProxy.isRepeatable(request)) {
                            this.log.debug("Cannot retry non-repeatable request");
                            throw new NonRepeatableRequestException("Cannot retry request " +
                                    "with a non-repeatable request entity", ex);
                        }
                        request.setHeaders(origheaders);
                    } else {
                        //不符合重试条件,就抛出异常
                        if (ex instanceof NoHttpResponseException) {
                            final NoHttpResponseException updatedex = new NoHttpResponseException(
                                    route.getTargetHost().toHostString() + " failed to respond");
                            updatedex.setStackTrace(ex.getStackTrace());
                            throw updatedex;
                        } else {
                            throw ex;
                        }
                    }
                }
            }
        }
    

    上一个版本中,没有connection reset的Error日志,这一个版本就有了。如果说新版本上线之后网络才有问题,可能性小。所以,以前和现在都会偶尔有网络问题产生。
    如果之前版本有网络问题,为什么没有Error日志呢?是否有重试呢?是否通过重试补救了呢?
    查看一下以前的info日志,发现的确有重试,也的确补救成功(没有类似的Connection reset报出)。

    //3.11
    (org.apache.http.impl.execchain.RetryExec:?) - I/O exception (java.net.SocketException) caught when processing request to {s}->https://domainName:443: Connection reset
    
    //3.12
    (org.apache.http.impl.execchain.RetryExec:?) - I/O exception (java.net.SocketException) caught when processing request to {s}->https://domainName:443: Connection reset
    

    然后查看新版本上线之后的日志,没有发现重试日志,说明SocketException能被DefaultHttpRequestRetryHandler处理,而不能被CustomRequestRetryHandler处理。在对比两者的异常处理类型,发现DefaultHttpRequestRetryHandler处理的ConnectException extends SocketException,所以能处理。因为,我们只要在CustomRequestRetryHandler中,加入SocketException或者ConnectException,就可以通过重试解决网络问题。

    至此,我们搞清楚了这个Connection reset的问题,并且找到了解决方案,开心ing~~

    补充与参考

    几个重要的类说明:

    • MainClientExec
      执行流程核心类,execute方法。这里有源码分析,还不错。

    • PoolingHttpClientConnectionManager
      用来管理connection,方法如:requestConnection(从连接池中lease一个连接,连接的状态可能是关闭的),connect(真正建立底层socket连接),releaseConnection(释放连接池中的连接)等。这里有一篇分析。

    • 如何检查httpConnection连接是否有效

    方法一,org.apache.http.HttpConnection的isStale()方法
        public boolean isStale() {
            //没有打开,即socket为空,则不可用
            if (!isOpen()) {
                return true;
            }
            try {
            //socket链路有了,测试链路是否可用
            //这里的测试方法是查看很短的时间内(这里是1ms),是否可以从输入流中读到数据
            //如果测试结果返回-1说明不可用
                final int bytesRead = fillInputBuffer(1);
                return bytesRead < 0;
            } catch (final SocketTimeoutException ex) {
                //注意这里SocketTimeoutException时,认为是可用的
                return false;
            } catch (final IOException ex) {
                //有I/O异常,不可用
                return true;
            }
        }
    
    方法二、org.apache.http.HttpClientConnection的isResponseAvailable方法

    有几种实现方法,但大都被弃用。查看org.apache.http.impl.DefaultBHttpClientConnection的实现。

        @Override
        public boolean isResponseAvailable(final int timeout) throws IOException {
            ensureOpen();
            try {
                return awaitInput(timeout);
            } catch (final SocketTimeoutException ex) {
               //这里与isStale不同,SocketTimeoutException时认为不可用
                return false;
            }
        }
    
        protected boolean awaitInput(final int timeout) throws IOException {
            if (this.inbuffer.hasBufferedData()) {
                return true;
            }
            //BHttpConnectionBase 中的fillInputBuffer方法,与isStale的测试方法一致
            fillInputBuffer(timeout);
            return this.inbuffer.hasBufferedData();
        }
    

    比较两种方法:isResponseAvailable暂时没有找到使用的地方;isStale在MainClientExec.execute中有直接使用。使用的话需要配置staleConnectionCheckEnabled,即每次请求都要去检测(最高耗时30ms)一次,看起来有一点悲观锁的意思,对性能影响比较大。所以4.4版本开始,默认值为false,且功能被标识为过时。这种方法不推荐。

    if (config.isStaleConnectionCheckEnabled()) {
        // validate connection
        if (managedConn.isOpen()) {
            this.log.debug("Stale connection check");
            if (managedConn.isStale()) {
                this.log.debug("Stale connection detected");
                managedConn.close();
            }
        }
    }
    
    方法三:官方推荐的org.apache.http.impl.conn.PoolingHttpClientConnectionManager#getValidateAfterInactivity()

    对非活动的永久连接,每个validateAfterInactivity毫秒(默认2s)做一次链路检查,尽量确保在使用的时候是可用的。为什么说尽量呢?本文就是一个例子,2s检查没有问题,但在使用之前的2s内网络出了问题,这就没有办法了。

    validateAfterInactivity的使用,在org.apache.http.pool.AbstractConnPool中,getPoolEntryBlocking方法。

    if (entry.isExpired(System.currentTimeMillis())) {
        entry.close();
    } else if (this.validateAfterInactivity > 0) {
        if (entry.getUpdated() + this.validateAfterInactivity <= System.currentTimeMillis()) {
            if (!validate(entry)) {
                entry.close();
            }
        }
    }
    

    详细解释在这里

  • 相关阅读:
    前端获取后台传输过来是数据 {张三:12} 解析为[object object],获取其中内容
    Idea 配置Jrebel热部署
    软件清单
    js实现敲回车键登录
    myql忽略大小写问题解决
    磁盘分区(2):格式化与挂载文件系统
    磁盘分区(1):fdisk和parted
    初识Docker:BusyBox容器后台运行失败
    安装Docker:解决container-selinux >= 2.9问题
    重启nginx:端口被占用问题
  • 原文地址:https://www.cnblogs.com/shoren/p/httpclient-connectionreset.html
Copyright © 2011-2022 走看看