zoukankan      html  css  js  c++  java
  • RabbitMQ 队列处理导致连接池耗尽 Tomcat 假死问题排查处理

    背景:
    监听器针对RabbitMQ队列做业务数据处理
    系统问题表现:
    业务系统无法正常使用,所有请求均不予相应,报404异常
    控制台问题表现:
    接收队列数据的logger日志打印,但是相关sql不打印(之前sql打印)
    报错异常:
    dbcp连接池(开始使用)

    [WARN ] 19:01:05.762 [SimpleAsyncTaskExecutor-1] o.h.util.JDBCExceptionReporter - SQL Error: 1040, SQLState: 08004
    [ERROR] 19:01:05.762 [SimpleAsyncTaskExecutor-1] o.h.util.JDBCExceptionReporter - Data source rejected establishment of connection,  message from server: "Too many connections"
    Caused by: org.springframework.transaction.CannotCreateTransactionException: Could not open Hibernate Session for transaction; nested exception is org.hibernate.exception.JDBCConnectionException: Cannot open connection
    Caused by: org.hibernate.exception.JDBCConnectionException: Cannot open connection
    Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: Data source rejected establishment of connection,  message from server: "Too many connections"
    
    

    druid连接池(为方便监控,后来修改采用,后续的测试分析均采用druid连接池后所得)

    [WARN ] 17:35:23.659 [SimpleAsyncTaskExecutor-1] o.h.util.JDBCExceptionReporter - SQL Error: 0, SQLState: null
    [ERROR] 17:35:23.659 [SimpleAsyncTaskExecutor-1] o.h.util.JDBCExceptionReporter - wait millis 10000, active 152, maxActive 500, creating 0
    [ERROR] 17:35:31.106 [QuartzScheduler_scheduler-LAPTOP-B40EFQQR1565686921066_ClusterManager] o.s.s.quartz.LocalDataSourceJobStore - ClusterManager: Error managing cluster: Failed to obtain DB connection from data source 'springNonTxDataSource.scheduler': com.alibaba.druid.pool.GetConnectionTimeoutException: wait millis 10000, active 152, maxActive 500, creating 0
    org.quartz.JobPersistenceException: Failed to obtain DB connection from data source 'springNonTxDataSource.scheduler': com.alibaba.druid.pool.GetConnectionTimeoutException: wait millis 10000, active 152, maxActive 500, creating 0
    Caused by: com.alibaba.druid.pool.GetConnectionTimeoutException: wait millis 10000, active 152, maxActive 500, creating 0
    com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: Data source rejected establishment of connection,  message from server: "Too many connections"
    [WARN ] 17:35:43.742 [SimpleAsyncTaskExecutor-1] o.s.a.r.l.SimpleMessageListenerContainer - Execution of Rabbit message listener failed, and no ErrorHandler has been set.
    org.springframework.amqp.rabbit.listener.ListenerExecutionFailedException: Listener threw exception
    Caused by: org.springframework.transaction.CannotCreateTransactionException: Could not open Hibernate Session for transaction; nested exception is org.hibernate.exception.GenericJDBCException: Cannot open connection
    Caused by: org.hibernate.exception.GenericJDBCException: Cannot open connection
    Caused by: com.alibaba.druid.pool.GetConnectionTimeoutException: wait millis 10000, active 152, maxActive 500, creating 0
    
    

    监听器核心代码:

    protected void doBusiness(String msg, String queueName) {
    	List<WeldingDto> list;
    	list = new LinkedList<>();
    	//校验数据合法性
    	super.validatePattern(msg, list, WeldingDto.class);
    	if (!list.isEmpty()) {
    		//保存wip表相关信息
    		WeldingDto weldingDto = list.get(0);
    		EquipDto equipDto = this.wipTempService.getEquipDto(weldingDto.getEquipCode(), null);
    		List<WorkOrder> workingOrderList = this.workOrderDao.getWorkingOrderListByEquip(equipDto.getId());
    		//注释部分业务代码,不影响问题分析
    	}
    }
    
    

    问题原因分析:
    虽然druid"活跃连接数"小于"最大连接数",但是"活跃连接数"已达到mysql数据库配置的最大连接数(SHOW STATUS LIKE 'Max_used_connections';);
    逻辑连接打开次数 > 逻辑连接关闭次数 说明:连接泄露;
    最终导致池中连接数被耗尽,池中连接数变为0;
    佐证问题测试发现:
    将队列清空,然后每丢一条数据,连接泄露数+N
    排查方案:
    采用druid连接池的超时回收机制,配置如下:

    <!-- 超过时间限制是否回收 -->  
    <property name="removeAbandoned" value="true" />  
    <!-- 超时时间,单位为秒 -->  
    <property name="removeAbandonedTimeout" value="20" />  
    <!-- 关闭abanded连接时输出错误日志 -->  
    <property name="logAbandoned" value="true" />
    

    添加以上配置后,可以在druid监视器的"数据源datasource"的"活跃连接堆栈查看ActiveConnection StackTrace"
    看到异常日志,异常日志可以看出连接占用未关闭的代码位置。
    注:该配置影响系统性能,建议排查问题时打开,系统运行时关闭。

    异常日志如下:

    java.lang.Thread.getStackTrace(Thread.java:1559)
    com.alibaba.druid.pool.DruidDataSource.getConnectionDirect(DruidDataSource.java:1209)
    com.alibaba.druid.pool.DruidDataSource.getConnection(DruidDataSource.java:1135)
    com.alibaba.druid.pool.DruidDataSource.getConnection(DruidDataSource.java:1125)
    com.alibaba.druid.pool.DruidDataSource.getConnection(DruidDataSource.java:104)
    org.springframework.orm.hibernate3.LocalDataSourceConnectionProvider.getConnection(LocalDataSourceConnectionProvider.java:83)
    org.hibernate.jdbc.ConnectionManager.openConnection(ConnectionManager.java:446)
    org.hibernate.jdbc.ConnectionManager.getConnection(ConnectionManager.java:167)
    org.hibernate.jdbc.AbstractBatcher.prepareQueryStatement(AbstractBatcher.java:161)
    org.hibernate.loader.Loader.prepareQueryStatement(Loader.java:1616)
    org.hibernate.loader.Loader.doQuery(Loader.java:717)
    org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:270)
    org.hibernate.loader.Loader.doList(Loader.java:2449)
    org.hibernate.loader.Loader.listIgnoreQueryCache(Loader.java:2192)
    org.hibernate.loader.Loader.list(Loader.java:2187)
    org.hibernate.loader.hql.QueryLoader.list(QueryLoader.java:452)
    org.hibernate.hql.ast.QueryTranslatorImpl.list(QueryTranslatorImpl.java:363)
    org.hibernate.engine.query.HQLQueryPlan.performList(HQLQueryPlan.java:196)
    org.hibernate.impl.SessionImpl.list(SessionImpl.java:1258)
    org.hibernate.impl.QueryImpl.list(QueryImpl.java:102)
    com.wisdytech.linkmes.plan.dao.impl.WorkOrderDaoImpl.getWorkingOrderListByEquip(WorkOrderDaoImpl.java:331)
    com.wisdytech.linkmes.msghandler.listeners.business.WeldingQueueListener.doBusiness(WeldingQueueListener.java:109)
    com.wisdytech.linkmes.msghandler.listeners.CommonQueueListener.onMessage(CommonQueueListener.java:70)
    org.springframework.amqp.rabbit.listener.adapter.MessageListenerAdapter.onMessage(MessageListenerAdapter.java:334)
    org.springframework.amqp.rabbit.listener.AbstractMessageListenerContainer.doInvokeListener(AbstractMessageListenerContainer.java:546)
    org.springframework.amqp.rabbit.listener.AbstractMessageListenerContainer.invokeListener(AbstractMessageListenerContainer.java:472)
    org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer.access$001(SimpleMessageListenerContainer.java:61)
    org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer$1.invokeListener(SimpleMessageListenerContainer.java:111)
    org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer.invokeListener(SimpleMessageListenerContainer.java:627)
    org.springframework.amqp.rabbit.listener.AbstractMessageListenerContainer.executeListener(AbstractMessageListenerContainer.java:454)
    org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer.doReceiveAndExecute(SimpleMessageListenerContainer.java:480)
    org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer.receiveAndExecute(SimpleMessageListenerContainer.java:464)
    org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer.access$300(SimpleMessageListenerContainer.java:61)
    org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer$AsyncMessageProcessingConsumer.run(SimpleMessageListenerContainer.java:558)
    java.lang.Thread.run(Thread.java:748)
    
    

    找到产生问题的指定代码位置后,分析代码发现getSession()创建后没有对该创建session关闭。
    解决方案1:手动关闭

    Session session = this.getSession();
    org.hibernate.Query query = session.createQuery(hql.toString());
    List<XXX> list = query.list();
    session.close();
    

    解决方案2:使用回调机制,通过spring关闭

    this.getHibernateTemplate().executeFind(new HibernateCallback<Object>() {
    	@Override
    	public Object doInHibernate(Session session) throws HibernateException {
    		org.hibernate.Query query = session.createQuery(hql.toString());//输入参数类型final
    		List<XXX> list1 = query.list();
    		return list1;
    	}
    });
    

    //下面两个方法自带回调

    this.getHibernateTemplate().find(hql.toString());
    this.getHibernateTemplate().get(XXX.class,"id");
    
    getSession() 需手动关闭
    getHibernateTemplate() 经过spring封装添加相应的声明式事务管理 spring管理

    注:建议使用getHibernateTemplate(),针对getHibernateTemplate()未提供的方法,可以使用HibernateCallback回调方法管理数据库,代码如上。
    解决方案3:加入事务
    本次问题是在监听器中调用dao层方法,dao层没有加入事务;可以在监听器最外层套一层有事务的service方法或者给dao层添加事务。

  • 相关阅读:
    kubernetes之常见故障排除(一)
    kubernetes集群管理命令(三)
    kubernetes集群管理命令(二)
    kubernetes集群管理常用命令一
    kubernetes集群管理之通过jq来截取属性
    kubernetes管理之使用yq工具截取属性
    kubectl技巧之通过jsonpath截取属性
    kubectl技巧之通过go-template截取属性
    kubernetes容器编排之定义环境变量以及通过downwardapi把pod信息作为环境变量传入容器内
    kubectl技巧之查看资源列表,资源版本和资源schema配置
  • 原文地址:https://www.cnblogs.com/chonghaojie/p/11348109.html
Copyright © 2011-2022 走看看