zoukankan      html  css  js  c++  java
  • 【Redis连接超时】记录线上RedisConnectionFailureException异常排查过程

    项目架构:

      部分组件如下:

      SpringCloudAlibaba(Nacos+Gateway+OpenFeign)+SpringBoot2.x+Redis

    问题背景:

      最近由于用户量增大,在高峰时期,会导致用户服务偶尔Redis出现连接超时的情况,

      例如:从Redis中获取手机验证码 ,登录成功后,将token存入Redis,以及涉及到使用Redis的场景都会出现RedisConnectionFailureException

      异常日志:

    237614  2021-03-02 17:24:42.595 ERROR [d03f845825644cee8753539f24d840ad] [http-nio-7122-exec-32] c.l.c.b.e.GlobalExceptionHandler -java.net.SocketTimeoutException: Read timed out; nested exception is redis.clients.jedis.exceptions.JedisConnectionException: java.net.SocketTimeoutException: Read timed out
    237615  org.springframework.data.redis.RedisConnectionFailureException: java.net.SocketTimeoutException: Read timed out; nested exception is redis.clients.jedis.exceptions.JedisConnectionException: java.net.SocketTimeoutException: Readtimed out
    237616          at org.springframework.data.redis.connection.jedis.JedisExceptionConverter.convert(JedisExceptionConverter.java:65)
    237617          at org.springframework.data.redis.connection.jedis.JedisExceptionConverter.convert(JedisExceptionConverter.java:42)
    237618          at org.springframework.data.redis.PassThroughExceptionTranslationStrategy.translate(PassThroughExceptionTranslationStrategy.java:44)
    237619          at org.springframework.data.redis.FallbackExceptionTranslationStrategy.translate(FallbackExceptionTranslationStrategy.java:42)
    237620          at org.springframework.data.redis.connection.jedis.JedisConnection.convertJedisAccessException(JedisConnection.java:135)
    237621          at org.springframework.data.redis.connection.jedis.JedisStringCommands.convertJedisAccessException(JedisStringCommands.java:751)
    237622          at org.springframework.data.redis.connection.jedis.JedisStringCommands.get(JedisStringCommands.java:67)
    237623          at org.springframework.data.redis.connection.DefaultedRedisConnection.get(DefaultedRedisConnection.java:260)
    237624          at org.springframework.data.redis.connection.DefaultStringRedisConnection.get(DefaultStringRedisConnection.java:398)
    237625          at org.springframework.data.redis.core.DefaultValueOperations$1.inRedis(DefaultValueOperations.java:57)
    237626          at org.springframework.data.redis.core.AbstractOperations$ValueDeserializingRedisCallback.doInRedis(AbstractOperations.java:60)
    237627          at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:228)
    237628          at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:188)
    237629          at org.springframework.data.redis.core.AbstractOperations.execute(AbstractOperations.java:96)
    237630          at org.springframework.data.redis.core.DefaultValueOperations.get(DefaultValueOperations.java:53)
    237631          at com.xxxx.xxx.xxx.utils.RedisUtil.get(RedisUtil.java:242)

      Maven相关的Redis依赖:

      <!-- redis -->
            <dependency>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-starter-data-redis</artifactId>
                <exclusions>
                    <exclusion>
                        <groupId>io.lettuce</groupId>
                        <artifactId>lettuce-core</artifactId>
                    </exclusion>
                </exclusions>
            </dependency>
    
            <dependency>
                <groupId>redis.clients</groupId>
                <artifactId>jedis</artifactId>
            </dependency>

      Redis配置(单节点配置,没有做分布式部署)

    spring: 
        redis:
          pool:
          maxActive: 300
          maxIdle: 100
          maxWait: 1000
          host: xxxxxxxxx
          port: 6379
          password:
          timeout: 2000
          database: 5

    排查过程:

      这里分析可能的原因如下:

      原因1.代码中是否有keys *类似的查询,由于Redis是单线程的,数据量大,单个命令执行时间过长,导致Redis客户端请求超时,keys *类似的查询非常消耗Redis的性能;

      原因2.Redis配置文件配置的 timeout 超时时间过短,上一个请求还没有执行结束,下一个请求无法获执行,最终超时导致请求失败;

      原因3.Redis连接池配置的链接数太小,通过Prometheus 监控发现用户服务  高峰时间请求量最高为180,考虑是否是连接数太小导致无法获取Redis连接,从而失败;

      

      针对原因1:

        这边排查了项目中的代码,没有类似keys * 查询,因此排除了这个可能行

      针对原因2:

        这边在观察了在出现 RedisConnectionFailureException时候,确认当前服务器Redis连接数峰值为15,配置文件中配置的超时时间配置为2000ms,由于确认原因1中的没有非常耗时的查询

        所以这种可能行也被排除了;

      

      由于以上原因1和原因2都排除了,这里考虑原因3,是连接数的问题

      查看配置发现最大连接数是300,远大于峰值180,配置数据似乎没问题,

      于是,在开发环境测试该配置,由于项目中使用的是Jedis连接池,没有使用lettuce连接池(注意:SpringBoot2.x对应的Spring-Boot-Data-Redis依赖默认使用的连接池是lettuce,如果要使用Jedis连接池,需要排除默认连接池配置,引入Jedis连接池,见上面的Maven依赖)

      进一步追踪源码发现

      配置连接数相关的类为:

    package org.apache.commons.pool2.impl;
    
    public class GenericObjectPoolConfig<T> extends BaseObjectPoolConfig<T> {
        public static final int DEFAULT_MAX_TOTAL = 8;
        public static final int DEFAULT_MAX_IDLE = 8;
        public static final int DEFAULT_MIN_IDLE = 0;
        private int maxTotal = 8;
        private int maxIdle = 8;
        private int minIdle = 0;
    ...
    
    }

      加载该配置类的时机是在项目启动初始化连接池的时候

        

    package org.springframework.data.redis.connection.jedis;
    
    import java.time.Duration;
    import java.util.Optional;
    
    import javax.net.ssl.HostnameVerifier;
    import javax.net.ssl.SSLParameters;
    import javax.net.ssl.SSLSocketFactory;
    
    import org.apache.commons.pool2.impl.GenericObjectPoolConfig;
    import org.springframework.lang.Nullable;
    
    /**
     * Default implementation of {@literal JedisClientConfiguration}.
     *
     * @author Mark Paluch
     * @author Christoph Strobl
     * @since 2.0
     */
    class DefaultJedisClientConfiguration implements JedisClientConfiguration {
    
        private final boolean useSsl;
        private final Optional<SSLSocketFactory> sslSocketFactory;
        private final Optional<SSLParameters> sslParameters;
        private final Optional<HostnameVerifier> hostnameVerifier;
        private final boolean usePooling;
        private final Optional<GenericObjectPoolConfig> poolConfig;
        private final Optional<String> clientName;
        private final Duration readTimeout;
        private final Duration connectTimeout;
    
        DefaultJedisClientConfiguration(boolean useSsl, @Nullable SSLSocketFactory sslSocketFactory,
                @Nullable SSLParameters sslParameters, @Nullable HostnameVerifier hostnameVerifier, boolean usePooling,
                @Nullable GenericObjectPoolConfig poolConfig, @Nullable String clientName, Duration readTimeout,
                Duration connectTimeout) {
    
            this.useSsl = useSsl;
            this.sslSocketFactory = Optional.ofNullable(sslSocketFactory);
            this.sslParameters = Optional.ofNullable(sslParameters);
            this.hostnameVerifier = Optional.ofNullable(hostnameVerifier);
            this.usePooling = usePooling; 
            this.poolConfig = Optional.ofNullable(poolConfig);
            this.clientName = Optional.ofNullable(clientName);
            this.readTimeout = readTimeout;
            this.connectTimeout = connectTimeout;
        }

      Debug发现加载后仍然使用的是默认的连接数 

        public static final int DEFAULT_MAX_TOTAL = 8;
        public static final int DEFAULT_MAX_IDLE = 8;
        public static final int DEFAULT_MIN_IDLE = 0;
        private int maxTotal = 8;
        private int maxIdle = 8;
        private int minIdle = 0;

    这里可能就是问题所在,配置文件中配置的最大连接数未生效,于是发现配置中这段配置已经失效
     redis:
          pool:
          maxActive: 300
          maxIdle: 100
          maxWait: 1000
     需要改为
      redis:
          jedis:
            pool:
              maxActive: 300
              maxIdle: 100
              max-wait: 1000ms
    
    

      修改后重启生效,如配置的数据一致




  • 相关阅读:
    Android 开发技术周报 Issue#288
    Android 开发技术周报 Issue#287
    Flutter Weekly Issue 62
    Android 开发技术周报 Issue#286
    Flutter Weekly Issue 61
    Flutter Weekly Issue 60
    最新解决navigator.webdriver=true的方法
    极验反爬虫防护分析之slide验证方式下图片的处理及滑动轨迹的生成思路
    极验反爬虫防护分析之接口交互的解密方法
    极验反爬虫防护分析之接口交互的解密方法补遗
  • 原文地址:https://www.cnblogs.com/july-sunny/p/14472257.html
Copyright © 2011-2022 走看看