zoukankan      html  css  js  c++  java
  • 曹工改bug--这次,我遇到了一个难缠的栈溢出bug,还是日志相关的,真的难

    前言

    前几天,在linux上部署一个war包应用时,tomcat直接起不来,查看tomcat的日志,catalina.out里没啥特别的,但是查看localhost日志,发现栈溢出了。

    [root@localhost logs]# vim localhost.2019-12-26.log 
    
    26-Dec-2019 16:27:31.811 INFO [localhost-startStop-1] org.apache.catalina.core.ApplicationContext.log No Spring WebApplicationInitializer types detected on classpath
    26-Dec-2019 16:27:31.855 SEVERE [localhost-startStop-1] org.apache.catalina.core.StandardContext.listenerStart Exception sending context initialized event to listener instance of class [org.springframework.web.context.ContextLoaderListener]
     java.lang.StackOverflowError
            at org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:58)
            at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:358)
            at org.apache.log4j.Category.<init>(Category.java:57)
            at org.apache.log4j.Logger.<init>(Logger.java:37)
            at org.apache.log4j.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:43)
            at org.apache.log4j.LogManager.getLogger(LogManager.java:45)
            at org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:66)
            at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:358)
            at org.apache.log4j.Category.<init>(Category.java:57)
            at org.apache.log4j.Logger.<init>(Logger.java:37)
            at org.apache.log4j.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:43)
            at org.apache.log4j.LogManager.getLogger(LogManager.java:45)
            at org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:66)
            at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:358)
            at org.apache.log4j.Category.<init>(Category.java:57)
            at org.apache.log4j.Logger.<init>(Logger.java:37)
            at org.apache.log4j.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:43)
            at org.apache.log4j.LogManager.getLogger(LogManager.java:45)
            at org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:66)
            at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:358)
            at org.apache.log4j.Category.<init>(Category.java:57)
            at org.apache.log4j.Logger.<init>(Logger.java:37)
            at org.apache.log4j.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:43)
            at org.apache.log4j.LogManager.getLogger(LogManager.java:45)
            at org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:66)
            at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:358)
            at org.apache.log4j.Category.<init>(Category.java:57)
            at org.apache.log4j.Logger.<init>(Logger.java:37)
    

    当时脑子昏得很,所幸搜索引擎上找到了解决办法,

    https://www.jb51.net/article/143486.htm

    后边呢,忙着改bug,没时间细想,但总感觉还是有点糊里糊涂的。今天彻底排查了一下,清晰多了。

    slf4j与其他日志的关系

    slf4j,通俗来说,只是一个api类型的jar包,没有定义实现;具体的日志实现框架有哪些呢,主要有log4j、logback,大家可以看看下面这个图(懒得自己画了,从前面那个链接里拿的):

    这种接口和实现的关系,在软件设计里,就是为了对上层提供统一的编程入口,比如,我们写日志,只需要使用slf4j里的接口和类,而不用直接使用log4j/logback等,方便替换;这个类似于java的spi机制,比如,java官方定义jdbc接口,各厂商实现jdbc接口,提供出自己的驱动包,比如mysql-driver、oracle-driver等。

    slf4j如何寻找自身的实现

    在spi里,java.util.ServiceLoader通过寻找当前线程类加载器路径下的META-INF/services/接口的全名来寻找实现类;

    在slf4j里,则是通过如下机制,大家可以查看下面的源码:

    https://gitee.com/ckl111/all-simple-demo-in-work/tree/master/slf4j-log4j-stackoverflow-demo/slf4j-no-binder

    这个工程里,我们只添加了如下的maven依赖:

        <dependencies>
            <dependency>
                <groupId>org.slf4j</groupId>
                <artifactId>slf4j-api</artifactId>
                <version>1.7.25</version>
            </dependency>
    
        </dependencies>
    

    测试代码也很简单:

    import org.slf4j.Logger;
    import org.slf4j.LoggerFactory;
    
    public class Test {
    
        public static void main(String[] args) {
            Logger logger = LoggerFactory.getLogger(Test.class);
            logger.info("hahha");
        }
    }
    

    output如下:

    SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
    SLF4J: Defaulting to no-operation (NOP) logger implementation
    SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
    

    只要关注第一行即可,load class org.slf4j.impl.StaticLoggerBinder失败了。这个类,就是slf4j-api和slf4j-api的实现之间的粘合剂。

    画个图:

    我图里说了,两个包内,都是有org.slf4j.impl.StaticLoggerBinder这个类的,我不骗大家,大家可以看下面这个module的源码:

    https://gitee.com/ckl111/all-simple-demo-in-work/tree/master/slf4j-log4j-stackoverflow-demo/slf4j-multi-binder

    pom依赖主要有:

    <dependencies>
            <dependency>
                <groupId>org.slf4j</groupId>
                <artifactId>slf4j-api</artifactId>
                <version>1.7.24</version>
            </dependency>
            <dependency>
                <groupId>ch.qos.logback</groupId>
                <artifactId>logback-classic</artifactId>
                <version>1.2.3</version>
                <scope>compile</scope>
            </dependency>
    
            <dependency>
                <groupId>org.slf4j</groupId>
                <artifactId>slf4j-log4j12</artifactId>
                <version>1.7.25</version>
            </dependency>
    
        </dependencies>
    

    这里面有slf4j的多个实现,logback和slf4j-log4j。同样的测试类,此时会输出:

    SLF4J: Class path contains multiple SLF4J bindings.
    //这里在logback-classic-1.2.3.jar包里找到了包名和类名全匹配的类
    SLF4J: Found binding in [jar:file:/D:/soft/Repo_backup/ch/qos/logback/logback-classic/1.2.3/logback-classic-1.2.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    //在slf4j-log4j12也找到了
    SLF4J: Found binding in [jar:file:/D:/soft/Repo_backup/org/slf4j/slf4j-log4j12/1.7.25/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [ch.qos.logback.classic.util.ContextSelectorStaticBinder]
    

    所以,slf4j-api的做法其实很原始,你晓得噻,classpath下通常有很多jar包,这里,slf4j-api就是去classpath下找全类名匹配org.slf4j.impl.StaticLoggerBinder的class文件,找到几个算几个,一个没有就报错,多了就警告,然后随便选一个(其实是看哪个class在前面)。

    核心代码:

    slf4j-api包内:org.slf4j.LoggerFactory#bind
    private final static void bind() {
            try {
            	//通过classLoader.getResources(org/slf4j/impl/StaticLoggerBinder.class)查找class文件
                Set<URL> staticLoggerBinderPathSet = findPossibleStaticLoggerBinderPathSet();
                //如果有多个,警告一下
                reportMultipleBindingAmbiguity(staticLoggerBinderPathSet);
    			...
                // StaticLoggerBinder就是前面说了半天的那个类,在slf4j-api里是不存在的,如果整个classpath下都没有,但这里又去调用其静态方法,会直接抛异常NoClassDefFoundError,被catch住;
                // 如果存在呢,就会使用classloader加载StaticLoggerBinder,但这个顺序没法保证,假设有logback-classic和slf4j-log4j两个实现,就看类加载器先加载哪个了
                StaticLoggerBinder.getSingleton();
                INITIALIZATION_STATE = SUCCESSFUL_INITIALIZATION;
                reportActualBinding(staticLoggerBinderPathSet);
                fixSubstituteLoggers();
                replayEvents();
                // release all resources in SUBST_FACTORY
                SUBST_FACTORY.clear();
            } catch (NoClassDefFoundError ncde) {
                String msg = ncde.getMessage();
                if (messageContainsOrgSlf4jImplStaticLoggerBinder(msg)) {
                    INITIALIZATION_STATE = NOP_FALLBACK_INITIALIZATION;
                    Util.report("Failed to load class "org.slf4j.impl.StaticLoggerBinder".");
                    Util.report("Defaulting to no-operation (NOP) logger implementation");
                    Util.report("See " + NO_STATICLOGGERBINDER_URL + " for further details.");
                } else {
                    failedBinding(ncde);
                    throw ncde;
                }
            }
        }
    

    大家可以看我的注释,

    StaticLoggerBinder就是前面说了半天的那个类,在slf4j-api里是不存在的,如果整个classpath下都没有,但这里又去调用其静态方法,会直接抛异常NoClassDefFoundError,被catch住;

    如果存在呢,就会使用classloader加载StaticLoggerBinder,但这个顺序没法保证,假设有logback-classic和slf4j-log4j两个实现,就看类加载器先加载哪个了

    我这里说了一点,存在多个实现的时候,先加载哪个,后加载哪个,全看classloader。

    大家可以实际测一下,一般来说,在windows下和linux下,会有不同的表现的,坑吧,谁让你进了这充满bug的行业呢,很多很奇怪的问题,都是classloader在不同OS下,获取到的jar包顺序不同导致的:

    public class Test {
    
        public static void main(String[] args) throws NoSuchMethodException, InvocationTargetException, IllegalAccessException {
            Logger logger = LoggerFactory.getLogger(Test.class);
            logger.info("hahha");
          	//打印jar包的加载顺序
            ClassLoader loader = Test.class.getClassLoader();
            Method getURLs = loader.getClass().getMethod("getURLs");
            getURLs.setAccessible(true);
            URL[] o = (URL[]) getURLs.invoke(loader);
            for (URL url : o) {
                System.out.println(url);
            }
        }
    }
    

    问题真相揭秘

    回到问题,我仔细研究了一晚上,在本地复现了问题,可参考module测试代码:

    https://gitee.com/ckl111/all-simple-demo-in-work/tree/master/slf4j-log4j-stackoverflow-demo/slf4j-log4j-cycle-reference-exclude-log4j-in-pom

    这个module里,pom依赖如下:

    堆栈,发现大概是这样的,图小可以单独tab页查看:

    我这里也有对堆栈的文字解释:

    at org.apache.log4j.Category.(Category.java:57) log4j-over-slf4j
    at org.apache.log4j.Logger.(Logger.java:37) log4j-over-slf4j包,已经死循环了
    at org.apache.log4j.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:43) log4j-over-slf4j包
    at org.apache.log4j.LogManager.getLogger(LogManager.java:45) log4j-over-slf4j,这个LogManager是log4j包里的,因为log4j-over-slf4j是一个log4j的冒充者,所以它也有这个类
    at org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:66) slf4j-log4j12,实现了slf4j,进行了static bind的
    at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:358) slf4j-api
    at org.apache.log4j.Category.(Category.java:57) log4j-over-slf4j
    at org.apache.log4j.Logger.(Logger.java:37) log4j-over-slf4j

    关于log4j-over-slf4j,可进一步阅读:

    https://blog.csdn.net/john1337/article/details/76152906

    揭秘过程揭秘

    真相可能足够简单,但是在找真相的过程反而更难一些,因为这个包,其实在windows下跑是没问题的,在linux有问题,魔幻?

    并不魔幻。因为这个war包里,本来是log4j依赖的:

        <dependency>
          <groupId>log4j</groupId>
          <artifactId>log4j</artifactId>
        </dependency>
    
    

    相当于:log4j-over-slf4j 和 log4j 共存,我们说了,log4j-over-slf4j里,提供了log4j的类,包名和类名都一样,谁知道先加载哪一个呢?谁知道,windows下先加载哪个,linux下先加载哪个呢?这个就是要靠运气的时候了,所以是偶现。

    不信你把我们上面测试的module里,照下面这样操作,windows下,立马就好了

    之前不记得采用这种方式来验证,为此,还专门定义了一个自定义classloader,先加载slf4j-log4j jar包,再代理给parent,可参考:

    https://gitee.com/ckl111/all-simple-demo-in-work/tree/master/slf4j-log4j-stackoverflow-demo/slf4j-log4j-cycle-reference-custom-classloader

    总结

    这个问题还是比较有意思的,我也还没有完全弄懂,但大概了解了。

    关于classloader在windows和linux下的不同表现的问题,可参考:

    https://stackoverflow.com/questions/17324975/library-load-order-different-on-two-machines

    https://stackoverflow.com/questions/2179858/how-to-find-which-jars-and-in-what-order-are-loaded-by-a-classloader

    博主之前也写了一篇, 关于这个的:

    了不得,我可能发现了Jar 包冲突的秘密

    关于日志,博主之前还写了一篇:

    面试题:应用中很多jar包,比如spring、mybatis、redis等等,各自用的日志系统各异,怎么用slf4j统一输出?(上)

    本次博客的相关代码:

    https://gitee.com/ckl111/all-simple-demo-in-work/tree/master/slf4j-log4j-stackoverflow-demo

    大家有问题请留言,大家觉得有帮助,请点赞哦,这个也是对我的鼓励。

    谢谢大家!

  • 相关阅读:
    Ubuntu20.04更换软件源
    使用kubeadm安装k8s1.19版本之系统基础环境配置&k8s集群初始化(二)
    k8s如何删除处于terminating状态的ns资源
    k8s如何强制删除pod&pv&pvc和ns&namespace方法
    C语言中的有符号数和无符号整形数转换
    互联网-架构演进
    结合redis使token失效
    有一种爱叫做放手
    js 读取上传的json文件内容
    使用spark-md5获取文件的MD5值
  • 原文地址:https://www.cnblogs.com/grey-wolf/p/12104977.html
Copyright © 2011-2022 走看看