zoukankan      html  css  js  c++  java
  • Nutch的日志系统 分类: H3_NUTCH 2015-02-17 20:14 261人阅读 评论(0) 收藏


    一、Nutch日志实现方式

    1、Nutch使用slf4j作为日志接口,使用log4j作为具体实现。关于二者的基础,请参考

    http://blog.csdn.net/jediael_lu/article/details/43854571

    http://blog.csdn.net/jediael_lu/article/details/43865571


    2、在java类文件中,通过以下方式输出日志消息:

    (1)获取Logger对象

      public static final Logger LOG = LoggerFactory.getLogger(InjectorJob.class);
    

    (2)使用Logger进行输出

        SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
        long start = System.currentTimeMillis();
        LOG.info("InjectorJob: starting at " + sdf.format(start));

    3、在log4j.properties中定义各个属性

    # Define some default values that can be overridden by system properties
    hadoop.log.dir=.
    hadoop.log.file=hadoop.log
    
    # RootLogger - DailyRollingFileAppender
    log4j.rootLogger=INFO,DRFA
    
    # Logging Threshold
    log4j.threshold=ALL
    
    #special logging requirements for some commandline tools
    log4j.logger.org.apache.nutch.crawl.Crawl=INFO,cmdstdout
    log4j.logger.org.apache.nutch.crawl.InjectorJob=INFO,cmdstdout
    log4j.logger.org.apache.nutch.host.HostInjectorJob=INFO,cmdstdout
    log4j.logger.org.apache.nutch.crawl.GeneratorJob=INFO,cmdstdout
    log4j.logger.org.apache.nutch.crawl.DbUpdaterJob=INFO,cmdstdout
    log4j.logger.org.apache.nutch.host.HostDbUpdateJob=INFO,cmdstdout
    log4j.logger.org.apache.nutch.fetcher.FetcherJob=INFO,cmdstdout
    log4j.logger.org.apache.nutch.parse.ParserJob=INFO,cmdstdout
    log4j.logger.org.apache.nutch.indexer.IndexingJob=INFO,cmdstdout
    log4j.logger.org.apache.nutch.indexer.DeleteDuplicates=INFO,cmdstdout
    log4j.logger.org.apache.nutch.indexer.CleaningJob=INFO,cmdstdout
    log4j.logger.org.apache.nutch.crawl.WebTableReader=INFO,cmdstdout
    log4j.logger.org.apache.nutch.host.HostDbReader=INFO,cmdstdout
    log4j.logger.org.apache.nutch.parse.ParserChecker=INFO,cmdstdout
    log4j.logger.org.apache.nutch.indexer.IndexingFiltersChecker=INFO,cmdstdout
    log4j.logger.org.apache.nutch.plugin.PluginRepository=WARN
    log4j.logger.org.apache.nutch.api.NutchServer=INFO,cmdstdout
    
    log4j.logger.org.apache.nutch=INFO
    log4j.logger.org.apache.hadoop=WARN
    log4j.logger.org.apache.zookeeper=WARN
    log4j.logger.org.apache.gora=WARN
    
    #
    # Daily Rolling File Appender
    #
    
    log4j.appender.DRFA=org.apache.log4j.DailyRollingFileAppender
    log4j.appender.DRFA.File=${hadoop.log.dir}/${hadoop.log.file}
    
    # Rollver at midnight
    log4j.appender.DRFA.DatePattern=.yyyy-MM-dd
    
    # 30-day backup
    #log4j.appender.DRFA.MaxBackupIndex=30
    log4j.appender.DRFA.layout=org.apache.log4j.PatternLayout
    
    # Pattern format: Date LogLevel LoggerName LogMessage
    log4j.appender.DRFA.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} - %m%n
    # Debugging Pattern format: Date LogLevel LoggerName (FileName:MethodName:LineNo) LogMessage
    #log4j.appender.DRFA.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} (%F:%M(%L)) - %m%n
    
    
    #
    # stdout
    # Add *stdout* to rootlogger above if you want to use this 
    #
    
    log4j.appender.stdout=org.apache.log4j.ConsoleAppender
    log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
    log4j.appender.stdout.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} (%F:%M(%L)) - %m%n
    
    #
    # plain layout used for commandline tools to output to console
    #
    log4j.appender.cmdstdout=org.apache.log4j.ConsoleAppender
    log4j.appender.cmdstdout.layout=org.apache.log4j.PatternLayout
    log4j.appender.cmdstdout.layout.ConversionPattern=%m%n
    
    #
    # Rolling File Appender
    #
    
    #log4j.appender.RFA=org.apache.log4j.RollingFileAppender
    #log4j.appender.RFA.File=${hadoop.log.dir}/${hadoop.log.file}
    
    # Logfile size and and 30-day backups
    #log4j.appender.RFA.MaxFileSize=1MB
    #log4j.appender.RFA.MaxBackupIndex=30
    
    #log4j.appender.RFA.layout=org.apache.log4j.PatternLayout
    #log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} - %m%n
    #log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} (%F:%M(%L)) - %m%n

    二、Nutch日志分析

    1、nutch日志输出有2个appender: cmdstdout 与 DRFA。

    前者将日志输出至标准输出中,后者将文件输出到每日一个的日志文件中。


    2、整个工程的默认日志设置为INFO, DRFA

    而nutch自身的日志被重定义为INFO,cmdstdout

    hadoop, gora, zookeeper等则重定义为WARN,DRFA, 默认日志为./hadoop.log


    版权声明:本文为博主原创文章,未经博主允许不得转载。

  • 相关阅读:
    Coroutine in Java
    常见的开源日志(包括分布式)
    深入理解 Java G1 垃圾收集器--转
    卷积神经网络——本质上是在利用卷积做特征压缩,然后再全连接
    神经网络和反向传播算法——反向传播算法本质上是随机梯度下降,链式求导法则而来的
    LSTM入门学习——结合《LSTM模型》文章看
    LSTM入门学习——本质上就是比RNN的隐藏层公式稍微复杂了一点点而已
    LSTM模型
    syslog介绍-CS架构来采集系统日志
    NetFlow是一种数据交换方式,提供网络流量的会话级视图,记录下每个TCP/IP事务的信息
  • 原文地址:https://www.cnblogs.com/lujinhong2/p/4637218.html
Copyright © 2011-2022 走看看