zoukankan      html  css  js  c++  java
  • Nutch2.x常遇问题集锦

    1、nutch2.3-snapshot中batchid为null引发的.NullPointerException异常:

    Exception in thread "main" java.lang.NullPointerException
    at org.apache.nutch.parse.ParserJob.getBatchIdFilter(ParserJob.java:265)
    at org.apache.nutch.parse.ParserJob.run(ParserJob.java:253)
    at org.apache.nutch.crawl.Crawler.runTool(Crawler.java:69)
    at org.apache.nutch.crawl.Crawler.run(Crawler.java:174)
    at org.apache.nutch.crawl.Crawler.run(Crawler.java:253)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.nutch.crawl.Crawler.main(Crawler.java:260)

    这里有点没搞懂,batchid为null是没有fetch的url,怎么会出现在parserjob中,暂时不管,我们直接忽略掉为null的batchid即可,在ParserJob.java的getBatchIdFilter函数中增加代码遇到null即return即可,如下所示:

     private MapFieldValueFilter<String, WebPage> getBatchIdFilter(String batchId) {
        if (batchId==null||batchId.equals(REPARSE.toString())
            || batchId.equals(Nutch.ALL_CRAWL_ID.toString())) {
          return null;
        }
  • 相关阅读:
    真正的e时代
    在线手册
    UVA 10616 Divisible Group Sums
    UVA 10721 Bar Codes
    UVA 10205 Stack 'em Up
    UVA 10247 Complete Tree Labeling
    UVA 10081 Tight Words
    UVA 11125 Arrange Some Marbles
    UVA 10128 Queue
    UVA 10912 Simple Minded Hashing
  • 原文地址:https://www.cnblogs.com/e-life/p/4122623.html
Copyright © 2011-2022 走看看