zoukankan      html  css  js  c++  java
  • 异常: http://www.ly.com/news/visa.html: java.io.IOException: unzipBestEffort returned null

    nutch 运行时异常: http://www.ly.com/news/visa.html: java.io.IOException: unzipBestEffort returned null
    
    参考:http://www.tuicool.com/articles/faUB73

    此页面采用这个是一个分段传输,而nutch爬虫则默认采用了非分段式处理,导致构造GZIP时出错,从而影响了后面的GZIP解压失败。
    是否是分段传输可以在Http headers里面看到,如果是分段传输则有:transfer-encoding:chunked这样一个响应。

    解决方案:


    第一步(修改主程序)
    cd /codes/download/apache-nutch-1.2/src/java/org/apache/nutch/metadata/
    vim HttpHeaders.java
    增加字段:
     public final static String TRANSFER_ENCODING = "Transfer-Encoding";

    第二部(修改插件protocol-http)
    cd /codes/download/apache-nutch-1.2/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/

    158       String transferEncoding = getHeader(Response.TRANSFER_ENCODING);
    159       if(transferEncoding != null && "chunked".equalsIgnoreCase(transferEncoding.trim())){
    160         this.readChunkedContent(in, line);
    161          }else{
    162           readPlainContent(in);
    163          }
    
    

     第三步: 重新编译,ant, ant jar

     第四步:将 build文件夹下的 nutch-1.2.job nutch-1.2.jar 拷贝到bin相应目录下
            bulid/protocol-http/protocol-http.jar  拷贝到  bin的相应的plugins 目录下

    测试通过




  • 相关阅读:
    USACO 5.1 Starry Night
    USACO 4.4 Frame Up
    USACO 4.4 Shuttle Puzzle
    USACO 4.3 Letter Game (字典树)
    USACO 4.3 Street Race
    BZOJ 1036: [ZJOI2008]树的统计Count (树链剖分模板题)
    BZOJ 1861: [Zjoi2006]Book 书架 (splay)
    codeforces 354 D. Transferring Pyramid
    codeforces 286 E. Ladies' Shop (FFT)
    USACO 4.3 Buy Low, Buy Lower
  • 原文地址:https://www.cnblogs.com/i80386/p/3956766.html
Copyright © 2011-2022 走看看