zoukankan      html  css  js  c++  java
  • Nutch 1.4 运行爬虫索引网站时报错。

    命令如下:

    Administrator@f523540 ~
    $ cd  /cygdrive/d/nutch/apache-nutch-1.4-bin/runtime/local/
    
    Administrator@f523540 /cygdrive/d/nutch/apache-nutch-1.4-bin/runtime/local
    $ ./bin/nutch crawl urls -dir crawl -topN 5  -depth 3
    cygpath: can't convert empty path
    solrUrl is not set, indexing will be skipped...
    crawl started in: crawl
    rootUrlDir = urls
    threads = 10
    depth = 3
    solrUrl=null
    topN = 5
    Injector: starting at 2012-06-17 13:47:45
    Injector: crawlDb: crawl/crawldb
    Injector: urlDir: urls
    Injector: Converting injected urls to crawl db entries.
    Exception in thread "main" java.io.IOException: Job failed!
            at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
            at org.apache.nutch.crawl.Injector.inject(Injector.java:217)
            at org.apache.nutch.crawl.Crawl.run(Crawl.java:127)
            at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
            at org.apache.nutch.crawl.Crawl.main(Crawl.java:55)
    

     环境:cygwin  windows xp  java 1.6   nutch 1.4。不知道哪位有没有遇到过此问题,期待您的回答! 

  • 相关阅读:
    Android SurfaceView实战 带你玩转flabby bird (上)
    linux释放内存的命令
    Linux上的free命令详解
    app后端设计(14)--LBS的偏移问题
    包床、退床
    oracle学习17
    提示在【办公管理】-->【今日工作】
    oracle学习16
    数据库的表信息
    CodeForces
  • 原文地址:https://www.cnblogs.com/likehua/p/2552582.html
Copyright © 2011-2022 走看看