zoukankan      html  css  js  c++  java
  • nutch 2.1安装问题集锦

    参照官方文档http://nlp.solutions.asia/?p=180
    中间碰到的问题,解决方法参考
    http://blog.javachen.com/2014/05/20/nutch-intro/


    问题1:

    compile-core:

        [javac] Compiling 180 source files to /root/nutch/build/classes

        [javac] error: error reading /usr/lib/jvm/jdk1.8.0_20/jre/lib/ext/._zipfs.jar; error in opening zip file

        [javac] error: error reading /usr/lib/jvm/jdk1.8.0_20/jre/lib/ext/._sunec.jar; error in opening zip file

        [javac] error: error reading /usr/lib/jvm/jdk1.8.0_20/jre/lib/ext/._sunjce_provider.jar; error in opening zip file

        [javac] error: error reading /usr/lib/jvm/jdk1.8.0_20/jre/lib/ext/._sunpkcs11.jar; error in opening zip file

        [javac] error: error reading /usr/lib/jvm/jdk1.8.0_20/jre/lib/ext/._jfxrt.jar; error in opening zip file

        [javac] error: error reading /usr/lib/jvm/jdk1.8.0_20/jre/lib/ext/._dnsns.jar; error in opening zip file

        [javac] error: error reading /usr/lib/jvm/jdk1.8.0_20/jre/lib/ext/._nashorn.jar; error in opening zip file

        [javac] error: error reading /usr/lib/jvm/jdk1.8.0_20/jre/lib/ext/._localedata.jar; error in opening zip file

        [javac] error: error reading /usr/lib/jvm/jdk1.8.0_20/jre/lib/ext/._cldrdata.jar; error in opening zip file

        [javac] warning: [options] bootstrap class path not set in conjunction with -source 1.6

        [javac] 9 errors

        [javac] 1 warning


    BUILD FAILED

    /root/nutch/build.xml:101: Compile failed; see the compiler error output for details.

    原ext文件夹没有._这些jar,但是有同名zipfs,直接copy,编译通过;



    问题2:

    root@iZ280izbfjqZ:~/nutch/runtime/local# bin/nutch crawl urls -depth 3 -topN 5

    Exception in thread "main" java.lang.ClassNotFoundException: org.apache.gora.sql.store.SqlStore

    at java.net.URLClassLoader$1.run(URLClassLoader.java:372)

    at java.net.URLClassLoader$1.run(URLClassLoader.java:361)

    at java.security.AccessController.doPrivileged(Native Method)

    at java.net.URLClassLoader.findClass(URLClassLoader.java:360)

    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)

    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)

    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)

    at java.lang.Class.forName0(Native Method)

    at java.lang.Class.forName(Class.java:259)

    at org.apache.nutch.storage.StorageUtils.getDataStoreClass(StorageUtils.java:90)

    at org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:74)

    at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:221)

    at org.apache.nutch.crawl.Crawler.runTool(Crawler.java:68)

    at org.apache.nutch.crawl.Crawler.run(Crawler.java:136)

    at org.apache.nutch.crawl.Crawler.run(Crawler.java:250)

    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)

    at org.apache.nutch.crawl.Crawler.main(Crawler.java:257)
    参考以下文章:

    http://blog.sina.com.cn/s/blog_3c9872d00101p4f0.html


    问题三:

    root@iZ280izbfjqZ:~/nutch/runtime/local# bin/nutch crawl urls -depth 3 -topN 5

    InjectorJob: Using class org.apache.gora.sql.store.SqlStore as the Gora storage class.

    InjectorJob: total number of urls rejected by filters: 0

    InjectorJob: total number of urls injected after normalization and filtering: 1

    Exception in thread "main" java.lang.RuntimeException: job failed: name=generate: *, jobid=job_local1888916405_0002

    at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:55)

    at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:199)

    at org.apache.nutch.crawl.Crawler.runTool(Crawler.java:68)

    at org.apache.nutch.crawl.Crawler.run(Crawler.java:152)

    at org.apache.nutch.crawl.Crawler.run(Crawler.java:250)

    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)

    at org.apache.nutch.crawl.Crawler.main(Crawler.java:257)




    nutch/src/java/org/apache/nutch/crawl/GeneratorReducer.java,然后看其100行左右


    batchId=new Utf8(conf.get(GeneratorJob.BATCH_ID));


    改为:
    int randomSeed = Math.abs(new Random().nextInt());
    String batchIdStr = (System.currentTimeMillis()/1000)+"-"+randomSeed;
    batchId = new Utf8( batchIdStr );


    问题4.
    解决

    alter table webpage add batchId varchar(767) DEFAULT NULL;




    然后就成功了,庆祝一下

  • 相关阅读:
    C++同步串口通信
    python描述符详解
    python属性访问
    python简单计时器实现
    python时间模块详解(time模块)
    python魔法方法大全
    python类与对象各个算数运算魔法方法总结
    python里的魔法方法1(构造与析构)
    Python 函数修饰符(装饰器)的使用
    python类与对象的内置函数大全(BIF)
  • 原文地址:https://www.cnblogs.com/jpfss/p/7890588.html
Copyright © 2011-2022 走看看