zoukankan      html  css  js  c++  java
  • mahout 运行Twenty Newsgroups Classification实例

    按照mahout官网https://cwiki.apache.org/confluence/display/MAHOUT/Twenty+Newsgroups的说法,我只用运行一条命令就可以完成这个算法的调用了,如下:

    mahout@ubuntu:~/mahout-d-0.7/examples/bin$ ./classify-20newsgroups.sh 

    但是,我首先运行就出错了,因为我不是root账户,所以先改下路径,打开classify-20newsgroups.sh,替换/tmp/mahout-work-为/home/mahout/mahout-work-,这样用户mahout就具有了操作权限,但是还是出错,提示curl 找不到命令,好吧,我没安装这个,sudo apt-get install curl,ok ,ubuntu还是方便呀。

    然后再运行,结果运行到2/3时候还是出错,然后我查看详细信息,居然map输入的数据条数为0?啥意思?好吧,应该是本地文件操作和HDFS文件操作混淆了,其实在执行:

    + ./bin/mahout seqdirectory -i /home/mahout/mahout-work-mahout/20news-all -o /home/mahout/mahout-work-mahout/20news-seq

    这一步前应该把本地的20news-all上传到HDFS文件系统上面,然后重新执行第一条命令即可,全部信息如下(太多了,不知道贴的完不?):

    mahout@ubuntu:~/mahout-d-0.7/examples/bin$ ./classify-20newsgroups.sh 
    Please select a number to choose the corresponding task to run
    1. cnaivebayes
    2. naivebayes
    3. sgd
    4. clean -- cleans up the work area in /home/mahout/mahout-work-mahout
    Enter your choice : 2
    ok. You chose 2 and we'll use naivebayes
    creating work directory at /home/mahout/mahout-work-mahout
    + echo 'Preparing 20newsgroups data'
    Preparing 20newsgroups data
    + rm -rf /home/mahout/mahout-work-mahout/20news-all
    + mkdir /home/mahout/mahout-work-mahout/20news-all
    + cp -R /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/alt.atheism /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/comp.graphics /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/comp.os.ms-windows.misc /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/comp.sys.ibm.pc.hardware /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/comp.sys.mac.hardware /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/comp.windows.x /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/misc.forsale /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/rec.autos /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/rec.motorcycles /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/rec.sport.baseball /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/rec.sport.hockey /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/sci.crypt /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/sci.electronics /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/sci.med /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/sci.space /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/soc.religion.christian /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/talk.politics.guns /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/talk.politics.mideast /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/talk.politics.misc /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/talk.religion.misc /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/alt.atheism /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/comp.graphics /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/comp.os.ms-windows.misc /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/comp.sys.ibm.pc.hardware /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/comp.sys.mac.hardware /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/comp.windows.x /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/misc.forsale /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/rec.autos /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/rec.motorcycles /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/rec.sport.baseball /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/rec.sport.hockey /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/sci.crypt /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/sci.electronics /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/sci.med /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/sci.space /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/soc.religion.christian /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/talk.politics.guns /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/talk.politics.mideast /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/talk.politics.misc /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/talk.religion.misc /home/mahout/mahout-work-mahout/20news-all
    + echo 'Creating sequence files from 20newsgroups data'
    Creating sequence files from 20newsgroups data
    + ./bin/mahout seqdirectory -i /home/mahout/mahout-work-mahout/20news-all -o /home/mahout/mahout-work-mahout/20news-seq
    Warning: $HADOOP_HOME is deprecated.
    
    Running on hadoop, using /home/mahout/hadoop-1.0.4/bin/hadoop and HADOOP_CONF_DIR=
    MAHOUT-JOB: /home/mahout/mahout-d-0.7/mahout-examples-0.7-job.jar
    Warning: $HADOOP_HOME is deprecated.
    
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/mahout-examples-0.7-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    13/08/26 23:38:49 INFO common.AbstractJob: Command line arguments: {--charset=[UTF-8], --chunkSize=[64], --endPhase=[2147483647], --fileFilterClass=[org.apache.mahout.text.PrefixAdditionFilter], --input=[/home/mahout/mahout-work-mahout/20news-all], --keyPrefix=[], --output=[/home/mahout/mahout-work-mahout/20news-seq], --startPhase=[0], --tempDir=[temp]}
    13/08/26 23:42:57 INFO driver.MahoutDriver: Program took 248530 ms (Minutes: 4.142166666666666)
    + echo 'Converting sequence files to vectors'
    Converting sequence files to vectors
    + ./bin/mahout seq2sparse -i /home/mahout/mahout-work-mahout/20news-seq -o /home/mahout/mahout-work-mahout/20news-vectors -lnorm -nv -wt tfidf
    Warning: $HADOOP_HOME is deprecated.
    
    Running on hadoop, using /home/mahout/hadoop-1.0.4/bin/hadoop and HADOOP_CONF_DIR=
    MAHOUT-JOB: /home/mahout/mahout-d-0.7/mahout-examples-0.7-job.jar
    Warning: $HADOOP_HOME is deprecated.
    
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/mahout-examples-0.7-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    13/08/26 23:43:13 INFO vectorizer.SparseVectorsFromSequenceFiles: Maximum n-gram size is: 1
    13/08/26 23:43:13 INFO vectorizer.SparseVectorsFromSequenceFiles: Minimum LLR value: 1.0
    13/08/26 23:43:13 INFO vectorizer.SparseVectorsFromSequenceFiles: Number of reduce tasks: 1
    13/08/26 23:43:17 INFO input.FileInputFormat: Total input paths to process : 1
    13/08/26 23:43:17 INFO mapred.JobClient: Running job: job_201308212334_0056
    13/08/26 23:43:18 INFO mapred.JobClient:  map 0% reduce 0%
    13/08/26 23:43:45 INFO mapred.JobClient:  map 78% reduce 0%
    13/08/26 23:43:51 INFO mapred.JobClient:  map 100% reduce 0%
    13/08/26 23:43:56 INFO mapred.JobClient: Job complete: job_201308212334_0056
    13/08/26 23:43:56 INFO mapred.JobClient: Counters: 19
    13/08/26 23:43:56 INFO mapred.JobClient:   Job Counters 
    13/08/26 23:43:56 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=32883
    13/08/26 23:43:56 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
    13/08/26 23:43:56 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
    13/08/26 23:43:56 INFO mapred.JobClient:     Launched map tasks=1
    13/08/26 23:43:56 INFO mapred.JobClient:     Data-local map tasks=1
    13/08/26 23:43:56 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
    13/08/26 23:43:56 INFO mapred.JobClient:   File Output Format Counters 
    13/08/26 23:43:56 INFO mapred.JobClient:     Bytes Written=27503580
    13/08/26 23:43:56 INFO mapred.JobClient:   FileSystemCounters
    13/08/26 23:43:56 INFO mapred.JobClient:     HDFS_BYTES_READ=36694022
    13/08/26 23:43:56 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=21899
    13/08/26 23:43:56 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=27503580
    13/08/26 23:43:56 INFO mapred.JobClient:   File Input Format Counters 
    13/08/26 23:43:56 INFO mapred.JobClient:     Bytes Read=36693889
    13/08/26 23:43:56 INFO mapred.JobClient:   Map-Reduce Framework
    13/08/26 23:43:56 INFO mapred.JobClient:     Map input records=18846
    13/08/26 23:43:56 INFO mapred.JobClient:     Physical memory (bytes) snapshot=75157504
    13/08/26 23:43:56 INFO mapred.JobClient:     Spilled Records=0
    13/08/26 23:43:56 INFO mapred.JobClient:     CPU time spent (ms)=5730
    13/08/26 23:43:56 INFO mapred.JobClient:     Total committed heap usage (bytes)=15859712
    13/08/26 23:43:56 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=974381056
    13/08/26 23:43:56 INFO mapred.JobClient:     Map output records=18846
    13/08/26 23:43:56 INFO mapred.JobClient:     SPLIT_RAW_BYTES=133
    13/08/26 23:43:56 INFO input.FileInputFormat: Total input paths to process : 1
    13/08/26 23:43:56 INFO mapred.JobClient: Running job: job_201308212334_0057
    13/08/26 23:43:57 INFO mapred.JobClient:  map 0% reduce 0%
    13/08/26 23:44:15 INFO mapred.JobClient:  map 3% reduce 0%
    13/08/26 23:44:18 INFO mapred.JobClient:  map 23% reduce 0%
    13/08/26 23:44:21 INFO mapred.JobClient:  map 60% reduce 0%
    13/08/26 23:44:24 INFO mapred.JobClient:  map 100% reduce 0%
    13/08/26 23:44:48 INFO mapred.JobClient:  map 100% reduce 100%
    13/08/26 23:44:53 INFO mapred.JobClient: Job complete: job_201308212334_0057
    13/08/26 23:44:53 INFO mapred.JobClient: Counters: 29
    13/08/26 23:44:53 INFO mapred.JobClient:   Job Counters 
    13/08/26 23:44:53 INFO mapred.JobClient:     Launched reduce tasks=1
    13/08/26 23:44:53 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=31312
    13/08/26 23:44:53 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
    13/08/26 23:44:53 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
    13/08/26 23:44:53 INFO mapred.JobClient:     Launched map tasks=1
    13/08/26 23:44:53 INFO mapred.JobClient:     Data-local map tasks=1
    13/08/26 23:44:53 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=18422
    13/08/26 23:44:53 INFO mapred.JobClient:   File Output Format Counters 
    13/08/26 23:44:53 INFO mapred.JobClient:     Bytes Written=2315037
    13/08/26 23:44:53 INFO mapred.JobClient:   FileSystemCounters
    13/08/26 23:44:53 INFO mapred.JobClient:     FILE_BYTES_READ=11857906
    13/08/26 23:44:53 INFO mapred.JobClient:     HDFS_BYTES_READ=27503742
    13/08/26 23:44:53 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=15440401
    13/08/26 23:44:53 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=2315037
    13/08/26 23:44:53 INFO mapred.JobClient:   File Input Format Counters 
    13/08/26 23:44:53 INFO mapred.JobClient:     Bytes Read=27503580
    13/08/26 23:44:53 INFO mapred.JobClient:   Map-Reduce Framework
    13/08/26 23:44:53 INFO mapred.JobClient:     Map output materialized bytes=3538084
    13/08/26 23:44:53 INFO mapred.JobClient:     Map input records=18846
    13/08/26 23:44:53 INFO mapred.JobClient:     Reduce shuffle bytes=0
    13/08/26 23:44:53 INFO mapred.JobClient:     Spilled Records=849345
    13/08/26 23:44:53 INFO mapred.JobClient:     Map output bytes=39462740
    13/08/26 23:44:53 INFO mapred.JobClient:     Total committed heap usage (bytes)=176033792
    13/08/26 23:44:53 INFO mapred.JobClient:     CPU time spent (ms)=14080
    13/08/26 23:44:53 INFO mapred.JobClient:     Combine input records=3026242
    13/08/26 23:44:53 INFO mapred.JobClient:     SPLIT_RAW_BYTES=162
    13/08/26 23:44:53 INFO mapred.JobClient:     Reduce input records=192904
    13/08/26 23:44:53 INFO mapred.JobClient:     Reduce input groups=192904
    13/08/26 23:44:53 INFO mapred.JobClient:     Combine output records=554873
    13/08/26 23:44:53 INFO mapred.JobClient:     Physical memory (bytes) snapshot=283111424
    13/08/26 23:44:53 INFO mapred.JobClient:     Reduce output records=93563
    13/08/26 23:44:53 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=1957584896
    13/08/26 23:44:53 INFO mapred.JobClient:     Map output records=2664273
    13/08/26 23:44:54 INFO input.FileInputFormat: Total input paths to process : 1
    13/08/26 23:44:55 INFO mapred.JobClient: Running job: job_201308212334_0058
    13/08/26 23:44:56 INFO mapred.JobClient:  map 0% reduce 0%
    13/08/26 23:45:13 INFO mapred.JobClient:  map 94% reduce 0%
    13/08/26 23:45:16 INFO mapred.JobClient:  map 100% reduce 0%
    13/08/26 23:45:43 INFO mapred.JobClient:  map 100% reduce 100%
    13/08/26 23:45:48 INFO mapred.JobClient: Job complete: job_201308212334_0058
    13/08/26 23:45:48 INFO mapred.JobClient: Counters: 29
    13/08/26 23:45:48 INFO mapred.JobClient:   Job Counters 
    13/08/26 23:45:48 INFO mapred.JobClient:     Launched reduce tasks=1
    13/08/26 23:45:48 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=21298
    13/08/26 23:45:48 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
    13/08/26 23:45:48 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
    13/08/26 23:45:48 INFO mapred.JobClient:     Launched map tasks=1
    13/08/26 23:45:48 INFO mapred.JobClient:     Data-local map tasks=1
    13/08/26 23:45:48 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=24763
    13/08/26 23:45:48 INFO mapred.JobClient:   File Output Format Counters 
    13/08/26 23:45:48 INFO mapred.JobClient:     Bytes Written=29314118
    13/08/26 23:45:48 INFO mapred.JobClient:   FileSystemCounters
    13/08/26 23:45:48 INFO mapred.JobClient:     FILE_BYTES_READ=27274291
    13/08/26 23:45:48 INFO mapred.JobClient:     HDFS_BYTES_READ=29440826
    13/08/26 23:45:48 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=54595105
    13/08/26 23:45:48 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=29314118
    13/08/26 23:45:48 INFO mapred.JobClient:   File Input Format Counters 
    13/08/26 23:45:48 INFO mapred.JobClient:     Bytes Read=27503580
    13/08/26 23:45:48 INFO mapred.JobClient:   Map-Reduce Framework
    13/08/26 23:45:48 INFO mapred.JobClient:     Map output materialized bytes=27274291
    13/08/26 23:45:48 INFO mapred.JobClient:     Map input records=18846
    13/08/26 23:45:48 INFO mapred.JobClient:     Reduce shuffle bytes=0
    13/08/26 23:45:48 INFO mapred.JobClient:     Spilled Records=37692
    13/08/26 23:45:48 INFO mapred.JobClient:     Map output bytes=27199343
    13/08/26 23:45:48 INFO mapred.JobClient:     Total committed heap usage (bytes)=215695360
    13/08/26 23:45:48 INFO mapred.JobClient:     CPU time spent (ms)=12980
    13/08/26 23:45:48 INFO mapred.JobClient:     Combine input records=0
    13/08/26 23:45:48 INFO mapred.JobClient:     SPLIT_RAW_BYTES=162
    13/08/26 23:45:48 INFO mapred.JobClient:     Reduce input records=18846
    13/08/26 23:45:48 INFO mapred.JobClient:     Reduce input groups=18846
    13/08/26 23:45:48 INFO mapred.JobClient:     Combine output records=0
    13/08/26 23:45:48 INFO mapred.JobClient:     Physical memory (bytes) snapshot=332349440
    13/08/26 23:45:48 INFO mapred.JobClient:     Reduce output records=18846
    13/08/26 23:45:48 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=1957584896
    13/08/26 23:45:48 INFO mapred.JobClient:     Map output records=18846
    13/08/26 23:45:49 INFO input.FileInputFormat: Total input paths to process : 1
    13/08/26 23:45:49 INFO mapred.JobClient: Running job: job_201308212334_0059
    13/08/26 23:45:50 INFO mapred.JobClient:  map 0% reduce 0%
    13/08/26 23:46:10 INFO mapred.JobClient:  map 100% reduce 0%
    13/08/26 23:46:25 INFO mapred.JobClient:  map 100% reduce 92%
    13/08/26 23:46:31 INFO mapred.JobClient:  map 100% reduce 100%
    13/08/26 23:46:36 INFO mapred.JobClient: Job complete: job_201308212334_0059
    13/08/26 23:46:36 INFO mapred.JobClient: Counters: 29
    13/08/26 23:46:36 INFO mapred.JobClient:   Job Counters 
    13/08/26 23:46:36 INFO mapred.JobClient:     Launched reduce tasks=1
    13/08/26 23:46:36 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=18217
    13/08/26 23:46:36 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
    13/08/26 23:46:36 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
    13/08/26 23:46:36 INFO mapred.JobClient:     Launched map tasks=1
    13/08/26 23:46:36 INFO mapred.JobClient:     Data-local map tasks=1
    13/08/26 23:46:36 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=20981
    13/08/26 23:46:36 INFO mapred.JobClient:   File Output Format Counters 
    13/08/26 23:46:36 INFO mapred.JobClient:     Bytes Written=29314118
    13/08/26 23:46:36 INFO mapred.JobClient:   FileSystemCounters
    13/08/26 23:46:36 INFO mapred.JobClient:     FILE_BYTES_READ=29059398
    13/08/26 23:46:36 INFO mapred.JobClient:     HDFS_BYTES_READ=29314278
    13/08/26 23:46:36 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=58163419
    13/08/26 23:46:36 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=29314118
    13/08/26 23:46:36 INFO mapred.JobClient:   File Input Format Counters 
    13/08/26 23:46:36 INFO mapred.JobClient:     Bytes Read=29314118
    13/08/26 23:46:36 INFO mapred.JobClient:   Map-Reduce Framework
    13/08/26 23:46:36 INFO mapred.JobClient:     Map output materialized bytes=29059398
    13/08/26 23:46:36 INFO mapred.JobClient:     Map input records=18846
    13/08/26 23:46:36 INFO mapred.JobClient:     Reduce shuffle bytes=0
    13/08/26 23:46:36 INFO mapred.JobClient:     Spilled Records=37692
    13/08/26 23:46:36 INFO mapred.JobClient:     Map output bytes=28984080
    13/08/26 23:46:36 INFO mapred.JobClient:     Total committed heap usage (bytes)=205225984
    13/08/26 23:46:36 INFO mapred.JobClient:     CPU time spent (ms)=8650
    13/08/26 23:46:36 INFO mapred.JobClient:     Combine input records=0
    13/08/26 23:46:37 INFO mapred.JobClient:     SPLIT_RAW_BYTES=160
    13/08/26 23:46:37 INFO mapred.JobClient:     Reduce input records=18846
    13/08/26 23:46:37 INFO mapred.JobClient:     Reduce input groups=18846
    13/08/26 23:46:37 INFO mapred.JobClient:     Combine output records=0
    13/08/26 23:46:37 INFO mapred.JobClient:     Physical memory (bytes) snapshot=313606144
    13/08/26 23:46:37 INFO mapred.JobClient:     Reduce output records=18846
    13/08/26 23:46:37 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=1957584896
    13/08/26 23:46:37 INFO mapred.JobClient:     Map output records=18846
    13/08/26 23:46:37 INFO common.HadoopUtil: Deleting /home/mahout/mahout-work-mahout/20news-vectors/partial-vectors-0
    13/08/26 23:46:37 INFO input.FileInputFormat: Total input paths to process : 1
    13/08/26 23:46:37 INFO mapred.JobClient: Running job: job_201308212334_0060
    13/08/26 23:46:38 INFO mapred.JobClient:  map 0% reduce 0%
    13/08/26 23:46:56 INFO mapred.JobClient:  map 100% reduce 0%
    13/08/26 23:47:14 INFO mapred.JobClient:  map 100% reduce 100%
    13/08/26 23:47:19 INFO mapred.JobClient: Job complete: job_201308212334_0060
    13/08/26 23:47:19 INFO mapred.JobClient: Counters: 29
    13/08/26 23:47:19 INFO mapred.JobClient:   Job Counters 
    13/08/26 23:47:19 INFO mapred.JobClient:     Launched reduce tasks=1
    13/08/26 23:47:19 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=21504
    13/08/26 23:47:19 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
    13/08/26 23:47:19 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
    13/08/26 23:47:19 INFO mapred.JobClient:     Launched map tasks=1
    13/08/26 23:47:19 INFO mapred.JobClient:     Data-local map tasks=1
    13/08/26 23:47:19 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=14273
    13/08/26 23:47:19 INFO mapred.JobClient:   File Output Format Counters 
    13/08/26 23:47:19 INFO mapred.JobClient:     Bytes Written=1890073
    13/08/26 23:47:19 INFO mapred.JobClient:   FileSystemCounters
    13/08/26 23:47:19 INFO mapred.JobClient:     FILE_BYTES_READ=4880788
    13/08/26 23:47:19 INFO mapred.JobClient:     HDFS_BYTES_READ=29314271
    13/08/26 23:47:19 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=6235019
    13/08/26 23:47:19 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=1890073
    13/08/26 23:47:19 INFO mapred.JobClient:   File Input Format Counters 
    13/08/26 23:47:19 INFO mapred.JobClient:     Bytes Read=29314118
    13/08/26 23:47:19 INFO mapred.JobClient:   Map-Reduce Framework
    13/08/26 23:47:19 INFO mapred.JobClient:     Map output materialized bytes=1309902
    13/08/26 23:47:19 INFO mapred.JobClient:     Map input records=18846
    13/08/26 23:47:19 INFO mapred.JobClient:     Reduce shuffle bytes=0
    13/08/26 23:47:19 INFO mapred.JobClient:     Spilled Records=442187
    13/08/26 23:47:19 INFO mapred.JobClient:     Map output bytes=31005336
    13/08/26 23:47:19 INFO mapred.JobClient:     Total committed heap usage (bytes)=176033792
    13/08/26 23:47:19 INFO mapred.JobClient:     CPU time spent (ms)=9210
    13/08/26 23:47:19 INFO mapred.JobClient:     Combine input records=2838837
    13/08/26 23:47:19 INFO mapred.JobClient:     SPLIT_RAW_BYTES=153
    13/08/26 23:47:19 INFO mapred.JobClient:     Reduce input records=93564
    13/08/26 23:47:19 INFO mapred.JobClient:     Reduce input groups=93564
    13/08/26 23:47:19 INFO mapred.JobClient:     Combine output records=348623
    13/08/26 23:47:19 INFO mapred.JobClient:     Physical memory (bytes) snapshot=284684288
    13/08/26 23:47:19 INFO mapred.JobClient:     Reduce output records=93564
    13/08/26 23:47:19 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=1957584896
    13/08/26 23:47:19 INFO mapred.JobClient:     Map output records=2583778
    13/08/26 23:47:19 INFO input.FileInputFormat: Total input paths to process : 1
    13/08/26 23:47:19 INFO mapred.JobClient: Running job: job_201308212334_0061
    13/08/26 23:47:20 INFO mapred.JobClient:  map 0% reduce 0%
    13/08/26 23:47:38 INFO mapred.JobClient:  map 100% reduce 0%
    13/08/26 23:47:53 INFO mapred.JobClient:  map 100% reduce 67%
    13/08/26 23:47:59 INFO mapred.JobClient:  map 100% reduce 100%
    13/08/26 23:48:04 INFO mapred.JobClient: Job complete: job_201308212334_0061
    13/08/26 23:48:04 INFO mapred.JobClient: Counters: 29
    13/08/26 23:48:04 INFO mapred.JobClient:   Job Counters 
    13/08/26 23:48:04 INFO mapred.JobClient:     Launched reduce tasks=1
    13/08/26 23:48:04 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=18292
    13/08/26 23:48:04 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
    13/08/26 23:48:04 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
    13/08/26 23:48:04 INFO mapred.JobClient:     Launched map tasks=1
    13/08/26 23:48:04 INFO mapred.JobClient:     Data-local map tasks=1
    13/08/26 23:48:04 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=19293
    13/08/26 23:48:04 INFO mapred.JobClient:   File Output Format Counters 
    13/08/26 23:48:04 INFO mapred.JobClient:     Bytes Written=28689283
    13/08/26 23:48:04 INFO mapred.JobClient:   FileSystemCounters
    13/08/26 23:48:04 INFO mapred.JobClient:     FILE_BYTES_READ=29059398
    13/08/26 23:48:04 INFO mapred.JobClient:     HDFS_BYTES_READ=31204324
    13/08/26 23:48:04 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=58165045
    13/08/26 23:48:04 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=28689283
    13/08/26 23:48:04 INFO mapred.JobClient:   File Input Format Counters 
    13/08/26 23:48:04 INFO mapred.JobClient:     Bytes Read=29314118
    13/08/26 23:48:04 INFO mapred.JobClient:   Map-Reduce Framework
    13/08/26 23:48:04 INFO mapred.JobClient:     Map output materialized bytes=29059398
    13/08/26 23:48:04 INFO mapred.JobClient:     Map input records=18846
    13/08/26 23:48:04 INFO mapred.JobClient:     Reduce shuffle bytes=0
    13/08/26 23:48:04 INFO mapred.JobClient:     Spilled Records=37692
    13/08/26 23:48:04 INFO mapred.JobClient:     Map output bytes=28984080
    13/08/26 23:48:04 INFO mapred.JobClient:     Total committed heap usage (bytes)=205225984
    13/08/26 23:48:04 INFO mapred.JobClient:     CPU time spent (ms)=8770
    13/08/26 23:48:04 INFO mapred.JobClient:     Combine input records=0
    13/08/26 23:48:04 INFO mapred.JobClient:     SPLIT_RAW_BYTES=153
    13/08/26 23:48:04 INFO mapred.JobClient:     Reduce input records=18846
    13/08/26 23:48:04 INFO mapred.JobClient:     Reduce input groups=18846
    13/08/26 23:48:04 INFO mapred.JobClient:     Combine output records=0
    13/08/26 23:48:04 INFO mapred.JobClient:     Physical memory (bytes) snapshot=320401408
    13/08/26 23:48:04 INFO mapred.JobClient:     Reduce output records=18846
    13/08/26 23:48:04 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=1957584896
    13/08/26 23:48:04 INFO mapred.JobClient:     Map output records=18846
    13/08/26 23:48:05 INFO input.FileInputFormat: Total input paths to process : 1
    13/08/26 23:48:05 INFO mapred.JobClient: Running job: job_201308212334_0062
    13/08/26 23:48:06 INFO mapred.JobClient:  map 0% reduce 0%
    13/08/26 23:48:24 INFO mapred.JobClient:  map 100% reduce 0%
    13/08/26 23:48:36 INFO mapred.JobClient:  map 100% reduce 33%
    13/08/26 23:48:39 INFO mapred.JobClient:  map 100% reduce 86%
    13/08/26 23:48:48 INFO mapred.JobClient:  map 100% reduce 100%
    13/08/26 23:48:53 INFO mapred.JobClient: Job complete: job_201308212334_0062
    13/08/26 23:48:53 INFO mapred.JobClient: Counters: 29
    13/08/26 23:48:53 INFO mapred.JobClient:   Job Counters 
    13/08/26 23:48:53 INFO mapred.JobClient:     Launched reduce tasks=1
    13/08/26 23:48:53 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=18225
    13/08/26 23:48:53 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
    13/08/26 23:48:53 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
    13/08/26 23:48:53 INFO mapred.JobClient:     Launched map tasks=1
    13/08/26 23:48:53 INFO mapred.JobClient:     Data-local map tasks=1
    13/08/26 23:48:53 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=21045
    13/08/26 23:48:53 INFO mapred.JobClient:   File Output Format Counters 
    13/08/26 23:48:53 INFO mapred.JobClient:     Bytes Written=28689283
    13/08/26 23:48:53 INFO mapred.JobClient:   FileSystemCounters
    13/08/26 23:48:53 INFO mapred.JobClient:     FILE_BYTES_READ=28437750
    13/08/26 23:48:53 INFO mapred.JobClient:     HDFS_BYTES_READ=28689443
    13/08/26 23:48:53 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=56920127
    13/08/26 23:48:53 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=28689283
    13/08/26 23:48:53 INFO mapred.JobClient:   File Input Format Counters 
    13/08/26 23:48:53 INFO mapred.JobClient:     Bytes Read=28689283
    13/08/26 23:48:53 INFO mapred.JobClient:   Map-Reduce Framework
    13/08/26 23:48:53 INFO mapred.JobClient:     Map output materialized bytes=28437750
    13/08/26 23:48:53 INFO mapred.JobClient:     Map input records=18846
    13/08/26 23:48:53 INFO mapred.JobClient:     Reduce shuffle bytes=0
    13/08/26 23:48:53 INFO mapred.JobClient:     Spilled Records=37692
    13/08/26 23:48:53 INFO mapred.JobClient:     Map output bytes=28362505
    13/08/26 23:48:53 INFO mapred.JobClient:     Total committed heap usage (bytes)=204603392
    13/08/26 23:48:53 INFO mapred.JobClient:     CPU time spent (ms)=8340
    13/08/26 23:48:53 INFO mapred.JobClient:     Combine input records=0
    13/08/26 23:48:53 INFO mapred.JobClient:     SPLIT_RAW_BYTES=160
    13/08/26 23:48:53 INFO mapred.JobClient:     Reduce input records=18846
    13/08/26 23:48:53 INFO mapred.JobClient:     Reduce input groups=18846
    13/08/26 23:48:53 INFO mapred.JobClient:     Combine output records=0
    13/08/26 23:48:53 INFO mapred.JobClient:     Physical memory (bytes) snapshot=313868288
    13/08/26 23:48:53 INFO mapred.JobClient:     Reduce output records=18846
    13/08/26 23:48:53 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=1957584896
    13/08/26 23:48:53 INFO mapred.JobClient:     Map output records=18846
    13/08/26 23:48:53 INFO common.HadoopUtil: Deleting /home/mahout/mahout-work-mahout/20news-vectors/partial-vectors-0
    13/08/26 23:48:53 INFO driver.MahoutDriver: Program took 339621 ms (Minutes: 5.66035)
    + echo 'Creating training and holdout set with a random 80-20 split of the generated vector dataset'
    Creating training and holdout set with a random 80-20 split of the generated vector dataset
    + ./bin/mahout split -i /home/mahout/mahout-work-mahout/20news-vectors/tfidf-vectors --trainingOutput /home/mahout/mahout-work-mahout/20news-train-vectors --testOutput /home/mahout/mahout-work-mahout/20news-test-vectors --randomSelectionPct 40 --overwrite --sequenceFiles -xm sequential
    Warning: $HADOOP_HOME is deprecated.
    
    Running on hadoop, using /home/mahout/hadoop-1.0.4/bin/hadoop and HADOOP_CONF_DIR=
    MAHOUT-JOB: /home/mahout/mahout-d-0.7/mahout-examples-0.7-job.jar
    Warning: $HADOOP_HOME is deprecated.
    
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/mahout-examples-0.7-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    13/08/26 23:49:06 WARN driver.MahoutDriver: No split.props found on classpath, will use command-line arguments only
    13/08/26 23:49:07 INFO common.AbstractJob: Command line arguments: {--endPhase=[2147483647], --input=[/home/mahout/mahout-work-mahout/20news-vectors/tfidf-vectors], --method=[sequential], --overwrite=null, --randomSelectionPct=[40], --sequenceFiles=null, --startPhase=[0], --tempDir=[temp], --testOutput=[/home/mahout/mahout-work-mahout/20news-test-vectors], --trainingOutput=[/home/mahout/mahout-work-mahout/20news-train-vectors]}
    13/08/26 23:49:11 INFO utils.SplitInput: part-r-00000 has 162419 lines
    13/08/26 23:49:11 INFO utils.SplitInput: part-r-00000 test split size is 64968 based on random selection percentage 40
    13/08/26 23:49:11 INFO util.NativeCodeLoader: Loaded the native-hadoop library
    13/08/26 23:49:11 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
    13/08/26 23:49:11 INFO compress.CodecPool: Got brand-new compressor
    13/08/26 23:49:11 INFO compress.CodecPool: Got brand-new compressor
    13/08/26 23:49:16 INFO utils.SplitInput: file: part-r-00000, input: 162419 train: 11321, test: 7525 starting at 0
    13/08/26 23:49:16 INFO driver.MahoutDriver: Program took 9786 ms (Minutes: 0.1631)
    + echo 'Training Naive Bayes model'
    Training Naive Bayes model
    + ./bin/mahout trainnb -i /home/mahout/mahout-work-mahout/20news-train-vectors -el -o /home/mahout/mahout-work-mahout/model -li /home/mahout/mahout-work-mahout/labelindex -ow
    Warning: $HADOOP_HOME is deprecated.
    
    Running on hadoop, using /home/mahout/hadoop-1.0.4/bin/hadoop and HADOOP_CONF_DIR=
    MAHOUT-JOB: /home/mahout/mahout-d-0.7/mahout-examples-0.7-job.jar
    Warning: $HADOOP_HOME is deprecated.
    
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/mahout-examples-0.7-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    13/08/26 23:49:22 WARN driver.MahoutDriver: No trainnb.props found on classpath, will use command-line arguments only
    13/08/26 23:49:22 INFO common.AbstractJob: Command line arguments: {--alphaI=[1.0], --endPhase=[2147483647], --extractLabels=null, --input=[/home/mahout/mahout-work-mahout/20news-train-vectors], --labelIndex=[/home/mahout/mahout-work-mahout/labelindex], --output=[/home/mahout/mahout-work-mahout/model], --overwrite=null, --startPhase=[0], --tempDir=[temp]}
    13/08/26 23:49:23 INFO common.HadoopUtil: Deleting temp
    13/08/26 23:49:23 INFO util.NativeCodeLoader: Loaded the native-hadoop library
    13/08/26 23:49:23 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
    13/08/26 23:49:23 INFO compress.CodecPool: Got brand-new decompressor
    13/08/26 23:49:26 INFO input.FileInputFormat: Total input paths to process : 1
    13/08/26 23:49:26 INFO mapred.JobClient: Running job: job_201308212334_0063
    13/08/26 23:49:27 INFO mapred.JobClient:  map 0% reduce 0%
    13/08/26 23:49:49 INFO mapred.JobClient:  map 43% reduce 0%
    13/08/26 23:49:52 INFO mapred.JobClient:  map 100% reduce 0%
    13/08/26 23:50:13 INFO mapred.JobClient:  map 100% reduce 100%
    13/08/26 23:50:18 INFO mapred.JobClient: Job complete: job_201308212334_0063
    13/08/26 23:50:18 INFO mapred.JobClient: Counters: 29
    13/08/26 23:50:18 INFO mapred.JobClient:   Job Counters 
    13/08/26 23:50:18 INFO mapred.JobClient:     Launched reduce tasks=1
    13/08/26 23:50:18 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=22816
    13/08/26 23:50:18 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
    13/08/26 23:50:18 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
    13/08/26 23:50:18 INFO mapred.JobClient:     Launched map tasks=1
    13/08/26 23:50:18 INFO mapred.JobClient:     Data-local map tasks=1
    13/08/26 23:50:18 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=20680
    13/08/26 23:50:18 INFO mapred.JobClient:   File Output Format Counters 
    13/08/26 23:50:18 INFO mapred.JobClient:     Bytes Written=2718605
    13/08/26 23:50:18 INFO mapred.JobClient:   FileSystemCounters
    13/08/26 23:50:18 INFO mapred.JobClient:     FILE_BYTES_READ=1404371
    13/08/26 23:50:18 INFO mapred.JobClient:     HDFS_BYTES_READ=12669237
    13/08/26 23:50:18 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=2854477
    13/08/26 23:50:18 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=2718605
    13/08/26 23:50:18 INFO mapred.JobClient:   File Input Format Counters 
    13/08/26 23:50:18 INFO mapred.JobClient:     Bytes Read=12668431
    13/08/26 23:50:18 INFO mapred.JobClient:   Map-Reduce Framework
    13/08/26 23:50:18 INFO mapred.JobClient:     Map output materialized bytes=1404363
    13/08/26 23:50:18 INFO mapred.JobClient:     Map input records=11321
    13/08/26 23:50:18 INFO mapred.JobClient:     Reduce shuffle bytes=1404363
    13/08/26 23:50:18 INFO mapred.JobClient:     Spilled Records=40
    13/08/26 23:50:18 INFO mapred.JobClient:     Map output bytes=16682576
    13/08/26 23:50:18 INFO mapred.JobClient:     Total committed heap usage (bytes)=176164864
    13/08/26 23:50:18 INFO mapred.JobClient:     CPU time spent (ms)=8190
    13/08/26 23:50:18 INFO mapred.JobClient:     Combine input records=11321
    13/08/26 23:50:18 INFO mapred.JobClient:     SPLIT_RAW_BYTES=148
    13/08/26 23:50:18 INFO mapred.JobClient:     Reduce input records=20
    13/08/26 23:50:18 INFO mapred.JobClient:     Reduce input groups=20
    13/08/26 23:50:18 INFO mapred.JobClient:     Combine output records=20
    13/08/26 23:50:18 INFO mapred.JobClient:     Physical memory (bytes) snapshot=294400000
    13/08/26 23:50:18 INFO mapred.JobClient:     Reduce output records=20
    13/08/26 23:50:18 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=1961967616
    13/08/26 23:50:18 INFO mapred.JobClient:     Map output records=11321
    13/08/26 23:50:18 INFO input.FileInputFormat: Total input paths to process : 1
    13/08/26 23:50:18 INFO mapred.JobClient: Running job: job_201308212334_0064
    13/08/26 23:50:19 INFO mapred.JobClient:  map 0% reduce 0%
    13/08/26 23:50:40 INFO mapred.JobClient:  map 100% reduce 0%
    13/08/26 23:51:01 INFO mapred.JobClient:  map 100% reduce 100%
    13/08/26 23:51:06 INFO mapred.JobClient: Job complete: job_201308212334_0064
    13/08/26 23:51:06 INFO mapred.JobClient: Counters: 29
    13/08/26 23:51:06 INFO mapred.JobClient:   Job Counters 
    13/08/26 23:51:06 INFO mapred.JobClient:     Launched reduce tasks=1
    13/08/26 23:51:06 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=24609
    13/08/26 23:51:06 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
    13/08/26 23:51:06 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
    13/08/26 23:51:06 INFO mapred.JobClient:     Launched map tasks=1
    13/08/26 23:51:06 INFO mapred.JobClient:     Data-local map tasks=1
    13/08/26 23:51:06 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=15258
    13/08/26 23:51:06 INFO mapred.JobClient:   File Output Format Counters 
    13/08/26 23:51:06 INFO mapred.JobClient:     Bytes Written=893560
    13/08/26 23:51:06 INFO mapred.JobClient:   FileSystemCounters
    13/08/26 23:51:06 INFO mapred.JobClient:     FILE_BYTES_READ=362674
    13/08/26 23:51:06 INFO mapred.JobClient:     HDFS_BYTES_READ=2718737
    13/08/26 23:51:06 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=771195
    13/08/26 23:51:06 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=893560
    13/08/26 23:51:06 INFO mapred.JobClient:   File Input Format Counters 
    13/08/26 23:51:06 INFO mapred.JobClient:     Bytes Read=2718605
    13/08/26 23:51:06 INFO mapred.JobClient:   Map-Reduce Framework
    13/08/26 23:51:06 INFO mapred.JobClient:     Map output materialized bytes=362666
    13/08/26 23:51:06 INFO mapred.JobClient:     Map input records=20
    13/08/26 23:51:06 INFO mapred.JobClient:     Reduce shuffle bytes=362666
    13/08/26 23:51:06 INFO mapred.JobClient:     Spilled Records=4
    13/08/26 23:51:06 INFO mapred.JobClient:     Map output bytes=893434
    13/08/26 23:51:06 INFO mapred.JobClient:     Total committed heap usage (bytes)=223264768
    13/08/26 23:51:06 INFO mapred.JobClient:     CPU time spent (ms)=5370
    13/08/26 23:51:06 INFO mapred.JobClient:     Combine input records=2
    13/08/26 23:51:06 INFO mapred.JobClient:     SPLIT_RAW_BYTES=132
    13/08/26 23:51:06 INFO mapred.JobClient:     Reduce input records=2
    13/08/26 23:51:06 INFO mapred.JobClient:     Reduce input groups=2
    13/08/26 23:51:06 INFO mapred.JobClient:     Combine output records=2
    13/08/26 23:51:06 INFO mapred.JobClient:     Physical memory (bytes) snapshot=300597248
    13/08/26 23:51:06 INFO mapred.JobClient:     Reduce output records=2
    13/08/26 23:51:06 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=1961967616
    13/08/26 23:51:06 INFO mapred.JobClient:     Map output records=2
    13/08/26 23:51:07 INFO driver.MahoutDriver: Program took 104944 ms (Minutes: 1.7490666666666668)
    + echo 'Self testing on training set'
    Self testing on training set
    + ./bin/mahout testnb -i /home/mahout/mahout-work-mahout/20news-train-vectors -m /home/mahout/mahout-work-mahout/model -l /home/mahout/mahout-work-mahout/labelindex -ow -o /home/mahout/mahout-work-mahout/20news-testing
    Warning: $HADOOP_HOME is deprecated.
    
    Running on hadoop, using /home/mahout/hadoop-1.0.4/bin/hadoop and HADOOP_CONF_DIR=
    MAHOUT-JOB: /home/mahout/mahout-d-0.7/mahout-examples-0.7-job.jar
    Warning: $HADOOP_HOME is deprecated.
    
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/mahout-examples-0.7-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    13/08/26 23:51:19 WARN driver.MahoutDriver: No testnb.props found on classpath, will use command-line arguments only
    13/08/26 23:51:19 INFO common.AbstractJob: Command line arguments: {--endPhase=[2147483647], --input=[/home/mahout/mahout-work-mahout/20news-train-vectors], --labelIndex=[/home/mahout/mahout-work-mahout/labelindex], --model=[/home/mahout/mahout-work-mahout/model], --output=[/home/mahout/mahout-work-mahout/20news-testing], --overwrite=null, --startPhase=[0], --tempDir=[temp]}
    13/08/26 23:51:20 INFO input.FileInputFormat: Total input paths to process : 1
    13/08/26 23:51:21 INFO mapred.JobClient: Running job: job_201308212334_0065
    13/08/26 23:51:22 INFO mapred.JobClient:  map 0% reduce 0%
    13/08/26 23:51:45 INFO mapred.JobClient:  map 51% reduce 0%
    13/08/26 23:51:48 INFO mapred.JobClient:  map 89% reduce 0%
    13/08/26 23:51:54 INFO mapred.JobClient:  map 100% reduce 0%
    13/08/26 23:51:58 INFO mapred.JobClient: Job complete: job_201308212334_0065
    13/08/26 23:51:58 INFO mapred.JobClient: Counters: 19
    13/08/26 23:51:58 INFO mapred.JobClient:   Job Counters 
    13/08/26 23:51:58 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=34216
    13/08/26 23:51:58 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
    13/08/26 23:51:58 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
    13/08/26 23:51:58 INFO mapred.JobClient:     Launched map tasks=1
    13/08/26 23:51:58 INFO mapred.JobClient:     Data-local map tasks=1
    13/08/26 23:51:58 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
    13/08/26 23:51:58 INFO mapred.JobClient:   File Output Format Counters 
    13/08/26 23:51:58 INFO mapred.JobClient:     Bytes Written=2132486
    13/08/26 23:51:58 INFO mapred.JobClient:   FileSystemCounters
    13/08/26 23:51:58 INFO mapred.JobClient:     HDFS_BYTES_READ=16279896
    13/08/26 23:51:58 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=22523
    13/08/26 23:51:58 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=2132486
    13/08/26 23:51:58 INFO mapred.JobClient:   File Input Format Counters 
    13/08/26 23:51:58 INFO mapred.JobClient:     Bytes Read=12668431
    13/08/26 23:51:58 INFO mapred.JobClient:   Map-Reduce Framework
    13/08/26 23:51:58 INFO mapred.JobClient:     Map input records=11321
    13/08/26 23:51:58 INFO mapred.JobClient:     Physical memory (bytes) snapshot=87547904
    13/08/26 23:51:58 INFO mapred.JobClient:     Spilled Records=0
    13/08/26 23:51:58 INFO mapred.JobClient:     CPU time spent (ms)=9380
    13/08/26 23:51:58 INFO mapred.JobClient:     Total committed heap usage (bytes)=28131328
    13/08/26 23:51:58 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=976572416
    13/08/26 23:51:58 INFO mapred.JobClient:     Map output records=11321
    13/08/26 23:51:58 INFO mapred.JobClient:     SPLIT_RAW_BYTES=148
    13/08/26 23:51:59 INFO test.TestNaiveBayesDriver: Standard NB Results: =======================================================
    Summary
    -------------------------------------------------------
    Correctly Classified Instances          :      11256	   99.4258%
    Incorrectly Classified Instances        :         65	    0.5742%
    Total Classified Instances              :      11321
    
    =======================================================
    Confusion Matrix
    -------------------------------------------------------
    a    	b    	c    	d    	e    	f    	g    	h    	i    	j    	k   l    	m    	n    	o    	p    	q    	r    	s    	t    	<--Classified as
    454  	0    	0    	1    	0    	0    	0    	0    	0    	0    	0   0    	0    	0    	0    	0    	0    	0    	3    	0    	 |  458     a     = alt.atheism
    0    	588  	0    	3    	0    	2    	0    	0    	0    	0    	0   1    	0    	1    	0    	0    	0    	0    	0    	0    	 |  595     b     = comp.graphics
    0    	3    	553  	7    	0    	0    	0    	0    	0    	0    	0   0    	0    	0    	0    	0    	0    	0    	0    	0    	 |  563     c     = comp.os.ms-windows.misc
    0    	0    	0    	592  	1    	0    	2    	0    	0    	0    	0   0    	0    	0    	0    	0    	0    	0    	0    	0    	 |  595     d     = comp.sys.ibm.pc.hardware
    0    	0    	0    	1    	593  	0    	0    	0    	0    	0    	0   0    	0    	0    	0    	0    	0    	0    	0    	0    	 |  594     e     = comp.sys.mac.hardware
    0    	2    	0    	1    	0    	576  	1    	0    	0    	0    	0   0    	0    	0    	1    	0    	0    	0    	0    	0    	 |  581     f     = comp.windows.x
    0    	1    	0    	0    	0    	0    	579  	0    	0    	0    	0   0    	1    	0    	0    	0    	0    	0    	0    	0    	 |  581     g     = misc.forsale
    0    	0    	0    	0    	0    	0    	1    	594  	0    	0    	0   0    	1    	0    	0    	0    	0    	0    	0    	0    	 |  596     h     = rec.autos
    0    	0    	0    	0    	0    	0    	1    	2    	591  	0    	0   0    	0    	0    	0    	0    	0    	0    	0    	0    	 |  594     i     = rec.motorcycles
    0    	0    	0    	0    	0    	0    	0    	0    	0    	615  	1   0    	0    	0    	0    	0    	0    	0    	0    	0    	 |  616     j     = rec.sport.baseball
    0    	0    	0    	0    	0    	0    	1    	0    	0    	1    	581 0    	0    	0    	0    	0    	0    	0    	0    	0    	 |  583     k     = rec.sport.hockey
    0    	0    	1    	0    	0    	0    	0    	0    	0    	0    	0   627  	1    	0    	0    	0    	0    	1    	0    	0    	 |  630     l     = sci.crypt
    0    	0    	0    	2    	0    	0    	1    	0    	0    	0    	0   0    	588  	0    	0    	0    	0    	0    	0    	0    	 |  591     m     = sci.electronics
    0    	1    	0    	0    	0    	0    	0    	0    	0    	0    	0   0    	0    	586  	1    	0    	0    	0    	0    	0    	 |  588     n     = sci.med
    0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0   0    	0    	0    	615  	0    	0    	0    	0    	0    	 |  615     o     = sci.space
    0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0   0    	0    	0    	0    	619  	1    	0    	0    	0    	 |  620     p     = soc.religion.christian
    1    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0   0    	0    	0    	0    	1    	541  	0    	0    	0    	 |  543     q     = talk.politics.mideast
    0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0   1    	0    	0    	0    	0    	0    	560  	0    	0    	 |  561     r     = talk.politics.guns
    3    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0   0    	0    	0    	0    	4    	0    	1    	351  	0    	 |  359     s     = talk.religion.misc
    0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0   1    	0    	0    	0    	0    	0    	4    	0    	453  	 |  458     t     = talk.politics.misc
    
    
    13/08/26 23:51:59 INFO driver.MahoutDriver: Program took 40214 ms (Minutes: 0.6702333333333333)
    + echo 'Testing on holdout set'
    Testing on holdout set
    + ./bin/mahout testnb -i /home/mahout/mahout-work-mahout/20news-test-vectors -m /home/mahout/mahout-work-mahout/model -l /home/mahout/mahout-work-mahout/labelindex -ow -o /home/mahout/mahout-work-mahout/20news-testing
    Warning: $HADOOP_HOME is deprecated.
    
    Running on hadoop, using /home/mahout/hadoop-1.0.4/bin/hadoop and HADOOP_CONF_DIR=
    MAHOUT-JOB: /home/mahout/mahout-d-0.7/mahout-examples-0.7-job.jar
    Warning: $HADOOP_HOME is deprecated.
    
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/mahout-examples-0.7-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    13/08/26 23:52:09 WARN driver.MahoutDriver: No testnb.props found on classpath, will use command-line arguments only
    13/08/26 23:52:09 INFO common.AbstractJob: Command line arguments: {--endPhase=[2147483647], --input=[/home/mahout/mahout-work-mahout/20news-test-vectors], --labelIndex=[/home/mahout/mahout-work-mahout/labelindex], --model=[/home/mahout/mahout-work-mahout/model], --output=[/home/mahout/mahout-work-mahout/20news-testing], --overwrite=null, --startPhase=[0], --tempDir=[temp]}
    13/08/26 23:52:10 INFO common.HadoopUtil: Deleting /home/mahout/mahout-work-mahout/20news-testing
    13/08/26 23:52:10 INFO input.FileInputFormat: Total input paths to process : 1
    13/08/26 23:52:11 INFO mapred.JobClient: Running job: job_201308212334_0066
    13/08/26 23:52:12 INFO mapred.JobClient:  map 0% reduce 0%
    13/08/26 23:52:30 INFO mapred.JobClient:  map 85% reduce 0%
    13/08/26 23:52:36 INFO mapred.JobClient:  map 100% reduce 0%
    13/08/26 23:52:41 INFO mapred.JobClient: Job complete: job_201308212334_0066
    13/08/26 23:52:41 INFO mapred.JobClient: Counters: 19
    13/08/26 23:52:41 INFO mapred.JobClient:   Job Counters 
    13/08/26 23:52:41 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=25113
    13/08/26 23:52:41 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
    13/08/26 23:52:41 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
    13/08/26 23:52:41 INFO mapred.JobClient:     Launched map tasks=1
    13/08/26 23:52:41 INFO mapred.JobClient:     Data-local map tasks=1
    13/08/26 23:52:41 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
    13/08/26 23:52:41 INFO mapred.JobClient:   File Output Format Counters 
    13/08/26 23:52:41 INFO mapred.JobClient:     Bytes Written=1417942
    13/08/26 23:52:41 INFO mapred.JobClient:   FileSystemCounters
    13/08/26 23:52:41 INFO mapred.JobClient:     HDFS_BYTES_READ=12148944
    13/08/26 23:52:41 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=22522
    13/08/26 23:52:41 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=1417942
    13/08/26 23:52:41 INFO mapred.JobClient:   File Input Format Counters 
    13/08/26 23:52:41 INFO mapred.JobClient:     Bytes Read=8537480
    13/08/26 23:52:41 INFO mapred.JobClient:   Map-Reduce Framework
    13/08/26 23:52:41 INFO mapred.JobClient:     Map input records=7525
    13/08/26 23:52:41 INFO mapred.JobClient:     Physical memory (bytes) snapshot=85057536
    13/08/26 23:52:41 INFO mapred.JobClient:     Spilled Records=0
    13/08/26 23:52:41 INFO mapred.JobClient:     CPU time spent (ms)=6630
    13/08/26 23:52:41 INFO mapred.JobClient:     Total committed heap usage (bytes)=28155904
    13/08/26 23:52:41 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=976572416
    13/08/26 23:52:41 INFO mapred.JobClient:     Map output records=7525
    13/08/26 23:52:41 INFO mapred.JobClient:     SPLIT_RAW_BYTES=147
    13/08/26 23:52:42 INFO test.TestNaiveBayesDriver: Standard NB Results: =======================================================
    Summary
    -------------------------------------------------------
    Correctly Classified Instances          :       6801	   90.3787%
    Incorrectly Classified Instances        :        724	    9.6213%
    Total Classified Instances              :       7525
    
    =======================================================
    Confusion Matrix
    -------------------------------------------------------
    a    	b    	c    	d    	e    	f    	g    	h    	i    	j    	k   l    	m    	n    	o    	p    	q    	r    	s    	t    	<--Classified as
    318  	0    	0    	0    	1    	0    	0    	0    	1    	0    	0   0    	0    	0    	1    	4    	0    	0    	15   	1    	 |  341     a     = alt.atheism
    1    	318  	7    	20   	4    	7    	7    	2    	0    	1    	0   1    	1    	2    	6    	0    	0    	0    	0    	1    	 |  378     b     = comp.graphics
    0    	25   	277  	78   	12   	15   	5    	0    	0    	0    	0   2    	4    	0    	1    	0    	0    	0    	0    	3    	 |  422     c     = comp.os.ms-windows.misc
    1    	4    	3    	336  	20   	3    	8    	0    	0    	0    	0   1    	11   	0    	0    	0    	0    	0    	0    	0    	 |  387     d     = comp.sys.ibm.pc.hardware
    0    	3    	1    	6    	350  	1    	3    	0    	0    	0    	0   1    	3    	1    	0    	0    	0    	0    	0    	0    	 |  369     e     = comp.sys.mac.hardware
    1    	20   	3    	6    	7    	365  	3    	0    	0    	0    	0   1    	0    	0    	0    	0    	1    	0    	0    	0    	 |  407     f     = comp.windows.x
    0    	1    	1    	19   	8    	0    	329  	13   	1    	0    	0   2    	14   	0    	4    	0    	0    	1    	1    	0    	 |  394     g     = misc.forsale
    0    	2    	1    	2    	3    	1    	10   	361  	8    	0    	0   0    	4    	0    	0    	0    	0    	1    	0    	1    	 |  394     h     = rec.autos
    0    	0    	0    	1    	0    	0    	2    	3    	393  	1    	0   0    	0    	0    	0    	0    	0    	1    	0    	1    	 |  402     i     = rec.motorcycles
    0    	0    	0    	1    	0    	0    	2    	3    	0    	360  	6   0    	2    	2    	1    	0    	0    	0    	0    	1    	 |  378     j     = rec.sport.baseball
    0    	1    	0    	2    	1    	0    	0    	0    	2    	5    	401 0    	1    	0    	0    	1    	0    	0    	0    	2    	 |  416     k     = rec.sport.hockey
    1    	1    	0    	1    	3    	2    	1    	1    	0    	0    	0   344  	1    	1    	2    	0    	1    	1    	0    	1    	 |  361     l     = sci.crypt
    0    	5    	0    	15   	14   	0    	5    	1    	1    	0    	0   2    	348  	1    	1    	0    	0    	0    	0    	0    	 |  393     m     = sci.electronics
    1    	2    	1    	1    	1    	0    	1    	0    	0    	1    	0   1    	4    	381  	5    	0    	0    	1    	1    	1    	 |  402     n     = sci.med
    1    	4    	0    	0    	2    	0    	2    	1    	0    	0    	0   1    	2    	1    	356  	0    	0    	1    	0    	1    	 |  372     o     = sci.space
    5    	0    	0    	1    	1    	0    	0    	1    	0    	0    	1   0    	0    	1    	0    	359  	3    	0    	4    	1    	 |  377     p     = soc.religion.christian
    0    	0    	0    	0    	0    	0    	0    	0    	0    	1    	1   0    	1    	0    	1    	2    	389  	0    	0    	2    	 |  397     q     = talk.politics.mideast
    0    	0    	1    	0    	1    	1    	0    	1    	0    	0    	0   2    	1    	1    	0    	0    	0    	335  	0    	6    	 |  349     r     = talk.politics.guns
    29   	1    	0    	1    	0    	0    	1    	0    	0    	1    	0   0    	0    	0    	2    	24   	0    	8    	197  	5    	 |  269     s     = talk.religion.misc
    2    	0    	0    	0    	2    	0    	0    	1    	0    	1    	1   1    	0    	1    	2    	0    	2    	17   	3    	284  	 |  317     t     = talk.politics.misc
    
    
    13/08/26 23:52:42 INFO driver.MahoutDriver: Program took 32480 ms (Minutes: 0.5413333333333333)

    在job信息可以看到全部的任务信息,如下:


    然后对照每个job信息,查看相应的mapper和reducer就可以分析这个算法了。


    分享,快乐,成长


    转载请注明出处:http://blog.csdn.net/fansy1990 




  • 相关阅读:
    IO 单个文件的多线程拷贝
    day30 进程 同步 异步 阻塞 非阻塞 并发 并行 创建进程 守护进程 僵尸进程与孤儿进程 互斥锁
    day31 进程间通讯,线程
    d29天 上传电影练习 UDP使用 ScketServer模块
    d28 scoket套接字 struct模块
    d27网络编程
    d24 反射,元类
    d23 多态,oop中常用的内置函数 类中常用内置函数
    d22 封装 property装饰器 接口 抽象类 鸭子类型
    d21天 继承
  • 原文地址:https://www.cnblogs.com/suncoolcat/p/3285734.html
Copyright © 2011-2022 走看看