zoukankan      html  css  js  c++  java
  • mahout 运行Twenty Newsgroups Classification实例

    按照mahout官网https://cwiki.apache.org/confluence/display/MAHOUT/Twenty+Newsgroups的说法,我只用运行一条命令就可以完成这个算法的调用了,如下:

    mahout@ubuntu:~/mahout-d-0.7/examples/bin$ ./classify-20newsgroups.sh 

    但是,我首先运行就出错了,因为我不是root账户,所以先改下路径,打开classify-20newsgroups.sh,替换/tmp/mahout-work-为/home/mahout/mahout-work-,这样用户mahout就具有了操作权限,但是还是出错,提示curl 找不到命令,好吧,我没安装这个,sudo apt-get install curl,ok ,ubuntu还是方便呀。

    然后再运行,结果运行到2/3时候还是出错,然后我查看详细信息,居然map输入的数据条数为0?啥意思?好吧,应该是本地文件操作和HDFS文件操作混淆了,其实在执行:

    + ./bin/mahout seqdirectory -i /home/mahout/mahout-work-mahout/20news-all -o /home/mahout/mahout-work-mahout/20news-seq

    这一步前应该把本地的20news-all上传到HDFS文件系统上面,然后重新执行第一条命令即可,全部信息如下(太多了,不知道贴的完不?):

    mahout@ubuntu:~/mahout-d-0.7/examples/bin$ ./classify-20newsgroups.sh 
    Please select a number to choose the corresponding task to run
    1. cnaivebayes
    2. naivebayes
    3. sgd
    4. clean -- cleans up the work area in /home/mahout/mahout-work-mahout
    Enter your choice : 2
    ok. You chose 2 and we'll use naivebayes
    creating work directory at /home/mahout/mahout-work-mahout
    + echo 'Preparing 20newsgroups data'
    Preparing 20newsgroups data
    + rm -rf /home/mahout/mahout-work-mahout/20news-all
    + mkdir /home/mahout/mahout-work-mahout/20news-all
    + cp -R /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/alt.atheism /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/comp.graphics /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/comp.os.ms-windows.misc /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/comp.sys.ibm.pc.hardware /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/comp.sys.mac.hardware /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/comp.windows.x /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/misc.forsale /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/rec.autos /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/rec.motorcycles /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/rec.sport.baseball /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/rec.sport.hockey /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/sci.crypt /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/sci.electronics /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/sci.med /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/sci.space /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/soc.religion.christian /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/talk.politics.guns /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/talk.politics.mideast /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/talk.politics.misc /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/talk.religion.misc /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/alt.atheism /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/comp.graphics /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/comp.os.ms-windows.misc /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/comp.sys.ibm.pc.hardware /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/comp.sys.mac.hardware /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/comp.windows.x /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/misc.forsale /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/rec.autos /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/rec.motorcycles /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/rec.sport.baseball /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/rec.sport.hockey /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/sci.crypt /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/sci.electronics /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/sci.med /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/sci.space /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/soc.religion.christian /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/talk.politics.guns /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/talk.politics.mideast /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/talk.politics.misc /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/talk.religion.misc /home/mahout/mahout-work-mahout/20news-all
    + echo 'Creating sequence files from 20newsgroups data'
    Creating sequence files from 20newsgroups data
    + ./bin/mahout seqdirectory -i /home/mahout/mahout-work-mahout/20news-all -o /home/mahout/mahout-work-mahout/20news-seq
    Warning: $HADOOP_HOME is deprecated.
    
    Running on hadoop, using /home/mahout/hadoop-1.0.4/bin/hadoop and HADOOP_CONF_DIR=
    MAHOUT-JOB: /home/mahout/mahout-d-0.7/mahout-examples-0.7-job.jar
    Warning: $HADOOP_HOME is deprecated.
    
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/mahout-examples-0.7-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    13/08/26 23:38:49 INFO common.AbstractJob: Command line arguments: {--charset=[UTF-8], --chunkSize=[64], --endPhase=[2147483647], --fileFilterClass=[org.apache.mahout.text.PrefixAdditionFilter], --input=[/home/mahout/mahout-work-mahout/20news-all], --keyPrefix=[], --output=[/home/mahout/mahout-work-mahout/20news-seq], --startPhase=[0], --tempDir=[temp]}
    13/08/26 23:42:57 INFO driver.MahoutDriver: Program took 248530 ms (Minutes: 4.142166666666666)
    + echo 'Converting sequence files to vectors'
    Converting sequence files to vectors
    + ./bin/mahout seq2sparse -i /home/mahout/mahout-work-mahout/20news-seq -o /home/mahout/mahout-work-mahout/20news-vectors -lnorm -nv -wt tfidf
    Warning: $HADOOP_HOME is deprecated.
    
    Running on hadoop, using /home/mahout/hadoop-1.0.4/bin/hadoop and HADOOP_CONF_DIR=
    MAHOUT-JOB: /home/mahout/mahout-d-0.7/mahout-examples-0.7-job.jar
    Warning: $HADOOP_HOME is deprecated.
    
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/mahout-examples-0.7-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    13/08/26 23:43:13 INFO vectorizer.SparseVectorsFromSequenceFiles: Maximum n-gram size is: 1
    13/08/26 23:43:13 INFO vectorizer.SparseVectorsFromSequenceFiles: Minimum LLR value: 1.0
    13/08/26 23:43:13 INFO vectorizer.SparseVectorsFromSequenceFiles: Number of reduce tasks: 1
    13/08/26 23:43:17 INFO input.FileInputFormat: Total input paths to process : 1
    13/08/26 23:43:17 INFO mapred.JobClient: Running job: job_201308212334_0056
    13/08/26 23:43:18 INFO mapred.JobClient:  map 0% reduce 0%
    13/08/26 23:43:45 INFO mapred.JobClient:  map 78% reduce 0%
    13/08/26 23:43:51 INFO mapred.JobClient:  map 100% reduce 0%
    13/08/26 23:43:56 INFO mapred.JobClient: Job complete: job_201308212334_0056
    13/08/26 23:43:56 INFO mapred.JobClient: Counters: 19
    13/08/26 23:43:56 INFO mapred.JobClient:   Job Counters 
    13/08/26 23:43:56 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=32883
    13/08/26 23:43:56 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
    13/08/26 23:43:56 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
    13/08/26 23:43:56 INFO mapred.JobClient:     Launched map tasks=1
    13/08/26 23:43:56 INFO mapred.JobClient:     Data-local map tasks=1
    13/08/26 23:43:56 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
    13/08/26 23:43:56 INFO mapred.JobClient:   File Output Format Counters 
    13/08/26 23:43:56 INFO mapred.JobClient:     Bytes Written=27503580
    13/08/26 23:43:56 INFO mapred.JobClient:   FileSystemCounters
    13/08/26 23:43:56 INFO mapred.JobClient:     HDFS_BYTES_READ=36694022
    13/08/26 23:43:56 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=21899
    13/08/26 23:43:56 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=27503580
    13/08/26 23:43:56 INFO mapred.JobClient:   File Input Format Counters 
    13/08/26 23:43:56 INFO mapred.JobClient:     Bytes Read=36693889
    13/08/26 23:43:56 INFO mapred.JobClient:   Map-Reduce Framework
    13/08/26 23:43:56 INFO mapred.JobClient:     Map input records=18846
    13/08/26 23:43:56 INFO mapred.JobClient:     Physical memory (bytes) snapshot=75157504
    13/08/26 23:43:56 INFO mapred.JobClient:     Spilled Records=0
    13/08/26 23:43:56 INFO mapred.JobClient:     CPU time spent (ms)=5730
    13/08/26 23:43:56 INFO mapred.JobClient:     Total committed heap usage (bytes)=15859712
    13/08/26 23:43:56 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=974381056
    13/08/26 23:43:56 INFO mapred.JobClient:     Map output records=18846
    13/08/26 23:43:56 INFO mapred.JobClient:     SPLIT_RAW_BYTES=133
    13/08/26 23:43:56 INFO input.FileInputFormat: Total input paths to process : 1
    13/08/26 23:43:56 INFO mapred.JobClient: Running job: job_201308212334_0057
    13/08/26 23:43:57 INFO mapred.JobClient:  map 0% reduce 0%
    13/08/26 23:44:15 INFO mapred.JobClient:  map 3% reduce 0%
    13/08/26 23:44:18 INFO mapred.JobClient:  map 23% reduce 0%
    13/08/26 23:44:21 INFO mapred.JobClient:  map 60% reduce 0%
    13/08/26 23:44:24 INFO mapred.JobClient:  map 100% reduce 0%
    13/08/26 23:44:48 INFO mapred.JobClient:  map 100% reduce 100%
    13/08/26 23:44:53 INFO mapred.JobClient: Job complete: job_201308212334_0057
    13/08/26 23:44:53 INFO mapred.JobClient: Counters: 29
    13/08/26 23:44:53 INFO mapred.JobClient:   Job Counters 
    13/08/26 23:44:53 INFO mapred.JobClient:     Launched reduce tasks=1
    13/08/26 23:44:53 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=31312
    13/08/26 23:44:53 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
    13/08/26 23:44:53 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
    13/08/26 23:44:53 INFO mapred.JobClient:     Launched map tasks=1
    13/08/26 23:44:53 INFO mapred.JobClient:     Data-local map tasks=1
    13/08/26 23:44:53 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=18422
    13/08/26 23:44:53 INFO mapred.JobClient:   File Output Format Counters 
    13/08/26 23:44:53 INFO mapred.JobClient:     Bytes Written=2315037
    13/08/26 23:44:53 INFO mapred.JobClient:   FileSystemCounters
    13/08/26 23:44:53 INFO mapred.JobClient:     FILE_BYTES_READ=11857906
    13/08/26 23:44:53 INFO mapred.JobClient:     HDFS_BYTES_READ=27503742
    13/08/26 23:44:53 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=15440401
    13/08/26 23:44:53 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=2315037
    13/08/26 23:44:53 INFO mapred.JobClient:   File Input Format Counters 
    13/08/26 23:44:53 INFO mapred.JobClient:     Bytes Read=27503580
    13/08/26 23:44:53 INFO mapred.JobClient:   Map-Reduce Framework
    13/08/26 23:44:53 INFO mapred.JobClient:     Map output materialized bytes=3538084
    13/08/26 23:44:53 INFO mapred.JobClient:     Map input records=18846
    13/08/26 23:44:53 INFO mapred.JobClient:     Reduce shuffle bytes=0
    13/08/26 23:44:53 INFO mapred.JobClient:     Spilled Records=849345
    13/08/26 23:44:53 INFO mapred.JobClient:     Map output bytes=39462740
    13/08/26 23:44:53 INFO mapred.JobClient:     Total committed heap usage (bytes)=176033792
    13/08/26 23:44:53 INFO mapred.JobClient:     CPU time spent (ms)=14080
    13/08/26 23:44:53 INFO mapred.JobClient:     Combine input records=3026242
    13/08/26 23:44:53 INFO mapred.JobClient:     SPLIT_RAW_BYTES=162
    13/08/26 23:44:53 INFO mapred.JobClient:     Reduce input records=192904
    13/08/26 23:44:53 INFO mapred.JobClient:     Reduce input groups=192904
    13/08/26 23:44:53 INFO mapred.JobClient:     Combine output records=554873
    13/08/26 23:44:53 INFO mapred.JobClient:     Physical memory (bytes) snapshot=283111424
    13/08/26 23:44:53 INFO mapred.JobClient:     Reduce output records=93563
    13/08/26 23:44:53 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=1957584896
    13/08/26 23:44:53 INFO mapred.JobClient:     Map output records=2664273
    13/08/26 23:44:54 INFO input.FileInputFormat: Total input paths to process : 1
    13/08/26 23:44:55 INFO mapred.JobClient: Running job: job_201308212334_0058
    13/08/26 23:44:56 INFO mapred.JobClient:  map 0% reduce 0%
    13/08/26 23:45:13 INFO mapred.JobClient:  map 94% reduce 0%
    13/08/26 23:45:16 INFO mapred.JobClient:  map 100% reduce 0%
    13/08/26 23:45:43 INFO mapred.JobClient:  map 100% reduce 100%
    13/08/26 23:45:48 INFO mapred.JobClient: Job complete: job_201308212334_0058
    13/08/26 23:45:48 INFO mapred.JobClient: Counters: 29
    13/08/26 23:45:48 INFO mapred.JobClient:   Job Counters 
    13/08/26 23:45:48 INFO mapred.JobClient:     Launched reduce tasks=1
    13/08/26 23:45:48 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=21298
    13/08/26 23:45:48 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
    13/08/26 23:45:48 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
    13/08/26 23:45:48 INFO mapred.JobClient:     Launched map tasks=1
    13/08/26 23:45:48 INFO mapred.JobClient:     Data-local map tasks=1
    13/08/26 23:45:48 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=24763
    13/08/26 23:45:48 INFO mapred.JobClient:   File Output Format Counters 
    13/08/26 23:45:48 INFO mapred.JobClient:     Bytes Written=29314118
    13/08/26 23:45:48 INFO mapred.JobClient:   FileSystemCounters
    13/08/26 23:45:48 INFO mapred.JobClient:     FILE_BYTES_READ=27274291
    13/08/26 23:45:48 INFO mapred.JobClient:     HDFS_BYTES_READ=29440826
    13/08/26 23:45:48 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=54595105
    13/08/26 23:45:48 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=29314118
    13/08/26 23:45:48 INFO mapred.JobClient:   File Input Format Counters 
    13/08/26 23:45:48 INFO mapred.JobClient:     Bytes Read=27503580
    13/08/26 23:45:48 INFO mapred.JobClient:   Map-Reduce Framework
    13/08/26 23:45:48 INFO mapred.JobClient:     Map output materialized bytes=27274291
    13/08/26 23:45:48 INFO mapred.JobClient:     Map input records=18846
    13/08/26 23:45:48 INFO mapred.JobClient:     Reduce shuffle bytes=0
    13/08/26 23:45:48 INFO mapred.JobClient:     Spilled Records=37692
    13/08/26 23:45:48 INFO mapred.JobClient:     Map output bytes=27199343
    13/08/26 23:45:48 INFO mapred.JobClient:     Total committed heap usage (bytes)=215695360
    13/08/26 23:45:48 INFO mapred.JobClient:     CPU time spent (ms)=12980
    13/08/26 23:45:48 INFO mapred.JobClient:     Combine input records=0
    13/08/26 23:45:48 INFO mapred.JobClient:     SPLIT_RAW_BYTES=162
    13/08/26 23:45:48 INFO mapred.JobClient:     Reduce input records=18846
    13/08/26 23:45:48 INFO mapred.JobClient:     Reduce input groups=18846
    13/08/26 23:45:48 INFO mapred.JobClient:     Combine output records=0
    13/08/26 23:45:48 INFO mapred.JobClient:     Physical memory (bytes) snapshot=332349440
    13/08/26 23:45:48 INFO mapred.JobClient:     Reduce output records=18846
    13/08/26 23:45:48 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=1957584896
    13/08/26 23:45:48 INFO mapred.JobClient:     Map output records=18846
    13/08/26 23:45:49 INFO input.FileInputFormat: Total input paths to process : 1
    13/08/26 23:45:49 INFO mapred.JobClient: Running job: job_201308212334_0059
    13/08/26 23:45:50 INFO mapred.JobClient:  map 0% reduce 0%
    13/08/26 23:46:10 INFO mapred.JobClient:  map 100% reduce 0%
    13/08/26 23:46:25 INFO mapred.JobClient:  map 100% reduce 92%
    13/08/26 23:46:31 INFO mapred.JobClient:  map 100% reduce 100%
    13/08/26 23:46:36 INFO mapred.JobClient: Job complete: job_201308212334_0059
    13/08/26 23:46:36 INFO mapred.JobClient: Counters: 29
    13/08/26 23:46:36 INFO mapred.JobClient:   Job Counters 
    13/08/26 23:46:36 INFO mapred.JobClient:     Launched reduce tasks=1
    13/08/26 23:46:36 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=18217
    13/08/26 23:46:36 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
    13/08/26 23:46:36 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
    13/08/26 23:46:36 INFO mapred.JobClient:     Launched map tasks=1
    13/08/26 23:46:36 INFO mapred.JobClient:     Data-local map tasks=1
    13/08/26 23:46:36 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=20981
    13/08/26 23:46:36 INFO mapred.JobClient:   File Output Format Counters 
    13/08/26 23:46:36 INFO mapred.JobClient:     Bytes Written=29314118
    13/08/26 23:46:36 INFO mapred.JobClient:   FileSystemCounters
    13/08/26 23:46:36 INFO mapred.JobClient:     FILE_BYTES_READ=29059398
    13/08/26 23:46:36 INFO mapred.JobClient:     HDFS_BYTES_READ=29314278
    13/08/26 23:46:36 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=58163419
    13/08/26 23:46:36 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=29314118
    13/08/26 23:46:36 INFO mapred.JobClient:   File Input Format Counters 
    13/08/26 23:46:36 INFO mapred.JobClient:     Bytes Read=29314118
    13/08/26 23:46:36 INFO mapred.JobClient:   Map-Reduce Framework
    13/08/26 23:46:36 INFO mapred.JobClient:     Map output materialized bytes=29059398
    13/08/26 23:46:36 INFO mapred.JobClient:     Map input records=18846
    13/08/26 23:46:36 INFO mapred.JobClient:     Reduce shuffle bytes=0
    13/08/26 23:46:36 INFO mapred.JobClient:     Spilled Records=37692
    13/08/26 23:46:36 INFO mapred.JobClient:     Map output bytes=28984080
    13/08/26 23:46:36 INFO mapred.JobClient:     Total committed heap usage (bytes)=205225984
    13/08/26 23:46:36 INFO mapred.JobClient:     CPU time spent (ms)=8650
    13/08/26 23:46:36 INFO mapred.JobClient:     Combine input records=0
    13/08/26 23:46:37 INFO mapred.JobClient:     SPLIT_RAW_BYTES=160
    13/08/26 23:46:37 INFO mapred.JobClient:     Reduce input records=18846
    13/08/26 23:46:37 INFO mapred.JobClient:     Reduce input groups=18846
    13/08/26 23:46:37 INFO mapred.JobClient:     Combine output records=0
    13/08/26 23:46:37 INFO mapred.JobClient:     Physical memory (bytes) snapshot=313606144
    13/08/26 23:46:37 INFO mapred.JobClient:     Reduce output records=18846
    13/08/26 23:46:37 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=1957584896
    13/08/26 23:46:37 INFO mapred.JobClient:     Map output records=18846
    13/08/26 23:46:37 INFO common.HadoopUtil: Deleting /home/mahout/mahout-work-mahout/20news-vectors/partial-vectors-0
    13/08/26 23:46:37 INFO input.FileInputFormat: Total input paths to process : 1
    13/08/26 23:46:37 INFO mapred.JobClient: Running job: job_201308212334_0060
    13/08/26 23:46:38 INFO mapred.JobClient:  map 0% reduce 0%
    13/08/26 23:46:56 INFO mapred.JobClient:  map 100% reduce 0%
    13/08/26 23:47:14 INFO mapred.JobClient:  map 100% reduce 100%
    13/08/26 23:47:19 INFO mapred.JobClient: Job complete: job_201308212334_0060
    13/08/26 23:47:19 INFO mapred.JobClient: Counters: 29
    13/08/26 23:47:19 INFO mapred.JobClient:   Job Counters 
    13/08/26 23:47:19 INFO mapred.JobClient:     Launched reduce tasks=1
    13/08/26 23:47:19 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=21504
    13/08/26 23:47:19 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
    13/08/26 23:47:19 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
    13/08/26 23:47:19 INFO mapred.JobClient:     Launched map tasks=1
    13/08/26 23:47:19 INFO mapred.JobClient:     Data-local map tasks=1
    13/08/26 23:47:19 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=14273
    13/08/26 23:47:19 INFO mapred.JobClient:   File Output Format Counters 
    13/08/26 23:47:19 INFO mapred.JobClient:     Bytes Written=1890073
    13/08/26 23:47:19 INFO mapred.JobClient:   FileSystemCounters
    13/08/26 23:47:19 INFO mapred.JobClient:     FILE_BYTES_READ=4880788
    13/08/26 23:47:19 INFO mapred.JobClient:     HDFS_BYTES_READ=29314271
    13/08/26 23:47:19 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=6235019
    13/08/26 23:47:19 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=1890073
    13/08/26 23:47:19 INFO mapred.JobClient:   File Input Format Counters 
    13/08/26 23:47:19 INFO mapred.JobClient:     Bytes Read=29314118
    13/08/26 23:47:19 INFO mapred.JobClient:   Map-Reduce Framework
    13/08/26 23:47:19 INFO mapred.JobClient:     Map output materialized bytes=1309902
    13/08/26 23:47:19 INFO mapred.JobClient:     Map input records=18846
    13/08/26 23:47:19 INFO mapred.JobClient:     Reduce shuffle bytes=0
    13/08/26 23:47:19 INFO mapred.JobClient:     Spilled Records=442187
    13/08/26 23:47:19 INFO mapred.JobClient:     Map output bytes=31005336
    13/08/26 23:47:19 INFO mapred.JobClient:     Total committed heap usage (bytes)=176033792
    13/08/26 23:47:19 INFO mapred.JobClient:     CPU time spent (ms)=9210
    13/08/26 23:47:19 INFO mapred.JobClient:     Combine input records=2838837
    13/08/26 23:47:19 INFO mapred.JobClient:     SPLIT_RAW_BYTES=153
    13/08/26 23:47:19 INFO mapred.JobClient:     Reduce input records=93564
    13/08/26 23:47:19 INFO mapred.JobClient:     Reduce input groups=93564
    13/08/26 23:47:19 INFO mapred.JobClient:     Combine output records=348623
    13/08/26 23:47:19 INFO mapred.JobClient:     Physical memory (bytes) snapshot=284684288
    13/08/26 23:47:19 INFO mapred.JobClient:     Reduce output records=93564
    13/08/26 23:47:19 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=1957584896
    13/08/26 23:47:19 INFO mapred.JobClient:     Map output records=2583778
    13/08/26 23:47:19 INFO input.FileInputFormat: Total input paths to process : 1
    13/08/26 23:47:19 INFO mapred.JobClient: Running job: job_201308212334_0061
    13/08/26 23:47:20 INFO mapred.JobClient:  map 0% reduce 0%
    13/08/26 23:47:38 INFO mapred.JobClient:  map 100% reduce 0%
    13/08/26 23:47:53 INFO mapred.JobClient:  map 100% reduce 67%
    13/08/26 23:47:59 INFO mapred.JobClient:  map 100% reduce 100%
    13/08/26 23:48:04 INFO mapred.JobClient: Job complete: job_201308212334_0061
    13/08/26 23:48:04 INFO mapred.JobClient: Counters: 29
    13/08/26 23:48:04 INFO mapred.JobClient:   Job Counters 
    13/08/26 23:48:04 INFO mapred.JobClient:     Launched reduce tasks=1
    13/08/26 23:48:04 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=18292
    13/08/26 23:48:04 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
    13/08/26 23:48:04 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
    13/08/26 23:48:04 INFO mapred.JobClient:     Launched map tasks=1
    13/08/26 23:48:04 INFO mapred.JobClient:     Data-local map tasks=1
    13/08/26 23:48:04 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=19293
    13/08/26 23:48:04 INFO mapred.JobClient:   File Output Format Counters 
    13/08/26 23:48:04 INFO mapred.JobClient:     Bytes Written=28689283
    13/08/26 23:48:04 INFO mapred.JobClient:   FileSystemCounters
    13/08/26 23:48:04 INFO mapred.JobClient:     FILE_BYTES_READ=29059398
    13/08/26 23:48:04 INFO mapred.JobClient:     HDFS_BYTES_READ=31204324
    13/08/26 23:48:04 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=58165045
    13/08/26 23:48:04 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=28689283
    13/08/26 23:48:04 INFO mapred.JobClient:   File Input Format Counters 
    13/08/26 23:48:04 INFO mapred.JobClient:     Bytes Read=29314118
    13/08/26 23:48:04 INFO mapred.JobClient:   Map-Reduce Framework
    13/08/26 23:48:04 INFO mapred.JobClient:     Map output materialized bytes=29059398
    13/08/26 23:48:04 INFO mapred.JobClient:     Map input records=18846
    13/08/26 23:48:04 INFO mapred.JobClient:     Reduce shuffle bytes=0
    13/08/26 23:48:04 INFO mapred.JobClient:     Spilled Records=37692
    13/08/26 23:48:04 INFO mapred.JobClient:     Map output bytes=28984080
    13/08/26 23:48:04 INFO mapred.JobClient:     Total committed heap usage (bytes)=205225984
    13/08/26 23:48:04 INFO mapred.JobClient:     CPU time spent (ms)=8770
    13/08/26 23:48:04 INFO mapred.JobClient:     Combine input records=0
    13/08/26 23:48:04 INFO mapred.JobClient:     SPLIT_RAW_BYTES=153
    13/08/26 23:48:04 INFO mapred.JobClient:     Reduce input records=18846
    13/08/26 23:48:04 INFO mapred.JobClient:     Reduce input groups=18846
    13/08/26 23:48:04 INFO mapred.JobClient:     Combine output records=0
    13/08/26 23:48:04 INFO mapred.JobClient:     Physical memory (bytes) snapshot=320401408
    13/08/26 23:48:04 INFO mapred.JobClient:     Reduce output records=18846
    13/08/26 23:48:04 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=1957584896
    13/08/26 23:48:04 INFO mapred.JobClient:     Map output records=18846
    13/08/26 23:48:05 INFO input.FileInputFormat: Total input paths to process : 1
    13/08/26 23:48:05 INFO mapred.JobClient: Running job: job_201308212334_0062
    13/08/26 23:48:06 INFO mapred.JobClient:  map 0% reduce 0%
    13/08/26 23:48:24 INFO mapred.JobClient:  map 100% reduce 0%
    13/08/26 23:48:36 INFO mapred.JobClient:  map 100% reduce 33%
    13/08/26 23:48:39 INFO mapred.JobClient:  map 100% reduce 86%
    13/08/26 23:48:48 INFO mapred.JobClient:  map 100% reduce 100%
    13/08/26 23:48:53 INFO mapred.JobClient: Job complete: job_201308212334_0062
    13/08/26 23:48:53 INFO mapred.JobClient: Counters: 29
    13/08/26 23:48:53 INFO mapred.JobClient:   Job Counters 
    13/08/26 23:48:53 INFO mapred.JobClient:     Launched reduce tasks=1
    13/08/26 23:48:53 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=18225
    13/08/26 23:48:53 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
    13/08/26 23:48:53 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
    13/08/26 23:48:53 INFO mapred.JobClient:     Launched map tasks=1
    13/08/26 23:48:53 INFO mapred.JobClient:     Data-local map tasks=1
    13/08/26 23:48:53 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=21045
    13/08/26 23:48:53 INFO mapred.JobClient:   File Output Format Counters 
    13/08/26 23:48:53 INFO mapred.JobClient:     Bytes Written=28689283
    13/08/26 23:48:53 INFO mapred.JobClient:   FileSystemCounters
    13/08/26 23:48:53 INFO mapred.JobClient:     FILE_BYTES_READ=28437750
    13/08/26 23:48:53 INFO mapred.JobClient:     HDFS_BYTES_READ=28689443
    13/08/26 23:48:53 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=56920127
    13/08/26 23:48:53 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=28689283
    13/08/26 23:48:53 INFO mapred.JobClient:   File Input Format Counters 
    13/08/26 23:48:53 INFO mapred.JobClient:     Bytes Read=28689283
    13/08/26 23:48:53 INFO mapred.JobClient:   Map-Reduce Framework
    13/08/26 23:48:53 INFO mapred.JobClient:     Map output materialized bytes=28437750
    13/08/26 23:48:53 INFO mapred.JobClient:     Map input records=18846
    13/08/26 23:48:53 INFO mapred.JobClient:     Reduce shuffle bytes=0
    13/08/26 23:48:53 INFO mapred.JobClient:     Spilled Records=37692
    13/08/26 23:48:53 INFO mapred.JobClient:     Map output bytes=28362505
    13/08/26 23:48:53 INFO mapred.JobClient:     Total committed heap usage (bytes)=204603392
    13/08/26 23:48:53 INFO mapred.JobClient:     CPU time spent (ms)=8340
    13/08/26 23:48:53 INFO mapred.JobClient:     Combine input records=0
    13/08/26 23:48:53 INFO mapred.JobClient:     SPLIT_RAW_BYTES=160
    13/08/26 23:48:53 INFO mapred.JobClient:     Reduce input records=18846
    13/08/26 23:48:53 INFO mapred.JobClient:     Reduce input groups=18846
    13/08/26 23:48:53 INFO mapred.JobClient:     Combine output records=0
    13/08/26 23:48:53 INFO mapred.JobClient:     Physical memory (bytes) snapshot=313868288
    13/08/26 23:48:53 INFO mapred.JobClient:     Reduce output records=18846
    13/08/26 23:48:53 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=1957584896
    13/08/26 23:48:53 INFO mapred.JobClient:     Map output records=18846
    13/08/26 23:48:53 INFO common.HadoopUtil: Deleting /home/mahout/mahout-work-mahout/20news-vectors/partial-vectors-0
    13/08/26 23:48:53 INFO driver.MahoutDriver: Program took 339621 ms (Minutes: 5.66035)
    + echo 'Creating training and holdout set with a random 80-20 split of the generated vector dataset'
    Creating training and holdout set with a random 80-20 split of the generated vector dataset
    + ./bin/mahout split -i /home/mahout/mahout-work-mahout/20news-vectors/tfidf-vectors --trainingOutput /home/mahout/mahout-work-mahout/20news-train-vectors --testOutput /home/mahout/mahout-work-mahout/20news-test-vectors --randomSelectionPct 40 --overwrite --sequenceFiles -xm sequential
    Warning: $HADOOP_HOME is deprecated.
    
    Running on hadoop, using /home/mahout/hadoop-1.0.4/bin/hadoop and HADOOP_CONF_DIR=
    MAHOUT-JOB: /home/mahout/mahout-d-0.7/mahout-examples-0.7-job.jar
    Warning: $HADOOP_HOME is deprecated.
    
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/mahout-examples-0.7-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    13/08/26 23:49:06 WARN driver.MahoutDriver: No split.props found on classpath, will use command-line arguments only
    13/08/26 23:49:07 INFO common.AbstractJob: Command line arguments: {--endPhase=[2147483647], --input=[/home/mahout/mahout-work-mahout/20news-vectors/tfidf-vectors], --method=[sequential], --overwrite=null, --randomSelectionPct=[40], --sequenceFiles=null, --startPhase=[0], --tempDir=[temp], --testOutput=[/home/mahout/mahout-work-mahout/20news-test-vectors], --trainingOutput=[/home/mahout/mahout-work-mahout/20news-train-vectors]}
    13/08/26 23:49:11 INFO utils.SplitInput: part-r-00000 has 162419 lines
    13/08/26 23:49:11 INFO utils.SplitInput: part-r-00000 test split size is 64968 based on random selection percentage 40
    13/08/26 23:49:11 INFO util.NativeCodeLoader: Loaded the native-hadoop library
    13/08/26 23:49:11 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
    13/08/26 23:49:11 INFO compress.CodecPool: Got brand-new compressor
    13/08/26 23:49:11 INFO compress.CodecPool: Got brand-new compressor
    13/08/26 23:49:16 INFO utils.SplitInput: file: part-r-00000, input: 162419 train: 11321, test: 7525 starting at 0
    13/08/26 23:49:16 INFO driver.MahoutDriver: Program took 9786 ms (Minutes: 0.1631)
    + echo 'Training Naive Bayes model'
    Training Naive Bayes model
    + ./bin/mahout trainnb -i /home/mahout/mahout-work-mahout/20news-train-vectors -el -o /home/mahout/mahout-work-mahout/model -li /home/mahout/mahout-work-mahout/labelindex -ow
    Warning: $HADOOP_HOME is deprecated.
    
    Running on hadoop, using /home/mahout/hadoop-1.0.4/bin/hadoop and HADOOP_CONF_DIR=
    MAHOUT-JOB: /home/mahout/mahout-d-0.7/mahout-examples-0.7-job.jar
    Warning: $HADOOP_HOME is deprecated.
    
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/mahout-examples-0.7-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    13/08/26 23:49:22 WARN driver.MahoutDriver: No trainnb.props found on classpath, will use command-line arguments only
    13/08/26 23:49:22 INFO common.AbstractJob: Command line arguments: {--alphaI=[1.0], --endPhase=[2147483647], --extractLabels=null, --input=[/home/mahout/mahout-work-mahout/20news-train-vectors], --labelIndex=[/home/mahout/mahout-work-mahout/labelindex], --output=[/home/mahout/mahout-work-mahout/model], --overwrite=null, --startPhase=[0], --tempDir=[temp]}
    13/08/26 23:49:23 INFO common.HadoopUtil: Deleting temp
    13/08/26 23:49:23 INFO util.NativeCodeLoader: Loaded the native-hadoop library
    13/08/26 23:49:23 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
    13/08/26 23:49:23 INFO compress.CodecPool: Got brand-new decompressor
    13/08/26 23:49:26 INFO input.FileInputFormat: Total input paths to process : 1
    13/08/26 23:49:26 INFO mapred.JobClient: Running job: job_201308212334_0063
    13/08/26 23:49:27 INFO mapred.JobClient:  map 0% reduce 0%
    13/08/26 23:49:49 INFO mapred.JobClient:  map 43% reduce 0%
    13/08/26 23:49:52 INFO mapred.JobClient:  map 100% reduce 0%
    13/08/26 23:50:13 INFO mapred.JobClient:  map 100% reduce 100%
    13/08/26 23:50:18 INFO mapred.JobClient: Job complete: job_201308212334_0063
    13/08/26 23:50:18 INFO mapred.JobClient: Counters: 29
    13/08/26 23:50:18 INFO mapred.JobClient:   Job Counters 
    13/08/26 23:50:18 INFO mapred.JobClient:     Launched reduce tasks=1
    13/08/26 23:50:18 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=22816
    13/08/26 23:50:18 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
    13/08/26 23:50:18 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
    13/08/26 23:50:18 INFO mapred.JobClient:     Launched map tasks=1
    13/08/26 23:50:18 INFO mapred.JobClient:     Data-local map tasks=1
    13/08/26 23:50:18 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=20680
    13/08/26 23:50:18 INFO mapred.JobClient:   File Output Format Counters 
    13/08/26 23:50:18 INFO mapred.JobClient:     Bytes Written=2718605
    13/08/26 23:50:18 INFO mapred.JobClient:   FileSystemCounters
    13/08/26 23:50:18 INFO mapred.JobClient:     FILE_BYTES_READ=1404371
    13/08/26 23:50:18 INFO mapred.JobClient:     HDFS_BYTES_READ=12669237
    13/08/26 23:50:18 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=2854477
    13/08/26 23:50:18 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=2718605
    13/08/26 23:50:18 INFO mapred.JobClient:   File Input Format Counters 
    13/08/26 23:50:18 INFO mapred.JobClient:     Bytes Read=12668431
    13/08/26 23:50:18 INFO mapred.JobClient:   Map-Reduce Framework
    13/08/26 23:50:18 INFO mapred.JobClient:     Map output materialized bytes=1404363
    13/08/26 23:50:18 INFO mapred.JobClient:     Map input records=11321
    13/08/26 23:50:18 INFO mapred.JobClient:     Reduce shuffle bytes=1404363
    13/08/26 23:50:18 INFO mapred.JobClient:     Spilled Records=40
    13/08/26 23:50:18 INFO mapred.JobClient:     Map output bytes=16682576
    13/08/26 23:50:18 INFO mapred.JobClient:     Total committed heap usage (bytes)=176164864
    13/08/26 23:50:18 INFO mapred.JobClient:     CPU time spent (ms)=8190
    13/08/26 23:50:18 INFO mapred.JobClient:     Combine input records=11321
    13/08/26 23:50:18 INFO mapred.JobClient:     SPLIT_RAW_BYTES=148
    13/08/26 23:50:18 INFO mapred.JobClient:     Reduce input records=20
    13/08/26 23:50:18 INFO mapred.JobClient:     Reduce input groups=20
    13/08/26 23:50:18 INFO mapred.JobClient:     Combine output records=20
    13/08/26 23:50:18 INFO mapred.JobClient:     Physical memory (bytes) snapshot=294400000
    13/08/26 23:50:18 INFO mapred.JobClient:     Reduce output records=20
    13/08/26 23:50:18 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=1961967616
    13/08/26 23:50:18 INFO mapred.JobClient:     Map output records=11321
    13/08/26 23:50:18 INFO input.FileInputFormat: Total input paths to process : 1
    13/08/26 23:50:18 INFO mapred.JobClient: Running job: job_201308212334_0064
    13/08/26 23:50:19 INFO mapred.JobClient:  map 0% reduce 0%
    13/08/26 23:50:40 INFO mapred.JobClient:  map 100% reduce 0%
    13/08/26 23:51:01 INFO mapred.JobClient:  map 100% reduce 100%
    13/08/26 23:51:06 INFO mapred.JobClient: Job complete: job_201308212334_0064
    13/08/26 23:51:06 INFO mapred.JobClient: Counters: 29
    13/08/26 23:51:06 INFO mapred.JobClient:   Job Counters 
    13/08/26 23:51:06 INFO mapred.JobClient:     Launched reduce tasks=1
    13/08/26 23:51:06 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=24609
    13/08/26 23:51:06 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
    13/08/26 23:51:06 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
    13/08/26 23:51:06 INFO mapred.JobClient:     Launched map tasks=1
    13/08/26 23:51:06 INFO mapred.JobClient:     Data-local map tasks=1
    13/08/26 23:51:06 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=15258
    13/08/26 23:51:06 INFO mapred.JobClient:   File Output Format Counters 
    13/08/26 23:51:06 INFO mapred.JobClient:     Bytes Written=893560
    13/08/26 23:51:06 INFO mapred.JobClient:   FileSystemCounters
    13/08/26 23:51:06 INFO mapred.JobClient:     FILE_BYTES_READ=362674
    13/08/26 23:51:06 INFO mapred.JobClient:     HDFS_BYTES_READ=2718737
    13/08/26 23:51:06 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=771195
    13/08/26 23:51:06 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=893560
    13/08/26 23:51:06 INFO mapred.JobClient:   File Input Format Counters 
    13/08/26 23:51:06 INFO mapred.JobClient:     Bytes Read=2718605
    13/08/26 23:51:06 INFO mapred.JobClient:   Map-Reduce Framework
    13/08/26 23:51:06 INFO mapred.JobClient:     Map output materialized bytes=362666
    13/08/26 23:51:06 INFO mapred.JobClient:     Map input records=20
    13/08/26 23:51:06 INFO mapred.JobClient:     Reduce shuffle bytes=362666
    13/08/26 23:51:06 INFO mapred.JobClient:     Spilled Records=4
    13/08/26 23:51:06 INFO mapred.JobClient:     Map output bytes=893434
    13/08/26 23:51:06 INFO mapred.JobClient:     Total committed heap usage (bytes)=223264768
    13/08/26 23:51:06 INFO mapred.JobClient:     CPU time spent (ms)=5370
    13/08/26 23:51:06 INFO mapred.JobClient:     Combine input records=2
    13/08/26 23:51:06 INFO mapred.JobClient:     SPLIT_RAW_BYTES=132
    13/08/26 23:51:06 INFO mapred.JobClient:     Reduce input records=2
    13/08/26 23:51:06 INFO mapred.JobClient:     Reduce input groups=2
    13/08/26 23:51:06 INFO mapred.JobClient:     Combine output records=2
    13/08/26 23:51:06 INFO mapred.JobClient:     Physical memory (bytes) snapshot=300597248
    13/08/26 23:51:06 INFO mapred.JobClient:     Reduce output records=2
    13/08/26 23:51:06 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=1961967616
    13/08/26 23:51:06 INFO mapred.JobClient:     Map output records=2
    13/08/26 23:51:07 INFO driver.MahoutDriver: Program took 104944 ms (Minutes: 1.7490666666666668)
    + echo 'Self testing on training set'
    Self testing on training set
    + ./bin/mahout testnb -i /home/mahout/mahout-work-mahout/20news-train-vectors -m /home/mahout/mahout-work-mahout/model -l /home/mahout/mahout-work-mahout/labelindex -ow -o /home/mahout/mahout-work-mahout/20news-testing
    Warning: $HADOOP_HOME is deprecated.
    
    Running on hadoop, using /home/mahout/hadoop-1.0.4/bin/hadoop and HADOOP_CONF_DIR=
    MAHOUT-JOB: /home/mahout/mahout-d-0.7/mahout-examples-0.7-job.jar
    Warning: $HADOOP_HOME is deprecated.
    
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/mahout-examples-0.7-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    13/08/26 23:51:19 WARN driver.MahoutDriver: No testnb.props found on classpath, will use command-line arguments only
    13/08/26 23:51:19 INFO common.AbstractJob: Command line arguments: {--endPhase=[2147483647], --input=[/home/mahout/mahout-work-mahout/20news-train-vectors], --labelIndex=[/home/mahout/mahout-work-mahout/labelindex], --model=[/home/mahout/mahout-work-mahout/model], --output=[/home/mahout/mahout-work-mahout/20news-testing], --overwrite=null, --startPhase=[0], --tempDir=[temp]}
    13/08/26 23:51:20 INFO input.FileInputFormat: Total input paths to process : 1
    13/08/26 23:51:21 INFO mapred.JobClient: Running job: job_201308212334_0065
    13/08/26 23:51:22 INFO mapred.JobClient:  map 0% reduce 0%
    13/08/26 23:51:45 INFO mapred.JobClient:  map 51% reduce 0%
    13/08/26 23:51:48 INFO mapred.JobClient:  map 89% reduce 0%
    13/08/26 23:51:54 INFO mapred.JobClient:  map 100% reduce 0%
    13/08/26 23:51:58 INFO mapred.JobClient: Job complete: job_201308212334_0065
    13/08/26 23:51:58 INFO mapred.JobClient: Counters: 19
    13/08/26 23:51:58 INFO mapred.JobClient:   Job Counters 
    13/08/26 23:51:58 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=34216
    13/08/26 23:51:58 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
    13/08/26 23:51:58 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
    13/08/26 23:51:58 INFO mapred.JobClient:     Launched map tasks=1
    13/08/26 23:51:58 INFO mapred.JobClient:     Data-local map tasks=1
    13/08/26 23:51:58 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
    13/08/26 23:51:58 INFO mapred.JobClient:   File Output Format Counters 
    13/08/26 23:51:58 INFO mapred.JobClient:     Bytes Written=2132486
    13/08/26 23:51:58 INFO mapred.JobClient:   FileSystemCounters
    13/08/26 23:51:58 INFO mapred.JobClient:     HDFS_BYTES_READ=16279896
    13/08/26 23:51:58 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=22523
    13/08/26 23:51:58 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=2132486
    13/08/26 23:51:58 INFO mapred.JobClient:   File Input Format Counters 
    13/08/26 23:51:58 INFO mapred.JobClient:     Bytes Read=12668431
    13/08/26 23:51:58 INFO mapred.JobClient:   Map-Reduce Framework
    13/08/26 23:51:58 INFO mapred.JobClient:     Map input records=11321
    13/08/26 23:51:58 INFO mapred.JobClient:     Physical memory (bytes) snapshot=87547904
    13/08/26 23:51:58 INFO mapred.JobClient:     Spilled Records=0
    13/08/26 23:51:58 INFO mapred.JobClient:     CPU time spent (ms)=9380
    13/08/26 23:51:58 INFO mapred.JobClient:     Total committed heap usage (bytes)=28131328
    13/08/26 23:51:58 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=976572416
    13/08/26 23:51:58 INFO mapred.JobClient:     Map output records=11321
    13/08/26 23:51:58 INFO mapred.JobClient:     SPLIT_RAW_BYTES=148
    13/08/26 23:51:59 INFO test.TestNaiveBayesDriver: Standard NB Results: =======================================================
    Summary
    -------------------------------------------------------
    Correctly Classified Instances          :      11256	   99.4258%
    Incorrectly Classified Instances        :         65	    0.5742%
    Total Classified Instances              :      11321
    
    =======================================================
    Confusion Matrix
    -------------------------------------------------------
    a    	b    	c    	d    	e    	f    	g    	h    	i    	j    	k   l    	m    	n    	o    	p    	q    	r    	s    	t    	<--Classified as
    454  	0    	0    	1    	0    	0    	0    	0    	0    	0    	0   0    	0    	0    	0    	0    	0    	0    	3    	0    	 |  458     a     = alt.atheism
    0    	588  	0    	3    	0    	2    	0    	0    	0    	0    	0   1    	0    	1    	0    	0    	0    	0    	0    	0    	 |  595     b     = comp.graphics
    0    	3    	553  	7    	0    	0    	0    	0    	0    	0    	0   0    	0    	0    	0    	0    	0    	0    	0    	0    	 |  563     c     = comp.os.ms-windows.misc
    0    	0    	0    	592  	1    	0    	2    	0    	0    	0    	0   0    	0    	0    	0    	0    	0    	0    	0    	0    	 |  595     d     = comp.sys.ibm.pc.hardware
    0    	0    	0    	1    	593  	0    	0    	0    	0    	0    	0   0    	0    	0    	0    	0    	0    	0    	0    	0    	 |  594     e     = comp.sys.mac.hardware
    0    	2    	0    	1    	0    	576  	1    	0    	0    	0    	0   0    	0    	0    	1    	0    	0    	0    	0    	0    	 |  581     f     = comp.windows.x
    0    	1    	0    	0    	0    	0    	579  	0    	0    	0    	0   0    	1    	0    	0    	0    	0    	0    	0    	0    	 |  581     g     = misc.forsale
    0    	0    	0    	0    	0    	0    	1    	594  	0    	0    	0   0    	1    	0    	0    	0    	0    	0    	0    	0    	 |  596     h     = rec.autos
    0    	0    	0    	0    	0    	0    	1    	2    	591  	0    	0   0    	0    	0    	0    	0    	0    	0    	0    	0    	 |  594     i     = rec.motorcycles
    0    	0    	0    	0    	0    	0    	0    	0    	0    	615  	1   0    	0    	0    	0    	0    	0    	0    	0    	0    	 |  616     j     = rec.sport.baseball
    0    	0    	0    	0    	0    	0    	1    	0    	0    	1    	581 0    	0    	0    	0    	0    	0    	0    	0    	0    	 |  583     k     = rec.sport.hockey
    0    	0    	1    	0    	0    	0    	0    	0    	0    	0    	0   627  	1    	0    	0    	0    	0    	1    	0    	0    	 |  630     l     = sci.crypt
    0    	0    	0    	2    	0    	0    	1    	0    	0    	0    	0   0    	588  	0    	0    	0    	0    	0    	0    	0    	 |  591     m     = sci.electronics
    0    	1    	0    	0    	0    	0    	0    	0    	0    	0    	0   0    	0    	586  	1    	0    	0    	0    	0    	0    	 |  588     n     = sci.med
    0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0   0    	0    	0    	615  	0    	0    	0    	0    	0    	 |  615     o     = sci.space
    0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0   0    	0    	0    	0    	619  	1    	0    	0    	0    	 |  620     p     = soc.religion.christian
    1    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0   0    	0    	0    	0    	1    	541  	0    	0    	0    	 |  543     q     = talk.politics.mideast
    0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0   1    	0    	0    	0    	0    	0    	560  	0    	0    	 |  561     r     = talk.politics.guns
    3    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0   0    	0    	0    	0    	4    	0    	1    	351  	0    	 |  359     s     = talk.religion.misc
    0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0   1    	0    	0    	0    	0    	0    	4    	0    	453  	 |  458     t     = talk.politics.misc
    
    
    13/08/26 23:51:59 INFO driver.MahoutDriver: Program took 40214 ms (Minutes: 0.6702333333333333)
    + echo 'Testing on holdout set'
    Testing on holdout set
    + ./bin/mahout testnb -i /home/mahout/mahout-work-mahout/20news-test-vectors -m /home/mahout/mahout-work-mahout/model -l /home/mahout/mahout-work-mahout/labelindex -ow -o /home/mahout/mahout-work-mahout/20news-testing
    Warning: $HADOOP_HOME is deprecated.
    
    Running on hadoop, using /home/mahout/hadoop-1.0.4/bin/hadoop and HADOOP_CONF_DIR=
    MAHOUT-JOB: /home/mahout/mahout-d-0.7/mahout-examples-0.7-job.jar
    Warning: $HADOOP_HOME is deprecated.
    
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/mahout-examples-0.7-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    13/08/26 23:52:09 WARN driver.MahoutDriver: No testnb.props found on classpath, will use command-line arguments only
    13/08/26 23:52:09 INFO common.AbstractJob: Command line arguments: {--endPhase=[2147483647], --input=[/home/mahout/mahout-work-mahout/20news-test-vectors], --labelIndex=[/home/mahout/mahout-work-mahout/labelindex], --model=[/home/mahout/mahout-work-mahout/model], --output=[/home/mahout/mahout-work-mahout/20news-testing], --overwrite=null, --startPhase=[0], --tempDir=[temp]}
    13/08/26 23:52:10 INFO common.HadoopUtil: Deleting /home/mahout/mahout-work-mahout/20news-testing
    13/08/26 23:52:10 INFO input.FileInputFormat: Total input paths to process : 1
    13/08/26 23:52:11 INFO mapred.JobClient: Running job: job_201308212334_0066
    13/08/26 23:52:12 INFO mapred.JobClient:  map 0% reduce 0%
    13/08/26 23:52:30 INFO mapred.JobClient:  map 85% reduce 0%
    13/08/26 23:52:36 INFO mapred.JobClient:  map 100% reduce 0%
    13/08/26 23:52:41 INFO mapred.JobClient: Job complete: job_201308212334_0066
    13/08/26 23:52:41 INFO mapred.JobClient: Counters: 19
    13/08/26 23:52:41 INFO mapred.JobClient:   Job Counters 
    13/08/26 23:52:41 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=25113
    13/08/26 23:52:41 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
    13/08/26 23:52:41 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
    13/08/26 23:52:41 INFO mapred.JobClient:     Launched map tasks=1
    13/08/26 23:52:41 INFO mapred.JobClient:     Data-local map tasks=1
    13/08/26 23:52:41 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
    13/08/26 23:52:41 INFO mapred.JobClient:   File Output Format Counters 
    13/08/26 23:52:41 INFO mapred.JobClient:     Bytes Written=1417942
    13/08/26 23:52:41 INFO mapred.JobClient:   FileSystemCounters
    13/08/26 23:52:41 INFO mapred.JobClient:     HDFS_BYTES_READ=12148944
    13/08/26 23:52:41 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=22522
    13/08/26 23:52:41 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=1417942
    13/08/26 23:52:41 INFO mapred.JobClient:   File Input Format Counters 
    13/08/26 23:52:41 INFO mapred.JobClient:     Bytes Read=8537480
    13/08/26 23:52:41 INFO mapred.JobClient:   Map-Reduce Framework
    13/08/26 23:52:41 INFO mapred.JobClient:     Map input records=7525
    13/08/26 23:52:41 INFO mapred.JobClient:     Physical memory (bytes) snapshot=85057536
    13/08/26 23:52:41 INFO mapred.JobClient:     Spilled Records=0
    13/08/26 23:52:41 INFO mapred.JobClient:     CPU time spent (ms)=6630
    13/08/26 23:52:41 INFO mapred.JobClient:     Total committed heap usage (bytes)=28155904
    13/08/26 23:52:41 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=976572416
    13/08/26 23:52:41 INFO mapred.JobClient:     Map output records=7525
    13/08/26 23:52:41 INFO mapred.JobClient:     SPLIT_RAW_BYTES=147
    13/08/26 23:52:42 INFO test.TestNaiveBayesDriver: Standard NB Results: =======================================================
    Summary
    -------------------------------------------------------
    Correctly Classified Instances          :       6801	   90.3787%
    Incorrectly Classified Instances        :        724	    9.6213%
    Total Classified Instances              :       7525
    
    =======================================================
    Confusion Matrix
    -------------------------------------------------------
    a    	b    	c    	d    	e    	f    	g    	h    	i    	j    	k   l    	m    	n    	o    	p    	q    	r    	s    	t    	<--Classified as
    318  	0    	0    	0    	1    	0    	0    	0    	1    	0    	0   0    	0    	0    	1    	4    	0    	0    	15   	1    	 |  341     a     = alt.atheism
    1    	318  	7    	20   	4    	7    	7    	2    	0    	1    	0   1    	1    	2    	6    	0    	0    	0    	0    	1    	 |  378     b     = comp.graphics
    0    	25   	277  	78   	12   	15   	5    	0    	0    	0    	0   2    	4    	0    	1    	0    	0    	0    	0    	3    	 |  422     c     = comp.os.ms-windows.misc
    1    	4    	3    	336  	20   	3    	8    	0    	0    	0    	0   1    	11   	0    	0    	0    	0    	0    	0    	0    	 |  387     d     = comp.sys.ibm.pc.hardware
    0    	3    	1    	6    	350  	1    	3    	0    	0    	0    	0   1    	3    	1    	0    	0    	0    	0    	0    	0    	 |  369     e     = comp.sys.mac.hardware
    1    	20   	3    	6    	7    	365  	3    	0    	0    	0    	0   1    	0    	0    	0    	0    	1    	0    	0    	0    	 |  407     f     = comp.windows.x
    0    	1    	1    	19   	8    	0    	329  	13   	1    	0    	0   2    	14   	0    	4    	0    	0    	1    	1    	0    	 |  394     g     = misc.forsale
    0    	2    	1    	2    	3    	1    	10   	361  	8    	0    	0   0    	4    	0    	0    	0    	0    	1    	0    	1    	 |  394     h     = rec.autos
    0    	0    	0    	1    	0    	0    	2    	3    	393  	1    	0   0    	0    	0    	0    	0    	0    	1    	0    	1    	 |  402     i     = rec.motorcycles
    0    	0    	0    	1    	0    	0    	2    	3    	0    	360  	6   0    	2    	2    	1    	0    	0    	0    	0    	1    	 |  378     j     = rec.sport.baseball
    0    	1    	0    	2    	1    	0    	0    	0    	2    	5    	401 0    	1    	0    	0    	1    	0    	0    	0    	2    	 |  416     k     = rec.sport.hockey
    1    	1    	0    	1    	3    	2    	1    	1    	0    	0    	0   344  	1    	1    	2    	0    	1    	1    	0    	1    	 |  361     l     = sci.crypt
    0    	5    	0    	15   	14   	0    	5    	1    	1    	0    	0   2    	348  	1    	1    	0    	0    	0    	0    	0    	 |  393     m     = sci.electronics
    1    	2    	1    	1    	1    	0    	1    	0    	0    	1    	0   1    	4    	381  	5    	0    	0    	1    	1    	1    	 |  402     n     = sci.med
    1    	4    	0    	0    	2    	0    	2    	1    	0    	0    	0   1    	2    	1    	356  	0    	0    	1    	0    	1    	 |  372     o     = sci.space
    5    	0    	0    	1    	1    	0    	0    	1    	0    	0    	1   0    	0    	1    	0    	359  	3    	0    	4    	1    	 |  377     p     = soc.religion.christian
    0    	0    	0    	0    	0    	0    	0    	0    	0    	1    	1   0    	1    	0    	1    	2    	389  	0    	0    	2    	 |  397     q     = talk.politics.mideast
    0    	0    	1    	0    	1    	1    	0    	1    	0    	0    	0   2    	1    	1    	0    	0    	0    	335  	0    	6    	 |  349     r     = talk.politics.guns
    29   	1    	0    	1    	0    	0    	1    	0    	0    	1    	0   0    	0    	0    	2    	24   	0    	8    	197  	5    	 |  269     s     = talk.religion.misc
    2    	0    	0    	0    	2    	0    	0    	1    	0    	1    	1   1    	0    	1    	2    	0    	2    	17   	3    	284  	 |  317     t     = talk.politics.misc
    
    
    13/08/26 23:52:42 INFO driver.MahoutDriver: Program took 32480 ms (Minutes: 0.5413333333333333)

    在job信息可以看到全部的任务信息,如下:


    然后对照每个job信息,查看相应的mapper和reducer就可以分析这个算法了。


    分享,快乐,成长


    转载请注明出处:http://blog.csdn.net/fansy1990 




  • 相关阅读:
    python selenium T5
    python selenium T4
    python selenium T3
    python selenium T2
    python selenium T1
    day1——变量,if语句 | day2——while循环
    Python Day48 mysql补充
    Python Day47索引
    Python Day46 MySQL数据备份
    Python Day45多表连接查询
  • 原文地址:https://www.cnblogs.com/suncoolcat/p/3285734.html
Copyright © 2011-2022 走看看