zoukankan      html  css  js  c++  java
  • mahout-distribution-0.9.tar.gz的安装的与配置、启动与运行自带的mahout算法

       不多说,直接上干货!

      首先,别在windows下搭建什么,安装什么Cygwin啊!直接在linux,对于企业里推荐用CentOS6.5,在学校里用Ubuntu。

    Mahout安装所需软件清单:

    软件        版本          说明

    操作系统    CentOS6.5        64位

    JDK      jdk1.7.0_79        

    Hadoop      2.6.0          

    Mahout     mahout-distribution-0.8    

       为什么采用这个版本,而不是0.9及其以后的版本,是因为差别有点大,比如fpg关联规则算法。以及网上参考资料少

      说在前面的话,

      关于Mahout的安装配置,这里介绍两种方式:其一,下载源码(直接下载源码或者通过svn下载源码都可以),然后使用Maven进行编译;其二,下载完整包进行解压缩。这里我使用的是完整包进行解压缩安装。

    一、 mahout-distribution-0.8.tar.gz的下载

    http://archive.apache.org/dist/mahout/0.8/

       我这里,以稳定版本mahout-0.9为例

      当然,这里也可以使用wget命令在线下载,很简单,不多说。


    二、 mahout-distribution-0.8.tar.gz的安装

      1、先新建好目录

      我一般喜欢在/usr/loca/下新建

    [root@djt002 local]# pwd
    /usr/local
    [root@djt002 local]# ll
    total 72
    drwxr-xr-x. 2 root   root   4096 Sep 23  2011 bin
    drwxr-xr-x. 2 hadoop hadoop 4096 Mar 14 06:19 data
    drwxr-xr-x. 3 hadoop hadoop 4096 Feb 21 23:10 elasticsearch
    drwxr-xr-x. 2 root   root   4096 Sep 23  2011 etc
    drwxr-xr-x. 3 hadoop hadoop 4096 Jan 17 17:14 flume
    drwxr-xr-x. 2 root   root   4096 Sep 23  2011 games
    drwxr-xr-x. 3 hadoop hadoop 4096 Jan 16 23:33 hadoop
    drwxr-xr-x. 3 hadoop hadoop 4096 Mar 16 18:26 hbase
    drwxr-xr-x. 4 hadoop hadoop 4096 Mar 14 17:48 hive
    drwxr-xr-x. 2 root   root   4096 Sep 23  2011 include
    drwxr-xr-x. 3 hadoop hadoop 4096 Jan 16 23:25 jdk
    drwxr-xr-x. 2 root   root   4096 Sep 23  2011 lib
    drwxr-xr-x. 2 root   root   4096 Sep 23  2011 lib64
    drwxr-xr-x. 2 root   root   4096 Sep 23  2011 libexec
    drwxr-xr-x. 2 root   root   4096 Sep 23  2011 sbin
    drwxr-xr-x. 5 root   root   4096 Jan 16 20:09 share
    drwxr-xr-x. 4 hadoop hadoop 4096 Mar 17 23:33 sqoop
    drwxr-xr-x. 2 root   root   4096 Sep 23  2011 src
    [root@djt002 local]# mkdir mahout
    [root@djt002 local]# ll
    total 76
    drwxr-xr-x. 2 root   root   4096 Sep 23  2011 bin
    drwxr-xr-x. 2 hadoop hadoop 4096 Mar 14 06:19 data
    drwxr-xr-x. 3 hadoop hadoop 4096 Feb 21 23:10 elasticsearch
    drwxr-xr-x. 2 root   root   4096 Sep 23  2011 etc
    drwxr-xr-x. 3 hadoop hadoop 4096 Jan 17 17:14 flume
    drwxr-xr-x. 2 root   root   4096 Sep 23  2011 games
    drwxr-xr-x. 3 hadoop hadoop 4096 Jan 16 23:33 hadoop
    drwxr-xr-x. 3 hadoop hadoop 4096 Mar 16 18:26 hbase
    drwxr-xr-x. 4 hadoop hadoop 4096 Mar 14 17:48 hive
    drwxr-xr-x. 2 root   root   4096 Sep 23  2011 include
    drwxr-xr-x. 3 hadoop hadoop 4096 Jan 16 23:25 jdk
    drwxr-xr-x. 2 root   root   4096 Sep 23  2011 lib
    drwxr-xr-x. 2 root   root   4096 Sep 23  2011 lib64
    drwxr-xr-x. 2 root   root   4096 Sep 23  2011 libexec
    drwxr-xr-x  2 root   root   4096 Apr  7 00:21 mahout
    drwxr-xr-x. 2 root   root   4096 Sep 23  2011 sbin
    drwxr-xr-x. 5 root   root   4096 Jan 16 20:09 share
    drwxr-xr-x. 4 hadoop hadoop 4096 Mar 17 23:33 sqoop
    drwxr-xr-x. 2 root   root   4096 Sep 23  2011 src
    [root@djt002 local]# chown -R hadoop:hadoop mahout
    [root@djt002 local]# ll
    total 76
    drwxr-xr-x. 2 root   root   4096 Sep 23  2011 bin
    drwxr-xr-x. 2 hadoop hadoop 4096 Mar 14 06:19 data
    drwxr-xr-x. 3 hadoop hadoop 4096 Feb 21 23:10 elasticsearch
    drwxr-xr-x. 2 root   root   4096 Sep 23  2011 etc
    drwxr-xr-x. 3 hadoop hadoop 4096 Jan 17 17:14 flume
    drwxr-xr-x. 2 root   root   4096 Sep 23  2011 games
    drwxr-xr-x. 3 hadoop hadoop 4096 Jan 16 23:33 hadoop
    drwxr-xr-x. 3 hadoop hadoop 4096 Mar 16 18:26 hbase
    drwxr-xr-x. 4 hadoop hadoop 4096 Mar 14 17:48 hive
    drwxr-xr-x. 2 root   root   4096 Sep 23  2011 include
    drwxr-xr-x. 3 hadoop hadoop 4096 Jan 16 23:25 jdk
    drwxr-xr-x. 2 root   root   4096 Sep 23  2011 lib
    drwxr-xr-x. 2 root   root   4096 Sep 23  2011 lib64
    drwxr-xr-x. 2 root   root   4096 Sep 23  2011 libexec
    drwxr-xr-x  2 hadoop hadoop 4096 Apr  7 00:21 mahout
    drwxr-xr-x. 2 root   root   4096 Sep 23  2011 sbin
    drwxr-xr-x. 5 root   root   4096 Jan 16 20:09 share
    drwxr-xr-x. 4 hadoop hadoop 4096 Mar 17 23:33 sqoop
    drwxr-xr-x. 2 root   root   4096 Sep 23  2011 src
    [root@djt002 local]# 

       2、上传mahout压缩包

    [root@djt002 local]# su hadoop
    [hadoop@djt002 local]$ cd mahout/
    [hadoop@djt002 mahout]$ pwd
    /usr/local/mahout
    [hadoop@djt002 mahout]$ ll
    total 0
    [hadoop@djt002 mahout]$ rz
    
    [hadoop@djt002 mahout]$ ll
    total 67628
    -rw-r--r-- 1 hadoop hadoop 69248331 Apr  6 16:09 mahout-distribution-0.8.tar.gz
    [hadoop@djt002 mahout]$ 

       3、解压

    [hadoop@djt002 mahout]$ pwd
    /usr/local/mahout
    [hadoop@djt002 mahout]$ ll
    total 67628
    -rw-r--r-- 1 hadoop hadoop 69248331 Apr  6 16:09 mahout-distribution-0.8.tar.gz
    [hadoop@djt002 mahout]$  tar  -zxvf  mahout-distribution-0.9.tar.gz

       4、删除压缩包和赋予用户组

    [hadoop@djt002 mahout]$ pwd
    /usr/local/mahout
    [hadoop@djt002 mahout]$ ll
    total 67632
    drwxrwxr-x 7 hadoop hadoop     4096 Apr  7 00:25 mahout-distribution-0.8
    -rw-r--r-- 1 hadoop hadoop 69248331 Apr  6 16:09 mahout-distribution-0.8.tar.gz
    [hadoop@djt002 mahout]$ rm mahout-distribution-0.9.tar.gz 
    [hadoop@djt002 mahout]$ ll
    total 4
    drwxrwxr-x 7 hadoop hadoop 4096 Apr  7 00:25 mahout-distribution-0.8
    [hadoop@djt002 mahout]$ 

       5、mahout的配置

    [root@djt002 mahout-distribution-0.8]# pwd
    /usr/local/mahout/mahout-distribution-0.8
    [root@djt002 mahout-distribution-0.8]# vim /etc/profile

    #mahout
    export MAHOUT_HOME=/usr/local/mahout/mahout-distribution-0.8
    export MAHOUT_HOME_CONF_DIR=/usr/local/mahout/mahout-distribution-0.8/conf
    export PATH=$PATH:$MAHOUT_HOME/bin
    export CLASSPATH=.:$JAVA_HOME/lib:$MAHOUT_HOME/lib:$JRE_HOME/lib:$CLASSPATH

    [root@djt002 mahout-distribution-0.9]# source /etc/profile

       认识下mahout的目录结构

    [hadoop@djt002 mahout-distribution-0.8]$ pwd
    /usr/local/mahout/mahout-distribution-0.8
    [hadoop@djt002 mahout-distribution-0.8]$ ll
    total 64924
    drwxrwxr-x 2 hadoop hadoop     4096 Apr 28 22:06 bin
    drwxrwxr-x 3 hadoop hadoop     4096 Apr 28 22:06 buildtools
    drwxr-xr-x 2 hadoop hadoop     4096 Jul  8  2013 conf
    drwxrwxr-x 3 hadoop hadoop     4096 Apr 28 22:06 core
    drwxrwxr-x 3 hadoop hadoop     4096 Apr 28 22:06 distribution
    drwxrwxr-x 6 hadoop hadoop     4096 Apr 28 22:06 docs
    drwxrwxr-x 5 hadoop hadoop     4096 Apr 28 22:06 examples
    drwxrwxr-x 3 hadoop hadoop     4096 Apr 28 22:06 integration
    drwxrwxr-x 3 hadoop hadoop     4096 Apr 28 22:06 lib
    -rw-r--r-- 1 hadoop hadoop    39588 Jul  8  2013 LICENSE.txt
    -rw-r--r-- 1 hadoop hadoop  1643245 Jul  8  2013 mahout-core-0.8.jar
    -rw-r--r-- 1 hadoop hadoop 19929354 Jul  8  2013 mahout-core-0.8-job.jar
    -rw-r--r-- 1 hadoop hadoop   273767 Jul  8  2013 mahout-examples-0.8.jar
    -rw-r--r-- 1 hadoop hadoop 42503144 Jul  8  2013 mahout-examples-0.8-job.jar
    -rw-r--r-- 1 hadoop hadoop   439078 Jul  8  2013 mahout-integration-0.8.jar
    -rw-r--r-- 1 hadoop hadoop  1590913 Jul  8  2013 mahout-math-0.8.jar
    drwxrwxr-x 3 hadoop hadoop     4096 Apr 28 22:06 math
    -rw-r--r-- 1 hadoop hadoop     1888 Jul  8  2013 NOTICE.txt
    -rw-r--r-- 1 hadoop hadoop     1212 Jul  8  2013 README.txt
    [hadoop@djt002 mahout-distribution-0.8]$ 

    三、验证mahout是否安装成功

    [hadoop@djt002 mahout-distribution-0.8]$ bin/mahout --help
    Running on hadoop, using /usr/local/hadoop/hadoop-2.6.0/bin/hadoop and HADOOP_CONF_DIR=
    MAHOUT-JOB: /usr/local/mahout/mahout-distribution-0.9/mahout-examples-0.9-job.jar
    Unknown program '--help' chosen.
    Valid program names are:
      arff.vector: : Generate Vectors from an ARFF file or directory
      baumwelch: : Baum-Welch algorithm for unsupervised HMM training
      canopy: : Canopy clustering
      cat: : Print a file or resource as the logistic regression models would see it
      cleansvd: : Cleanup and verification of SVD output
      clusterdump: : Dump cluster output to text
      clusterpp: : Groups Clustering Output In Clusters
      cmdump: : Dump confusion matrix in HTML or text formats
      concatmatrices: : Concatenates 2 matrices of same cardinality into a single matrix
      cvb: : LDA via Collapsed Variation Bayes (0th deriv. approx)
      cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally.
      evaluateFactorization: : compute RMSE and MAE of a rating matrix factorization against probes
      fkmeans: : Fuzzy K-means clustering
      hmmpredict: : Generate random sequence of observations by given HMM
      itemsimilarity: : Compute the item-item-similarities for item-based collaborative filtering
      kmeans: : K-means clustering
      lucene.vector: : Generate Vectors from a Lucene index
      lucene2seq: : Generate Text SequenceFiles from a Lucene index
      matrixdump: : Dump matrix in CSV format
      matrixmult: : Take the product of two matrices
      parallelALS: : ALS-WR factorization of a rating matrix
      qualcluster: : Runs clustering experiments and summarizes results in a CSV
      recommendfactorized: : Compute recommendations using the factorization of a rating matrix
      recommenditembased: : Compute recommendations using item-based collaborative filtering
      regexconverter: : Convert text files on a per line basis based on regular expressions
      resplit: : Splits a set of SequenceFiles into a number of equal splits
      rowid: : Map SequenceFile<Text,VectorWritable> to {SequenceFile<IntWritable,VectorWritable>, SequenceFile<IntWritable,Text>}
      rowsimilarity: : Compute the pairwise similarities of the rows of a matrix
      runAdaptiveLogistic: : Score new production data using a probably trained and validated AdaptivelogisticRegression model
      runlogistic: : Run a logistic regression model against CSV data
      seq2encoded: : Encoded Sparse Vector generation from Text sequence files
      seq2sparse: : Sparse Vector generation from Text sequence files
      seqdirectory: : Generate sequence files (of Text) from a directory
      seqdumper: : Generic Sequence File dumper
      seqmailarchives: : Creates SequenceFile from a directory containing gzipped mail archives
      seqwiki: : Wikipedia xml dump to sequence file
      spectralkmeans: : Spectral k-means clustering
      split: : Split Input data into test and train sets
      splitDataset: : split a rating dataset into training and probe parts
      ssvd: : Stochastic SVD
      streamingkmeans: : Streaming k-means clustering
      svd: : Lanczos Singular Value Decomposition
      testnb: : Test the Vector-based Bayes classifier
      trainAdaptiveLogistic: : Train an AdaptivelogisticRegression model
      trainlogistic: : Train a logistic regression using stochastic gradient descent
      trainnb: : Train the Vector-based Bayes classifier
      transpose: : Take the transpose of a matrix
      validateAdaptiveLogistic: : Validate an AdaptivelogisticRegression model against hold-out data set
      vecdist: : Compute the distances between a set of Vectors (or Cluster or Canopy, they must fit in memory) and a list of Vectors
      vectordump: : Dump vectors from a sequence file to text
      viterbi: : Viterbi decoding of hidden states from given output states sequence
    [hadoop@djt002 mahout-distribution-0.9]$ 

      出现上述的界面,说明mahout安装成功,因为,自动列出mahout已经实现的所有命令。

    运行mahout自带的示例(确保hadoop集群已开启)

    mahout中的算法大致可以分为三大类:

      聚类,协同过滤和分类

    其中

      常用聚类算法有:canopy聚类,k均值算法(kmeans),模糊k均值,层次聚类,LDA聚类等

      常用分类算法有:贝叶斯,逻辑回归,支持向量机,感知器,神经网络等

      因为,我的版本是mahout-0.8,所以mahout-examples-0.8-job.jar。

      以下是运行mahout自带的keans算法
    $HADOOP_HOME/bin/hadoop jar /usr/local/mahout/mahout-distribution-0.8/mahout-examples-0.8-job.jar org.apache.mahout.clustering.syntheticcontrol.kmeans.Job


      或者


      以下是运行mahout自带的cnopy算法
    $HADOOP_HOME/bin/hadoop  jar  /usr/local/mahout/mahout-distribution-0.8/mahout-examples-0.8-job.jar   org.apache.mahout.clustering.syntheticcontrol.canopy.Job


    [hadoop@djt002 mahout-distribution-0.9]$ $HADOOP_HOME/bin/hadoop  jar  /usr/local/mahout/mahout-distribution-0.9/mahout-examples-0.9-job.jar   org.apache.mahout.clustering.syntheticcontrol.canopy.Job
    17/04/28 06:42:49 INFO canopy.Job: Running with default arguments
    17/04/28 06:42:54 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
    17/04/28 06:42:55 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
    17/04/28 06:42:58 INFO mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging/hadoop/.staging/job_1493332712225_0001
    Exception in thread "main" org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://djt002:9000/user/hadoop/testdata
        at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:321)
        at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:264)
        at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:385)
        at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:597)
        at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:614)
        at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:492)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
        at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1314)
        at org.apache.mahout.clustering.conversion.InputDriver.runJob(InputDriver.java:108)
        at org.apache.mahout.clustering.syntheticcontrol.canopy.Job.run(Job.java:85)
        at org.apache.mahout.clustering.syntheticcontrol.canopy.Job.main(Job.java:55)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
    [hadoop@djt002 mahout-distribution-0.9]$ 

       准备测试数据

    练习数据下载地址:

    http://download.csdn.net/detail/qq1010885678/8582941

      上面的练习数据是用来检测kmeans聚类算法的数据。

      将练习数据(data.txt)上传到hdfs中对应的hdfs://djt002:9000/user/hadoop/testdata目录下即可。(这是样本数据集,可以适用各种算法)

         我这里,上传测试数据。到我本地linux自己写的一个路径。(这里为了自己所需哈)

     

    [hadoop@djt002 mahout]$ pwd
    /usr/local/mahout
    [hadoop@djt002 mahout]$ ll
    total 4
    drwxrwxr-x 7 hadoop hadoop 4096 Apr  7 00:25 mahout-distribution-0.8
    [hadoop@djt002 mahout]$ mkdir mahoutData
    [hadoop@djt002 mahout]$ ll
    total 8
    drwxrwxr-x 2 hadoop hadoop 4096 Apr 28 06:59 mahoutData
    drwxrwxr-x 7 hadoop hadoop 4096 Apr  7 00:25 mahout-distribution-0.8
    [hadoop@djt002 mahout]$ cd mahoutData/
    [hadoop@djt002 mahoutData]$ pwd
    /usr/local/mahout/mahoutData
    [hadoop@djt002 mahoutData]$ ll
    total 0
    [hadoop@djt002 mahoutData]$ rz
    CC[hadoop@djt002 mahoutData]$ ll
    total 0
    [hadoop@djt002 mahoutData]$ rz
    
    [hadoop@djt002 mahoutData]$ ll
    total 284
    -rw-r--r-- 1 hadoop hadoop 288972 Apr 27 22:48 data.txt
    [hadoop@djt002 mahoutData]$ 

       然后,将/usr/local/mahout/mahoutData/下的测试数据,上传到hdfs://djt002:9000/user/hadoop/testdata下

    [hadoop@djt002 mahoutData]$ $HADOOP_HOME/bin/hadoop fs -put /usr/local/mahout/mahoutData/data.txt  hdfs://djt002:9000/user/hadoop/testdata
    
    或者

    [hadoop@djt002 mahoutData]$ $HADOOP_HOME/bin/hadoop fs -copyFromLocal  /usr/local/mahout/mahoutData/data.txt  hdfs://djt002:9000/user/hadoop/testdata/



    [hadoop@djt002 mahoutData]$ $HADOOP_HOME/bin/hadoop fs -ls hdfs://djt002:9000/user/hadoop/testdata/ -rw-r--r-- 1 hadoop supergroup 288972 2017-04-28 07:02 hdfs://djt002:9000/user/hadoop/testdata

      也许中间会出现,这个数据集,你会上传不了。解决方案如下

    [hadoop@djt002 mahoutData]$ $HADOOP_HOME/bin/hadoop fs -put /usr/local/mahout/mahoutData/data.txt  hdfs://djt002:9000/user/hadoop/testdata/
    put: `hdfs://djt002:9000/user/hadoop/testdata': File exists
    [hadoop@djt002 mahoutData]$ $HADOOP_HOME/bin/hadoop fs -rm   hdfs://djt002:9000/user/hadoop/testdata/
    17/04/28 07:16:58 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
    Deleted hdfs://djt002:9000/user/hadoop/testdata
    [hadoop@djt002 mahoutData]$ $HADOOP_HOME/bin/hadoop fs -mkdir   hdfs://djt002:9000/user/hadoop/testdata/
    [hadoop@djt002 mahoutData]$ 

     

     

    [hadoop@djt002 mahoutData]$ $HADOOP_HOME/bin/hadoop fs -put /usr/local/mahout/mahoutData/data.txt  hdfs://djt002:9000/user/hadoop/testdata/
    [hadoop@djt002 mahoutData]$ 

     

    使用kmeans算法

    [hadoop@djt002 mahoutData]$ $HADOOP_HOME/bin/hadoop  jar  /usr/local/mahout/mahout-distribution-0.8/mahout-examples-0.8-job.jar   org.apache.mahout.clustering.syntheticcontrol.kmeans.Job

      注意,是不需输入路径和输出路径的啊!(自带的jar包里都已经写死了的)

      (注意:如果你是选择用mahout压缩包里自带的kmeans算法的话,则它的输入路径是testdata是固定死的,

            即hdfs:djt002://9000/user/hadoop/testdata/  )

      并且每次运行hadoop都要删掉原来的output目录!

    [hadoop@djt002 mahoutData]$ $HADOOP_HOME/bin/hadoop fs -rm -r hdfs://djt002:9000/user/hadoop/output/*

      ....

      由于聚类算法是一种迭代的过程(之后会讲解)

      所以,它会一直重复的执行mr任务到符合要求(这其中的过程可能有点久。。。)

     

    Kmeans运行结果如下:

    70, 7.311, 10.611, 6.924, 3.440, 9.465, 4.764, 2.838, 8.807, 1.960, 2.864, 6.728, 0.369, 1.374, -0.167, 2.125, 8.306, 4.908, -0.432]
        1.0 : [distance=29.095866076790845]: 60 = [30.817, 28.079, 24.628, 23.933, 28.660, 25.704, 27.501, 23.513, 30.377, 27.595, 22.938, 26.684, 25.208, 26.834, 22.931, 17.732, 17.544, 24.167, 25.602, 19.269, 14.978, 17.223, 18.962, 22.281, 17.035, 23.789, 14.878, 18.113, 10.981, 11.661, 14.331, 19.942, 11.175, 10.714, 15.675, 15.468, 16.010, 14.972, 15.101, 15.131, 15.154, 10.492, 14.754, 5.222, 5.393, 13.606, 11.775, 6.307, 3.370, 10.107, 7.779, 10.209, 1.493, 4.822, 0.019, 8.019, -0.279, -0.049, 5.757, 2.718]
        1.0 : [distance=24.674726284993667]: 60 = [31.347, 28.245, 34.275, 29.885, 30.573, 32.373, 24.031, 24.057, 24.099, 23.777, 28.993, 29.853, 26.485, 29.245, 28.145, 22.528, 20.390, 20.570, 27.921, 18.786, 22.144, 20.163, 17.616, 19.541, 20.342, 22.061, 21.358, 23.951, 13.447, 12.974, 18.406, 17.349, 17.425, 11.041, 14.912, 10.147, 16.731, 9.845, 14.840, 18.283, 18.426, 10.059, 16.760, 14.187, 14.301, 14.277, 12.823, 15.574, 10.789, 10.957, 8.361, 4.116, 3.732, 3.508, 2.288, 9.768, 9.661, 2.183, 6.933, 4.670]
        1.0 : [distance=31.366016794511612]: 60 = [35.439, 24.104, 27.345, 28.982, 34.488, 27.952, 32.550, 25.255, 29.188, 24.766, 29.235, 20.520, 19.745, 27.306, 29.226, 27.510, 21.879, 25.199, 19.470, 19.373, 19.371, 26.519, 19.270, 18.184, 24.926, 15.082, 17.402, 14.351, 22.618, 22.343, 22.627, 15.136, 16.385, 13.479, 21.914, 21.072, 18.025, 15.178, 19.715, 11.919, 18.650, 16.242, 12.783, 17.710, 17.715, 8.372, 13.702, 7.537, 9.190, 11.098, 13.714, 8.595, 11.006, 15.031, 10.061, 7.613, 13.295, 12.292, 12.478, 11.095]
        1.0 : [distance=26.598263851474357]: 60 = [26.273, 31.229, 29.741, 34.208, 33.329, 33.610, 31.072, 22.530, 28.587, 21.130, 23.557, 28.078, 27.546, 25.825, 18.454, 25.903, 24.448, 24.003, 23.199, 22.158, 17.711, 23.922, 20.550, 15.913, 17.699, 13.883, 17.494, 16.360, 20.679, 11.790, 18.424, 10.493, 11.001, 17.994, 11.673, 11.014, 11.437, 16.197, 16.435, 7.331, 15.089, 16.779, 14.449, 9.551, 11.331, 10.564, 5.992, 8.369, 11.402, 7.865, 2.526, 4.632, 9.335, 6.772, 3.018, 3.675, 0.455, 5.362, 6.945, 7.901]
        1.0 : [distance=27.50313693276032]: 60 = [26.148, 30.828, 27.122, 31.797, 26.812, 24.681, 31.379, 22.047, 22.034, 24.293, 30.875, 22.493, 30.889, 19.167, 19.199, 27.696, 17.370, 27.648, 23.842, 26.493, 23.635, 23.577, 20.884, 18.786, 18.898, 18.091, 22.021, 20.674, 23.890, 12.646, 18.448, 17.732, 17.897, 14.679, 13.598, 12.689, 19.832, 12.489, 9.745, 18.990, 18.820, 16.517, 12.024, 14.131, 13.394, 15.473, 11.140, 5.094, 15.265, 14.651, 8.299, 3.163, 12.039, 4.893, 7.552, 12.315, 9.581, 5.462, 2.984, 8.981]
        1.0 : [distance=41.63476648186727]: 60 = [30.822, 26.592, 32.747, 31.626, 31.853, 32.258, 34.720, 25.605, 24.215, 29.830, 28.270, 30.519, 27.139, 32.953, 29.208, 27.265, 31.003, 24.601, 27.746, 29.257, 25.375, 9.397, 11.854, 18.179, 11.058, 12.507, 14.945, 19.796, 9.565, 19.152, 11.940, 16.022, 17.441, 10.963, 10.996, 8.929, 15.033, 8.991, 20.548, 17.140, 13.223, 14.981, 10.412, 19.554, 19.192, 13.297, 15.799, 11.817, 12.925, 12.827, 13.102, 13.449, 11.540, 17.939, 8.543, 13.994, 15.765, 16.096, 16.662, 8.968]
        1.0 : [distance=47.92825575495409]: 60 = [35.675, 32.252, 33.359, 31.057, 24.062, 29.028, 24.791, 27.460, 25.859, 28.450, 30.435, 27.962, 28.948, 27.236, 28.649, 29.507, 35.871, 31.607, 25.408, 30.508, 32.454, 26.580, 27.593, 34.277, 27.145, 33.938, 27.016, 12.593, 10.910, 4.930, 4.463, 5.002, 11.772, 15.086, 10.525, 13.935, 10.900, 15.151, 8.885, 14.374, 13.364, 13.354, 6.827, 14.907, 4.364, 15.200, 14.254, 8.839, 13.155, 7.695, 8.300, 15.678, 14.164, 10.802, 9.084, 5.791, 10.142, 16.019, 12.784, 12.437]
        1.0 : [distance=48.93716831670561]: 60 = [31.775, 33.510, 25.615, 27.700, 24.828, 33.067, 34.310, 28.609, 34.490, 35.751, 25.563, 26.692, 34.970, 30.595, 26.545, 35.828, 29.338, 24.678, 33.323, 33.962, 34.928, 16.294, 8.878, 12.901, 7.906, 6.083, 6.624, 11.364, 9.335, 11.368, 10.111, 15.291, 13.921, 10.583, 15.977, 16.325, 11.815, 11.675, 11.011, 16.201, 9.244, 15.829, 10.276, 16.145, 13.675, 9.326, 10.849, 6.772, 17.498, 7.973, 16.450, 9.991, 6.178, 16.111, 17.548, 13.860, 10.801, 8.851, 10.028, 8.332]
        1.0 : [distance=45.830951493743164]: 60 = [28.636, 35.554, 28.989, 26.883, 30.280, 35.294, 33.550, 32.722, 30.094, 32.951, 34.356, 33.583, 27.756, 33.049, 25.218, 31.894, 34.318, 25.636, 32.570, 24.817, 27.464, 12.408, 9.314, 12.147, 8.343, 7.502, 11.223, 12.910, 10.207, 14.853, 6.479, 11.333, 14.162, 5.533, 14.142, 15.040, 13.506, 5.263, 6.361, 13.789, 13.502, 8.490, 11.222, 15.391, 9.330, 15.925, 13.675, 13.507, 12.027, 12.400, 11.421, 8.011, 12.951, 8.780, 11.031, 12.124, 12.020, 12.910, 8.291, 10.597]
        1.0 : [distance=48.07002341109426]: 60 = [34.335, 30.938, 31.953, 31.146, 24.519, 24.393, 27.696, 29.874, 26.767, 33.089, 31.371, 26.233, 26.383, 35.661, 32.663, 27.685, 29.277, 31.761, 34.650, 24.940, 33.434, 26.849, 28.714, 26.581, 34.825, 34.026, 8.823, 12.634, 12.694, 6.279, 13.644, 16.651, 18.078, 7.975, 9.274, 9.208, 12.879, 12.729, 6.976, 17.832, 13.330, 6.326, 12.131, 11.842, 16.716, 10.425, 9.445, 14.400, 15.696, 11.028, 10.608, 15.190, 9.076, 17.909, 9.846, 15.013, 13.913, 11.743, 11.699, 10.152]
    17/04/28 07:35:13 INFO clustering.ClusterDumper: Wrote 6 clusters
    [hadoop@djt002 mahoutData]$ 

      mahout无异常!!!

      注意:执行完这个kmeans算法之后产生的文件按普通方式是查看不了的,看到的只是一堆莫名其妙的数据!!!

      查看聚类分析的结果:

      需要用mahout的seqdumper命令来下载到本地linux上才能查看正常结果。

    [hadoop@djt002 ~]$ $MAHOUT_HOME/bin/mahout seqdumper -i /user/hadoop/output/data/part-m-00000 -o ~/res.txt

     

    [hadoop@djt002 ~]$ $MAHOUT_HOME/bin/mahout seqdumper -i /user/hadoop/output/data/part-m-00000 -o ~/res.txt
    Running on hadoop, using /usr/local/hadoop/hadoop-2.6.0/bin/hadoop and HADOOP_CONF_DIR=
    MAHOUT-JOB: /usr/local/mahout/mahout-distribution-0.9/mahout-examples-0.9-job.jar
    17/04/28 18:31:20 INFO common.AbstractJob: Command line arguments: {--endPhase=[2147483647], --input=[/user/hadoop/output/data/part-m-00000], --output=[/home/hadoop/res.txt], --startPhase=[0], --tempDir=[temp]}
    17/04/28 18:31:29 INFO driver.MahoutDriver: Program took 8750 ms (Minutes: 0.14583333333333334)
    [hadoop@djt002 ~]$ ll
    total 444
    -rw-r--r--. 1 hadoop hadoop   4176 Feb 21 09:01 anagram.jar
    drwxrwxr-x. 3 hadoop hadoop   4096 Mar 19 04:34 app
    drwxr-xr-x. 2 hadoop hadoop   4096 Jan 17 18:23 Desktop
    drwxrwxr-x. 2 hadoop hadoop   4096 Feb 21 17:03 djt
    drwxr-xr-x. 2 hadoop hadoop   4096 Jan 17 18:23 Documents
    drwxr-xr-x. 2 hadoop hadoop   4096 Jan 17 18:23 Downloads
    drwxrwxr-x. 4 hadoop hadoop   4096 Jan 17 18:54 flume
    drwxr-xr-x. 2 hadoop hadoop   4096 Jan 17 18:23 Music
    drwxr-xr-x. 2 hadoop hadoop   4096 Jan 17 18:23 Pictures
    drwxr-xr-x. 2 hadoop hadoop   4096 Jan 17 18:23 Public
    -rw-rw-r--  1 hadoop hadoop 397021 Apr 28 18:31 res.txt
    drwxr-xr-x. 2 hadoop hadoop   4096 Jan 17 18:23 Templates
    drwxrwxr-x. 3 hadoop hadoop   4096 Mar 23 08:06 tvdata
    drwxr-xr-x. 2 hadoop hadoop   4096 Jan 17 18:23 Videos
    [hadoop@djt002 ~]$ sz res.txt 

    Input Path: /user/hadoop/output/data/part-m-00000
    Key class: class org.apache.hadoop.io.Text Value Class: class org.apache.mahout.math.VectorWritable
    Key: 60: Value: {0:28.7812,31:26.6311,34:29.1495,4:28.9207,32:35.6541,5:33.7596,8:35.2479,6:25.3969,30:25.0293,24:33.0292,29:34.9424,17:26.5235,51:24.5556,36:26.1927,12:36.0253,23:29.5054,58:25.4652,21:29.27,11:29.2171,10:32.8717,15:32.8717,7:27.7849,28:26.1203,46:28.0721,33:28.4353,55:34.9879,54:34.9318,25:25.04,3:31.2834,49:29.747,41:26.2353,1:34.4632,26:28.9167,44:31.0558,37:33.3182,56:32.4721,42:28.9964,27:24.3437,50:31.4333,16:34.1173,40:35.5344,48:35.4973,39:27.0443,9:27.1159,52:33.7431,13:32.337,43:32.0036,19:26.3693,59:25.8717,2:31.3381,20:25.7744,18:27.6623,22:30.7326,35:28.1584,57:33.3759,45:34.2553,38:30.9772,47:28.9402,14:34.5249,53:25.0466}
    Key: 60: Value: {0:24.8923,31:32.5981,34:26.9414,4:27.8789,32:28.3038,5:31.5926,8:27.9516,6:31.4861,30:34.0765,24:31.9874,29:25.0701,17:35.6273,51:31.0205,36:33.1089,12:27.4867,23:30.4719,58:32.1005,21:24.1311,11:31.1887,10:27.5415,15:24.488,7:35.5469,28:33.6472,46:26.3458,33:26.1471,55:26.4244,54:33.6564,25:33.6615,3:32.8217,49:29.4047,41:26.5301,1:25.741,26:25.5511,44:32.8357,37:24.1491,56:28.4661,42:24.8578,27:30.4686,50:32.5577,16:27.5918,40:35.9519,48:28.9861,39:25.7906,9:31.6595,52:26.6418,13:31.391,43:25.9562,19:31.4167,59:26.691,2:27.5532,20:30.7447,18:35.4102,22:35.1422,35:31.5203,57:34.2484,45:28.5322,38:28.5157,47:30.6213,14:27.811,53:28.4331}
    Key: 60: Value: {0:31.3987,31:24.246,34:31.6114,4:27.8613,32:26.9631,5:28.5491,8:25.2239,6:24.9717,30:27.3086,24:24.3323,29:28.8778,17:32.5614,51:26.5966,36:27.4809,12:28.2572,23:32.3851,58:29.5446,21:31.4781,11:27.2587,10:31.8387,15:35.0625,7:32.4358,28:31.5137,46:29.6082,33:25.2919,55:29.9897,54:25.5772,25:30.2001,3:24.2905,49:27.1717,41:31.0561,1:30.6316,26:31.2452,44:31.4391,37:24.2075,56:31.351,42:26.3583,27:26.6814,50:33.6318,16:31.5717,40:32.6293,48:34.1444,39:35.1253,9:27.3068,52:25.5387,13:26.5819,43:28.0861,19:34.1202,59:29.343,2:26.3983,20:26.9337,18:31.0308,22:35.0173,35:24.7131,57:33.9002,45:27.3057,38:26.8059,47:35.9725,14:24.0455,53:32.5434}
    Key: 60: Value: {0:25.774,31:28.3714,34:35.9346,4:27.97,32:32.3667,5:25.2702,8:31.4549,6:28.132,30:27.5587,24:29.2806,29:24.824,17:35.0966,51:28.7261,36:24.3749,12:29.9578,23:31.6264,58:27.3659,21:25.0102,11:28.9916,10:28.9564,15:24.3037,7:29.4268,28:25.5265,46:35.769,33:26.9752,55:32.5492,54:34.6156,25:34.2021,3:25.6033,49:31.156,41:26.8908,1:30.5262,26:26.5077,44:34.3336,37:27.6083,56:30.9827,42:31.3209,27:32.2279,50:34.6292,16:24.314,40:32.4185,48:34.2054,39:29.8557,9:27.32,52:28.2979,13:30.2773,43:29.3849,19:32.0968,59:25.3069,2:35.4209,20:33.3303,18:25.3679,22:35.3155,35:35.1146,57:24.8938,45:24.7381,38:27.8433,47:31.8725,14:30.4447,53:31.5787}
    Key: 60: Value: {0:27.1798,31:33.4129,34:29.6526,4:24.6555,32:26.9245,5:28.9446,8:24.5596,6:35.798,30:33.1247,24:24.6081,29:28.0295,17:31.1274,51:27.9601,36:24.5119,12:35.4154,23:33.0321,58:31.1057,21:31.6565,11:25.3216,10:27.9634,15:29.4686,7:34.9446,28:35.8773,46:29.1348,33:30.2123,55:29.9993,54:35.3375,25:33.2025,3:25.6264,49:34.9244,41:27.9072,1:29.2498,26:27.4335,44:33.833,37:33.9931,56:34.2149,42:35.111,27:32.6355,50:27.7218,16:33.1739,40:31.2651,48:32.3223,39:33.204,9:34.2366,52:35.7198,13:34.862,43:35.0757,19:26.5173,59:31.0179,2:33.6928,20:28.6486,18:31.3701,22:35.9497,35:30.8644,57:33.1276,45:25.9481,38:33.3094,47:24.2875,14:25.1472,53:27.576}
    ....
    ....

     

     

       当然,你可以去看输出目录下/user/hadoop/output的其他的,比如clusters-0、clusters-1等,我这里仅仅是

    看的是/user/hadoop/output/data/下的。

    使用canopy算法

    [hadoop@djt002 mahoutData]$ $HADOOP_HOME/bin/hadoop  jar  /usr/local/mahout/mahout-distribution-0.8/mahout-examples-0.8-job.jar   org.apache.mahout.clustering.syntheticcontrol.canopy.Job

       这里不多赘述。



    使用dirichlet 算法

    [hadoop@djt002 mahoutData]$ $HADOOP_HOME/bin/hadoop  jar  /usr/local/mahout/mahout-distribution-0.8/mahout-examples-0.8-job.jar   org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job

       这里不多赘述。


    使用meanshift算法

    [hadoop@djt002 mahoutData]$ $HADOOP_HOME/bin/hadoop  jar  /usr/local/mahout/mahout-distribution-0.8/mahout-examples-0.8-job.jar   org.apache.mahout.clustering.syntheticcontrol.meanshift.Job

       这里不多赘述。

     总结

      mahout压缩包,给我们的默认输入路径是/user/hadoop/testdata  和  输出路径是 /user/hadoop/output  

      其实,我们是自己可以跟上自定义的输入路径和自定义输出路径的。

    [hadoop@djt002 mahoutData]$ $HADOOP_HOME/bin/hadoop  jar  /usr/local/mahout/mahout-distribution-0.8/mahout-examples-0.8-job.jar   org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
    [hadoop@djt002 mahoutData]$ $HADOOP_HOME/bin/hadoop  jar  /usr/local/mahout/mahout-distribution-0.8/mahout-examples-0.8-job.jar   org.apache.mahout.clustering.syntheticcontrol.kmeans.Job   -i   /user/hadoop/mahoutData/data.txt   -o  /user/hadoop/output
    
    
  • 相关阅读:
    卸载cuda,以及N卡驱动
    ubuntu 16.04 从gcc 5.4 安装gcc 5.3.0
    Check failed: status == CUBLAS_STATUS_SUCCESS (13 vs. 0) CUBLAS_STATUS_EXECUTION_FAILED
    ubuntu16.04 caffe cuda9.1 segnet nvidia gpu安装注意的点
    ubuntu16.04安装docker
    进程管理
    Dev TextEdit 只输入数字
    dev gridcontrol添加右键菜单
    WinForm rdlc 报表自定义datatable数据源
    DevExpress GridControl使用方法总结2 属性说明
  • 原文地址:https://www.cnblogs.com/zlslch/p/6673708.html
Copyright © 2011-2022 走看看