zoukankan      html  css  js  c++  java
  • Spark MLlib 之 Basic Statistics

    Spark MLlib提供了一些基本的统计学的算法,下面主要说明一下:

    1、Summary statistics

    对于RDD[Vector]类型,Spark MLlib提供了colStats的统计方法,该方法返回一个MultivariateStatisticalSummary的实例。他封装了列的最大值,最小值,均值、方差、总数。如下所示:

        val conf = new SparkConf().setAppName("Simple Application").setMaster("yarn-cluster")
        val sc = new SparkContext(conf)
        val observations = sc.textFile("/user/liujiyu/spark/mldata1.txt")
          .map(_.split(' ') //     转换为RDD[Array[String]]类型
            .map(_.toDouble)) //            转换为RDD[Array[Double]]类型
          .map(line => Vectors.dense(line)) //转换为RDD[Vector]类型
    
        // Compute column summary statistics.
        val summary: MultivariateStatisticalSummary = Statistics.colStats(observations)
        println(summary.mean) // a dense vector containing the mean value for each column
        println(summary.variance) // column-wise variance
        println(summary.numNonzeros) // number of nonzeros in each column
    

    2、Correlations(相关性)

    计算两个序列的相关性,提供了计算Pearson’s and Spearman’s correlation.如下所示:

        val conf = new SparkConf().setAppName("Simple Application").setMaster("yarn-cluster")
        val sc = new SparkContext(conf)
    
        val observations = sc.textFile("/user/liujiyu/spark/mldata1.txt")
    
        val data1 = Array(1.0, 2.0, 3.0, 4.0, 5.0)
        val data2 = Array(1.0, 2.0, 3.0, 4.0, 5.0)
        val distData1: RDD[Double] = sc.parallelize(data1)
        val distData2: RDD[Double] = sc.parallelize(data2) // must have the same number of partitions and cardinality as seriesX
    
        // compute the correlation using Pearson's method. Enter "spearman" for Spearman's method. If a 
        // method is not specified, Pearson's method will be used by default. 
        val correlation: Double = Statistics.corr(distData1, distData2, "pearson")
    
        val data: RDD[Vector] = observations // note that each Vector is a row and not a column
    
        // calculate the correlation matrix using Pearson's method. Use "spearman" for Spearman's method.
        // If a method is not specified, Pearson's method will be used by default. 
        val correlMatrix: Matrix = Statistics.corr(data, "pearson")
    
  • 相关阅读:
    Android 自定义android控件EditText边框背景
    Android安全问题 静音拍照与被拍
    Android 自绘TextView解决提前换行问题,支持图文混排
    Android EditText属性
    Android invalidate()自动清屏,屏幕刷新
    Cocos2d-x 3.0final 终结者系列教程12-Vector&map&value
    思维方式--SMART原则
    从.net复制源代码中国农历阵列,必要做日历
    POJ 3071-Football(可能性dp)
    mongodb group包(最具体的、最受欢迎、最容易理解的解释)
  • 原文地址:https://www.cnblogs.com/ljy2013/p/5105549.html
Copyright © 2011-2022 走看看