zoukankan      html  css  js  c++  java
  • mllib文档笔记1

    spark.mllib contains the original API built on top of RDDs.

    spark.mllib 包含原始API构建于RDD之上。
    spark.ml provides higher-level API built on top of DataFrames for constructing ML pipelines.

    提供高级API构建于ML管道结构的DATAFrames之上

    MLlib supports local vectors and matrices stored on a single machine

    MLlib支持局部矢量和矩阵存储在一个单独机器上

    1、数据类型

    1)局部向量(Local vector)

    稀疏向量(sparse vector)

    稠密向量(dense vector)

    import org.apache.spark.mllib.linalg.Vector;
    import org.apache.spark.mllib.linalg.Vectors;

    // Create a dense vector (1.0, 0.0, 3.0).
    Vector dv = Vectors.dense(1.0, 0.0, 3.0);
    // Create a sparse vector (1.0, 0.0, 3.0) by specifying its indices and values corresponding to nonzero entries.

    //创建一个稀疏向量通过指定它的索引和值对应于相应的非零值
    Vector sv = Vectors.sparse(3, new int[] {0, 2}, new double[] {1.0, 3.0});

    2)标记点(Labeled point)

    A labeled point is a local vector, either dense or sparse, associated with a label/response. In MLlib, labeled points are used in supervised learning algorithms. We use a double to store a label, so we can use labeled points in both regression and classification

    用于监督学习算法,用一个双精度浮点值去存储一个标签,我们可以在回归和分类中用标记点

    Sparse data

    label index1:value1 index2:value2 ...

    where the indices are one-based and in ascending order. After loading, the feature indices are converted to zero-based.

    这是索引是基于1递增的序列,被加载后,这个值将被转变成基于0的开始的序列

    例子:

    MLUtils.loadLibSVMFile reads training examples stored in LIBSVM format.

    Refer to the MLUtils Java docs for details on the API.

    import org.apache.spark.mllib.regression.LabeledPoint;
    import org.apache.spark.mllib.util.MLUtils;
    import org.apache.spark.api.java.JavaRDD;

    JavaRDD<LabeledPoint> examples =
      MLUtils.loadLibSVMFile(jsc.sc(), "data/mllib/sample_libsvm_data.txt").toJavaRDD();

    2、局部矩阵(Local matrix)

    稠密矩阵(dense matrix):is stored in a one-dimensional array and the matrix size,  in column-major order.存储一个一维的向量和矩阵的大小(行、列),而且以列为主要顺序。

    稀疏矩阵(sparse matrix):Compressed Sparse Column (CSC) 压缩稀疏列

    eg:

    import org.apache.spark.mllib.linalg.Matrix;
    import org.apache.spark.mllib.linalg.Matrices;

    // Create a dense matrix ((1.0, 2.0), (3.0, 4.0), (5.0, 6.0))

    //Matrices.dense(行数,列数,值)
    Matrix dm = Matrices.dense(3, 2, new double[] {1.0, 3.0, 5.0, 2.0, 4.0, 6.0});

    // Create a sparse matrix ((9.0, 0.0), (0.0, 8.0), (0.0, 6.0))

    Matrices.sparse(行数,列数,行号,列号,值)
    Matrix sm = Matrices.sparse(3, 2, new int[] {0, 1, 3}, new int[] {0, 2, 1}, new double[] {9, 6, 8});

  • 相关阅读:
    SpringBoot项目maven 打包时跳过测试
    scss 学习笔记
    万事都源于一个字:缘
    H To begin or not to begin 题解(思维)
    条件 题解(bitset优化floyd)
    Dima and Salad 题解(01背包变形)
    P1052 [NOIP2005 提高组] 过河 题解(dp+数论优化)
    A Simple Math Problem 题解(数论)
    威佐夫博弈
    P3951 [NOIP2017 提高组] 小凯的疑惑 题解(数论/结论题)
  • 原文地址:https://www.cnblogs.com/lwhp/p/5684534.html
Copyright © 2011-2022 走看看