zoukankan      html  css  js  c++  java
  • 假期学习13

    今天做的是最后一个实验Spark 机器学习库 MLlib 编程实践的前一部分。

    以下是部分代码:

    import org.apache.spark.ml.feature.PCA
    import org.apache.spark.sql.Row
    import org.apache.spark.ml.linalg.{Vector,Vectors}
    import org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator
    import org.apache.spark.ml.{Pipeline,PipelineModel}
    import org.apache.spark.ml.feature.{IndexToString, StringIndexer, VectorIndexer,HashingTF, 
    Tokenizer}
    import org.apache.spark.ml.classification.LogisticRegression
    import org.apache.spark.ml.classification.LogisticRegressionModel
    import org.apache.spark.ml.classification.{BinaryLogisticRegressionSummary, 
    LogisticRegression}
    import org.apache.spark.sql.functions;
    scala> import spark.implicits._
    import spark.implicits._
    scala> case class Adult(features: org.apache.spark.ml.linalg.Vector, label: String)
    defined class Adult
    scala> val df = sc.textFile("adult.data.txt").map(_.split(",")).map(p => 
    Adult(Vectors.dense(p(0).toDouble,p(2).toDouble,p(4).toDouble, p(10).toDouble, p(11).toDouble, 
    p(12).toDouble), p(14).toString())).toDF()
    df: org.apache.spark.sql.DataFrame = [features: vector, label: string]
    scala> val test = sc.textFile("adult.test.txt").map(_.split(",")).map(p => 
    Adult(Vectors.dense(p(0).toDouble,p(2).toDouble,p(4).toDouble, p(10).toDouble, p(11).toDouble, 
    p(12).toDouble), p(14).toString())).toDF()
    test: org.apache.spark.sql.DataFrame = [features: vector, label: string]
  • 相关阅读:
    网络编程初探
    MY GOAL
    推荐一个网站:编程资料网 http://www.ourdev.net/
    端午时节, 嘿嘿, 用论文砸自己一把
    Requirement for My Job
    MVC 才是正道, say bye to naive
    linux下的top命令参数说明 (virt,res,shr,data 的意义)
    Linux中线程与CPU核的绑定
    linux多线程域名解析函数导致的内存空间占用增长
    MD5简介
  • 原文地址:https://www.cnblogs.com/Excusezuo/p/12315306.html
Copyright © 2011-2022 走看看