zoukankan      html  css  js  c++  java
  • 假期学习13

    今天做的是最后一个实验Spark 机器学习库 MLlib 编程实践的前一部分。

    以下是部分代码:

    import org.apache.spark.ml.feature.PCA
    import org.apache.spark.sql.Row
    import org.apache.spark.ml.linalg.{Vector,Vectors}
    import org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator
    import org.apache.spark.ml.{Pipeline,PipelineModel}
    import org.apache.spark.ml.feature.{IndexToString, StringIndexer, VectorIndexer,HashingTF, 
    Tokenizer}
    import org.apache.spark.ml.classification.LogisticRegression
    import org.apache.spark.ml.classification.LogisticRegressionModel
    import org.apache.spark.ml.classification.{BinaryLogisticRegressionSummary, 
    LogisticRegression}
    import org.apache.spark.sql.functions;
    scala> import spark.implicits._
    import spark.implicits._
    scala> case class Adult(features: org.apache.spark.ml.linalg.Vector, label: String)
    defined class Adult
    scala> val df = sc.textFile("adult.data.txt").map(_.split(",")).map(p => 
    Adult(Vectors.dense(p(0).toDouble,p(2).toDouble,p(4).toDouble, p(10).toDouble, p(11).toDouble, 
    p(12).toDouble), p(14).toString())).toDF()
    df: org.apache.spark.sql.DataFrame = [features: vector, label: string]
    scala> val test = sc.textFile("adult.test.txt").map(_.split(",")).map(p => 
    Adult(Vectors.dense(p(0).toDouble,p(2).toDouble,p(4).toDouble, p(10).toDouble, p(11).toDouble, 
    p(12).toDouble), p(14).toString())).toDF()
    test: org.apache.spark.sql.DataFrame = [features: vector, label: string]
  • 相关阅读:
    Alpine linux如何配置和管理自定义服务
    nginx仅允许域名访问禁止IP访问
    解决influxdb的log日志输出位置
    python配置文件INI/TOML/YAML/ENV的区别
    window获取本机所有IP
    学习本来的样子
    yum/编译安装Zabbix 5.0 LTS
    redis问题优化
    解决nginx同端口强制跳转https配置ssl证书问题
    通过DNS验证自动申请nginx证书
  • 原文地址:https://www.cnblogs.com/Excusezuo/p/12315306.html
Copyright © 2011-2022 走看看