zoukankan      html  css  js  c++  java
  • 2020寒假学习记录(14)——.编程实现将 RDD 转换为 DataFrame

    源文件内容如下(包含 id,name,age):

    1,Ella,36

    2,Bob,29

    3,Jack,29 

    请将数据复制保存到 Linux 系统中,命名为 employee.txt,实现从 RDD 转换得到 DataFrame,并按“id:1,name:Ella,age:36”的格式打印出 DataFrame 的所有数据。请写出程序代码。 

    scala> import org.apache.spark.sql.types._
    import org.apache.spark.sql.types._
    
    scala> import org.apache.spark.sql.Row
    import org.apache.spark.sql.Row
    
    scala> val peopleRDD = spark.sparkContext.textFile("file:///home/hadoop/77/employee.txt")
    peopleRDD: org.apache.spark.rdd.RDD[String] = file:///home/hadoop/77/employee.txt MapPartitionsRDD[1] at textFile at <console>:27
    
    scala> val schemaString = "id name age"
    schemaString: String = id name age
    
    scala> val fields = schemaString.split(" ").map(fieldName => StructField(fieldName, StringType, nullable = true))
    fields: Array[org.apache.spark.sql.types.StructField] = Array(StructField(id,StringType,true), StructField(name,StringType,true), StructField(age,StringType,true))
    
    scala> val schema = StructType(fields)
    schema: org.apache.spark.sql.types.StructType = StructType(StructField(id,StringType,true), StructField(name,StringType,true), StructField(age,StringType,true))
    
    scala> val rowRDD = peopleRDD.map(_.split(",")).map(attributes => Row(attributes(0), attributes(1).trim, attributes(2).trim))
    rowRDD: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = MapPartitionsRDD[3] at map at <console>:29
    
    scala> val peopleDF = spark.createDataFrame(rowRDD, schema)
    peopleDF: org.apache.spark.sql.DataFrame = [id: string, name: string ... 1 more field]
    
    scala> peopleDF.createOrReplaceTempView("people")
    
    scala> val results = spark.sql("SELECT id,name,age FROM people")
    results: org.apache.spark.sql.DataFrame = [id: string, name: string ... 1 more field]
    
    scala> results.map(attributes => "id: " + attributes(0)+","+"name:"+attributes(1)+","+"age:"+attributes(2)).show()
    +--------------------+                                                          
    |               value|
    +--------------------+
    |id: 1,name:Ella,age:36|
    |id: 2,name:Bob,age:29|
    |id: 3,name:Jack,age:29|
    +--------------------+
  • 相关阅读:
    zookeeper 是如何保证事务的顺序一致性的?
    Hibernate的一级缓存和二级缓存有什么区别?
    写出Hibernate中核心接口/类的名称,并描述他们各自的责任?
    请说说你对Struts2的拦截器的理解?
    什么是MVC模式?   
    JDBC中的Statement 和PreparedStatement的区别?
    说说数据库连接池工作原理和实现方案?
    如何删除表中的重复数据,只保留一条记录?
    Where和having都是条件筛选关键字,它们有什么分别?
    JSP和Servlet有哪些相同点和不同点?
  • 原文地址:https://www.cnblogs.com/Qi77/p/12324150.html
Copyright © 2011-2022 走看看