zoukankan      html  css  js  c++  java
  • Spark SQL编程之DataSet篇

                 Spark SQL编程之DataSet

                                         作者:尹正杰

    版权声明:原创作品,谢绝转载!否则将追究法律责任。

    一.创建DataSet

      温馨提示:
        Dataset是具有强类型的数据集合,需要提供对应的类型信息。下面是具体案例。
    
    
    scala> case class Person(name: String, age: Long)            #创建一个样例类
    defined class Person
    
    scala> val caseClassDS = Seq(Person("YinZhengjie", 18)).toDS()    #创建DataSet
    caseClassDS: org.apache.spark.sql.Dataset[Person] = [name: string, age: bigint]
    
    scala> caseClassDS.show                            #不难发现DataSet的方法和DataFrame的方法使用上很相似。
    +-----------+---+
    |       name|age|
    +-----------+---+
    |YinZhengjie| 18|
    +-----------+---+
    
    
    scala> caseClassDS.createTempView("person")
    
    scala> spark.sql("select * from person").show
    +-----------+---+
    |       name|age|
    +-----------+---+
    |YinZhengjie| 18|
    +-----------+---+
    
    
    scala> 

    二.RDD转换为DataSet

    scala> case class Person(name: String, age: Long)            #创建一个样例类
    defined class Person
    
    scala> val listRDD = sc.makeRDD(List(("YinZhengjie",18),("Jason Yin",20),("Danny",28)))      #创建一个RDD
    listRDD: org.apache.spark.rdd.RDD[(Int, String, Int)] = ParallelCollectionRDD[84] at makeRDD at <console>:27
    
    scala> val mapRDD = listRDD.map( t => { Person( t._1,t._2) })    #使用map算子将listRDD各元素转换成Person对象
    mapRDD: org.apache.spark.rdd.RDD[Person] = MapPartitionsRDD[102] at map at <console>:30
    
    scala> val ds = mapRDD.toDS                        #将rdd转换为DataSet
    ds: org.apache.spark.sql.Dataset[Person] = [name: string, age: bigint]
    
    scala> ds.show
    +-----------+---+
    |       name|age|
    +-----------+---+
    |YinZhengjie| 18|
    |  Jason Yin| 20|
    |      Danny| 28|
    +-----------+---+
    
    
    scala> 

    三.DataSet转换为RDD

    scala> ds.show      #查看DataSet数据
    +-----------+---+
    |       name|age|
    +-----------+---+
    |YinZhengjie| 18|
    |  Jason Yin| 20|
    |      Danny| 28|
    +-----------+---+
    
    
    scala> ds
    res6: org.apache.spark.sql.Dataset[Person] = [name: string, age: bigint]
    
    scala> ds.rdd        #将DataSet转换成RDD
    res7: org.apache.spark.rdd.RDD[Person] = MapPartitionsRDD[26] at rdd at <console>:29
    
    scala> res7.collect     #查看RDD的数据
    res8: Array[Person] = Array(Person(YinZhengjie,18), Person(Jason Yin,20), Person(Danny,28))
    
    scala> 
  • 相关阅读:
    sqlconnection 调用webservice服务
    WebService注解
    发布WebService 1.1
    soap 1.1 访问服务
    WebService一些概念
    8-7 Flutter通信机制&Dart端讲解
    8-4 Flutter Android混合开发实战-调试与发布
    8-3 Flutter Android混合开发实战-集成与调用
    8-2 Flutter混合开发流程与创建Flutter module
    7-5 高级功能列表下拉刷新与上拉加载更多功能实现
  • 原文地址:https://www.cnblogs.com/yinzhengjie2020/p/13197064.html
Copyright © 2011-2022 走看看