zoukankan      html  css  js  c++  java
  • Spark DataFrame简介(二)

    Spark DataFrame基础操作

     

    创建SparkSession和SparkContext

    val spark = SparkSession.builder.master("local").getOrCreate()
    val sc = spark.sparkContext
    

    从数组创建DataFrame

    spark.range(1000).toDF("number").show()
    

    指定Schema创建DataFrame

    val data = Seq(
      Row("A", 10, 112233),
      Row("B", 20, 223311),
      Row("C", 30, 331122))
    
    val schema = StructType(List(
      StructField("name", StringType),
      StructField("age", IntegerType),
      StructField("phone", IntegerType)))
    
    spark.createDataFrame(sc.makeRDD(data), schema).show()
     

    从JSON文件加载DataFrame

    /* data.json
       {"name":"A","age":10,"phone":112233}
       {"name":"B", "age":20,"phone":223311}
       {"name":"C", "age":30,"phone":331122}
     */
    spark.read.format("json").load("/Users/tobe/temp2/data.json").show()
     

    从CSV文件加载DataFrame

    /* data.csv
       name,age,phone
       A,10,112233
       B,20,223311
       C,30,331122
     */
    spark.read.option("header", true).csv("/Users/tobe/temp2/data.csv").show()
     

    读取MySQL数据库加载DataFrame

    /* data.csv
       name,age,phone
       A,10,112233
       B,20,223311
       C,30,331122
     */
    spark.read.option("header", true).csv("/Users/tobe/temp2/data.csv").show()
     

    RDD转DataFrame

    /* data.csv
       name,age,phone
       A,10,112233
       B,20,223311
       C,30,331122
     */
    spark.read.option("header", true).csv("/Users/tobe/temp2/data.csv").show()
     

    创建Timestamp数据

    Spark的TimestampType类型与Java的java.sql.Timestamp对应,

    /* data.csv
       name,age,phone
       A,10,112233
       B,20,223311
       C,30,331122
     */
    spark.read.option("header", true).csv("/Users/tobe/temp2/data.csv").show()
     

    创建DateType数据

    Spark的DateType类型与Java的java.sql.Date对应,

    /* data.csv
       name,age,phone
       A,10,112233
       B,20,223311
       C,30,331122
     */
    spark.read.option("header", true).csv("/Users/tobe/temp2/data.csv").show()
     
  • 相关阅读:
    安装原版Win8.1并激活
    Java8 List<对象> 转 Set、Map(高级)、排序、分组、统计
    SpringCloud第二弹(高可用Eureka+Ribbon负载均衡)
    SpringCloud第一弹(入门)
    SpringBoot+Shiro+Redis共享Session入门小栗子
    Go语言(IDEA下+Eclipse下)Hello World
    Linux学习杂谈
    小孩儿才分对错,成年人只看利弊
    Xshell5
    91 Testing MySQL学习总结
  • 原文地址:https://www.cnblogs.com/wenBlog/p/12553482.html
Copyright © 2011-2022 走看看