zoukankan      html  css  js  c++  java
  • Spark DataFrame简介(二)

    Spark DataFrame基础操作

     

    创建SparkSession和SparkContext

    val spark = SparkSession.builder.master("local").getOrCreate()
    val sc = spark.sparkContext
    

    从数组创建DataFrame

    spark.range(1000).toDF("number").show()
    

    指定Schema创建DataFrame

    val data = Seq(
      Row("A", 10, 112233),
      Row("B", 20, 223311),
      Row("C", 30, 331122))
    
    val schema = StructType(List(
      StructField("name", StringType),
      StructField("age", IntegerType),
      StructField("phone", IntegerType)))
    
    spark.createDataFrame(sc.makeRDD(data), schema).show()
     

    从JSON文件加载DataFrame

    /* data.json
       {"name":"A","age":10,"phone":112233}
       {"name":"B", "age":20,"phone":223311}
       {"name":"C", "age":30,"phone":331122}
     */
    spark.read.format("json").load("/Users/tobe/temp2/data.json").show()
     

    从CSV文件加载DataFrame

    /* data.csv
       name,age,phone
       A,10,112233
       B,20,223311
       C,30,331122
     */
    spark.read.option("header", true).csv("/Users/tobe/temp2/data.csv").show()
     

    读取MySQL数据库加载DataFrame

    /* data.csv
       name,age,phone
       A,10,112233
       B,20,223311
       C,30,331122
     */
    spark.read.option("header", true).csv("/Users/tobe/temp2/data.csv").show()
     

    RDD转DataFrame

    /* data.csv
       name,age,phone
       A,10,112233
       B,20,223311
       C,30,331122
     */
    spark.read.option("header", true).csv("/Users/tobe/temp2/data.csv").show()
     

    创建Timestamp数据

    Spark的TimestampType类型与Java的java.sql.Timestamp对应,

    /* data.csv
       name,age,phone
       A,10,112233
       B,20,223311
       C,30,331122
     */
    spark.read.option("header", true).csv("/Users/tobe/temp2/data.csv").show()
     

    创建DateType数据

    Spark的DateType类型与Java的java.sql.Date对应,

    /* data.csv
       name,age,phone
       A,10,112233
       B,20,223311
       C,30,331122
     */
    spark.read.option("header", true).csv("/Users/tobe/temp2/data.csv").show()
     
  • 相关阅读:
    python+requests+re匹配抓取猫眼上映电影信息
    Qt 5.12 LTS 部署
    Apache 日志记录相关设置
    php curl 相关知识
    Apache缓存相关配置
    Apache开启GZIP 压缩网页
    Apache 相关 mod_rewrite ,RewriteCond,{HTTP_HOST}
    Andriod you must restart adb and eclipse
    JDK 环境变量的配置
    http 协议详解
  • 原文地址:https://www.cnblogs.com/wenBlog/p/12553482.html
Copyright © 2011-2022 走看看