zoukankan      html  css  js  c++  java
  • Spark DataFrame简介(二)

    Spark DataFrame基础操作

     

    创建SparkSession和SparkContext

    val spark = SparkSession.builder.master("local").getOrCreate()
    val sc = spark.sparkContext
    

    从数组创建DataFrame

    spark.range(1000).toDF("number").show()
    

    指定Schema创建DataFrame

    val data = Seq(
      Row("A", 10, 112233),
      Row("B", 20, 223311),
      Row("C", 30, 331122))
    
    val schema = StructType(List(
      StructField("name", StringType),
      StructField("age", IntegerType),
      StructField("phone", IntegerType)))
    
    spark.createDataFrame(sc.makeRDD(data), schema).show()
     

    从JSON文件加载DataFrame

    /* data.json
       {"name":"A","age":10,"phone":112233}
       {"name":"B", "age":20,"phone":223311}
       {"name":"C", "age":30,"phone":331122}
     */
    spark.read.format("json").load("/Users/tobe/temp2/data.json").show()
     

    从CSV文件加载DataFrame

    /* data.csv
       name,age,phone
       A,10,112233
       B,20,223311
       C,30,331122
     */
    spark.read.option("header", true).csv("/Users/tobe/temp2/data.csv").show()
     

    读取MySQL数据库加载DataFrame

    /* data.csv
       name,age,phone
       A,10,112233
       B,20,223311
       C,30,331122
     */
    spark.read.option("header", true).csv("/Users/tobe/temp2/data.csv").show()
     

    RDD转DataFrame

    /* data.csv
       name,age,phone
       A,10,112233
       B,20,223311
       C,30,331122
     */
    spark.read.option("header", true).csv("/Users/tobe/temp2/data.csv").show()
     

    创建Timestamp数据

    Spark的TimestampType类型与Java的java.sql.Timestamp对应,

    /* data.csv
       name,age,phone
       A,10,112233
       B,20,223311
       C,30,331122
     */
    spark.read.option("header", true).csv("/Users/tobe/temp2/data.csv").show()
     

    创建DateType数据

    Spark的DateType类型与Java的java.sql.Date对应,

    /* data.csv
       name,age,phone
       A,10,112233
       B,20,223311
       C,30,331122
     */
    spark.read.option("header", true).csv("/Users/tobe/temp2/data.csv").show()
     
  • 相关阅读:
    windows平台部署.netcore和vue项目
    .netcore系统权限认证
    全文检索 识别pdf 图片OCR识别
    搜索引擎solr的安装与配置
    SQLSugar动态拼接Lambda表达式(顺便提一个sqlsugar框架的bug)
    .netcore项目部署linux
    vue+element 部署linux服务器
    使用七牛云存储上传文件学习案例
    MSSQL 全库搜索 指定字符串
    系统右键自定义功能-右键备份【C#】
  • 原文地址:https://www.cnblogs.com/wenBlog/p/12553482.html
Copyright © 2011-2022 走看看