zoukankan      html  css  js  c++  java
  • 【sparkSQL】DataFrame的常用操作

    scala> import org.apache.spark.sql.SparkSession
    import org.apache.spark.sql.SparkSession
     
    scala> val spark=SparkSession.builder().getOrCreate()
    spark: org.apache.spark.sql.SparkSession = org.apache.spark.sql.SparkSession@2bdab835
     
    //使支持RDDs转换为DataFrames及后续sql操作
    scala> import spark.implicits._
    import spark.implicits._
     
    scala> val df = spark.read.json("file:///usr/local/spark/examples/src/main/resources/people.json")
    df: org.apache.spark.sql.DataFrame = [age: bigint, name: string]
     
    scala> df.show()
    +----+-------+
    | age|   name|
    +----+-------+
    |null|Michael|
    |  30|   Andy|
    |  19| Justin|
    +----+-------+
    
    // 打印模式信息
    scala> df.printSchema()
    root
     |-- age: long (nullable = true)
     |-- name: string (nullable = true)
     
    // 选择多列
    scala> df.select(df("name"),df("age")+1).show()
    +-------+---------+
    |   name|(age + 1)|
    +-------+---------+
    |Michael|     null|
    |   Andy|       31|
    | Justin|       20|
    +-------+---------+
     
    // 条件过滤
    scala> df.filter(df("age") > 20 ).show()
    +---+----+
    |age|name|
    +---+----+
    | 30|Andy|
    +---+----+
     
    // 分组聚合
    scala> df.groupBy("age").count().show()
    +----+-----+
    | age|count|
    +----+-----+
    |  19|    1|
    |null|    1|
    |  30|    1|
    +----+-----+
     
    // 排序
    scala> df.sort(df("age").desc).show()
    +----+-------+
    | age|   name|
    +----+-------+
    |  30|   Andy|
    |  19| Justin|
    |null|Michael|
    +----+-------+
     
    //多列排序
    scala> df.sort(df("age").desc, df("name").asc).show()
    +----+-------+
    | age|   name|
    +----+-------+
    |  30|   Andy|
    |  19| Justin|
    |null|Michael|
    +----+-------+
     
    //对列进行重命名
    scala> df.select(df("name").as("username"),df("age")).show()
    +--------+----+
    |username| age|
    +--------+----+
    | Michael|null|
    |    Andy|  30|
    |  Justin|  19|
    +--------+----+
    
    //使用spark sql语句
    scala>df.createTempView("table1")
    scala> spark.sql("select * from table1 limit 10")
    

    以上是我们常用的dataframe的基础操作

    具体见一下博客

    https://blog.csdn.net/dabokele/article/details/52802150

    SparkSQL官网

    http://spark.apache.org/docs/1.6.2/api/scala/index.html#org.apache.spark.sql.DataFrame

  • 相关阅读:
    redis源码分析3---结构体---字典
    redis源码分析2---结构体---链表
    redis源码分析1---结构体---简单动态字符串sds
    智能算法---蚁群算法
    智能算法---粒子群算法
    C语言难点6:如何更好的看C语言源代码
    C语言难点5文件io,库函数
    C语言难点4之动态内存分配
    C语言难点3之结构,联合和指针
    C语言难点2之预处理器
  • 原文地址:https://www.cnblogs.com/zzhangyuhang/p/9044995.html
Copyright © 2011-2022 走看看