zoukankan      html  css  js  c++  java
  • Spark 取前几行,先sort再limit

    scala> val df = sc.parallelize(Seq(
         |   (0,"cat26",30.9), 
         |   (1,"cat67",28.5), 
         |   (2,"cat56",39.6),
         |   (3,"cat8",35.6))).toDF("Hour", "Category", "Value")
    df: org.apache.spark.sql.DataFrame = [Hour: int, Category: string ... 1 more field]
    
    scala> df.show
    +----+--------+-----+
    |Hour|Category|Value|
    +----+--------+-----+
    |   0|   cat26| 30.9|
    |   1|   cat67| 28.5|
    |   2|   cat56| 39.6|
    |   3|    cat8| 35.6|
    +----+--------+-----+
    
    
    scala> df.sort(col("Hour").asc).limit(1)
    res6: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [Hour: int, Category: string ... 1 more field]
    
    scala> df.sort(col("Hour").asc).limit(1).show
    +----+--------+-----+
    |Hour|Category|Value|
    +----+--------+-----+
    |   0|   cat26| 30.9|
    +----+--------+-----+
    
    
    scala> df.sort(col("Hour").desc).limit(1).show
    +----+--------+-----+
    |Hour|Category|Value|
    +----+--------+-----+
    |   3|    cat8| 35.6|
    +----+--------+-----+
    
    //默认是升序
    scala> df.sort(col("Hour")).limit(1).show
    +----+--------+-----+
    |Hour|Category|Value|
    +----+--------+-----+
    |   0|   cat26| 30.9|
    +----+--------+-----+
  • 相关阅读:
    操作系统
    redis
    数据库原理与mysql
    计算机网络
    重写、重载、隐藏以及多态分析
    c++复习重点
    重装系统记录
    正则表达式匹配ip地址
    信号量和互斥锁的区别 互斥量与临界区的区别
    为Markdown文件生成目录
  • 原文地址:https://www.cnblogs.com/v5captain/p/14208557.html
Copyright © 2011-2022 走看看