zoukankan      html  css  js  c++  java
  • 【技术问题】大数据0001---使用SparkSQL关联两个表求和取前几行

    场景:

    有两个表,表可以是文本或Json数据,结构化后分别是Table1(A,B,C)和Table2(C、D、E),两个表通过C关联,要求求出D+E之和,并以(A、B、D+E)三列返回

    解答:

    思路:SparkSQL支持读取Json创建表,同时创建的表可以做联合查询,类似传统Sql语句进行关联查询和统计分析

    代码:

    package study
    
    import org.apache.spark.SparkContext
    import org.apache.spark.sql.SparkSession
    
    object TestDataFrame2{
      def main(args:Array[String]):Unit={
        import org.apache.spark.sql.SparkSession
    
        val spark = SparkSession
          .builder()
          .master("local[*]")
          .appName("Spark SQL basic example")
          .config("spark.some.config.option", "some-value")
          .getOrCreate()
     
    
        spark.sql("""create table table1 using org.apache.spark.sql.json options(path "F://0002_BigData//Soft//comoceanspark//src//resources//Table1.json")""")
    
        spark.sql("""create table table2 using org.apache.spark.sql.json options(path "F://0002_BigData//Soft//comoceanspark//src//resources//Table2.json")""")
    
    
        spark.sql("show tables").show()
    
        spark.sql("select A,B,(D+E) as DE from table1 inner join table2 on table1.C = table2.C order by DE desc limit 5").show()
    
    
    
      }
    }

    Table1.json:

    {"A":"A1", "B":30, "C":1}
    {"A":"A2", "B":31, "C":2}
    {"A":"A3", "B":32, "C":3}
    {"A":"A4", "B":33, "C":4}
    {"A":"A5", "B":34, "C":5}
    {"A":"A6", "B":35, "C":6}
    {"A":"A7", "B":36, "C":7}
    {"A":"A8", "B":37, "C":8}
    {"A":"A9", "B":38, "C":9}

    Table2.json:

    {"C":1, "D":1, "E":1}
    {"C":2, "D":2, "E":2}
    {"C":3, "D":3, "E":3}
    {"C":4, "D":4, "E":4}
    {"C":5, "D":5, "E":5}
    {"C":6, "D":6, "E":6}
    {"C":7, "D":7, "E":7}
    {"C":8, "D":8, "E":8}
    {"C":9, "D":9, "E":9}

    结果:

    表显示

    计算结果显示:

  • 相关阅读:
    php 扩展编译linux
    PHP 单态设计模式
    Apache 支持.htaccess
    Apache 禁止访问目录
    jQuery 获取父元素、子元素、同级元素
    CentOS 配置防火墙操作实例(启、停、开、闭端口):
    CentOS 安装 lamp(转)
    火狐浏览器 js 1到9月份 new DATE不返回时间
    web 打开子窗口提交数据或其他操作后 关闭子窗口且刷新父窗口实现
    jquery on 绑定多个事件 多个元素
  • 原文地址:https://www.cnblogs.com/defineconst/p/11553271.html
Copyright © 2011-2022 走看看