zoukankan      html  css  js  c++  java
  • 通过Spark SQL关联查询两个HDFS上的文件操作

    order_created.txt   订单编号  订单创建时间

    10703007267488  2014-05-01 06:01:12.334+01
    10101043505096  2014-05-01 07:28:12.342+01
    10103043509747  2014-05-01 07:50:12.33+01
    10103043501575  2014-05-01 09:27:12.33+01
    10104043514061  2014-05-01 09:03:12.324+01

    order_picked.txt   订单编号  订单提取时间

    10703007267488  2014-05-01 07:02:12.334+01
    10101043505096  2014-05-01 08:29:12.342+01
    10103043509747  2014-05-01 10:55:12.33+01

    上传上述两个文件到HDFS:

    hadoop fs -put order_created.txt /data/order_created.txt
    hadoop fs -put order_picked.txt /data/order_picked.txt

    通过Spark SQL关联查询两个文件

    val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
    import hiveContext._
    
    case class OrderCreated(order_no:String,create_date:String)
    case class OrderPicked(order_no:String,picked_date:String)
    
    val order_created = sc.textFile("/data/order_created.txt").map(_.split("	")).map( d => OrderCreated(d(0),d(1)))
    val order_picked = sc.textFile("/data/order_picked.txt").map(_.split("	")).map( d => OrderPicked(d(0),d(1)))
    
    order_created.registerTempTable("t_order_created")
    order_picked.registerTempTable("t_order_picked")
    
    #手工设置Spark SQL task个数
    hiveContext.setConf("spark.sql.shuffle.partitions","10")
    hiveContext.sql("select a.order_no, a.create_date, b.picked_date from t_order_created a join t_order_picked b on a.order_no = b.order_no").collect.foreach(println)

    执行结果如下:

    [10101043505096,2014-05-01 07:28:12.342+01,2014-05-01 08:29:12.342+01]
    [10703007267488,2014-05-01 06:01:12.334+01,2014-05-01 07:02:12.334+01]
    [10103043509747,2014-05-01 07:50:12.33+01,2014-05-01 10:55:12.33+01]
  • 相关阅读:
    03_ if 练习 _ little2big
    uva 11275 3D Triangles
    uva 12296 Pieces and Discs
    uvalive 3218 Find the Border
    uvalive 2797 Monster Trap
    uvalive 4992 Jungle Outpost
    uva 2218 Triathlon
    uvalive 3890 Most Distant Point from the Sea
    uvalive 4728 Squares
    uva 10256 The Great Divide
  • 原文地址:https://www.cnblogs.com/luogankun/p/4268431.html
Copyright © 2011-2022 走看看