zoukankan      html  css  js  c++  java
  • Spark SQL 操作Hive 数据

    Spark 2.0以前版本:
    val sparkConf = new SparkConf().setAppName("soyo")
        val spark = new SparkContext(sparkConf)
    Spark 2.0以后版本:(上面的写法兼容)
    直接用SparkSession:
    val spark = SparkSession
          .builder
          .appName("soyo")
          .getOrCreate()
        var tc = spark.sparkContext.parallelize(数据).cache()


    import org.apache.spark.{SparkConf, SparkContext} import org.apache.spark.sql.{Row, SparkSession} import org.apache.spark.sql.hive._
    case class Persons(name:String,age:Int) case class Record(key: Int, value: String) object rdd_to_dataframe_parquet { val warehouseLocation = "file:${system:user.dir}/spark-warehouse" val spark=SparkSession.builder().config("spark.sql.warehouse.dir",warehouseLocation).enableHiveSupport().getOrCreate() import spark.implicits._ def main(args: Array[String]): Unit = { spark.sql("CREATE TABLE IF NOT EXISTS soyo1(key INT,value STRING)") spark.sql("LOAD DATA LOCAL INPATH 'file:///home/soyo/桌面/spark编程测试数据/kv1.txt' INTO TABLE soyo1") spark.sql("select * from soyo").show() //默认只取前20行 spark.sql("select * from soyo").take(100).foreach(println) import spark.sql //导入之后不需要再加Spark sql("SELECT COUNT(*) FROM soyo").show() sql("select count(*) from soyo1").show() sql("show tables").show() sql("select * from people").show() val result2=sql("select * from people") val fin_result=result2.map { case Row(key: String, value: Int) => s"name=$key;value=$value" } fin_result.show() val recordsDF = spark.createDataFrame((1 to 100).map(i => Record(i, s"soyo_$i"))) recordsDF.createOrReplaceTempView("records") // Queries can then join DataFrame data with data stored in Hive. sql("SELECT * FROM records ").show() val res= sql("SELECT * FROM records ").map( x=>"key:"+x(0)+",value:"+x(1) ).show() spark.stop() } }

     结果:+---+-------+
    |key|  value|
    +---+-------+
    |238|val_238|
    | 86| val_86|
    |311|val_311|
    | 27| val_27|
    |165|val_165|
    |409|val_409|
    |255|val_255|
    |278|val_278|
    | 98| val_98|
    |484|val_484|
    |265|val_265|
    |193|val_193|
    |401|val_401|
    |150|val_150|
    |273|val_273|
    |224|val_224|
    |369|val_369|
    | 66| val_66|
    |128|val_128|
    |213|val_213|
    +---+-------+
    only showing top 20 rows

    [238,val_238]
    [86,val_86]
    [311,val_311]
    [27,val_27]
    [165,val_165]
    [409,val_409]
    [255,val_255]
    [278,val_278]
    [98,val_98]
    [484,val_484]
    +--------+
    |count(1)|
    +--------+
    |    6000|
    +--------+

    +--------+
    |count(1)|
    +--------+
    |    8500|
    +--------+

    +--------+---------+-----------+
    |database|tableName|isTemporary|
    +--------+---------+-----------+
    | default|     soyo|      false|
    | default|    soyo1|      false|
    |        |   people|       true|
    +--------+---------+-----------+

    +-----+---+
    | name|age|
    +-----+---+
    |soyo8| 35|
    |   小周| 30|
    |   小华| 19|
    | soyo| 88|
    +-----+---+

    +-------------------+
    |              value|
    +-------------------+
    |name=soyo8;value=35|
    |   name=小周;value=30|
    |   name=小华;value=19|
    | name=soyo;value=88|
    +-------------------+

    +---+-------+
    |key|  value|
    +---+-------+
    |  1| soyo_1|
    |  2| soyo_2|
    |  3| soyo_3|
    |  4| soyo_4|
    |  5| soyo_5|
    |  6| soyo_6|
    |  7| soyo_7|
    |  8| soyo_8|
    |  9| soyo_9|
    | 10|soyo_10|
    | 11|soyo_11|
    | 12|soyo_12|
    | 13|soyo_13|
    | 14|soyo_14|
    | 15|soyo_15|
    | 16|soyo_16|
    | 17|soyo_17|
    | 18|soyo_18|
    | 19|soyo_19|
    | 20|soyo_20|
    +---+-------+
    only showing top 20 rows

    +--------------------+
    |               value|
    +--------------------+
    |  key:1,value:soyo_1|
    |  key:2,value:soyo_2|
    |  key:3,value:soyo_3|
    |  key:4,value:soyo_4|
    |  key:5,value:soyo_5|
    |  key:6,value:soyo_6|
    |  key:7,value:soyo_7|
    |  key:8,value:soyo_8|
    |  key:9,value:soyo_9|
    |key:10,value:soyo_10|
    |key:11,value:soyo_11|
    |key:12,value:soyo_12|
    |key:13,value:soyo_13|
    |key:14,value:soyo_14|
    |key:15,value:soyo_15|
    |key:16,value:soyo_16|
    |key:17,value:soyo_17|
    |key:18,value:soyo_18|
    |key:19,value:soyo_19|
    |key:20,value:soyo_20|
    +--------------------+
    only showing top 20 rows

  • 相关阅读:
    python 计时累积超过24小时时继续往上累加
    linux 下获取文件最后几行
    unbuntu 安装python包提示E: Unable to locate package python-timeout
    python 计时器
    jquery中html()、text()、val()的区别
    DESC和 ACS
    jQuery自动截取文字长度,超过部分
    Spring MVC 注解
    注解笔记
    Spring Data JPA初使用 *****重要********
  • 原文地址:https://www.cnblogs.com/soyo/p/7656322.html
Copyright © 2011-2022 走看看