zoukankan      html  css  js  c++  java
  • 大数据学习——sparkSql

    官网http://spark.apache.org/docs/1.6.2/sql-programming-guide.html

    val sc: SparkContext // An existing SparkContext.
    val sqlContext = new org.apache.spark.sql.SQLContext(sc)
    val df = sqlContext.read.json("hdfs://mini1:9000/person.json")
    1.在本地创建一个文件,有三列,分别是id、name、age,用空格分隔,然后上传到hdfs上
    hdfs dfs -put person.json /
    
    2.在spark shell执行下面命令,读取数据,将每一行的数据使用列分隔符分割
    val lineRDD = sc.textFile("hdfs://mini1:9000/person.json").map(_.split(" ")) 

    3.定义case class(相当于表的schema) case class Person(id:Int, name:String, age:Int)

    4.将RDD和case class关联 val personRDD = lineRDD.map(x => Person(x(0).toInt, x(1), x(2).toInt))

    5.将RDD转换成DataFrame val personDF = personRDD.toDF

    6.对DataFrame进行处理 personDF.show

     DSL风格语法

     

     SQL风格语法 

    scala> val dataRDD=sc.textFile("hdfs://mini1:9000/person.json")
    dataRDD: org.apache.spark.rdd.RDD[String] = hdfs://mini1:9000/person.json MapPartitionsRDD[120] at textFile at <console>:27
    
    scala> case class Person(id:Int ,name: String, age: Int)
    defined class Person
    
    scala> val personDF=dataRDD.map(_.split(" ")).map(x=> Person(x(0).toInt,x(1),x(2).toInt)).toDF()
    scala>  personDF.registerTempTable("t_person")

    SparkSqlTest
    package org.apache.spark
    
    import org.apache.spark.rdd.RDD
    import org.apache.spark.sql.{DataFrame, SQLContext}
    
    /**
      * Created by Administrator on 2019/6/12.
      */
    object SparkSqlTest {
      def main(args: Array[String]) {
        val conf = new SparkConf().setAppName("sparksql").setMaster("local[1]")
        val sc = new SparkContext(conf)
        val sqlContext = new SQLContext(sc)
        val file: RDD[String] = sc.textFile("hdfs://mini1:9000/person.json")
        val personRDD = file.map(_.split(" ")).map(x => Person(x(0).toInt, x(1), x(2).toInt))
        import sqlContext.implicits._
        val personDF: DataFrame = personRDD.toDF()
        personDF.registerTempTable("t_person")
        sqlContext.sql("select * from t_person").show
    
      }
    }
    
    case class Person(id: Int, name: String, age: Int)
    +---+--------+---+
    | id| name|age|
    +---+--------+---+
    | 1|zhangsan| 23|
    | 2| wangwu| 34|
    | 3| lisi| 43|
    +---+--------+---+
  • 相关阅读:
    sql server profiler 对TextData进行过滤
    简单账表"小计"无法正常显示
    从字符串转换日期和/或时间时,转换失败。
    [转载]Java中的final与static的区别
    POI Excel导出样式设置
    [转载]poi 设置Region后单元格边框不起作用
    [转载]将java程序编译成独立运行的exe文件
    Java 线程安全问题—synchronized锁机制
    彻底理解ThreadLocal
    ThreadLocal封装Connection,实现同一线程共享资源
  • 原文地址:https://www.cnblogs.com/feifeicui/p/11011787.html
Copyright © 2011-2022 走看看