zoukankan      html  css  js  c++  java
  • 大数据学习——sparkSql

    官网http://spark.apache.org/docs/1.6.2/sql-programming-guide.html

    val sc: SparkContext // An existing SparkContext.
    val sqlContext = new org.apache.spark.sql.SQLContext(sc)
    val df = sqlContext.read.json("hdfs://mini1:9000/person.json")
    1.在本地创建一个文件,有三列,分别是id、name、age,用空格分隔,然后上传到hdfs上
    hdfs dfs -put person.json /
    
    2.在spark shell执行下面命令,读取数据,将每一行的数据使用列分隔符分割
    val lineRDD = sc.textFile("hdfs://mini1:9000/person.json").map(_.split(" ")) 

    3.定义case class(相当于表的schema) case class Person(id:Int, name:String, age:Int)

    4.将RDD和case class关联 val personRDD = lineRDD.map(x => Person(x(0).toInt, x(1), x(2).toInt))

    5.将RDD转换成DataFrame val personDF = personRDD.toDF

    6.对DataFrame进行处理 personDF.show

     DSL风格语法

     

     SQL风格语法 

    scala> val dataRDD=sc.textFile("hdfs://mini1:9000/person.json")
    dataRDD: org.apache.spark.rdd.RDD[String] = hdfs://mini1:9000/person.json MapPartitionsRDD[120] at textFile at <console>:27
    
    scala> case class Person(id:Int ,name: String, age: Int)
    defined class Person
    
    scala> val personDF=dataRDD.map(_.split(" ")).map(x=> Person(x(0).toInt,x(1),x(2).toInt)).toDF()
    scala>  personDF.registerTempTable("t_person")

    SparkSqlTest
    package org.apache.spark
    
    import org.apache.spark.rdd.RDD
    import org.apache.spark.sql.{DataFrame, SQLContext}
    
    /**
      * Created by Administrator on 2019/6/12.
      */
    object SparkSqlTest {
      def main(args: Array[String]) {
        val conf = new SparkConf().setAppName("sparksql").setMaster("local[1]")
        val sc = new SparkContext(conf)
        val sqlContext = new SQLContext(sc)
        val file: RDD[String] = sc.textFile("hdfs://mini1:9000/person.json")
        val personRDD = file.map(_.split(" ")).map(x => Person(x(0).toInt, x(1), x(2).toInt))
        import sqlContext.implicits._
        val personDF: DataFrame = personRDD.toDF()
        personDF.registerTempTable("t_person")
        sqlContext.sql("select * from t_person").show
    
      }
    }
    
    case class Person(id: Int, name: String, age: Int)
    +---+--------+---+
    | id| name|age|
    +---+--------+---+
    | 1|zhangsan| 23|
    | 2| wangwu| 34|
    | 3| lisi| 43|
    +---+--------+---+
  • 相关阅读:
    linux-nginx
    mysql数据库的多实例与主从同步。
    MySQL的命令
    Mysql的管理
    linux之mariadb的安装
    Linux进程基础
    linux网络基础
    解锁HMC8及HMC9的root用户
    RHEL8.0-beta-1.ISO
    RHEL6误安装RHEL7的包导致glibc被升级后系统崩溃处理方法
  • 原文地址:https://www.cnblogs.com/feifeicui/p/11011787.html
Copyright © 2011-2022 走看看