zoukankan      html  css  js  c++  java
  • 【慕课网实战】九、以慕课网日志分析为例 进入大数据 Spark SQL 的世界

    即席查询
    普通查询

    Load Data
    1) RDD DataFrame/Dataset
    2) Local Cloud(HDFS/S3)

    将数据加载成RDD
    val masterLog = sc.textFile("file:///home/hadoop/app/spark-2.1.0-bin-2.6.0-cdh5.7.0/logs/spark-hadoop-org.apache.spark.deploy.master.Master-1-hadoop001.out")
    val workerLog = sc.textFile("file:///home/hadoop/app/spark-2.1.0-bin-2.6.0-cdh5.7.0/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-hadoop001.out")
    val allLog = sc.textFile("file:///home/hadoop/app/spark-2.1.0-bin-2.6.0-cdh5.7.0/logs/*out*")

    masterLog.count
    workerLog.count
    allLog.count

    存在的问题:使用使用SQL进行查询呢?

    import org.apache.spark.sql.Row
    val masterRDD = masterLog.map(x => Row(x))
    import org.apache.spark.sql.types._
    val schemaString = "line"

    val fields = schemaString.split(" ").map(fieldName => StructField(fieldName, StringType, nullable = true))
    val schema = StructType(fields)

    val masterDF = spark.createDataFrame(masterRDD, schema)
    masterDF.show

    JSON/Parquet
    val usersDF = spark.read.format("parquet").load("file:///home/hadoop/app/spark-2.1.0-bin-2.6.0-cdh5.7.0/examples/src/main/resources/users.parquet")
    usersDF.show

    spark.sql("select * from parquet.`file:///home/hadoop/app/spark-2.1.0-bin-2.6.0-cdh5.7.0/examples/src/main/resources/users.parquet`").show

    Drill 大数据处理框架

    从Cloud读取数据: HDFS/S3
    val hdfsRDD = sc.textFile("hdfs://path/file")
    val s3RDD = sc.textFile("s3a://bucket/object")
    s3a/s3n

    spark.read.format("text").load("hdfs://path/file")
    spark.read.format("text").load("s3a://bucket/object")

    val df=spark.read.format("json").load("file:///home/hadoop/app/spark-2.1.0-bin-2.6.0-cdh5.7.0/examples/src/main/resources/people.json")

    df.show

    TPC-DS

    spark-packages.org

  • 相关阅读:
    利用STM32播放音乐
    在MDK中使用$Sub$$和$Super$$的记录
    printf函数输出格式控制记录
    I2C软件实现
    C语言单项链表
    CreateEvent函数使用记录
    C语言宏定义使用记录
    GIT推送本地数据到远程空仓库
    2020-ECCV-Local Correlation Consistency for Knowledge Distillation阅读笔记
    2020-ECCV-Feature Normalized Knowledge Distillation for Image Classfication阅读笔记
  • 原文地址:https://www.cnblogs.com/kkxwz/p/8493777.html
Copyright © 2011-2022 走看看