zoukankan      html  css  js  c++  java
  • 【慕课网实战】九、以慕课网日志分析为例 进入大数据 Spark SQL 的世界

    即席查询
    普通查询

    Load Data
    1) RDD DataFrame/Dataset
    2) Local Cloud(HDFS/S3)

    将数据加载成RDD
    val masterLog = sc.textFile("file:///home/hadoop/app/spark-2.1.0-bin-2.6.0-cdh5.7.0/logs/spark-hadoop-org.apache.spark.deploy.master.Master-1-hadoop001.out")
    val workerLog = sc.textFile("file:///home/hadoop/app/spark-2.1.0-bin-2.6.0-cdh5.7.0/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-hadoop001.out")
    val allLog = sc.textFile("file:///home/hadoop/app/spark-2.1.0-bin-2.6.0-cdh5.7.0/logs/*out*")

    masterLog.count
    workerLog.count
    allLog.count

    存在的问题:使用使用SQL进行查询呢?

    import org.apache.spark.sql.Row
    val masterRDD = masterLog.map(x => Row(x))
    import org.apache.spark.sql.types._
    val schemaString = "line"

    val fields = schemaString.split(" ").map(fieldName => StructField(fieldName, StringType, nullable = true))
    val schema = StructType(fields)

    val masterDF = spark.createDataFrame(masterRDD, schema)
    masterDF.show

    JSON/Parquet
    val usersDF = spark.read.format("parquet").load("file:///home/hadoop/app/spark-2.1.0-bin-2.6.0-cdh5.7.0/examples/src/main/resources/users.parquet")
    usersDF.show

    spark.sql("select * from parquet.`file:///home/hadoop/app/spark-2.1.0-bin-2.6.0-cdh5.7.0/examples/src/main/resources/users.parquet`").show

    Drill 大数据处理框架

    从Cloud读取数据: HDFS/S3
    val hdfsRDD = sc.textFile("hdfs://path/file")
    val s3RDD = sc.textFile("s3a://bucket/object")
    s3a/s3n

    spark.read.format("text").load("hdfs://path/file")
    spark.read.format("text").load("s3a://bucket/object")

    val df=spark.read.format("json").load("file:///home/hadoop/app/spark-2.1.0-bin-2.6.0-cdh5.7.0/examples/src/main/resources/people.json")

    df.show

    TPC-DS

    spark-packages.org

  • 相关阅读:
    LyX使用中的一些问题
    Mac OS apache php配置
    MySQL utf8mb4 字符集:支持 emoji 表情符号
    java.util.NoSuchElementException: Timeout waiting for idle object
    MyEclipse 2014跟2015破解
    No row with the given identifier exists:
    Android启动icon切图大小
    Android接入百度自动更新SDK
    Android自定义spinner下拉框实现的实现
    android给View设置边框 填充颜色 弧度
  • 原文地址:https://www.cnblogs.com/kkxwz/p/8493777.html
Copyright © 2011-2022 走看看