zoukankan      html  css  js  c++  java
  • SparkSQL External Datasource简易使用之AVRO

    下载源码&编译:

    git clone https://github.com/databricks/spark-avro.git
    sbt/sbt package

    Maven GAV:

    groupId: com.databricks.spark
    artifactId: spark-avro_2.10
    version: 0.1

    $SPARK_HOME/conf/spark-env.sh

    export SPARK_CLASSPATH=/home/spark/software/source/spark_package/spark-avro/target/scala-2.10/spark-avro_2.10-0.1.jar:$SPARK_CLASSPATH

    测试数据下载:

    wget https://github.com/databricks/spark-avro/raw/master/src/test/resources/episodes.avro 

    Scala API:

    import org.apache.spark.sql.SQLContext
    val sqlContext = new SQLContext(sc)
    import com.databricks.spark.avro._
    val episodes = sqlContext.avroFile("file:///home/spark/software/data/episodes.avro")
    import sqlContext._
    episodes.select('title).collect()

    SQL:

    CREATE TEMPORARY TABLE episodes
    USING com.databricks.spark.avro
    OPTIONS (path "file:///home/spark/software/data/episodes.avro");
    
    select * from episodes;
  • 相关阅读:
    4.14打印特殊图案
    4.13十进制/二进制转换器
    4.12程序运行时间
    4.11 计算文件的大小
    4.10文件的读写
    4.9位运算
    CyclicBarrier
    tar 命令
    MySQL 常用函数介绍
    mysql 表转 java 实体 sql
  • 原文地址:https://www.cnblogs.com/luogankun/p/4181873.html
Copyright © 2011-2022 走看看