zoukankan      html  css  js  c++  java
  • SparkSQL External Datasource简易使用之CSV

    下载源码&编译:

    git clone https://github.com/databricks/spark-csv.git
    sbt/sbt assembly

    Maven GAV:

    groupId: com.databricks.spark
    artifactId: spark-csv_2.10
    version: 0.2.0

    $SPARK_HOME/conf/spark-env.sh

    export SPARK_CLASSPATH=/home/spark/software/source/spark_package/spark-csv/target/scala-2.10/spark-csv-assembly-0.2.0.jar:$SPARK_CLASSPATH

    测试数据下载:

    wget https://github.com/databricks/spark-csv/raw/master/src/test/resources/cars.csv 

    Scala API:

    import org.apache.spark.sql.SQLContext
    val sqlContext = new SQLContext(sc)
    import com.databricks.spark.csv._
    val cars = sqlContext.csvFile("file:///home/spark/software/data/cars.csv")
    cars.collect

    SQL:

    CREATE TABLE cars
    USING com.databricks.spark.csv
    OPTIONS (path "file:///home/spark/software/data/cars.csv", header "true");
    
    select * from cars;

    或者

    CREATE TABLE cars (yearMade double, carMake string, carModel string, comments string, blank string)
    USING com.databricks.spark.csv
    OPTIONS (path "cars.csv", header "true")
    select * from cars;
  • 相关阅读:
    [转]system函数返回值探究
    [转]bat中的特殊字符,以及需要在bat中当做字符如何处理
    [转]null和""以及==与equals的区别
    粘包问题
    并发编程
    GIL锁
    五种IO模型
    css选择器
    并发与串行
    模块(二)
  • 原文地址:https://www.cnblogs.com/luogankun/p/4181884.html
Copyright © 2011-2022 走看看