zoukankan      html  css  js  c++  java
  • sparkR读取csv文件

    sparkR读取csv文件

    The general method for creating SparkDataFrames from data sources is read.df. This method takes in the path for the file to load and the type of data source, and the currently active SparkSession will be used automatically. SparkR supports reading JSON, CSV and Parquet files natively, and through packages available from sources like Third Party Projects, you can find data source connectors for popular file formats like Avro. These packages can either be added by specifying --packages with spark-submit or sparkR commands, or if initializing SparkSession with sparkPackages parameter when in an interactive R shell or from RStudio.

    http://spark.apache.org/docs/latest/sparkr.html

    那spark-csv_2.11-1.4.0.jar包并不是一个R包,不需要安装,在我们的机器没有网的情况下,你下载的jar包根本不知道要放置在哪里?然后我通过在有网的环境下下载并使用该jar包,得知应该放在如下路径:

    (1)   你的R用户的工作目录下的一个子目录下,如:

    /home/summer/.ivy2/cache/com.databricks/spark-csv_2.11/jars/spark-csv_2.11-1.4.0.jar

    (2)   /root/.ivy2/cache/com.databricks/spark-csv_2.11/jars/spark-csv_2.11-1.4.0.jar

    注意安装的scala版本与上面的jar包的对应,此处scala应为2.11版本。

    .// bin/spark-shell

    bin/spark-shell --packages com.databricks:spark-csv_2.11:1.4.0

    =====================================

    .libPaths(c(file.path(Sys.getenv('SPARK_HOME'), 'R', 'lib'), .libPaths()))
    library(SparkR)

     Sys.setenv('SPARKR_SUBMIT_ARGS'='"--packages" "com.databricks:spark-csv_2.11:1.4.0" "sparkr-shell"')

    sc <- sparkR.init(master="local[*]",sparkPackages=”com.databricks:spark-csv_2.11:1.4.0”, sparkEnvir = list(spark.driver.memory="2g"))
    sqlContext <- sparkRSQL.init(sc)

    setwd(“~/hgData”)
    hgdata<-read.csv(sqlContext ,"db1014.csv",header = TRUE,colClasses=list('character','character','character','character','character','character','numeric','Date'))
  • 相关阅读:
    jfinal 导出excle
    SpringBoot添加多数据源mysql和oracle
    Centos6.8搭建Git服务(git版本可选)(转)
    阿里云服务器搭建java环境(jdk+tomcat+oracle11g)
    Oracle 迁移数据库到 mysql
    mysql 迁移数据库到 oracle (sql注意问题)
    nginx学习之——虚拟主机配置
    mongodb 学习之——mongod --help 中文解释
    win 7 MongoDB 下载安装
    nginx学习之——信号控制和配置
  • 原文地址:https://www.cnblogs.com/llphhl/p/6102969.html
Copyright © 2011-2022 走看看