zoukankan      html  css  js  c++  java
  • sparkR could not find function "textFile"

    Yeah, that’s probably because the head() you’re invoking there is defined for SparkR DataFrames
    [1] (note how you don’t have to use the SparkR::: namepsace in front of it), but SparkR:::textFile()
    returns an RDD object, which is more like a distributed list data structure the way you’re
    applying it over that .md text file. If you want to look at the first item or first several
    items in the RDD, I think you want to use SparkR:::first() or SparkR:::take(), both of which
    are applied to RDDs.


    Just remember that all the functions described in the public API [2] for SparkR right now
    are related mostly to working with DataFrames. You’ll have to use the R command line doc
    or look at the RDD source code for all the private functions you might want (which includes
    the doc strings used to make the R doc), whichever you find easier.


    Alek


    [1] -- http://spark.apache.org/docs/latest/api/R/head.html
    [2] -- https://spark.apache.org/docs/latest/api/R/index.html
    [3] -- https://github.com/apache/spark/blob/master/R/pkg/R/RDD.R


    From: Wei Zhou <zhweisophie@gmail.com<mailto:zhweisophie@gmail.com>>
    Date: Thursday, June 25, 2015 at 3:49 PM
    To: Aleksander Eskilson <Alek.Eskilson@cerner.com<mailto:Alek.Eskilson@cerner.com>>
    Cc: "user@spark.apache.org<mailto:user@spark.apache.org>" <user@spark.apache.org<mailto:user@spark.apache.org>>
    Subject: Re: sparkR could not find function "textFile"


    Hi Alek,


    Just a follow up question. This is what I did in sparkR shell:


    lines <- SparkR:::textFile(sc, "./README.md")
    head(lines)


    And I am getting error:


    "Error in x[seq_len(n)] : object of type 'S4' is not subsettable"


    I'm wondering what did I do wrong. Thanks in advance.


    Wei


    2015-06-25 13:44 GMT-07:00 Wei Zhou <zhweisophie@gmail.com<mailto:zhweisophie@gmail.com>>:
    Hi Alek,


    Thanks for the explanation, it is very helpful.


    Cheers,
    Wei


    2015-06-25 13:40 GMT-07:00 Eskilson,Aleksander <Alek.Eskilson@cerner.com<mailto:Alek.Eskilson@cerner.com>>:
    Hi there,


    The tutorial you’re reading there was written before the merge of SparkR for Spark 1.4.0
    For the merge, the RDD API (which includes the textFile() function) was made private, as the
    devs felt many of its functions were too low level. They focused instead on finishing the
    DataFrame API which supports local, HDFS, and Hive/HBase file reads. In the meantime, the
    devs are trying to determine which functions of the RDD API, if any, should be made public
    again. You can see the rationale behind this decision on the issue’s JIRA [1].


    You can still make use of those now private RDD functions by prepending the function call
    with the SparkR private namespace, for example, you’d use
    SparkR:::textFile(…).


    Hope that helps,
    Alek


    [1] -- https://issues.apache.org/jira/browse/SPARK-7230<https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_SPARK-2D7230&d=AwMFaQ&c=NRtzTzKNaCCmhN_9N2YJR-XrNU1huIgYP99yDsEzaJo&r=0vZw1rBdgaYvDJYLyKglbrax9kvQfRPdzxLUyWSyxPM&m=7RxLcWCdPWHoYk05KGwnohDZDileOX4Wo7Ht5SFge4I&s=ruNsApqV-sn8sBzSgJW0PIZ5beD_TvhLulQjeabR7p8&e=>


    From: Wei Zhou <zhweisophie@gmail.com<mailto:zhweisophie@gmail.com>>
    Date: Thursday, June 25, 2015 at 3:33 PM
    To: "user@spark.apache.org<mailto:user@spark.apache.org>" <user@spark.apache.org<mailto:user@spark.apache.org>>
    Subject: sparkR could not find function "textFile"


    Hi all,


    I am exploring sparkR by activating the shell and following the tutorial here https://amplab-extras.github.io/SparkR-pkg/<https://urldefense.proofpoint.com/v2/url?u=https-3A__amplab-2Dextras.github.io_SparkR-2Dpkg_&d=AwMFaQ&c=NRtzTzKNaCCmhN_9N2YJR-XrNU1huIgYP99yDsEzaJo&r=0vZw1rBdgaYvDJYLyKglbrax9kvQfRPdzxLUyWSyxPM&m=aL4A2Pv9tHbhgJUX-EnuYx2HntTnrqVpegm6Ag-FwnQ&s=qfOET1UvP0ECAKgnTJw8G13sFTi_PhiJ8Q89fMSgH_Q&e=>


    And when I tried to read in a local file with textFile(sc, "file_location"), it gives an error
    could not find function "textFile".


    By reading through sparkR doc for 1.4, it seems that we need sqlContext to import data, for
    example.


    people <- read.df(sqlContext, "./examples/src/main/resources/people.json", "json"


    )
    And we need to specify the file type.


    My question is does sparkR stop supporting general type file importing? If not, would appreciate
    any help on how to do this.


    PS, I am trying to recreate the word count example in sparkR, and want to import README.md
    file, or just any file into sparkR.


    Thanks in advance.


    Best,
    Wei


    CONFIDENTIALITY NOTICE This message and any included attachments are from Cerner Corporation
    and are intended only for the addressee. The information contained in this message is confidential
    and may constitute inside or non-public information under international, federal, or state
    securities laws. Unauthorized forwarding, printing, copying, distribution, or use of such
    information is strictly prohibited and may be unlawful. If you are not the addressee, please
    promptly delete this message and notify the sender of the delivery error by e-mail or you
    may call Cerner's corporate offices in Kansas City, Missouri, U.S.A at (+1) (816)221-1024<tel:%28%2B1%29%20%28816%29221-1024>.

  • 相关阅读:
    Jenkins 基础篇
    Jenkins 基础篇
    Windows配置Nodejs环境
    Windows配置安装JDK
    Windows安装MySQL
    Ubuntu安装MySQL
    利用中国移动合彩云实现360云盘迁移到百度云
    Linux Shell下的后台运行及其前台的转换
    nova image-list 和 glance image-list 有什么区别
    启动虚拟机时提示我已移动或我已复制选项的详解
  • 原文地址:https://www.cnblogs.com/awishfullyway/p/6505283.html
Copyright © 2011-2022 走看看