zoukankan      html  css  js  c++  java
  • SparkR-Install

    SparkR-Install

    时间:2017-03-30 23:05:18      阅读:17      评论:0      收藏:0      [点我收藏+]

    标签:too   下载   安装jdk   context   writing   磁盘   anti   1.5   products   

    1.下载R

    https://cran.r-project.org/src/base/R-3/

     技术分享

    1.2 环境变量配置:

    技术分享

    1.3 测试安装:

    技术分享

    2.下载Rtools33

    https://cran.r-project.org/bin/windows/Rtools/

    技术分享

    2.1 配置环境变量

    技术分享

    2.2 测试:

    技术分享

    3.安装RStudio

        https://www.rstudio.com/products/rstudio/download/ 直接下一步即可安装

        技术分享

    4.安装JDK并设置环境变量

    4.1环境变量配置:

       技术分享

      技术分享

      技术分享

    4.2测试:

    技术分享技术分享

    5.下载Spark安装程序

      5.1 URL: http://spark.apache.org/downloads.html

        技术分享

         5.2解压到本地磁盘的对应目录

          技术分享

    6.安装Spark并设置环境变量

        技术分享

       技术分享

    7.测试SparkR

      技术分享

      技术分享

      注意:如果发现了提示 WARN NativeCodeLader:Unable to load native-hadoop library for your platform.....using

    builtin-java classes where applicable  需要安装本地的hadoop库

    8.下载hadoop库并安装

      http://hadoop.apache.org/releases.html

      技术分享

       技术分享

    9.设置hadoop环境变量

       技术分享

       技术分享

    10.重新测试SparkR

       10.1 如果测试时候出现以下提示,需要修改log4j文件INFO为WARN,位于sparkconf下

       技术分享

        10.2 修改conf中的log4j文件:

        技术分享

           技术分享

         10.3 重新运行SparkR

         技术分享

    11.运行SprkR代码

        在Spark2.0中增加了RSparkSql进行Sql查询

        dataframe为数据框操作

        data-manipulation为数据转化

        ml为机器学习

        技术分享

       11.1 使用crtl+ALT+鼠標左鍵 打开控制台在此文件夹下

      技术分享

      11.2 执行spark-submit xxx.R文件即可

     技术分享

    12.安装SparkR包

        12.1 将spark安装目录下的R/lib中的SparkR文件拷贝到..R-3.3.2library中,注意是将整个Spark文件夹,而非里面每一个文件。

        源文件夹:

        技术分享  

         目的文件夹:

            技术分享

         12.2  在RStudio中打开SparkR文件并运行代码dataframe.R文件,采用Ctrl+Enter一行行执行即可

    SparkR语言的dataframe.R源代码如下

    #
    # Licensed to the Apache Software Foundation (ASF) under one or more
    # contributor license agreements.  See the NOTICE file distributed with
    # this work for additional information regarding copyright ownership.
    # The ASF licenses this file to You under the Apache License, Version 2.0
    # (the "License"); you may not use this file except in compliance with
    # the License.  You may obtain a copy of the License at
    #
    #    http://www.apache.org/licenses/LICENSE-2.0
    #
    # Unless required by applicable law or agreed to in writing, software
    # distributed under the License is distributed on an "AS IS" BASIS,
    # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    # See the License for the specific language governing permissions and
    # limitations under the License.
    #
    
    library(SparkR)
    
    # Initialize SparkContext and SQLContext
    sc <- sparkR.init(appName="SparkR-DataFrame-example")
    sqlContext <- sparkRSQL.init(sc)
    
    # Create a simple local data.frame
    localDF <- data.frame(name=c("John", "Smith", "Sarah"), age=c(19, 23, 18))
    
    # Convert local data frame to a SparkR DataFrame
    df <- createDataFrame(sqlContext, localDF)
    
    # Print its schema
    printSchema(df)
    # root
    #  |-- name: string (nullable = true)
    #  |-- age: double (nullable = true)
    
    # Create a DataFrame from a JSON file
    path <- file.path(Sys.getenv("SPARK_HOME"), "examples/src/main/resources/people.json")
    peopleDF <- read.json(sqlContext, path)
    printSchema(peopleDF)
    
    # Register this DataFrame as a table.
    registerTempTable(peopleDF, "people")
    
    # SQL statements can be run by using the sql methods provided by sqlContext
    teenagers <- sql(sqlContext, "SELECT name FROM people WHERE age >= 13 AND age <= 19")
    
    # Call collect to get a local data.frame
    teenagersLocalDF <- collect(teenagers)
    
    # Print the teenagers in our dataset 
    print(teenagersLocalDF)
    
    # Stop the SparkContext now
    sparkR.stop()

    13.Rsudio 运行结果

          技术分享

    END~

  • 相关阅读:
    使用CTE分页 在MSSQL2005上可以使用
    uc_client目录
    用SQL语句添加删除修改字段
    for all your mad scientific needs think geek
    C++:Prototype模式去掉Clone方法
    linux命令:top
    linux命令:time
    C++:运行期断言和编译期断言
    内核分析:EXPORT_SYMBOL解析
    Linux工具:使用SED编辑器
  • 原文地址:https://www.cnblogs.com/awishfullyway/p/6677055.html
Copyright © 2011-2022 走看看