zoukankan      html  css  js  c++  java
  • spark开发环境配置

    以后spark,mapreduce,mpi可能三者集于同一平台,各自的侧重点有所不用,相当于云计算与高性能计算的集合,互补,把spark的基础看了看,现在把开发环境看看,主要是看源码,最近Apache Spark源码走读系列挺好的,看了些。具体环境配置不是太复杂,具体可以看https://github.com/apache/spark

    1、代码下载

    git clone  https://github.com/apache/spark.git

    2、直接构建spark

    我是基于hadoop2.2.0的,因此执行如下:

    SPARK_HADOOP_VERSION=2.2.0 SPARK_YARN=true sbt/sbt assembly

    3、具体使用参考https://github.com/apache/spark

    Interactive Scala Shell

    The easiest way to start using Spark is through the Scala shell:

    ./bin/spark-shell
    

    Try the following command, which should return 1000:

    scala> sc.parallelize(1 to 1000).count()
    

    Interactive Python Shell

    Alternatively, if you prefer Python, you can use the Python shell:

    ./bin/pyspark
    

    And run the following command, which should also return 1000:

    >>> sc.parallelize(range(1000)).count()
    

    Example Programs

    Spark also comes with several sample programs in the examples directory. To run one of them, use./bin/run-example <class> [params]. For example:

    ./bin/run-example SparkPi
    

    will run the Pi example locally.

    You can set the MASTER environment variable when running examples to submit examples to a cluster. This can be a mesos:// or spark:// URL, "yarn-cluster" or "yarn-client" to run on YARN, and "local" to run locally with one thread, or "local[N]" to run locally with N threads. You can also use an abbreviated class name if the class is in the examples package. For instance:

    MASTER=spark://host:7077 ./bin/run-example SparkPi
    

    Many of the example programs print usage help if no params are given.

    Running Tests

    Testing first requires building Spark. Once Spark is built, tests can be run using:

    ./sbt/sbt test


    使用IDE,安装 Intellj Idea,并安装scala插件

    去idea官网下载idea的tar.gz包,解压就行。运行idea,安装scala插件。

    在源码根目录,使用如下命令

    ./sbt/sbt gen-idea
    

    就生成了idea项目文件。使用 idea,点击File->Open project,浏览到 incubator-spark文件夹,打开项目,就可以修改Spark代码了。

    具体参考:https://github.com/apache/spark

    http://cn.soulmachine.me/blog/20140130/

  • 相关阅读:
    linux shell习题
    The logback manual #02# Architecture
    The logback manual #01# Introduction
    算法导论(第三版)练习 10.1-1 ~ 10.1-7
    算法笔记 #006# 快速排序 × 算法导论(第三版)练习 7.1-1 ~ 7.1-4
    Linux笔记 #08# shell编程从零开始到低配学生管理系统
    Maven笔记 #01# 入门
    Java日志学习资料收集
    jsp中用EL读取了数据库里面的时间,怎么设置格式显示的格式
    ajax异步处理时,如何在JS中获取从Servlet或者Action中session,request
  • 原文地址:https://www.cnblogs.com/fengbing/p/3807131.html
Copyright © 2011-2022 走看看