zoukankan      html  css  js  c++  java
  • spark 配置使用

    介绍

    • Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs. It also supports **a rich set of higher-level tools **including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming.
    • 直接到官网下载代码,
    • 需要安装jdk,java已经设置到系统路径PATH,或者设置JAVA_HOME。

    使用

    • 运行例子:
      use bin/run-example [params] in the top-level Spark directory.
      ./bin/run-example SparkPi 10
    • 使用交互探索Spark,不过是scala语言
      ./bin/spark-shell --master local[2]
      使用run Spark shell with the --help option可以查命令参数.
    • Spark also provides a Python API. To run Spark interactively in a Python interpreter, use bin/pyspark:
      ./bin/pyspark --master local[2]
      也可以将这个包导入到python的包文件里,这里提供了方法:
      http://blog.csdn.net/sinat_26599509/article/details/51895999
    • Example applications are also provided in Python.当然可以自学去看怎么是实现。可以运行:
      ./bin/spark-submit examples/src/main/python/pi.py 10

    • 对于R,也是一样的可以运行交互环境 sparkR。也可以和上面使用spark-submit 运行R文件

  • 相关阅读:
    MongoDB笔记
    mysql笔记
    08-下载中间件
    ubuntu1804安装pycharm2018.3.x
    02-CSS基础
    14-eval 函数
    13-文件
    12-模块和包
    11-异常
    有关dir和 unittest
  • 原文地址:https://www.cnblogs.com/logmopeng/p/7439894.html
Copyright © 2011-2022 走看看