zoukankan      html  css  js  c++  java
  • Using Spark's "Hadoop Free" Build

    Spark uses Hadoop client libraries for HDFS and YARN. Starting in version Spark 1.4, the project packages “Hadoop free” builds that lets you more easily connect a single Spark binary to any Hadoop version. To use these builds, you need to modify SPARK_DIST_CLASSPATH to include Hadoop’s package jars. The most convenient place to do this is by adding an entry in conf/spark-env.sh.

    This page describes how to connect Spark to Hadoop for different types of distributions.

    Spark使用HDFS和YARN的Hadoop客户端库。 从Spark 1.4版本开始,项目包“Hadoop free”构建,可让您更轻松地将单个Spark二进制文件连接到任何Hadoop版本。 要使用这些构建,您需要修改SPARK_DIST_CLASSPATH以包含Hadoop的包jar。 最方便的地方是在conf / spark-env.sh中添加一个条目。

    本页介绍如何将Spark连接到Hadoop以用于不同类型的分发。

    Apache Hadoop

    For Apache distributions, you can use Hadoop’s ‘classpath’ command. For instance:

     1 ### in conf/spark-env.sh ###
     2 
     3 # If 'hadoop' binary is on your PATH
     4 export SPARK_DIST_CLASSPATH=$(hadoop classpath)
     5 
     6 # With explicit path to 'hadoop' binary
     7 export SPARK_DIST_CLASSPATH=$(/path/to/hadoop/bin/hadoop classpath)
     8 
     9 # Passing a Hadoop configuration directory
    10 export SPARK_DIST_CLASSPATH=$(hadoop --config /path/to/configs classpath)
  • 相关阅读:
    打造分布式爬虫
    vue入门-常用指令操作
    爬虫练习-爬取小说
    爬虫项目-爬取亚马逊商品信息
    爬虫框架_scrapy1
    CIE-LUV是什么颜色特征
    多目标跟踪baseline methods
    时间序列识别代码调试版本1
    拓扑空间1
    ps cs6破解
  • 原文地址:https://www.cnblogs.com/hmy-blog/p/7837257.html
Copyright © 2011-2022 走看看