zoukankan      html  css  js  c++  java
  • Spark单机编译(on CentOS 6)

    注:1. 编译Spark之前,需要搭建Java和Scala环境,参见http://www.cnblogs.com/kevingu/p/4418779.html

         2. Spark之前使用sbt进行编译,现在建议使用maven并兼容sbt,但会逐步淘汰sbt编译方式。本文使用Maven工具编译Spark 1.2.0。

    一、Maven工具搭建

    (I)从http://maven.apache.org/download.cgi下载Maven二进制安装包apache-maven-3.2.5-bin.tar.gz,解压后放在/usr/maven目录下。

    (II)添加环境变量

    export M2_HOME=/usr/maven/apache-maven-3.2.5
    export PATH=$PATH:$M2_HOME/bin
    export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"

    (III)编辑/usr/maven/apache-maven-3.2.5/conf/settings.xml配置文件(主要为<proxies><mirrors><profiles>标签更新源使用国内http://maven.oschina.net/

    <?xml version="1.0" encoding="UTF-8"?>
    
    <!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor 
        license agreements. See the NOTICE file distributed with this work for additional 
        information regarding copyright ownership. The ASF licenses this file to 
        you under the Apache License, Version 2.0 (the "License"); you may not use 
        this file except in compliance with the License. You may obtain a copy of 
        the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required 
        by applicable law or agreed to in writing, software distributed under the 
        License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS 
        OF ANY KIND, either express or implied. See the License for the specific 
        language governing permissions and limitations under the License. -->
    
    <!-- | This is the configuration file for Maven. It can be specified at two 
        levels: | | 1. User Level. This settings.xml file provides configuration 
        for a single user, | and is normally provided in ${user.home}/.m2/settings.xml. 
        | | NOTE: This location can be overridden with the CLI option: | | -s /path/to/user/settings.xml 
        | | 2. Global Level. This settings.xml file provides configuration for all 
        Maven | users on a machine (assuming they're all using the same Maven | installation). 
        It's normally provided in | ${maven.home}/conf/settings.xml. | | NOTE: This 
        location can be overridden with the CLI option: | | -gs /path/to/global/settings.xml 
        | | The sections in this sample file are intended to give you a running start 
        at | getting the most out of your Maven installation. Where appropriate, 
        the default | values (values used when the setting is not specified) are 
        provided. | | -->
    <settings xmlns="http://maven.apache.org/SETTINGS/1.0.0"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0 http://maven.apache.org/xsd/settings-1.0.0.xsd">
        <!-- localRepository | The path to the local repository maven will use to 
            store artifacts. | | Default: ${user.home}/.m2/repository 
        -->
            <!--localRepository>F:/Maven/repo/m2/</localRepository-->
    
        <!-- interactiveMode | This will determine whether maven prompts you when 
            it needs input. If set to false, | maven will use a sensible default value, 
            perhaps based on some other setting, for | the parameter in question. | | 
            Default: true <interactiveMode>true</interactiveMode> -->
    
        <!-- offline | Determines whether maven should attempt to connect to the 
            network when executing a build. | This will have an effect on artifact downloads, 
            artifact deployment, and others. | | Default: false <offline>false</offline> -->
    
        <!-- pluginGroups | This is a list of additional group identifiers that 
            will be searched when resolving plugins by their prefix, i.e. | when invoking 
            a command line like "mvn prefix:goal". Maven will automatically add the group 
            identifiers | "org.apache.maven.plugins" and "org.codehaus.mojo" if these 
            are not already contained in the list. | -->
        <pluginGroups>
            <!-- pluginGroup | Specifies a further group identifier to use for plugin 
                lookup. <pluginGroup>com.your.plugins</pluginGroup> -->
        </pluginGroups>
    
        <!-- proxies | This is a list of proxies which can be used on this machine 
            to connect to the network. | Unless otherwise specified (by system property 
            or command-line switch), the first proxy | specification in this list marked 
            as active will be used. | -->
         <proxies>
                <!--<proxy>
                <id>optional</id>
                <active>true</active>
                <protocol>http</protocol>
                <host>10.22.98.21</host>
                <port>8080</port>
            </proxy>
            -->
        </proxies> 
    
        <!-- servers | This is a list of authentication profiles, keyed by the server-id 
            used within the system. | Authentication profiles can be used whenever maven 
            must make a connection to a remote server. | -->
        <servers>
            <!-- server | Specifies the authentication information to use when connecting 
                to a particular server, identified by | a unique name within the system (referred 
                to by the 'id' attribute below). | | NOTE: You should either specify username/password 
                OR privateKey/passphrase, since these pairings are | used together. | <server> 
                <id>deploymentRepo</id> <username>repouser</username> <password>repopwd</password> 
                </server> -->
    
            <!-- Another sample, using keys to authenticate. <server> <id>siteServer</id> 
                <privateKey>/path/to/private/key</privateKey> <passphrase>optional; leave 
                empty if not used.</passphrase> </server> -->
        </servers>
    
        <!-- mirrors | This is a list of mirrors to be used in downloading artifacts 
            from remote repositories. | | It works like this: a POM may declare a repository 
            to use in resolving certain artifacts. | However, this repository may have 
            problems with heavy traffic at times, so people have mirrored | it to several 
            places. | | That repository definition will have a unique id, so we can create 
            a mirror reference for that | repository, to be used as an alternate download 
            site. The mirror site will be the preferred | server for that repository. 
            | -->
        <mirrors>
            <!-- mirror | Specifies a repository mirror site to use instead of a given 
                repository. The repository that | this mirror serves has an ID that matches 
                the mirrorOf element of this mirror. IDs are used | for inheritance and direct 
                lookup purposes, and must be unique across the set of mirrors. | -->
            <mirror>
                <id>nexus-osc</id>
                <mirrorOf>central</mirrorOf>
                <name>Nexus osc</name>
                <url>http://maven.oschina.net/content/groups/public/</url>
            </mirror>
            <mirror>
                <id>nexus-osc-thirdparty</id>
                <mirrorOf>thirdparty</mirrorOf>
                <name>Nexus osc thirdparty</name>
                <url>http://maven.oschina.net/content/repositories/thirdparty/</url>
            </mirror>
    
        </mirrors>
    
        <!-- profiles | This is a list of profiles which can be activated in a variety 
            of ways, and which can modify | the build process. Profiles provided in the 
            settings.xml are intended to provide local machine- | specific paths and 
            repository locations which allow the build to work in the local environment. 
            | | For example, if you have an integration testing plugin - like cactus 
            - that needs to know where | your Tomcat instance is installed, you can provide 
            a variable here such that the variable is | dereferenced during the build 
            process to configure the cactus plugin. | | As noted above, profiles can 
            be activated in a variety of ways. One way - the activeProfiles | section 
            of this document (settings.xml) - will be discussed later. Another way essentially 
            | relies on the detection of a system property, either matching a particular 
            value for the property, | or merely testing its existence. Profiles can also 
            be activated by JDK version prefix, where a | value of '1.4' might activate 
            a profile when the build is executed on a JDK version of '1.4.2_07'. | Finally, 
            the list of active profiles can be specified directly from the command line. 
            | | NOTE: For profiles defined in the settings.xml, you are restricted to 
            specifying only artifact | repositories, plugin repositories, and free-form 
            properties to be used as configuration | variables for plugins in the POM. 
            | | -->
        <profiles>
            <!-- profile | Specifies a set of introductions to the build process, to 
                be activated using one or more of the | mechanisms described above. For inheritance 
                purposes, and to activate profiles via <activatedProfiles/> | or the command 
                line, profiles have to have an ID that is unique. | | An encouraged best 
                practice for profile identification is to use a consistent naming convention 
                | for profiles, such as 'env-dev', 'env-test', 'env-production', 'user-jdcasey', 
                'user-brett', etc. | This will make it more intuitive to understand what 
                the set of introduced profiles is attempting | to accomplish, particularly 
                when you only have a list of profile id's for debug. | | This profile example 
                uses the JDK version to trigger activation, and provides a JDK-specific repo. -->
            <profile>
                <id>jdk-1.8</id>
    
                <activation>
                    <jdk>1.8</jdk>
                </activation>
    
                <repositories>
                    <repository>
                        <id>nexus</id>
                        <name>local private nexus</name>
                        <url>http://maven.oschina.net/content/groups/public/</url>
                        <releases>
                            <enabled>true</enabled>
                        </releases>
                        <snapshots>
                            <enabled>false</enabled>
                        </snapshots>
                    </repository>
                    <repository>
                                    <id>osc_thirdparty</id>
                                    <url>http://maven.oschina.net/content/repositories/thirdparty/</url>
                            </repository>
                </repositories>
                <pluginRepositories>
                    <pluginRepository>
                        <id>nexus</id>
                        <name>local private nexus</name>
                        <url>http://maven.oschina.net/content/groups/public/</url>
                        <releases>
                            <enabled>true</enabled>
                        </releases>
                        <snapshots>
                            <enabled>false</enabled>
                        </snapshots>
                    </pluginRepository>
                </pluginRepositories>
            </profile>
    
    
            <!-- | Here is another profile, activated by the system property 'target-env' 
                with a value of 'dev', | which provides a specific path to the Tomcat instance. 
                To use this, your plugin configuration | might hypothetically look like: 
                | | ... | <plugin> | <groupId>org.myco.myplugins</groupId> | <artifactId>myplugin</artifactId> 
                | | <configuration> | <tomcatLocation>${tomcatPath}</tomcatLocation> | </configuration> 
                | </plugin> | ... | | NOTE: If you just wanted to inject this configuration 
                whenever someone set 'target-env' to | anything, you could just leave off 
                the <value/> inside the activation-property. | <profile> <id>env-dev</id> 
                <activation> <property> <name>target-env</name> <value>dev</value> </property> 
                </activation> <properties> <tomcatPath>/path/to/tomcat/instance</tomcatPath> 
                </properties> </profile> -->
        </profiles>
    
        <!-- activeProfiles | List of profiles that are active for all builds. | 
            <activeProfiles> <activeProfile>alwaysActiveProfile</activeProfile> <activeProfile>anotherAlwaysActiveProfile</activeProfile> 
            </activeProfiles> -->
    </settings>

    (IV)验证打开Terminal,键入

    mvn -v

    显示以下信息,Maven工具搭建成功。

    Apache Maven 3.2.5 (12a6b3acb947671f09b81f49094c53f426d8cea1; 2014-12-15T01:29:23+08:00)
    Maven home: /usr/maven/apache-maven-3.2.5
    Java version: 1.7.0_72, vendor: Oracle Corporation
    Java home: /usr/java/jdk1.7.0_72/jre
    Default locale: en_US, platform encoding: UTF-8
    OS name: "linux", version: "2.6.32-504.8.1.el6.x86_64", arch: "amd64", family: "unix"

    二、从http://spark.apache.org/downloads.html下载Spark 1.2.0源码包,解压放在/usr/spark目录下。

    三、打开Terminal,进入/usr/spark/spark-1.2.0目录,键入

    mvn -DskipTests clean package

    出现以下信息,开始编译。

    [INFO] Scanning for projects...
    Downloading: http://maven.oschina.net/content/groups/public/org/apache/apache/14/apache-14.pom
    Downloaded: http://maven.oschina.net/content/groups/public/org/apache/apache/14/apache-14.pom (15 KB at 5.6 KB/sec)
    [INFO] ------------------------------------------------------------------------
    [INFO] Reactor Build Order:
    [INFO] 
    [INFO] Spark Project Parent POM
    [INFO] Spark Project Networking
    [INFO] Spark Project Shuffle Streaming Service
    [INFO] Spark Project Core
    [INFO] Spark Project Bagel
    [INFO] Spark Project GraphX
    [INFO] Spark Project Streaming
    [INFO] Spark Project Catalyst
    [INFO] Spark Project SQL
    [INFO] Spark Project ML Library
    [INFO] Spark Project Tools
    [INFO] Spark Project Hive
    [INFO] Spark Project REPL
    [INFO] Spark Project Assembly
    [INFO] Spark Project External Twitter
    [INFO] Spark Project External Flume Sink
    [INFO] Spark Project External Flume
    [INFO] Spark Project External MQTT
    [INFO] Spark Project External ZeroMQ
    [INFO] Spark Project External Kafka
    [INFO] Spark Project Examples
    [INFO]                                                                         
    [INFO] ------------------------------------------------------------------------

    编译过程中,Maven根据情况,下载需要的文件包,受限国内网络条件,时间可能较长。过程中若因网络问题出现下载错误,再次键入编译命令,编译过程继续进行,警告可忽略。直到最后出现以下信息,编译完成。

    [INFO] ------------------------------------------------------------------------
    [INFO] Reactor Summary:
    [INFO] 
    [INFO] Spark Project Parent POM ........................... SUCCESS [35:17 min]
    [INFO] Spark Project Networking ........................... SUCCESS [16:53 min]
    [INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ 26.230 s]
    [INFO] Spark Project Core ................................. SUCCESS [32:59 min]
    [INFO] Spark Project Bagel ................................ SUCCESS [ 25.566 s]
    [INFO] Spark Project GraphX ............................... SUCCESS [01:45 min]
    [INFO] Spark Project Streaming ............................ SUCCESS [01:54 min]
    [INFO] Spark Project Catalyst ............................. SUCCESS [01:56 min]
    [INFO] Spark Project SQL .................................. SUCCESS [05:14 min]
    [INFO] Spark Project ML Library ........................... SUCCESS [03:17 min]
    [INFO] Spark Project Tools ................................ SUCCESS [ 15.841 s]
    [INFO] Spark Project Hive ................................. SUCCESS [11:33 min]
    [INFO] Spark Project REPL ................................. SUCCESS [ 54.570 s]
    [INFO] Spark Project Assembly ............................. SUCCESS [ 46.018 s]
    [INFO] Spark Project External Twitter ..................... SUCCESS [ 47.342 s]
    [INFO] Spark Project External Flume Sink .................. SUCCESS [04:54 min]
    [INFO] Spark Project External Flume ....................... SUCCESS [ 37.416 s]
    [INFO] Spark Project External MQTT ........................ SUCCESS [ 34.923 s]
    [INFO] Spark Project External ZeroMQ ...................... SUCCESS [01:05 min]
    [INFO] Spark Project External Kafka ....................... SUCCESS [02:15 min]
    [INFO] Spark Project Examples ............................. SUCCESS [11:07 min]
    [INFO] ------------------------------------------------------------------------
    [INFO] BUILD SUCCESS
    [INFO] ------------------------------------------------------------------------
    [INFO] Total time: 02:15 h
    [INFO] Finished at: 2015-01-02T17:21:15+08:00
    [INFO] Final Memory: 69M/1122M
    [INFO] ------------------------------------------------------------------------

    四、启动Spark Shell

    /usr/Spark/Spark-1.2.0目录下,键入

    ./bin/spark-shell

    出现以下信息,Spark启动成功。

    Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
    15/04/13 09:50:52 INFO SecurityManager: Changing view acls to: kevin
    15/04/13 09:50:52 INFO SecurityManager: Changing modify acls to: kevin
    15/04/13 09:50:52 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(kevin); users with modify permissions: Set(kevin)
    15/04/13 09:50:52 INFO HttpServer: Starting HTTP Server
    15/04/13 09:50:52 INFO Utils: Successfully started service 'HTTP class server' on port 55842.
    Welcome to
          ____              __
         / __/__  ___ _____/ /__
        _ / _ / _ `/ __/  '_/
       /___/ .__/\_,_/_/ /_/\_   version 1.2.0
          /_/
    
    Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_72)
    Type in expressions to have them evaluated.
    Type :help for more information.
    15/04/13 09:50:57 WARN Utils: Your hostname, localhost.localdomain resolves to a loopback address: 127.0.0.1; using 192.168.131.151 instead (on interface eth0)
    15/04/13 09:50:57 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
    15/04/13 09:50:57 INFO SecurityManager: Changing view acls to: kevin
    15/04/13 09:50:57 INFO SecurityManager: Changing modify acls to: kevin
    15/04/13 09:50:57 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(kevin); users with modify permissions: Set(kevin)
    15/04/13 09:50:58 INFO Slf4jLogger: Slf4jLogger started
    15/04/13 09:50:58 INFO Remoting: Starting remoting
    15/04/13 09:50:58 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@192.168.131.151:41278]
    15/04/13 09:50:58 INFO Utils: Successfully started service 'sparkDriver' on port 41278.
    15/04/13 09:50:58 INFO SparkEnv: Registering MapOutputTracker
    15/04/13 09:50:58 INFO SparkEnv: Registering BlockManagerMaster
    15/04/13 09:50:58 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20150413095058-f481
    15/04/13 09:50:58 INFO MemoryStore: MemoryStore started with capacity 265.4 MB
    15/04/13 09:50:59 INFO HttpFileServer: HTTP File server directory is /tmp/spark-15b2ae1c-3256-43a7-bc05-b79cb924911d
    15/04/13 09:50:59 INFO HttpServer: Starting HTTP Server
    15/04/13 09:50:59 INFO Utils: Successfully started service 'HTTP file server' on port 41609.
    15/04/13 09:50:59 INFO Utils: Successfully started service 'SparkUI' on port 4040.
    15/04/13 09:50:59 INFO SparkUI: Started SparkUI at http://192.168.131.151:4040
    15/04/13 09:50:59 INFO Executor: Using REPL class URI: http://192.168.131.151:55842
    15/04/13 09:50:59 INFO AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver@192.168.131.151:41278/user/HeartbeatReceiver
    15/04/13 09:50:59 INFO NettyBlockTransferService: Server created on 50724
    15/04/13 09:50:59 INFO BlockManagerMaster: Trying to register BlockManager
    15/04/13 09:50:59 INFO BlockManagerMasterActor: Registering block manager localhost:50724 with 265.4 MB RAM, BlockManagerId(<driver>, localhost, 50724)
    15/04/13 09:50:59 INFO BlockManagerMaster: Registered BlockManager
    15/04/13 09:50:59 INFO SparkILoop: Created spark context..
    Spark context available as sc.
    
    scala>

    最后,单机编译Spark完成!

    参考:Maven:http://maven.apache.org/

            Spark:http://spark.apache.org/

  • 相关阅读:
    DS博客作业04--图
    DS博客作业03--树
    DS博客作业02--栈和队列
    DS博客作业01--线性表
    C博客作业05--指针
    C语言博客作业04--数组
    C博客作业03--函数
    博客作业——循环结构
    C博客作业05-指针
    C博客作业04--数组
  • 原文地址:https://www.cnblogs.com/kevingu/p/4421624.html
Copyright © 2011-2022 走看看