zoukankan      html  css  js  c++  java
  • spark入门

    下载spark-1.6.1-bin-hadoop2.6.tgz

    解压

    配置

    mv spark-env.sh.template spark-env.sh
    vi spark-env.sh
    在该配置文件中添加如下配置
    export JAVA_HOME=/usr/java/jdk1.7.0_45
    export SPARK_MASTER_IP=mini1
    export SPARK_MASTER_PORT=7077
    保存退出
    重命名并修改slaves.template文件
    mv slaves.template slaves
    vi slaves
    在该文件中添加子节点所在的位置(Worker节点)
    mini2
    mini3

    启动

    sbin/start-all.sh

     bin/spark-shell  启动单机版的spark-shell,不会再浏览器中看到他的信息

    //启动集群的sparkshell

    bin/spark-shell --master spark://mini1:7077 --executor-memory 512m --total-executor-cores 1

    --master spark://mini1:7077 指定Master的地址

    --executor-memory 2g 指定每个worker可用内存为2G

    --total-executor-cores 2 指定整个集群使用的cup核数为2

    wc

    bin/spark-submit --class org.apache.spark.examples.SparkPi  --master spark://mini1:7077   --total-executor-cores 1 --executor-memory 612m  lib/spark-examples-1.6.1-hadoop2.6.0.jar 50

    sc.textFile("hdfs://mini1:9000/wc/sparkInput").flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_,1).sortBy(_._2,false).saveAsTextFile("hdfs://mini1:9000/wc/sparkOutput2/")

    object WordCount {
      def main(args: Array[String]): Unit = {
        val conf = new SparkConf().setAppName("WC")
    
        val sc = new SparkContext(conf)
        sc.textFile(args(0)).flatMap(_.split(" ")).
          map((_,1)).reduceByKey(_+_).sortBy(_._2,false).saveAsTextFile(args(1))
    //    sc.textFile("hdfs://mini1:9000/wc/sparkInput").flatMap(_.split(" "))
        // .map((_,1)).reduceByKey(_+_,1).sortBy(_._2,false).saveAsTextFile("hdfs://mini1:9000/wc/sparkOutput2/")
    
    
        sc.stop()
      }
    }

    pom.xml

     
    <?xml version="1.0" encoding="UTF-8"?>
    <project xmlns="http://maven.apache.org/POM/4.0.0"
             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
             xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
        <modelVersion>4.0.0</modelVersion>
    
        <groupId>cn.my.spark</groupId>
        <artifactId>helloSpark</artifactId>
        <version>2.0</version>
    
    
        <properties>
            <maven.compiler.source>1.8</maven.compiler.source>
            <maven.compiler.target>1.8</maven.compiler.target>
            <encoding>UTF-8</encoding>
            <scala.version>2.10.6</scala.version>
            <spark.version>1.6.1</spark.version>
            <hadoop.version>2.6.4</hadoop.version>
        </properties>
    
        <dependencies>
            <dependency>
                <groupId>org.scala-lang</groupId>
                <artifactId>scala-library</artifactId>
                <version>${scala.version}</version>
            </dependency>
    
            <dependency>
                <groupId>org.apache.spark</groupId>
                <artifactId>spark-core_2.10</artifactId>
                <version>${spark.version}</version>
            </dependency>
    
            <dependency>
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-client</artifactId>
                <version>${hadoop.version}</version>
            </dependency>
        </dependencies>
    
        <build>
            <sourceDirectory>src/main/scala</sourceDirectory>
            <testSourceDirectory>src/test/scala</testSourceDirectory>
            <plugins>
                <plugin>
                    <groupId>net.alchim31.maven</groupId>
                    <artifactId>scala-maven-plugin</artifactId>
                    <version>3.2.2</version>
                    <executions>
                        <execution>
                            <goals>
                                <goal>compile</goal>
                                <goal>testCompile</goal>
                            </goals>
                            <configuration>
                                <args>
                                      <!-- scala2.11不支持这个参数,报错<arg>-make:transitive</arg>-->
                                    <arg>-make:transitive</arg>
                                    <arg>-dependencyfile</arg>
                                    <arg>${project.build.directory}/.scala_dependencies</arg>
                                </args>
                            </configuration>
                        </execution>
                    </executions>
                </plugin>
    
                <plugin>
                    <groupId>org.apache.maven.plugins</groupId>
                    <artifactId>maven-shade-plugin</artifactId>
                    <version>2.4.3</version>
                    <executions>
                        <execution>
                            <phase>package</phase>
                            <goals>
                                <goal>shade</goal>
                            </goals>
                            <configuration>
                                <filters>
                                    <filter>
                                        <artifact>*:*</artifact>
                                        <excludes>
                                            <exclude>META-INF/*.SF</exclude>
                                            <exclude>META-INF/*.DSA</exclude>
                                            <exclude>META-INF/*.RSA</exclude>
                                        </excludes>
                                    </filter>
                                </filters>
                                <transformers>
                                    <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                                        <!--<mainClass>cn.my.spark.WordCount</mainClass>-->
                                        <mainClass></mainClass>
                                    </transformer>
                                </transformers>
                            </configuration>
                        </execution>
                    </executions>
                </plugin>
            </plugins>
        </build>
    
    </project>
  • 相关阅读:
    Spring AOP详解 、 JDK动态代理、CGLib动态代理
    mysql 日期 字符串 时间戳转换
    图文:通过sql server 连接mysql
    c# 数据绑定之 DataFormatString 格式
    sql 截取字符串与 截取字符串最长的字符串
    oracle 清除表空间
    sql 遍历结果print和表格形式
    国家与城市的sql
    sql2005 将一列的多行内容拼接成一行
    oracle和mssql中复制表的比较
  • 原文地址:https://www.cnblogs.com/rocky-AGE-24/p/7222004.html
Copyright © 2011-2022 走看看