zoukankan      html  css  js  c++  java
  • intellij-idea打包Scala代码在spark中运行

    、创建好Maven项目之后(记得添加Scala框架到该项目),修改pom.xml文件,添加如下内容:

    <properties>
            <spark.version>2.1.1</spark.version>
            <scala.version>2.11</scala.version>
        </properties>
    
    
        <dependencies>
            <dependency>
                <groupId>org.apache.spark</groupId>
                <artifactId>spark-core_${scala.version}</artifactId>
                <version>${spark.version}</version>
            </dependency>
            <dependency>
                <groupId>org.apache.spark</groupId>
                <artifactId>spark-streaming_${scala.version}</artifactId>
                <version>${spark.version}</version>
            </dependency>
            <dependency>
                <groupId>org.apache.spark</groupId>
                <artifactId>spark-sql_${scala.version}</artifactId>
                <version>${spark.version}</version>
            </dependency>
            <dependency>
                <groupId>org.apache.spark</groupId>
                <artifactId>spark-hive_${scala.version}</artifactId>
                <version>${spark.version}</version>
            </dependency>
            <dependency>
                <groupId>org.apache.spark</groupId>
                <artifactId>spark-mllib_${scala.version}</artifactId>
                <version>${spark.version}</version>
            </dependency>
    
        </dependencies>
    
        <build>
            <plugins>
    
                <plugin>
                    <groupId>org.scala-tools</groupId>
                    <artifactId>maven-scala-plugin</artifactId>
                    <executions>
                        <execution>
                            <goals>
                                <goal>compile</goal>
                                <goal>testCompile</goal>
                            </goals>
                        </execution>
                    </executions>
                    <configuration>
                        <scalaVersion>${scala.version}</scalaVersion>
                        <args>
                            <arg>-target:jvm-1.5</arg>
                        </args>
                    </configuration>
                </plugin>
    
                <plugin>
                    <artifactId>maven-compiler-plugin</artifactId>
                    <version>3.6.0</version>
                    <configuration>
                        <source>1.8</source>
                        <target>1.8</target>
                    </configuration>
                </plugin>
    
                <plugin>
                    <groupId>org.apache.maven.plugins</groupId>
                    <artifactId>maven-surefire-plugin</artifactId>
                    <version>2.19</version>
                    <configuration>
                        <skip>true</skip>
                    </configuration>
                </plugin>
    
            </plugins>
        </build>

    其中保存之后,需要点击下面的import change,这样相当于是下载jar包

    二、编写一个Scala程序,统计单词的个数

    import org.apache.spark.SparkConf
    import org.apache.spark.SparkContext
    
    object WordCount {
      def main(args: Array[String]) {
        if (args.length == 0) {
          System.err.println("Usage: spark.example.WordCount <input> <output>")
          System.exit(1)
        }
        
        val input_path = args(0).toString
        val output_path = args(1).toString
        
        val conf = new SparkConf().setAppName("WordCount")
        conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
        
        val sc = new SparkContext(conf)
        val inputFile = sc.textFile(input_path)
        val countResult = inputFile.flatMap(line => line.split(" "))
        .map(word => (word, 1))
        .reduceByKey(_ + _)
        .map(x => x._1 + "	" + x._2)
        .saveAsTextFile(output_path)
      }
    }

    三、打包

    file->Porject Structure->Artifacts->绿色的加号->JAR->from modules...

    然后填写定义的类名,选择copy to..选项(打包这一个类)

    点击ok之后,然后build->build Artifacts->build,等待build完成。然后可以在项目的这个目录中找到刚刚打包的这个jar包

    四、运行在spark集群上面

    1. 把jar包放到能访问spark集群的机器上面

    2. 运行

    /usr/local/spark/bin/spark-submit --class WordCount --master spark://master:7077 /data/wangzai/package/WordCount.jar 
    hdfs://master:9000/spark/test.data hdfs://master:9000/spark_output/spark_wordcount 
    --executor-memory 1G 
    --executor-cores 1 
    --num-executors 10
    

    3. 结果

  • 相关阅读:
    java多线程基本概述(三)——同步块
    java多线程基本概述(三)——同步方法
    java多线程基本概述(二)——Thread的一些方法
    java多线程基本概述(一)——线程的基本认知
    【linux命令】时间
    【tips】linux中单双引号,单双括号与反引号的区别
    【每天一个linux命令】wget
    【每天一个linux命令】cut
    【每天一个linux命令】du
    【每天一个linux命令】grep
  • 原文地址:https://www.cnblogs.com/654wangzai321/p/9513488.html
Copyright © 2011-2022 走看看