zoukankan      html  css  js  c++  java
  • Spark学习之第一个程序 WordCount

    WordCount程序

    求下列文件中使用空格分割之后,单词出现的个数

    • input.txt
    java scala python hello world
    java pyfysf upuptop wintp top
    sfok sf sf 
    sf java android sf pyfysf upuptop 
    pyfysf upuptop java android spark
    hello world world hello top scala spark
    spark spark sql
    

    创建maven项目

    • pom.xml
    <?xml version="1.0" encoding="UTF-8"?>
    <project xmlns="http://maven.apache.org/POM/4.0.0"
             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
             xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
        <parent>
            <artifactId>SparkStudy</artifactId>
            <groupId>top.wintp.sparkstudy</groupId>
            <version>1.0-SNAPSHOT</version>
        </parent>
        <modelVersion>4.0.0</modelVersion>
    
        <artifactId>SparkCore</artifactId>
        <dependencies>
            <dependency>
                <groupId>org.apache.spark</groupId>
                <artifactId>spark-core_2.11</artifactId>
                <version>2.1.1</version>
            </dependency>
        </dependencies>
        <build>
            <finalName>WordCount</finalName>
            <plugins>
                <plugin>
                    <groupId>net.alchim31.maven</groupId>
                    <artifactId>scala-maven-plugin</artifactId>
                    <version>3.2.2</version>
                    <executions>
                        <execution>
                            <goals>
                                <goal>compile</goal>
                                <goal>testCompile</goal>
                            </goals>
                        </execution>
                    </executions>
                </plugin>
                <plugin>
                    <groupId>org.apache.maven.plugins</groupId>
                    <artifactId>maven-assembly-plugin</artifactId>
                    <version>3.0.0</version>
                    <configuration>
                        <archive>
                            <manifest>
                                <mainClass>WordCount(修改)</mainClass>
                            </manifest>
                        </archive>
                        <descriptorRefs>
                            <descriptorRef>jar-with-dependencies</descriptorRef>
                        </descriptorRefs>
                    </configuration>
                    <executions>
                        <execution>
                            <id>make-assembly</id>
                            <phase>package</phase>
                            <goals>
                                <goal>single</goal>
                            </goals>
                        </execution>
                    </executions>
                </plugin>
            </plugins>
        </build>
    
    
    </project>
    
    
    • WordCount.scala
    package top.wintp.sparkstudy.sparkcore
    
    import org.apache.spark.{SparkConf, SparkContext}
    
    /**
      * description:
      * <p>
      * author:  upuptop
      * <p>
      * qq: 337081267
      * <p>
      * CSDN:   http://blog.csdn.net/pyfysf
      * <p>
      * cnblogs:   http://www.cnblogs.com/upuptop
      * <p>
      * blog:   http://wintp.top
      * <p>
      * email:  pyfysf@163.com
      * <p>
      * time: 2019/07/2019/7/1
      * <p>
      */
    object WordCount {
      def main(args: Array[String]): Unit = {
        //    创建SparkConf  
        // setMaster local/local[n]/local[*] 都是本地运行 可以设置远程服务器的Master的地址URL
        val conf = new SparkConf().setMaster("local[*]").setAppName("WordCount")
        //    创建SparkContext
        val sc = new SparkContext(conf)
        //    根据外部文件创建RDD
        val line = sc.textFile("E:/input/input.txt")
        //    flatmap压平操作
        val words = line.flatMap(_.split(" "))
        //    map 组装键值对
        val k2v = words.map((_, 1))
        //    计算结果
        val result = k2v.reduceByKey(_ + _)
        //    保存结果数据到文件中去
        result.saveAsTextFile("E:/output/" + System.currentTimeMillis())
    
        //    关闭Context
        sc.stop()
      }
    }
    
    
    • 输出结果
      在这里插入图片描述

    按照如上配置,不会出现以下问题,如不幸出现下面描述问题,请将scal-SDK放到所有依赖的最后

    
    Exception in thread "main"
     java.lang.NoSuchMethodError: scala.Predef$.refArrayOps([Ljava/lang/Object;)
     Lscala/collection/mutable/ArrayOps;
    
    

    在这里插入图片描述

  • 相关阅读:
    json和pickle模块
    53. 最大子序和
    69. x 的平方根
    leetcode刷题周记【2020.9.21-2020.9.26】
    推荐学习 Java 的地方
    5、SpringBoot:配置文件及自动配置原理
    4、SpringBoot:运行原理探究
    3、SpringBoot:helloworld
    2、SpringBoot:什么是微服务
    1、SpringBoot:什么是SpringBoot
  • 原文地址:https://www.cnblogs.com/shaofeer/p/11154488.html
Copyright © 2011-2022 走看看