zoukankan      html  css  js  c++  java
  • Spark学习之第一个程序 WordCount

    WordCount程序

    求下列文件中使用空格分割之后,单词出现的个数

    • input.txt
    java scala python hello world
    java pyfysf upuptop wintp top
    sfok sf sf 
    sf java android sf pyfysf upuptop 
    pyfysf upuptop java android spark
    hello world world hello top scala spark
    spark spark sql
    

    创建maven项目

    • pom.xml
    <?xml version="1.0" encoding="UTF-8"?>
    <project xmlns="http://maven.apache.org/POM/4.0.0"
             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
             xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
        <parent>
            <artifactId>SparkStudy</artifactId>
            <groupId>top.wintp.sparkstudy</groupId>
            <version>1.0-SNAPSHOT</version>
        </parent>
        <modelVersion>4.0.0</modelVersion>
    
        <artifactId>SparkCore</artifactId>
        <dependencies>
            <dependency>
                <groupId>org.apache.spark</groupId>
                <artifactId>spark-core_2.11</artifactId>
                <version>2.1.1</version>
            </dependency>
        </dependencies>
        <build>
            <finalName>WordCount</finalName>
            <plugins>
                <plugin>
                    <groupId>net.alchim31.maven</groupId>
                    <artifactId>scala-maven-plugin</artifactId>
                    <version>3.2.2</version>
                    <executions>
                        <execution>
                            <goals>
                                <goal>compile</goal>
                                <goal>testCompile</goal>
                            </goals>
                        </execution>
                    </executions>
                </plugin>
                <plugin>
                    <groupId>org.apache.maven.plugins</groupId>
                    <artifactId>maven-assembly-plugin</artifactId>
                    <version>3.0.0</version>
                    <configuration>
                        <archive>
                            <manifest>
                                <mainClass>WordCount(修改)</mainClass>
                            </manifest>
                        </archive>
                        <descriptorRefs>
                            <descriptorRef>jar-with-dependencies</descriptorRef>
                        </descriptorRefs>
                    </configuration>
                    <executions>
                        <execution>
                            <id>make-assembly</id>
                            <phase>package</phase>
                            <goals>
                                <goal>single</goal>
                            </goals>
                        </execution>
                    </executions>
                </plugin>
            </plugins>
        </build>
    
    
    </project>
    
    
    • WordCount.scala
    package top.wintp.sparkstudy.sparkcore
    
    import org.apache.spark.{SparkConf, SparkContext}
    
    /**
      * description:
      * <p>
      * author:  upuptop
      * <p>
      * qq: 337081267
      * <p>
      * CSDN:   http://blog.csdn.net/pyfysf
      * <p>
      * cnblogs:   http://www.cnblogs.com/upuptop
      * <p>
      * blog:   http://wintp.top
      * <p>
      * email:  pyfysf@163.com
      * <p>
      * time: 2019/07/2019/7/1
      * <p>
      */
    object WordCount {
      def main(args: Array[String]): Unit = {
        //    创建SparkConf  
        // setMaster local/local[n]/local[*] 都是本地运行 可以设置远程服务器的Master的地址URL
        val conf = new SparkConf().setMaster("local[*]").setAppName("WordCount")
        //    创建SparkContext
        val sc = new SparkContext(conf)
        //    根据外部文件创建RDD
        val line = sc.textFile("E:/input/input.txt")
        //    flatmap压平操作
        val words = line.flatMap(_.split(" "))
        //    map 组装键值对
        val k2v = words.map((_, 1))
        //    计算结果
        val result = k2v.reduceByKey(_ + _)
        //    保存结果数据到文件中去
        result.saveAsTextFile("E:/output/" + System.currentTimeMillis())
    
        //    关闭Context
        sc.stop()
      }
    }
    
    
    • 输出结果
      在这里插入图片描述

    按照如上配置,不会出现以下问题,如不幸出现下面描述问题,请将scal-SDK放到所有依赖的最后

    
    Exception in thread "main"
     java.lang.NoSuchMethodError: scala.Predef$.refArrayOps([Ljava/lang/Object;)
     Lscala/collection/mutable/ArrayOps;
    
    

    在这里插入图片描述

  • 相关阅读:
    EFCore
    PS-邮件发送异常信息
    python-Django
    Autofac
    swagger
    查看哪个程序占用了端口
    SQL SERVER-系统数据库还原
    破解root密码
    WebApi路由
    async,await.task
  • 原文地址:https://www.cnblogs.com/shaofeer/p/11154488.html
Copyright © 2011-2022 走看看