zoukankan      html  css  js  c++  java
  • Spark学习之第一个程序 WordCount

    WordCount程序

    求下列文件中使用空格分割之后,单词出现的个数

    • input.txt
    java scala python hello world
    java pyfysf upuptop wintp top
    sfok sf sf 
    sf java android sf pyfysf upuptop 
    pyfysf upuptop java android spark
    hello world world hello top scala spark
    spark spark sql
    

    创建maven项目

    • pom.xml
    <?xml version="1.0" encoding="UTF-8"?>
    <project xmlns="http://maven.apache.org/POM/4.0.0"
             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
             xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
        <parent>
            <artifactId>SparkStudy</artifactId>
            <groupId>top.wintp.sparkstudy</groupId>
            <version>1.0-SNAPSHOT</version>
        </parent>
        <modelVersion>4.0.0</modelVersion>
    
        <artifactId>SparkCore</artifactId>
        <dependencies>
            <dependency>
                <groupId>org.apache.spark</groupId>
                <artifactId>spark-core_2.11</artifactId>
                <version>2.1.1</version>
            </dependency>
        </dependencies>
        <build>
            <finalName>WordCount</finalName>
            <plugins>
                <plugin>
                    <groupId>net.alchim31.maven</groupId>
                    <artifactId>scala-maven-plugin</artifactId>
                    <version>3.2.2</version>
                    <executions>
                        <execution>
                            <goals>
                                <goal>compile</goal>
                                <goal>testCompile</goal>
                            </goals>
                        </execution>
                    </executions>
                </plugin>
                <plugin>
                    <groupId>org.apache.maven.plugins</groupId>
                    <artifactId>maven-assembly-plugin</artifactId>
                    <version>3.0.0</version>
                    <configuration>
                        <archive>
                            <manifest>
                                <mainClass>WordCount(修改)</mainClass>
                            </manifest>
                        </archive>
                        <descriptorRefs>
                            <descriptorRef>jar-with-dependencies</descriptorRef>
                        </descriptorRefs>
                    </configuration>
                    <executions>
                        <execution>
                            <id>make-assembly</id>
                            <phase>package</phase>
                            <goals>
                                <goal>single</goal>
                            </goals>
                        </execution>
                    </executions>
                </plugin>
            </plugins>
        </build>
    
    
    </project>
    
    
    • WordCount.scala
    package top.wintp.sparkstudy.sparkcore
    
    import org.apache.spark.{SparkConf, SparkContext}
    
    /**
      * description:
      * <p>
      * author:  upuptop
      * <p>
      * qq: 337081267
      * <p>
      * CSDN:   http://blog.csdn.net/pyfysf
      * <p>
      * cnblogs:   http://www.cnblogs.com/upuptop
      * <p>
      * blog:   http://wintp.top
      * <p>
      * email:  pyfysf@163.com
      * <p>
      * time: 2019/07/2019/7/1
      * <p>
      */
    object WordCount {
      def main(args: Array[String]): Unit = {
        //    创建SparkConf  
        // setMaster local/local[n]/local[*] 都是本地运行 可以设置远程服务器的Master的地址URL
        val conf = new SparkConf().setMaster("local[*]").setAppName("WordCount")
        //    创建SparkContext
        val sc = new SparkContext(conf)
        //    根据外部文件创建RDD
        val line = sc.textFile("E:/input/input.txt")
        //    flatmap压平操作
        val words = line.flatMap(_.split(" "))
        //    map 组装键值对
        val k2v = words.map((_, 1))
        //    计算结果
        val result = k2v.reduceByKey(_ + _)
        //    保存结果数据到文件中去
        result.saveAsTextFile("E:/output/" + System.currentTimeMillis())
    
        //    关闭Context
        sc.stop()
      }
    }
    
    
    • 输出结果
      在这里插入图片描述

    按照如上配置,不会出现以下问题,如不幸出现下面描述问题,请将scal-SDK放到所有依赖的最后

    
    Exception in thread "main"
     java.lang.NoSuchMethodError: scala.Predef$.refArrayOps([Ljava/lang/Object;)
     Lscala/collection/mutable/ArrayOps;
    
    

    在这里插入图片描述

  • 相关阅读:
    SQLServer性能诊断与调优
    (转).NET面试题整理之基础篇
    (转)[茗洋芳竹]程序员常用不常见很难得的地址大全,博主很辛苦
    (转)页面过度动画效果大集合
    (转)软件开发和团队”最小模式”初探2-6人模型(下)
    silverlight 乐动魔方 实战九 .
    (转)js+flash实现手写输入功能特效
    (转)软件开发和团队”最小模式”初探2-6人模型(上)
    (转)我眼中的PM
    silverlight 乐动魔方 实战十 .
  • 原文地址:https://www.cnblogs.com/shaofeer/p/11154488.html
Copyright © 2011-2022 走看看