zoukankan      html  css  js  c++  java
  • spark读取文件单词统计 local模式

    读取文件中的单词进行聚合统计

    pom.xml:

    <properties>
            <scala.version>2.11.8</scala.version>
            <spark.version>2.2.0</spark.version>
        </properties>
    
        <dependencies>
            <dependency>
                <groupId>org.scala-lang</groupId>
                <artifactId>scala-library</artifactId>
                <version>${scala.version}</version>
            </dependency>
            <dependency>
                <groupId>org.apache.spark</groupId>
                <artifactId>spark-core_2.11</artifactId>
                <version>${spark.version}</version>
            </dependency>
            <dependency>
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-client</artifactId>
                <version>2.6.0</version>
            </dependency>
    
            <dependency>
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-common</artifactId>
                <version>2.6.0</version>
            </dependency>
    
            <dependency>
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-hdfs</artifactId>
                <version>2.6.0</version>
            </dependency>
        </dependencies>
    
        <build>
            <sourceDirectory>src/main/scala</sourceDirectory>
            <testSourceDirectory>src/test/scala</testSourceDirectory>
            <plugins>
                <!--指定java版本-->
                <plugin>
                    <groupId>org.apache.maven.plugins</groupId>
                    <artifactId>maven-compiler-plugin</artifactId>
                    <version>3.0</version>
                    <configuration>
                        <source>1.8</source>
                        <target>1.8</target>
                        <encoding>UTF-8</encoding>
                    </configuration>
                </plugin>
    
                <!--scala依赖插件,为scala提供支持-->
                <plugin>
                    <groupId>net.alchim31.maven</groupId>
                    <artifactId>scala-maven-plugin</artifactId>
                    <version>3.2.0</version>
                    <executions>
                        <execution>
                            <goals>
                                <goal>compile</goal>
                                <goal>testCompile</goal>
                            </goals>
                            <configuration>
                                <args>
                                    <arg>-dependencyfile</arg>
                                    <arg>${project.build.directory}/.scala_dependencies</arg>
                                </args>
                            </configuration>
                        </execution>
                    </executions>
                </plugin>
    
                <!--把所有jar包集成到一个jar包中-->
                <plugin>
                    <groupId>org.apache.maven.plugins</groupId>
                    <artifactId>maven-shade-plugin</artifactId>
                    <version>3.1.1</version>
                    <executions>
                        <execution>
                            <phase>package</phase>
                            <goals>
                                <goal>shade</goal>
                            </goals>
                            <configuration>
    <--!去掉META-INF文件中可能出现的非法签名文件--> <filters> <filter> <artifact>*:*</artifact> <excludes> <exclude>META-INF/*.SF</exclude> <exclude>META-INF/*.DSA</exclude> <exclude>META-INF/*.RSA</exclude> </excludes> </filter> </filters> <transformers> <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer"> <mainClass>com.test.WordCount</mainClass> </transformer> </transformers> </configuration> </execution> </executions> </plugin> </plugins> </build>

    WordCount:

    object WordCount {
      def main(args: Array[String]): Unit = {
        val cof = new SparkConf().setMaster("local[2]").setAppName("wordcount")
    
        val sc = new SparkContext(cof)
    
        val rdd1 = sc.textFile("data/wordcount")
    
        val rdd2 = rdd1.flatMap(item => item.split(" "))
    
        val rdd3 = rdd2.map(item => (item,1))
    
        val rdd4 = rdd3.reduceByKey((curr,agg) => (curr+agg))
    
        val result = rdd4.collect()
    
        result.foreach(item => print(item))
      }
    

     

  • 相关阅读:
    手把手教会你如何通过C#创建Windows Service
    推荐几款软件界面模型设计工具
    visual studio 2010小技巧
    C# 枚举在属性中运用
    C# Stream 和 byte[] 之间的转换
    推荐一款DataGridView的打印解决方案
    VB提高专辑VB编写自定义类(下)
    vb 怎么把长整型转字符串
    Android NAND: nand_dev_load_disk_state, restore failed: size required (3546398242485400641) exceeds device limit (6920
    VB中各种类型的转换
  • 原文地址:https://www.cnblogs.com/chong-zuo3322/p/12910293.html
Copyright © 2011-2022 走看看