zoukankan      html  css  js  c++  java
  • Flink Batch File Word Count

    POM文件

        <dependencies>
            <dependency>
                <groupId>org.apache.flink</groupId>
                <artifactId>flink-scala_2.11</artifactId>
                <version>1.10.2</version>
            </dependency>
            <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-streaming-scala -->
            <dependency>
                <groupId>org.apache.flink</groupId>
                <artifactId>flink-streaming-scala_2.11</artifactId>
                <version>1.10.2</version>
            </dependency>
            <dependency>
                <groupId>org.apache.flink</groupId>
                <artifactId>flink-connector-kafka-0.11_2.11</artifactId>
                <version>1.10.1</version>
            </dependency>
        </dependencies>
        <build>
            <plugins>
                <!-- 该插件用于将Scala 代码编译成class 文件-->
                <plugin>
                    <groupId>net.alchim31.maven</groupId>
                    <artifactId>scala-maven-plugin</artifactId>
                    <version>3.4.6</version>
                    <executions>
                        <execution>
                            <!-- 声明绑定到maven 的compile 阶段-->
                            <goals>
                                <goal>compile</goal>
                            </goals>
                        </execution>
                    </executions>
                </plugin>
                <plugin>
                    <groupId>org.apache.maven.plugins</groupId>
                    <artifactId>maven-assembly-plugin</artifactId>
                    <version>3.0.0</version>
                    <configuration>
                        <descriptorRefs>
                            <descriptorRef>jar-with-dependencies</descriptorRef>
                        </descriptorRefs>
                    </configuration>
                    <executions>
                        <execution>
                            <id>make-assembly</id>
                            <phase>package</phase>
                            <goals>
                                <goal>single</goal>
                            </goals>
                        </execution>
                    </executions>
                </plugin>
            </plugins>
        </build>

    源码:

    package com.kpwong.wc
    
    import org.apache.flink.api.scala.ExecutionEnvironment
    import org.apache.flink.api.scala._
    
    
    //批处理的wordcount
    object wordCount {
      def main(args: Array[String]): Unit = {
    
        //创建一个批处理执行环境
        val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment
    
        //从文件中读取数据
        val inputPath = "G:\flinkdemo\FlinkTutorial\src\main\resources\hello.txt"
    
        val inputDataSet: DataSet[String] = env.readTextFile(inputPath)
    
        val resultDS: AggregateDataSet[(String, Int)] = inputDataSet.flatMap(_.split(" ")).map((_,1)).groupBy(0).sum(1)
    
        resultDS.print()
      }
    }
    

     

     

  • 相关阅读:
    python 换行符的识别问题,Unix 和Windows 中是不一样的
    MaxCompute小文件问题优化方案
    MaxCompute小文件问题优化方案
    C++ 中的sort()排序函数用法
    C++ 中的sort()排序函数用法
    简单记录几个有用的sql查询
    bzoj1084(SCOI2005)最大子矩阵
    bzoj1025(SCOI2009)游戏——唯一分解的思路与应用
    bzoj1087互不侵犯King(状压)
    bzoj2748(HAOI2018)音量调节
  • 原文地址:https://www.cnblogs.com/kpwong/p/14084151.html
Copyright © 2011-2022 走看看