zoukankan      html  css  js  c++  java
  • Spark Batch Demo

    原创转载请注明出处:https://www.cnblogs.com/agilestyle/p/14700811.html

    Project Directory

    Maven Dependency

    <?xml version="1.0" encoding="UTF-8"?>
    <project xmlns="http://maven.apache.org/POM/4.0.0"
             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
             xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
        <modelVersion>4.0.0</modelVersion>
    
        <groupId>org.fool</groupId>
        <artifactId>hellospark</artifactId>
        <version>1.0-SNAPSHOT</version>
    
        <properties>
            <maven.compiler.source>8</maven.compiler.source>
            <maven.compiler.target>8</maven.compiler.target>
        </properties>
    
        <dependencies>
            <dependency>
                <groupId>org.apache.spark</groupId>
                <artifactId>spark-core_2.12</artifactId>
                <version>3.1.1</version>
            </dependency>
    
            <dependency>
                <groupId>org.apache.spark</groupId>
                <artifactId>spark-streaming_2.12</artifactId>
                <version>3.1.1</version>
            </dependency>
    
            <dependency>
                <groupId>org.apache.spark</groupId>
                <artifactId>spark-sql_2.12</artifactId>
                <version>3.1.1</version>
            </dependency>
        </dependencies>
    
        <build>
            <pluginManagement>
                <plugins>
                    <plugin>
                        <groupId>org.apache.maven.plugins</groupId>
                        <artifactId>maven-compiler-plugin</artifactId>
                        <version>3.8.1</version>
                    </plugin>
                </plugins>
            </pluginManagement>
        </build>
        
    </project>

    SpringBatchDemo.scala

    package org.fool.spark
    
    import org.apache.spark.sql.{SaveMode, SparkSession}
    
    object SparkBatchDemo {
      def main(args: Array[String]): Unit = {
        val hdfsSourcePath = "hdfs://127.0.0.1:9000/input/people.json"
    
        val sparkSession = SparkSession.builder().appName("SparkBatchDemo").master("local").getOrCreate();
    
        val df = sparkSession.read.json(hdfsSourcePath)
    
        df.show()
        df.printSchema()
    
        df.select("name").show()
        df.groupBy("age").count().show()
    
        val hdfsOutputPath = "hdfs://127.0.0.1:9000/output/"
    
        df.write.mode(SaveMode.Overwrite).json(hdfsOutputPath)
    
        sparkSession.close()
      }
    }

    Note: 先从HDFS读,然后show,最后再写回HDFS

    people.json

    {"name":"Caocao","create_time":"2021-06-22 01:45:52.478","update_time":"2021-06-22 02:45:52.478"}
    {"name":"Liubei", "age":30,"create_time":"2021-06-22 01:45:52.478","update_time":"2021-06-22 02:45:52.478"}
    {"name":"Guanyu", "age":19,"create_time":"2021-06-22 01:45:52.478","update_time":"2021-06-22 02:45:52.478"}
    {"name":"Zhangfei", "age":29,"create_time":"2021-06-22 01:45:52.478","update_time":"2021-06-22 02:45:52.478"}

    执行前需要把people.json文件上传到HDFS上

    cd $HADOOP_HOME/bin
    ./hdfs dfs -put ~/Desktop/people.json  /input

    Run

    HDFS Output

    查看HDFS中output目录下的结果

    强者自救 圣者渡人
  • 相关阅读:
    28完全背包+扩展欧几里得(包子凑数)
    HDU 3527 SPY
    POJ 3615 Cow Hurdles
    POJ 3620 Avoid The Lakes
    POJ 3036 Honeycomb Walk
    HDU 2352 Verdis Quo
    HDU 2368 Alfredo's Pizza Restaurant
    HDU 2700 Parity
    HDU 3763 CDs
    POJ 3279 Fliptile
  • 原文地址:https://www.cnblogs.com/agilestyle/p/14700811.html
Copyright © 2011-2022 走看看