zoukankan      html  css  js  c++  java
  • sparkstreaming在yarn运行

    sparkstreaming在yarn运行

    • idea
    Maven->Lifecycle->package
    
    • 将jar包传入服务器

    • 执行spark-submit命令提交yarn

    spark-submit 
    --class cn.ruige.data.genderalStat.gemeralStat.HistoryGenderTotal 
    --master yarn 
    --deploy-mode cluster 
    --queue default 
    --executor-memory 2g 
    --executor-cores 2  
    --jars /opt/rely_jar/mysql-connector-java-5.1.38.jar ./datas_eagle-1.0-SNAPSHOT-jar-with-dependencies.jar /opt/sparkstream_jar/config.properties historyGender groupGender
    # --class 指定运行方法
    # --master 提交任务到哪里执行
    	yarn
    	spark://<host>:<port>
    	local
    # --deploy-mode 启动模式
    	client 本地启动
    	cluster 集群模式
    # --queue yarn上队列名称
    # --executor-memory 每个executor的内存 默认1G
    # --executor-cores CPU核数
    # --jars 指定jar包,以逗号分隔
    	本地文件 /opt/rely_jar/mysql-connector-java-5.1.38.jar
    	也可以: hdfs:, http:, https:, ftp: executor直接从URL拉回文件
    # ./datas_eagle-1.0-SNAPSHOT-jar-with-dependencies.jar 为自己打包jar包,这里输入本地目录也可以上传指定hdfs
    	hdfs://master:9000/user/spark/jars/datas_eagle-1.0-SNAPSHOT-jar-with-dependencies.jar
    # 其他配置
    --packages  包含在driver和executor的 classpath中的jar的maven坐标
    	mysql:mysql-connector-java:5.1.38
    	org.apache.spark:spark-streaming-kafka-0-10_2.12:2.4.8
    

    常见报错

    • Exception in thread "main" java.lang.NoSuchMethodError: org.apa
    在pom.xml中scala version必须与服务器scala版本一致才行
    

    yarn常见操作

    # 显示正在运行
    yarn application -list
    # 显示所有
    yarn application -list -appStates ALL
    # 列出app id 错误
    yarn logs -applicationId [appid]
    # 删除task
    yarn application -kill [appid]
    

    pom依赖

    <?xml version="1.0" encoding="UTF-8"?>
    <project xmlns="http://maven.apache.org/POM/4.0.0"
             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
             xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
        <modelVersion>4.0.0</modelVersion>
    
        <groupId>org.example</groupId>
        <artifactId>datas_eagle</artifactId>
        <version>1.0-SNAPSHOT</version>
    
        <properties>
            <scala.version>2.11.12</scala.version>
            <spark.version>2.4.8</spark.version>
            <kafka.version>0.11.0.3</kafka.version>
    <!--        <scala.binary.version>2.12.12</scala.binary.version>-->
        </properties>
        <dependencies>
            <dependency>
                <groupId>org.apache.kafka</groupId>
                <artifactId>kafka-clients</artifactId>
                <version>2.4.0</version>
            </dependency>
    <!--        <dependency>-->
    <!--            <groupId>org.apache.kafka</groupId>-->
    <!--            <artifactId>kafka-clients</artifactId>-->
    <!--            <version>0.11.0.3</version>-->
    <!--            <scope>provided</scope>-->
    <!--        </dependency>-->
            <dependency>
                <groupId>org.scala-lang</groupId>
                <artifactId>scala-library</artifactId>
                <version>${scala.version}</version>
            </dependency>
            <dependency>
                <groupId>org.apache.spark</groupId>
                <artifactId>spark-core_2.11</artifactId>
                <version>${spark.version}</version>
                <scope>provided</scope>
            </dependency>
            <dependency>
                <groupId>com.google.code.gson</groupId>
                <artifactId>gson</artifactId>
                <version>2.2.4</version>
                <scope>provided</scope>
            </dependency>
            <dependency>
                <groupId>org.apache.spark</groupId>
                <artifactId>spark-core_2.11</artifactId>
                <version>${spark.version}</version>
                <scope>provided</scope>
            </dependency>
            <dependency>
                <groupId>org.apache.spark</groupId>
                <artifactId>spark-sql_2.11</artifactId>
                <version>${spark.version}</version>
                <scope>provided</scope>
            </dependency>
            <dependency>
                <groupId>mysql</groupId>
                <artifactId>mysql-connector-java</artifactId>
                <version>5.1.38</version>
                <scope>provided</scope>
            </dependency>
            <dependency>
                <groupId>org.apache.spark</groupId>
                <artifactId>spark-streaming_2.11</artifactId>
                <version>${spark.version}</version>
                <scope>provided</scope>
            </dependency>
            <dependency>
                <groupId>org.apache.spark</groupId>
                <artifactId>spark-streaming-kafka-0-10_2.11</artifactId>
                <version>${spark.version}</version>
                <scope>provided</scope>
            </dependency>
        </dependencies>
        <pluginRepositories>
            <pluginRepository>
                <id>ali-plugin</id>
                <url>http://maven.aliyun.com/nexus/content/groups/public/</url>
                <snapshots>
                    <enabled>true</enabled>
                </snapshots>
            </pluginRepository>
        </pluginRepositories>
        <build>
            <plugins>
                <!-- 指定编译java的插件 -->
                <plugin>
                    <groupId>org.apache.maven.plugins</groupId>
                    <artifactId>maven-compiler-plugin</artifactId>
                    <version>3.5.1</version>
                    <configuration>
                        <source>1.8</source>
                        <target>1.8</target>
                    </configuration>
                </plugin>
                <!-- 指定编译scala的插件 -->
                <plugin>
                    <groupId>net.alchim31.maven</groupId>
                    <artifactId>scala-maven-plugin</artifactId>
                    <version>3.2.2</version>
                    <executions>
                        <execution>
                            <goals>
                                <goal>compile</goal>
                                <goal>testCompile</goal>
                            </goals>
                            <configuration>
                                <args>
                                    <arg>-dependencyfile</arg>
                                    <arg>${project.build.directory}/.scala_dependencies</arg>
                                </args>
                            </configuration>
                        </execution>
                    </executions>
                </plugin>
                <!-- Maven Assembly Plugin -->
                <plugin>
                    <groupId>org.apache.maven.plugins</groupId>
                    <artifactId>maven-assembly-plugin</artifactId>
                    <version>2.4.1</version>
                    <configuration>
                        <!-- get all project dependencies -->
                        <descriptorRefs>
                            <descriptorRef>jar-with-dependencies</descriptorRef>
                        </descriptorRefs>
                        <!-- MainClass in mainfest make a executable jar -->
                        <archive>
                            <manifest>
                                <!--<mainClass>util.Microseer</mainClass>-->
                            </manifest>
                        </archive>
    
                    </configuration>
                    <executions>
                        <execution>
                            <id>make-assembly</id>
                            <!-- bind to the packaging phase -->
                            <phase>package</phase>
                            <goals>
                                <goal>single</goal>
                            </goals>
                        </execution>
                    </executions>
                </plugin>
            </plugins>
        </build>
    <!--    <repositories>-->
    <!--        <repository>-->
    <!--            <id>maven-ali</id>-->
    <!--            <url>http://maven.aliyun.com/nexus/content/groups/public//</url>-->
    <!--            <releases>-->
    <!--                <enabled>true</enabled>-->
    <!--            </releases>-->
    <!--            <snapshots>-->
    <!--                <enabled>true</enabled>-->
    <!--                <updatePolicy>always</updatePolicy>-->
    <!--                <checksumPolicy>fail</checksumPolicy>-->
    <!--            </snapshots>-->
    <!--        </repository>-->
    <!--    </repositories>-->
    </project>
    
  • 相关阅读:
    设计模式之里氏替换原则
    设计模式之依赖倒置原则讲解
    条款10 若不想使用编译器自动生成的函数,就该明确拒绝
    Django---常用字段和参数
    Python中abc
    Python中鸭子类型
    Python多继承的正确打开方式:mixins机制
    python新式类和经典类的区别
    Django---drf权限、频率、过滤、排序、异常处理
    删库跑路技巧 删库跑路命令
  • 原文地址:https://www.cnblogs.com/xujunkai/p/15116789.html
Copyright © 2011-2022 走看看