zoukankan      html  css  js  c++  java
  • Spark- 使用第三方依赖解析IP地址

    使用 github上已有的开源项目
    1)git clone https://github.com/wzhe06/ipdatabase.git

    2)编译下载的项目: mvn clean package- DskipTests

    3)安装jar包到自己的 maven仓库

      mvn install: install-file -Dfile=${编译的jar包路径}/target/ipdatabase-1.0-SNAPSHOT jar  -DgroupId=com.ggstar  -DartifactId=ipdatabase   -Dversion=1.0  -Dpackaging=jar

    4)添加依赖到pom

    <dependency>
      <groupId>com.ggstar<groupId>
      <artifactId>ipdatabase</artifactId>
      <version>1.0</version>
    </dependency>
    <dependency>
      <groupId>org.apache.poi</groupId>
      <cartifactId>poi-ooxml</artifactId>
      <version>3.14</version>
    </dependency>
    <dependency>
      <groupId>org.apache.poi</groupId> 
      <cartifactId>poi
    </artifactId>
     <version>3.14</version>
    </dependency>

    5)将源码main/resource下的ipDatabase.csv和ipRegion.xlxs拷贝到当前项目的resource目录下

    6)ip解析工具类

    /**
      * IP解析工具类
      */
    object IpUtils {
    
      def getCity(ip:String): Unit ={
        IpHepler.findRegionByIp(ip)
      }
    }

    7)打包到yarn运行

    在pom文件排除spark打包,因为环境上有。

    <!--scala 依赖-->
        <dependency>
          <groupId>org.scala-lang</groupId>
          <artifactId>scala-library</artifactId>
          <version>${scala.version}</version>
          <scope>provided</scope>
        </dependency>
        <!--SparkSQL-->
        <dependency>
          <groupId>org.apache.spark</groupId>
          <artifactId>spark-sql_2.11</artifactId>
          <version>${spark.version}</version>
          <scope>provided</scope>
        </dependency>
    
        <dependency>
          <groupId>org.apache.spark</groupId>
          <artifactId>spark-hive_2.11</artifactId>
          <version>${spark.version}</version>
          <scope>provided</scope>
        </dependency>

    打包时注意,pom.xml中需要添加如下plugin

    <plugin>
            <artifactId>maven-assembly-plugin</artifactId>
            <configuration>
              <archive>
                <mainfest>
                  <mainClass></mainClass>
                </mainfest>
              </archive>
              <descriptRefs>
                <descriptRef>
                  jar-with-dependencies
                </descriptRef>
              </descriptRefs>
            </configuration>
          </plugin>
          <plugin>

    提交运行

    /bin/spark-submit
    class com.rz.log.SparkstatcleanJobYARN
    --name SparkstatcleanJobYARN
    --master yarn
    --executor-memory 1G
    --num-executors 1
    --files /home/hadoop/Lib/ipDatabase. CSV, /home/hadoop/lib/ipRegion XlSx 
    /home/hadoop/lib/sql-1.0-jar-with-dependencies.jar 
    hdfs://hadoop001:8020/imooc/input/* hdfs://hadoop001: 8020/imooc/clean
  • 相关阅读:
    对于HTML页面中CSS, JS, HTML的加载与执行过程的简单分析
    JavaScript中call,apply,bind方法的总结
    彻底理解js中this的指向,不必硬背
    cookie
    Cookie深度解析
    cookie和localstorage
    单页面应用和多页面应用
    can't access lexical declaration `a' before initialization
    http协议
    10、Javascript——数据类型(转载)
  • 原文地址:https://www.cnblogs.com/RzCong/p/10612380.html
Copyright © 2011-2022 走看看