zoukankan      html  css  js  c++  java
  • spark操作geoip的domain数据库

    val ipv4 = sc.textFile("hdfs://hbase11:9000/sparkTsData/GeoIP2-Domain-Blocks-IPv4.csv").map(_.split(",")).map(p=> (p(1),p(0)))
     
    val ipv6 = sc.textFile("hdfs://hbase11:9000/sparkTsData/GeoIP2-Domain-Blocks-IPv6.csv").map(_.split(",")).map(p=> (p(1),p(0)))
    
    val ip = ipv4 union ipv6
    
    ip.saveAsTextFile("hdfs://hbase11:9000/sparkTsData/combineIp")
    val ipSorted = ip.countByKey()
     
    val ipSortedRdd = sc.parallelize(ipSorted.toList)
    ipSortedRdd.collect
    ipSortedRdd.filter(p=> p._2.toLong > 1).count
    val ipSortedRddDup = ipSortedRdd.filter(p=> p._2.toLong > 1)
  • 相关阅读:
    复变函数
    abc136
    点集
    一些数学题
    牛客多校第六场
    牛客多校第五场G
    复数
    generator 1
    digits 2
    Winner
  • 原文地址:https://www.cnblogs.com/mayidudu/p/5761479.html
Copyright © 2011-2022 走看看