zoukankan      html  css  js  c++  java
  • 初见spark-04(高级算子)

      今天,这个是spark的高级算子的讲解的最后一个章节,今天我们来介绍几个简单的算子,

      

    countByKey

    val rdd1 = sc.parallelize(List(("a", 1), ("b", 2), ("b", 2), ("c", 2), ("c", 1)))
    rdd1.countByKey
    rdd1.countByValue

    -------------------------------------------------------------------------------------------
    -------------------------------------------------------------------------------------------
    filterByRange

    val rdd1 = sc.parallelize(List(("e", 5), ("c", 3), ("d", 4), ("c", 2), ("a", 1)))
    val rdd2 = rdd1.filterByRange("b", "d")
    rdd2.collect

    -------------------------------------------------------------------------------------------
    -------------------------------------------------------------------------------------------
    flatMapValues : Array((a,1), (a,2), (b,3), (b,4))
    val rdd3 = sc.parallelize(List(("a", "1 2"), ("b", "3 4")))
    val rdd4 = rdd3.flatMapValues(_.split(" "))
    rdd4.collect

    -------------------------------------------------------------------------------------------
    -------------------------------------------------------------------------------------------
    foldByKey

    val rdd1 = sc.parallelize(List("dog", "wolf", "cat", "bear"), 2)
    val rdd2 = rdd1.map(x => (x.length, x))
    val rdd3 = rdd2.foldByKey("")(_+_)

    val rdd = sc.textFile("hdfs://node-1.itcast.cn:9000/wc").flatMap(_.split(" ")).map((_, 1))
    rdd.foldByKey(0)(_+_)

    -------------------------------------------------------------------------------------------
    -------------------------------------------------------------------------------------------
    foreachPartition
    val rdd1 = sc.parallelize(List(1, 2, 3, 4, 5, 6, 7, 8, 9), 3)
    rdd1.foreachPartition(x => println(x.reduce(_ + _)))

    -------------------------------------------------------------------------------------------
    -------------------------------------------------------------------------------------------
    keyBy : 以传入的参数做key
    val rdd1 = sc.parallelize(List("dog", "salmon", "salmon", "rat", "elephant"), 3)
    val rdd2 = rdd1.keyBy(_.length)
    rdd2.collect

    -------------------------------------------------------------------------------------------
    -------------------------------------------------------------------------------------------
    keys values
    val rdd1 = sc.parallelize(List("dog", "tiger", "lion", "cat", "panther", "eagle"), 2)
    val rdd2 = rdd1.map(x => (x.length, x))
    rdd2.keys.collect
    rdd2.values.collect

  • 相关阅读:
    在mac守护进程中启动一个新进程
    OBS源码解析(3)OBSApp类介绍
    缩略图预览mini库
    Web Sql database 本地数据库
    React入口详解
    网页全屏显示
    使用cheerio爬数据兼容gbk和utf8
    前端自动化grunt的使用
    Emmet 神一样的sublime text插件
    BFC(Block Formatting Context)理解
  • 原文地址:https://www.cnblogs.com/wnbahmbb/p/6234728.html
Copyright © 2011-2022 走看看