zoukankan      html  css  js  c++  java
  • 初见spark-04(高级算子)

      今天,这个是spark的高级算子的讲解的最后一个章节,今天我们来介绍几个简单的算子,

      

    countByKey

    val rdd1 = sc.parallelize(List(("a", 1), ("b", 2), ("b", 2), ("c", 2), ("c", 1)))
    rdd1.countByKey
    rdd1.countByValue

    -------------------------------------------------------------------------------------------
    -------------------------------------------------------------------------------------------
    filterByRange

    val rdd1 = sc.parallelize(List(("e", 5), ("c", 3), ("d", 4), ("c", 2), ("a", 1)))
    val rdd2 = rdd1.filterByRange("b", "d")
    rdd2.collect

    -------------------------------------------------------------------------------------------
    -------------------------------------------------------------------------------------------
    flatMapValues : Array((a,1), (a,2), (b,3), (b,4))
    val rdd3 = sc.parallelize(List(("a", "1 2"), ("b", "3 4")))
    val rdd4 = rdd3.flatMapValues(_.split(" "))
    rdd4.collect

    -------------------------------------------------------------------------------------------
    -------------------------------------------------------------------------------------------
    foldByKey

    val rdd1 = sc.parallelize(List("dog", "wolf", "cat", "bear"), 2)
    val rdd2 = rdd1.map(x => (x.length, x))
    val rdd3 = rdd2.foldByKey("")(_+_)

    val rdd = sc.textFile("hdfs://node-1.itcast.cn:9000/wc").flatMap(_.split(" ")).map((_, 1))
    rdd.foldByKey(0)(_+_)

    -------------------------------------------------------------------------------------------
    -------------------------------------------------------------------------------------------
    foreachPartition
    val rdd1 = sc.parallelize(List(1, 2, 3, 4, 5, 6, 7, 8, 9), 3)
    rdd1.foreachPartition(x => println(x.reduce(_ + _)))

    -------------------------------------------------------------------------------------------
    -------------------------------------------------------------------------------------------
    keyBy : 以传入的参数做key
    val rdd1 = sc.parallelize(List("dog", "salmon", "salmon", "rat", "elephant"), 3)
    val rdd2 = rdd1.keyBy(_.length)
    rdd2.collect

    -------------------------------------------------------------------------------------------
    -------------------------------------------------------------------------------------------
    keys values
    val rdd1 = sc.parallelize(List("dog", "tiger", "lion", "cat", "panther", "eagle"), 2)
    val rdd2 = rdd1.map(x => (x.length, x))
    rdd2.keys.collect
    rdd2.values.collect

  • 相关阅读:
    如何提取一个转录本的3'UTR区域的序列
    如何研究某个gene的ceRNA 网络
    ceRNA 调控机制
    利用circpedia 数据库探究circRNA的可变剪切
    R语言低级绘图函数-symbols
    R语言低级绘图函数-grid
    R语言低级绘图函数-axis
    R语言低级绘图函数-title
    R语言低级绘图函数-points
    二叉树和二叉查找树之间的区别
  • 原文地址:https://www.cnblogs.com/wnbahmbb/p/6234728.html
Copyright © 2011-2022 走看看