zoukankan html css js c++ java

Spark算子讲解(二)

1：glom

def glom(): RDD[Array[T]]

将原RDD的元素收集到一个数组，创建一个数组类型的RDD

2：getNumPartitions

final def getNumPartitions: Int

求RDD的分区书

3：groupBy

def groupBy[K](f: (T) ⇒ K)(implicit kt: ClassTag[K]): RDD[(K, Iterable[T])]

根据指定函数进行分组，例如：

scala> rdd1.collect
res61: Array[Int] = Array(1, 2, 3, 4, 5)

scala> rdd1.groupBy(x=>if(x%2==0) 0 else 1).collect
res62: Array[(Int, Iterable[Int])] = Array((0,CompactBuffer(4, 2)), (1,CompactBuffer(1, 3, 5)))

4：randomSplit

def randomSplit(weights: Array[Double], seed: Long = Utils.random.nextLong): Array[RDD[T]]

将一个RDD根据weights数组进行划分多个RDD，返回一个数组。

5：countByValue

返回每一个元素出现的次数，可以更加方便实现wordcount

scala> sc.parallelize(Array(1,2,1,2,1,2,3,4,5)).countByValue
res73: scala.collection.Map[Int,Long] = Map(5 -> 1, 1 -> 3, 2 -> 3, 3 -> 1, 4 -> 1)

6：countByValueApprox

def countByValueApprox(timeout: Long, confidence: Double = 0.95)(implicit ord: Ordering[T] = null): PartialResult[Map[T, BoundedDouble]]

求一个近似的计算结果

7：++

def ++(other: RDD[T]): RDD[T]

求RDD的并集

8：fold

def fold(zeroValue: T)(op: (T, T) ⇒ T): T

例如：

scala> rdd1.collect
res90: Array[Int] = Array(1, 2, 3, 4, 5)

scala> rdd1.fold(0)(_+_)
res91: Int = 15

查看全文

相关阅读:
金额转中国大写
 double 四舍五入保留一定的位数
 通过ajax提交表单上传文件
 微信扫码提示浏览器打开的
 在Servlet中获取spring容器WebApplicationContext
Oracle CONNECT by 简单用法
 JS 删除Array对象中的元素。
数据导出excel
DWZ 在js中刷新某个navTab
Python发送邮件

原文地址：https://www.cnblogs.com/leodaxin/p/7499552.html