zoukankan html css js c++ java

RDD无reduceByKey方法

写Spark代码的时候经常发现rdd没有reduceByKey的方法，这个发生在spark1.2及其以前对版本，因为rdd本身不存在reduceByKey的方法，需要隐式转换成PairRDDFunctions才能访问，因此需要引入Import org.apache.spark.SparkContext._。

不过到了spark1.3的版本后，隐式转换的放在rdd的object中，这样就会自动被引入，不需要显式引入。

 * Defines implicit functions that provide extra functionalities on RDDs of specific types.
 * For example, [[RDD.rddToPairRDDFunctions]] converts an RDD into a [[PairRDDFunctions]] for
 * key-value-pair RDDs, and enabling extra functionalities such as [[PairRDDFunctions.reduceByKey]].
*/
 
object RDD {
  // The following implicit functions were in SparkContext before 1.3 and users had to
  // `import SparkContext._` to enable them. Now we move them here to make the compiler find
  // them automatically. However, we still keep the old functions in SparkContext for backward
  // compatibility and forward to the following functions directly.
  implicit def rddToPairRDDFunctions[K, V](rdd: RDD[(K, V)])
    (implicit kt: ClassTag[K], vt: ClassTag[V], ord: Ordering[K] = null): PairRDDFunctions[K, V] = {
    new PairRDDFunctions(rdd)
  }

至于什么是隐式转换，简单来讲就是scala偷梁换柱换柱，让隔壁老王来干你干不了的事情了。

查看全文

相关阅读:
CentOS6.0/RedHat Server 6.4安装配置过程详细图解！
关于Haproxy安装和配置：负载配置【haproxy.cfg】问题记录
 菜鸟学习Struts——bean标签库
 2013——2014总结
 高效程序员的45个习惯读书 ——敏捷开发修炼之道笔记之态度决定一切
 Hive深入浅出
 Java从入门到精通——调错篇之SVN 出现 Loced错误
 考试系统优化——准备工作
 深入解析：分布式系统的事务处理经典问题及模型(转载分享)
黑客攻击 UVa11825

原文地址：https://www.cnblogs.com/luckuan/p/4479551.html