zoukankan      html  css  js  c++  java
  • 【spark】常用转换操作:keys 、values和mapValues

    1.keys

    功能:

      返回所有键值对的key

    示例

    val list = List("hadoop","spark","hive","spark")
    val rdd = sc.parallelize(list)
    val pairRdd = rdd.map(x => (x,1))
    pairRdd.keys.collect.foreach(println)
    

    结果

    hadoop
    spark
    hive
    spark
    list: List[String] = List(hadoop, spark, hive, spark)
    rdd: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[142] at parallelize at command-3434610298353610:2
    pairRdd: org.apache.spark.rdd.RDD[(String, Int)] = MapPartitionsRDD[143] at map at command-3434610298353610:3
    

    2.values

    功能:

      返回所有键值对的value

    示例

    val list = List("hadoop","spark","hive","spark")
    val rdd = sc.parallelize(list)
    val pairRdd = rdd.map(x => (x,1))
    pairRdd.values.collect.foreach(println)
    

    结果

    1
    1
    1
    1
    list: List[String] = List(hadoop, spark, hive, spark)
    rdd: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[145] at parallelize at command-3434610298353610:2
    pairRdd: org.apache.spark.rdd.RDD[(String, Int)] = MapPartitionsRDD[146] at map at command-3434610298353610:3
    

    3.mapValues(func)

    功能:

      对键值对每个value都应用一个函数,但是,key不会发生变化。

    示例 

    val list = List("hadoop","spark","hive","spark")
    val rdd = sc.parallelize(list)
    val pairRdd = rdd.map(x => (x,1))
    pairRdd.mapValues(_+1).collect.foreach(println)//对每个value进行+1
    

    结果

    (hadoop,2)
    (spark,2)
    (hive,2)
    (spark,2)
  • 相关阅读:
    Mysql5.7主主互备安装配置
    一个简单有效的kubernetes部署案例
    kubernetes应用部署原理
    在线电路编程 (ICP)
    N76E003系统时钟
    说说UART(转)
    串行通信中 同步通信和异步通信的区别及使用情况(转)
    串行通讯与并行通讯区别
    定时器2及输入捕获
    N76E003之定时器3
  • 原文地址:https://www.cnblogs.com/zzhangyuhang/p/9001608.html
Copyright © 2011-2022 走看看