zoukankan      html  css  js  c++  java
  • python spark 通过key来统计不同values个数

    >>> rdd = sc.parallelize([("a", "1"), ("b", 1), ("a", 1), ("a", 1)])
    >>> rdd.distinct().countByKey().items()
    [('a', 2), ('b', 1)]
    
    OR:
    
    from operator import add
    
    
    rdd.distinct().map(lambda x: (x[0], 1)).reduceByKey(add)
    rdd.distinct().keys().map(lambda x: (x, 1)).reduceByKey(add)

    distinct(numPartitions=None)

    Return a new RDD containing the distinct elements in this RDD.

    >>> sorted(sc.parallelize([1, 1, 2, 3]).distinct().collect())
    [1, 2, 3]

     countByKey()

    Count the number of elements for each key, and return the result to the master as a dictionary.

    >>> rdd = sc.parallelize([("a", 1), ("b", 1), ("a", 1)])
    >>> sorted(rdd.countByKey().items())
    [('a', 2), ('b', 1)]


  • 相关阅读:
    SAP Easy tree
    SAP Column tree
    SAP Tree
    SAP 文本框多行输入
    SAP -SE30 程序运行时间分析
    SAP 实例- 页签tabsrip
    ABAP CDS
    ABAP CDS
    ABAP CDS
    ABAP CDS
  • 原文地址:https://www.cnblogs.com/bonelee/p/7155153.html
Copyright © 2011-2022 走看看