zoukankan      html  css  js  c++  java
  • Groupby

    Groupby - collection processing

    Iterator and Iterable have most of the most useful methods when dealing with collections. Fold, Map, Filter are probably the most common. But other very useful methods include grouped/groupBy, sliding, find, forall, foreach, and many more. I want to cover Iterable's groupBy method in this topic.

    This is a Scala 2.8 and later method. It is similar to partition in that it allows the collection to be divided (or partitioned). Partition takes a method with returns a boolean and partitions the collection into two depending on a result. GroupBy takes a function that returns an object and returns a Map with the key being the return value. This allows an arbitrary number of partitions to be made from the collection.

    Here is the method signature:

    def groupBy[K](f : (A) => K) : Map[K, This]
    

    A bit of context is require to understand the three Type parameters A, K and This. This method is defined in a super class of collections called TraversableLike (I will briefly discuss this in the next topic.) TraversableLike takes two type parameters: the type of the collection and the type contained in the collection. Therefore in this method definition, 'This' refers to the collection type (List for example) and A refers to contained type (perhaps Int). Finally K refers to the type returned by the function and are the keys of the groups formed by the method.

    scala> val groups = (1 to 20).toList groupBy {
          case i if(i<5) => "g1"
          case i if(i<10) => "g2"
          case i if(i<15) => "g3"
          case _ => "g4"
          }
    
          res4: scala.collection.Map[java.lang.String,List[Int]] = Map(g1 -> List(1, 2, 3, 4), 
          g2 -> List(5, 6, 7, 8, 9), g3 -> List(10, 11, 12, 13, 14), g4 -> List(15, 16, 17, 18, 19, 20))
    
    scala> val mods = (1 to 20).toList groupBy ( _ % 4 )
    
    mods: scala.collection.Map[Int,List[Int]] = Map(1 -> List(1, 5, 9, 13, 17), 2 -> List(2, 6, 10, 14, 18), 
    3 -> List(3, 7,11, 15, 19), 0 -> List(4, 8, 12, 16, 20))
  • 相关阅读:
    Pig与Hive的区别
    Hadoop MapReduceV2(Yarn) 框架简介
    Spark技术内幕:Client,Master和Worker 通信源码解析
    Spark技术内幕:Stage划分及提交源码分析
    无责任比较thrift vs protocol buffers
    理解hadoop的Map-Reduce数据流(data flow)
    hadoop-2.5安装与配置
    linux下查看本地程序占用的端口
    MFC安装与部署(程序打包)
    关系数据库设计中数据字典设计例子
  • 原文地址:https://www.cnblogs.com/rollenholt/p/4170820.html
Copyright © 2011-2022 走看看