zoukankan      html  css  js  c++  java
  • scala mapPartitionsWithIndex函数的使用

    var rdd1=sc.makeRDD(Array((1,"A"),(2,"B"),(3,"C"),(4,"D")),2)

    rdd1.partitions.size

    res20:int=2

    rdd1.mapPartitionsWithIndex{

    (partIdx,iter)=>{

     var part_map=scala.collection.mutable.Map[string,List[(Int,String)]]()

     while(iter.hasNext)

    {

      var part_name="part_"+partIdx;

      var elem=iter.next();

     if(part_map.contains(part_name)){

     var elems=part_map(part_name)

    elems::=elem

    part_map(part_name)=elems

    } else{

      part_map(part_name)=List[(Int,String)]{elem}

    }

    }

    part_map.iterator

    }}.collect

     -----------------------------------------------------------

    val three=sc.textFile("/tmp/spark/three",3)
    var idx=0
    import org.apache.spark.HashPartitioner

    val res=three.filter(_.trim().length>0).map(num=>(num.trim.toInt,"")).partitionBy(new HashPartitioner(1)).sortBykey().map
    (t=>{
    idx+=1
    (idx,t._1)
    }).collect.foreach(x=>println(x._1+" "+x._2))

    ------------------------------------------------------------------

    spark算子:partitionBy对数据进行分区
    https://www.cnblogs.com/yy3b2007com/p/7800793.html

    Hadoop经典案例Spark实现(三)——数据排序

    https://blog.csdn.net/kwu_ganymede/article/details/50475788

  • 相关阅读:
    [IOI2013]Dreaming
    Lost Cows
    Mobile Service
    [POI2005]Bank notes
    [CTSC2007]动物园zoo
    [CF1093F]Vasya and Array
    [雅礼集训 2017 Day1]市场
    [APIO2014]序列分割
    [CEOI2004]锯木厂选址
    [APIO2010]特别行动队
  • 原文地址:https://www.cnblogs.com/chengjun/p/8954515.html
Copyright © 2011-2022 走看看