zoukankan      html  css  js  c++  java
  • [Spark][Python]Mapping Single Rows to Multiple Pairs

    Mapping Single Rows to Multiple Pairs
    目的:

    把如下的这种数据,

    Input Data

    00001 sku010:sku933:sku022
    00002 sku912:sku331
    00003 sku888:sku022:sku010:sku594
    00004 sku411


    转换为这样:
    一个Key值,带的这几个键值,分别罗列:

    (00001,sk010)
    (00001,sku933)
    (00001,sku022)

    ...
    (00002,sku912)
    (00002,sku331)
    (00003,sku888)

    这就是所谓的 Mapping Single Rows to Multiple Pairs

    步骤如下:

    [training@localhost ~]$ vim act001.txt
    [training@localhost ~]$
    [training@localhost ~]$ cat act001.txt
    00001 ku010:sku933:sku022
    00002 sku912:sku331
    00003 sku888:sku022:sku010:sku594
    00004 sku411
    [training@localhost ~]$ hdfs dfs -put act001.txt
    [training@localhost ~]$
    [training@localhost ~]$ hdfs dfs -cat act001.txt
    00001 ku010:sku933:sku022
    00002 sku912:sku331
    00003 sku888:sku022:sku010:sku594
    00004 sku411
    [training@localhost ~]$

    In [6]: mydata01=mydata.map(lambda line: line.split(" "))

    In [7]: type(mydata01)
    Out[7]: pyspark.rdd.PipelinedRDD

    In [8]: mydata02=mydata01.map(lambda fields: (fields[0],fields[1]))

    In [9]: type(mydata02)
    Out[9]: pyspark.rdd.PipelinedRDD

    In [10]:

    In [11]: mydata03 = mydata02.flatMapValues(lambda skus: skus.split(":"))

    In [12]: type(mydata03)
    Out[12]: pyspark.rdd.PipelinedRDD

    In [13]: mydata03.take(1)
    Out[13]: [(u'00001', u'ku010')]

  • 相关阅读:
    HTML表格的运用
    HTML常用元素
    CSS常用样式(四)之animation
    CSS常用样式(三)
    CSS学习总结(三)
    CSS常用样式(二)
    CSS常用样式(一)
    CSS学习总结(二)
    CSS学习总结(一)
    HTML标签的嵌套
  • 原文地址:https://www.cnblogs.com/gaojian/p/7603900.html
Copyright © 2011-2022 走看看