zoukankan      html  css  js  c++  java
  • [Spark][Python]DataFrame的左右连接例子

    [Spark][Python]DataFrame的左右连接例子

    $ hdfs dfs -cat people.json

    {"name":"Alice","pcode":"94304"}
    {"name":"Brayden","age":30,"pcode":"94304"}
    {"name":"Carla","age":19,"pcoe":"10036"}
    {"name":"Diana","age":46}
    {"name":"Etienne","pcode":"94104"}

    $ hdfs dfs -cat pcodes.json

    {"pcode":"10036","city":"New York","state":"NY"}
    {"pcode":"87501","city":"Santa Fe","state":"NM"}
    {"pcode":"94304","city":"Palo Alto","state":"CA"}
    {"pcode":"94104","city":"San Francisco","state":"CA"}

    $pyspark

    sqlContext = HiveContext(sc)
    peopleDF = sqlContext.read.json("people.json")
    peopleDF.limit(5).show()

    +----+-------+-----+-----+
    | age| name|pcode| pcoe|
    +----+-------+-----+-----+
    |null| Alice|94304| null|
    | 30|Brayden|94304| null|
    | 19| Carla| null|10036|
    | 46| Diana| null| null|
    |null|Etienne|94104| null|
    +----+-------+-----+-----+

    sqlContext = HiveContext(sc)
    pcodesDF = sqlContext.read.json("pcodes.json")
    pcodesDF.limit(5).show()

    +-------------+-----+-----+
    | city|pcode|state|
    +-------------+-----+-----+
    | New York|10036| NY|
    | Santa Fe|87501| NM|
    | Palo Alto|94304| CA|
    |San Francisco|94104| CA|
    +-------------+-----+-----+

    mydf000 = peopleDF.join(pcodesDF,"pcode")
    mydf000.limit(5).show()

    +-----+----+-------+----+-------------+-----+
    |pcode| age| name|pcoe| city|state|
    +-----+----+-------+----+-------------+-----+
    |94304|null| Alice|null| Palo Alto| CA|
    |94304| 30|Brayden|null| Palo Alto| CA|
    |94104|null|Etienne|null|San Francisco| CA|
    +-----+----+-------+----+-------------+-----+

    mydf001=peopleDF.join(pcodesDF,"pcode","leftsemi")
    mydf001.limit(5).show()

    +-----+----+-------+----+
    |pcode| age| name|pcoe|
    +-----+----+-------+----+
    |94304|null| Alice|null|
    |94304| 30|Brayden|null|
    |94104|null|Etienne|null|
    +-----+----+-------+----+

    mydf002=peopleDF.join(pcodesDF,"pcode","left_outer")
    mydf002.limit(5).show()

    +-----+----+-------+-----+-------------+-----+
    |pcode| age| name| pcoe| city|state|
    +-----+----+-------+-----+-------------+-----+
    |94304|null| Alice| null| Palo Alto| CA|
    |94304| 30|Brayden| null| Palo Alto| CA|
    | null| 19| Carla|10036| null| null|
    | null| 46| Diana| null| null| null|
    |94104|null|Etienne| null|San Francisco| CA|
    +-----+----+-------+-----+-------------+-----+

    mydf003=peopleDF.join(pcodesDF,"pcode","right_outer")
    mydf003.limit(5).show()

    +-----+----+-------+----+-------------+-----+
    |pcode| age| name|pcoe| city|state|
    +-----+----+-------+----+-------------+-----+
    |10036|null| null|null| New York| NY|
    |87501|null| null|null| Santa Fe| NM|
    |94304|null| Alice|null| Palo Alto| CA|
    |94304| 30|Brayden|null| Palo Alto| CA|
    |94104|null|Etienne|null|San Francisco| CA|
    +-----+----+-------+----+-------------+-----+
  • 相关阅读:
    使用jquery的get,ajax,post三种方式实现ajax效果
    在javascript中Json字符串的解析
    (转)C#发送邮件及附件
    jQuery的combobox绑定失去焦点blur事件
    windows2008R2 x64位架设IIS7.x的支持SQLServer2008的PHP服务器
    PyCharm的几个常用设置
    转: 震惊小伙伴的单行代码 Python篇
    virtualBox安装Ubuntu16.4遇到的问题解决办法
    PHP里面把16进制的图片数据显示在html的img标签上
    转:Python:sitecustomize 和 usercustomize
  • 原文地址:https://www.cnblogs.com/gaojian/p/7633001.html
Copyright © 2011-2022 走看看