zoukankan      html  css  js  c++  java
  • Spark scala使用na.replace替换DataFrame中的字符串

    创建DataFrameF示例

    val df = sc.parallelize(Seq(
         |   (0,"cat26","cat26"),
         |   (1,"cat67","cat26"),
         |   (2,"cat56","cat26"),
         |   (3,"cat8","cat26"))).toDF("Hour", "Category", "Value")

    方法一:

    scala> df.na.replace("*", Map[Any, Any](
         |      "cat26" -> "cat23"
         |    )).show()
    +----+--------+-----+
    |Hour|Category|Value|
    +----+--------+-----+
    |   0|   cat23|cat23|
    |   1|   cat67|cat23|
    |   2|   cat56|cat23|
    |   3|    cat8|cat23|
    +----+--------+-----+

    spark官方源码示例:org/apache/spark/sql/DataFrameNaFunctionsSuite.scala
    name是列名

    df.na.replace("name", Map(
            "Bob" -> "Bravo",
            "Alice" -> null
          ))
    
    df.na.replace("*", Map[Any, Any](
         false -> null
       ))

    方法二:

    替换hour列中的0为9
    import com.google.common.collect.ImmutableMap; scala
    > df.na.replace("hour", ImmutableMap.of(0, 9)).show() +----+--------+-----+ |Hour|Category|Value| +----+--------+-----+ | 9| cat26|cat26| | 1| cat67|cat26| | 2| cat56|cat26| | 3| cat8|cat26| +----+--------+-----+ 替换所有列中"cat26""cat222" scala> df.na.replace("*", ImmutableMap.of("cat26", "cat222")).show() +----+--------+------+ |Hour|Category| Value| +----+--------+------+ | 0| cat222|cat222| | 1| cat67|cat222| | 2| cat56|cat222| | 3| cat8|cat222| +----+--------+------+

    spark官方源码示例:

    org/apache/spark/sql/DataFrameNaFunctions.scala
    * {{{
    *   import com.google.common.collect.ImmutableMap;
    *
    *   // Replaces all occurrences of 1.0 with 2.0 in column "height".
    *   df.na.replace("height", ImmutableMap.of(1.0, 2.0));
    *
    *   // Replaces all occurrences of "UNKNOWN" with "unnamed" in column "name".
    *   df.na.replace("name", ImmutableMap.of("UNKNOWN", "unnamed"));
    *
    *   // Replaces all occurrences of "UNKNOWN" with "unnamed" in all string columns.
    *   df.na.replace("*", ImmutableMap.of("UNKNOWN", "unnamed"));
    * }}}

    如果没有一直坚持,也不会有质的飞跃,当生命有了限度,每个人的价值就会浮现。

    船长博客,期待共同交流提高!

    本文如对您有帮助,记得点击右下边小球【赞一下】,热烈期待您关注博客 n(*≧▽≦*)n

    0成本创业_月入5000被动收入

  • 相关阅读:
    每日总结2021.9.14
    jar包下载mvn
    每日总结EL表达语言 JSTL标签
    每日学习总结之数据中台概述
    Server Tomcat v9.0 Server at localhost failed to start
    Server Tomcat v9.0 Server at localhost failed to start(2)
    链表 java
    MVC 中用JS跳转窗体Window.Location.href
    Oracle 关键字
    MVC 配置路由 反复走控制其中的action (int?)
  • 原文地址:https://www.cnblogs.com/v5captain/p/14846377.html
Copyright © 2011-2022 走看看