zoukankan      html  css  js  c++  java
  • Spark 将DataFrame所有的列类型改为double

    Spark 将DataFrame所有的列类型改为double

    1.单列转化方法

    import org.apache.spark.sql.types._
    val data = Array(("1", "2", "3", "4", "5"), ("6", "7", "8", "9", "10"))
    val df = spark.createDataFrame(data).toDF("col1", "col2", "col3", "col4", "col5")
    
    import org.apache.spark.sql.functions._
    df.select(col("col1").cast(DoubleType)).show()
    

    2.循环转变

    val colNames = df.columns
    
    var df1 = df
    for (colName <- colNames) {
      df1 = df1.withColumn(colName, col(colName).cast(DoubleType))
    }
    df1.show()
    

    3.通过:_*

    val cols = colNames.map(f => col(f).cast(DoubleType))
    df.select(cols: _*).show()
    
    +----+----+----+----+----+
    |col1|col2|col3|col4|col5|
    +----+----+----+----+----+
    | 1.0| 2.0| 3.0| 4.0| 5.0|
    | 6.0| 7.0| 8.0| 9.0|10.0|
    +----+----+----+----+----+
    
    

    查询指定多列和转变指定列的类型了:

    val name = "col1,col3,col5"
    df.select(name.split(",").map(name => col(name)): _*).show()
    df.select(name.split(",").map(name => col(name).cast(DoubleType)): _*).show()
    
    +----+----+----+
    |col1|col3|col5|
    +----+----+----+
    |   1|   3|   5|
    |   6|   8|  10|
    +----+----+----+
    
    +----+----+----+
    |col1|col3|col5|
    +----+----+----+
    | 1.0| 3.0| 5.0|
    | 6.0| 8.0|10.0|
    +----+----+----+
    
    

    上部分完整代码:

    import org.apache.spark.sql.SparkSession
    import org.apache.spark.sql.types._
    import org.apache.spark.sql.DataFrame
    
    object ChangeAllColDatatypes {
    
      def main(args: Array[String]): Unit = {
        val spark = SparkSession.builder().appName("ChangeAllColDatatypes").master("local").getOrCreate()
        import org.apache.spark.sql.types._
        val data = Array(("1", "2", "3", "4", "5"), ("6", "7", "8", "9", "10"))
        val df = spark.createDataFrame(data).toDF("col1", "col2", "col3", "col4", "col5")
    
        import org.apache.spark.sql.functions._
        df.select(col("col1").cast(DoubleType)).show()
    
        val colNames = df.columns
    
        var df1 = df
        for (colName <- colNames) {
          df1 = df1.withColumn(colName, col(colName).cast(DoubleType))
        }
        df1.show()
    
        val cols = colNames.map(f => col(f).cast(DoubleType))
        df.select(cols: _*).show()
        val name = "col1,col3,col5"
        df.select(name.split(",").map(name => col(name)): _*).show()
        df.select(name.split(",").map(name => col(name).cast(DoubleType)): _*).show()
    
      }
    

    上部分原文地址:董可伦

  • 相关阅读:
    最常用的CountDownLatch, CyclicBarrier你知道多少? (Java工程师必会)
    浅谈Java中的Condition条件队列,手摸手带你实现一个阻塞队列!
    实习到公司倒闭,2019我的技术踩坑之路!
    Java中的等待唤醒机制—至少50%的工程师还没掌握!
    告别编码5分钟,命名2小时!史上最全的Java命名规范参考!
    解决SELinux阻止Nginx访问服务
    ZooKeeper使用入门
    JVM致命错误日志详解
    虚拟机中设置 CentOS 静态 IP
    Spring 核心技术(7)
  • 原文地址:https://www.cnblogs.com/aixing/p/13327350.html
Copyright © 2011-2022 走看看