zoukankan      html  css  js  c++  java
  • Spark 将DataFrame所有的列类型改为double

    Spark 将DataFrame所有的列类型改为double

    1.单列转化方法

    import org.apache.spark.sql.types._
    val data = Array(("1", "2", "3", "4", "5"), ("6", "7", "8", "9", "10"))
    val df = spark.createDataFrame(data).toDF("col1", "col2", "col3", "col4", "col5")
    
    import org.apache.spark.sql.functions._
    df.select(col("col1").cast(DoubleType)).show()
    

    2.循环转变

    val colNames = df.columns
    
    var df1 = df
    for (colName <- colNames) {
      df1 = df1.withColumn(colName, col(colName).cast(DoubleType))
    }
    df1.show()
    

    3.通过:_*

    val cols = colNames.map(f => col(f).cast(DoubleType))
    df.select(cols: _*).show()
    
    +----+----+----+----+----+
    |col1|col2|col3|col4|col5|
    +----+----+----+----+----+
    | 1.0| 2.0| 3.0| 4.0| 5.0|
    | 6.0| 7.0| 8.0| 9.0|10.0|
    +----+----+----+----+----+
    
    

    查询指定多列和转变指定列的类型了:

    val name = "col1,col3,col5"
    df.select(name.split(",").map(name => col(name)): _*).show()
    df.select(name.split(",").map(name => col(name).cast(DoubleType)): _*).show()
    
    +----+----+----+
    |col1|col3|col5|
    +----+----+----+
    |   1|   3|   5|
    |   6|   8|  10|
    +----+----+----+
    
    +----+----+----+
    |col1|col3|col5|
    +----+----+----+
    | 1.0| 3.0| 5.0|
    | 6.0| 8.0|10.0|
    +----+----+----+
    
    

    上部分完整代码:

    import org.apache.spark.sql.SparkSession
    import org.apache.spark.sql.types._
    import org.apache.spark.sql.DataFrame
    
    object ChangeAllColDatatypes {
    
      def main(args: Array[String]): Unit = {
        val spark = SparkSession.builder().appName("ChangeAllColDatatypes").master("local").getOrCreate()
        import org.apache.spark.sql.types._
        val data = Array(("1", "2", "3", "4", "5"), ("6", "7", "8", "9", "10"))
        val df = spark.createDataFrame(data).toDF("col1", "col2", "col3", "col4", "col5")
    
        import org.apache.spark.sql.functions._
        df.select(col("col1").cast(DoubleType)).show()
    
        val colNames = df.columns
    
        var df1 = df
        for (colName <- colNames) {
          df1 = df1.withColumn(colName, col(colName).cast(DoubleType))
        }
        df1.show()
    
        val cols = colNames.map(f => col(f).cast(DoubleType))
        df.select(cols: _*).show()
        val name = "col1,col3,col5"
        df.select(name.split(",").map(name => col(name)): _*).show()
        df.select(name.split(",").map(name => col(name).cast(DoubleType)): _*).show()
    
      }
    

    上部分原文地址:董可伦

  • 相关阅读:
    array_merge
    漏斗模型
    3 破解密码,xshell连接
    2-安装linux7
    1-Linux运维人员要求
    17-[模块]-time&datetime
    16-[模块]-导入方式
    Nginx服务器
    15-作业:员工信息表查询
    14-本章总结
  • 原文地址:https://www.cnblogs.com/aixing/p/13327350.html
Copyright © 2011-2022 走看看