构造一个dataframe
import org.apache.spark.sql._ import org.apache.spark.sql.types._ val data = Array(List("Category A", 100, "This is category A"), List("Category B", 120, "This is category B"), List("Category C", 150, "This is category C")) // Create a schema for the dataframe val schema = StructType( StructField("Category", StringType, true) :: StructField("Count", IntegerType, true) :: StructField("Description", StringType, true) :: Nil) // Convert list to List of Row val rows = data.map(t=>Row(t(0),t(1),t(2))).toList // Create RDD val rdd = spark.sparkContext.parallelize(rows) // Create data frame val df = spark.createDataFrame(rdd,schema) print(df.schema) df.show()
+----------+-----+------------------+ | Category|Count| Description| +----------+-----+------------------+ |Category A| 100|This is category A| |Category B| 120|This is category B| |Category C| 150|This is category C| +----------+-----+------------------+
重命名一列:
val df2 = df.withColumnRenamed("Category", "category_new") df2.show()
scala> df2.show() +------------+-----+------------------+ |category_new|Count| Description| +------------+-----+------------------+ | Category A| 100|This is category A| | Category B| 120|This is category B| | Category C| 150|This is category C| +------------+-----+------------------+
重命名所有列:(每个列名转为小写并在末尾增加_new)
# Rename columns val new_column_names=df.columns.map(c=>c.toLowerCase() + "_new") val df3 = df.toDF(new_column_names:_*) df3.show()
scala> df3.show() +------------+---------+------------------+ |category_new|count_new| description_new| +------------+---------+------------------+ | Category A| 100|This is category A| | Category B| 120|This is category B| | Category C| 150|This is category C| +------------+---------+------------------+
还可以替换列名中的.或者其它字符
c.toLowerCase().replaceAll("\.", "_") + "_new"
转自:https://kontext.tech/column/spark/527/scala-change-data-frame-column-names-in-spark