zoukankan      html  css  js  c++  java
  • pyspark——Rdd与DataFrame相互转换

    Rdd转DataFrame

    from pyspark.sql.types import *
    from pyspark import SparkContext,SparkConf
    from  pyspark.sql import SparkSession
    
    spark=SparkSession.builder.appName("boye").getOrCreate()
    sc = spark.sparkContext
    textFile = sc.textFile("file:///usr/local/test/urls")
    rdd = textFile.map(lambda x:x.split("	")).filter(lambda x:len(x)==2)
    df = spark.createDataFrame(rdd,schema=["rowkey","url"])
    df.write.format("json").mode("overwrite").save("file:///usr/local/test/outPut") #保存数据
    df.write.save(path='/usr/local/test/csv', format='csv', mode='overwrite', sep='	') #保存为csv文件
    df.write.mode("overwrite").saveAsTable("ss") #永久保存
    

    DataFrame转Rdd  

    1 from pyspark import SparkContext,SparkConf
    2 from  pyspark.sql import SparkSession
    3 spark=SparkSession.builder.appName("boye").getOrCreate()
    4 sc = spark.sparkContext
    5 df = spark.read.json("file:///usr/local/test/01.json")
    6 rdd = df.select("name","age").limit(10).rdd
    7 rdd = rdd.map(lambda d:"{}	{}".format(d.name,d.age) )
    8 rdd.saveAsTextFile("file:///usr/local/test/rdd_json")
    9 #rdd.repartition(1).saveAsTextFile("file:///usr/local/test/rdd1")
  • 相关阅读:
    OpenGL(十一) BMP真彩文件的显示和复制操作
    OpenGL(十) 截屏并保存BMP文件
    复数的认识与理解
    复数的认识与理解
    采样定理
    采样定理
    How Many Fibs_hdu_1316(大数).java
    FZOJ2110 star(DFS)
    透过表象看本质!?之二数据拟合
    设计中的道理_3
  • 原文地址:https://www.cnblogs.com/boye169/p/13861789.html
Copyright © 2011-2022 走看看