zoukankan      html  css  js  c++  java
  • 利用python将两张表链接



    from pyspark.sql import SparkSession
    from pyspark.sql.types import *
    import os


    def getUser(spark,path):
    struct1 = StructType([
    StructField("user", StringType(), True),
    StructField("vedios", StringType(), True),
    StructField("id", IntegerType(), True)
    ])
    df = spark.read.csv(path, schema=struct1, sep=" ", header=True)
    df.createOrReplaceTempView("users1")
    df = spark.sql("select * from users1")
    return df


    def getMovies(spark,path):
    df = spark.read.csv(path, header=True)
    df.createOrReplaceTempView("movies")
    df = spark.sql("select * from movies ")
    return df


    if __name__ == '__main__':
    os.environ['JAVA_HOME'] = 'C:Program FilesJavajdk1.8.0_211'
    print(os.path)
    spark = SparkSession
    .builder
    .appName("Python Spark SQL basic example")
    .config("spark.some.config.option", "some-value")
    .getOrCreate()
    path_user = "C:/Users/Administrator/Desktop/guiliVideo/user/2008/0903/user.txt"
    path_movies="C:/Users/Administrator/Desktop/vedios.txt"
    df1=getUser(spark,path_user)
    df2=getMovies(spark,path_movies)
    df3=df1.join(df2,df1.user==df2.uploader,how='inner')
    df3.createOrReplaceTempView('table1')
    df4=spark.sql('select * from table1 limit 10')
    df4.show(http://www.amjmh.com)
     
    ---------------------

  • 相关阅读:
    需要
    js学习
    vc 异常堆栈记录,这样不怕突然异常了
    1
    [Java] 动态代理 02 --生成代理主题角色
    [Java] 静态代理
    [Java] 反射机制 02
    [Java] 反射机制 01
    [Java] 正则表达式 02 EmailSpider 抓邮件, CodeCounter 统计程序有效代码行数
    [Java] 正则表达式 01 (基本都概览)
  • 原文地址:https://www.cnblogs.com/ly570/p/11357427.html
Copyright © 2011-2022 走看看