zoukankan      html  css  js  c++  java
  • Spark TempView和GlobalTempView的区别

    Spark TempView和GlobalTempView的区别

    TempView和GlobalTempView在spark的Dataframe中经常使用,两者的区别和应用场景有什么不同。

    我们以下面的例子比较下两者的不同。

    from pyspark.sql import SparkSession
    import numpy as np
    import pandas as pd
    
    spark = SparkSession.builder.getOrCreate()
    
    d = np.random.randint(1,100, 5*5).reshape(5,-1)
    data = pd.DataFrame(d, columns=list('abcde'))
    df = spark.createDataFrame(data)
    df.show()
    
    +---+---+---+---+---+
    |  a|  b|  c|  d|  e|
    +---+---+---+---+---+
    | 17| 30| 61| 61| 33|
    | 32| 23| 24|  7|  7|
    | 47|  6|  4| 95| 34|
    | 50| 69| 83| 21| 46|
    | 52| 12| 83| 49| 85|
    +---+---+---+---+---+
    

    从tempview中取数据

    temp = df.createTempView('temp')
    temp_sql = "select * from temp where a=50"
    res = spark.sql(temp_sql)
    res.show()
    
    +---+---+---+---+---+
    |  a|  b|  c|  d|  e|
    +---+---+---+---+---+
    | 50| 69| 83| 21| 46|
    +---+---+---+---+---+
    

    从globaltempview中取数据

    glob = df.createGlobalTempView('glob')
    glob_sql = "select * from global_temp.glob where a = 17"
    res2 = spark.sql(glob_sql)
    res2.show()
    
    +---+---+---+---+---+
    |  a|  b|  c|  d|  e|
    +---+---+---+---+---+
    | 17| 30| 61| 61| 33|
    +---+---+---+---+---+
    

    Globaltempview 数据可以在多个sparkSession中共享

    # 创建新的sparkSession
    spark2 = spark.newSession()
    spark2 == spark
    
    False
    
    # 新的sparkSession可以获取globaltempview中的数据
    new_sql = "select * from global_temp.glob where a = 47"
    temp = spark2.sql(new_sql)
    temp.show()
    
    +---+---+---+---+---+
    |  a|  b|  c|  d|  e|
    +---+---+---+---+---+
    | 47|  6|  4| 95| 34|
    +---+---+---+---+---+
    
    # 新的sparkSession无法获取tempview中的数据
    # 会提示找不到temp表
    
    new_sql2 = "select * from temp where a = 47"
    temp = spark2.sql(new_sql2)
    temp.show()
    
    # 使用global_temp前缀也不行
    new_sql2 = "select * from global_temp.temp where a = 47"
    temp = spark2.sql(new_sql2)
    temp.show()
    
    ---------------------------------------------------------------------------
    Py4JJavaError                             Traceback (most recent call last)
    # 此处多行删除异常信息
    AnalysisException: "Table or view not found: `global_temp`.`temp`; line 1 pos 14;
    'Project [*]
    +- 'Filter ('a = 47)
       +- 'UnresolvedRelation `global_temp`.`temp`
    "
    

    tempview删除后无法使用

    spark.catalog.dropTempView('temp')
    spark.catalog.dropGlobalTempView('glob')
    
    # 报错,找不到table temp
    temp_sql2 = "select * from temp where a = 47"
    temp = spark.sql(temp_sql2)
    
    # 报错,找不到global_temp.glob,spark和spark2中均报错
    glob_sql2 = "select * from global_temp.glob where a = 47"
    temp = spark.sql(glob_sql2)
    temp = spark2.sql(glob_sql2)
    

    总结

    spark中有四个tempview方法

    • df.createGlobalTempView
    • df.createOrReplaceGlobalTempView
    • df.createOrReplaceTempView
    • df.createTempView

    replace方法:不存在则直接创建,存在则替换


    tempview删除后无法使用

    两个删除方法
    spark.catalog.dropTempView('temp')
    spark.catalog.dropGlobalTempView('glob')


    TempView和GlobalTempView的异同

    1. tempview只能在一个sparkSession中使用
    2. GlobaltempView可以在多个sparkSession中共享使用
    3. 但是他们都不能跨Application使用
  • 相关阅读:
    周末之个人杂想(十三)
    PowerTip of the DaySorting Multiple Properties
    PowerTip of the DayCreate Remoting Solutions
    PowerTip of the DayAdd Help to Your Functions
    PowerTip of the DayAcessing Function Parameters by Type
    PowerTip of the DayReplace Text in Files
    PowerTip of the DayAdding Extra Information
    PowerTip of the DayPrinting Results
    Win7下IIS 7.5配置SSAS(2008)远程访问
    PowerTip of the DayOpening Current Folder in Explorer
  • 原文地址:https://www.cnblogs.com/StitchSun/p/13255607.html
Copyright © 2011-2022 走看看