zoukankan      html  css  js  c++  java
  • pyspark 数据类型及转换

    spark 有哪些数据类型 https://spark.apache.org/docs/latest/sql-reference.html

    Spark 数据类型

    Data Types

    Spark SQL and DataFrames support the following data types:

    • Numeric types
      • ByteType: Represents 1-byte signed integer numbers. The range of numbers is from -128 to 127.
      • ShortType: Represents 2-byte signed integer numbers. The range of numbers is from -32768 to 32767.
      • IntegerType: Represents 4-byte signed integer numbers. The range of numbers is from -2147483648 to 2147483647.
      • LongType: Represents 8-byte signed integer numbers. The range of numbers is from -9223372036854775808 to 9223372036854775807.
      • FloatType: Represents 4-byte single-precision floating point numbers.
      • DoubleType: Represents 8-byte double-precision floating point numbers.
      • DecimalType: Represents arbitrary-precision signed decimal numbers. Backed internally by java.math.BigDecimal. A BigDecimal consists of an arbitrary precision integer unscaled value and a 32-bit integer scale.
    • String type
      • StringType: Represents character string values.
    • Binary type
      • BinaryType: Represents byte sequence values.
    • Boolean type
      • BooleanType: Represents boolean values.
    • Datetime type
      • TimestampType: Represents values comprising values of fields year, month, day, hour, minute, and second.
      • DateType: Represents values comprising values of fields year, month, day.
    • Complex types
      • ArrayType(elementType, containsNull): Represents values comprising a sequence of elements with the type of elementTypecontainsNull is used to indicate if elements in a ArrayType value can have null values.
      • MapType(keyType, valueType, valueContainsNull): Represents values comprising a set of key-value pairs. The data type of keys are described by keyType and the data type of values are described by valueType. For a MapType value, keys are not allowed to have null values. valueContainsNull is used to indicate if values of a MapType value can have null values.
      • StructType(fields): Represents values with the structure described by a sequence of StructFields (fields).
        • StructField(name, dataType, nullable): Represents a field in a StructType. The name of a field is indicated by name. The data type of a field is indicated by dataTypenullable is used to indicate if values of this fields can have null values.

    对应的pyspark 数据类型在这里 pyspark.sql.types

    一些常见的转化场景:

    1. Converts a date/timestamp/string to a value of string, 转成的string 的格式用第二个参数指定

    df.withColumn('test', F.date_format(col('Last_Update'),"yyyy/MM/dd")).show()

    2. 转成 string后,可以 cast 成你想要的类型,比如下面的 date 型

    df = df.withColumn('date', F.date_format(col('Last_Update'),"yyyy-MM-dd").alias('ts').cast("date"))

    3. 把 timestamp 秒数(从1970年开始)转成日期格式 string

    4. unix_timestamp 把 日期 String 转换成 timestamp 秒数,是上面操作的反操作

      

       因为unix_timestamp 不考虑 ms ,如果一定要考虑ms可以用下面的方法

    df1 = df.withColumn("unix_timestamp",F.unix_timestamp(df.TIME,'dd-MMM-yyyy HH:mm:ss.SSS z') + F.substring(df.TIME,-7,3).cast('float')/1000)

    5. timestamp 秒数转换成 timestamp type, 可以用 F.to_timestamp

      

     6. 从timestamp 或者 string 日期类型提取 时间,日期等信息

      

     

    Ref:

    https://stackoverflow.com/questions/54337991/pyspark-from-unixtime-unix-timestamp-does-not-convert-to-timestamp

    转载请注明出处 http://www.cnblogs.com/mashuai-191/
  • 相关阅读:
    1、Python的初识与简介
    解密解密
    python看是32位还是64
    linux实战一段,安装python3(centos)
    前段技巧
    django后端safe和前端safe的方法
    测试
    python小知识整理
    ajax格式
    111
  • 原文地址:https://www.cnblogs.com/mashuai-191/p/12580628.html
Copyright © 2011-2022 走看看