zoukankan      html  css  js  c++  java
  • spark scala读取csv文件

    将以下内容保存为small_zipcode.csv

    id,zipcode,type,city,state,population
    1,704,STANDARD,,PR,30100
    2,704,,PASEO COSTA DEL SUR,PR,
    3,709,,BDA SAN LUIS,PR,3700
    4,76166,UNIQUE,CINGULAR WIRELESS,TX,84000
    5,76177,STANDARD,,TX,
    ,,,,,
    7,76179,STANDARD,,TX,

    打开spark-shell交互式命令行

    val filePath="small_zipcode.csv"
    val df=spark.read.options(
      Map("inferSchema"->"true","delimiter"->",","header"->"true")).csv(filePath)
    
    scala> df.show
    +----+-------+--------+-------------------+-----+----------+
    |  id|zipcode|    type|               city|state|population|
    +----+-------+--------+-------------------+-----+----------+
    |   1|    704|STANDARD|               null|   PR|     30100|
    |   2|    704|    null|PASEO COSTA DEL SUR|   PR|      null|
    |   3|    709|    null|       BDA SAN LUIS|   PR|      3700|
    |   4|  76166|  UNIQUE|  CINGULAR WIRELESS|   TX|     84000|
    |   5|  76177|STANDARD|               null|   TX|      null|
    |null|   null|    null|               null| null|      null|
    |   7|  76179|STANDARD|               null|   TX|      null|
    +----+-------+--------+-------------------+-----+----------+
    
    scala> df.na.drop("all").show()
    +---+-------+--------+-------------------+-----+----------+
    | id|zipcode|    type|               city|state|population|
    +---+-------+--------+-------------------+-----+----------+
    |  1|    704|STANDARD|               null|   PR|     30100|
    |  2|    704|    null|PASEO COSTA DEL SUR|   PR|      null|
    |  3|    709|    null|       BDA SAN LUIS|   PR|      3700|
    |  4|  76166|  UNIQUE|  CINGULAR WIRELESS|   TX|     84000|
    |  5|  76177|STANDARD|               null|   TX|      null|
    |  7|  76179|STANDARD|               null|   TX|      null|
    +---+-------+--------+-------------------+-----+----------+
    
    
    scala> df.na.drop().show()
    +---+-------+------+-----------------+-----+----------+
    | id|zipcode|  type|             city|state|population|
    +---+-------+------+-----------------+-----+----------+
    |  4|  76166|UNIQUE|CINGULAR WIRELESS|   TX|     84000|
    +---+-------+------+-----------------+-----+----------+
    参考:
    N多spark使用示例:https://sparkbyexamples.com/spark/spark-dataframe-drop-rows-with-null-values/
  • 相关阅读:
    201006120100630
    2010080120100901
    20101120至20101220
    201155学习总结
    PublishReport.rss
    windowservice创建及部署
    提示要角色管理工具安装Microsoft .NET Framework 3.5
    部署SSIS包
    ETL及SSIS
    IbatisNet
  • 原文地址:https://www.cnblogs.com/v5captain/p/14248659.html
Copyright © 2011-2022 走看看