zoukankan      html  css  js  c++  java
  • spark scala读取csv文件

    将以下内容保存为small_zipcode.csv

    id,zipcode,type,city,state,population
    1,704,STANDARD,,PR,30100
    2,704,,PASEO COSTA DEL SUR,PR,
    3,709,,BDA SAN LUIS,PR,3700
    4,76166,UNIQUE,CINGULAR WIRELESS,TX,84000
    5,76177,STANDARD,,TX,
    ,,,,,
    7,76179,STANDARD,,TX,

    打开spark-shell交互式命令行

    val filePath="small_zipcode.csv"
    val df=spark.read.options(
      Map("inferSchema"->"true","delimiter"->",","header"->"true")).csv(filePath)
    
    scala> df.show
    +----+-------+--------+-------------------+-----+----------+
    |  id|zipcode|    type|               city|state|population|
    +----+-------+--------+-------------------+-----+----------+
    |   1|    704|STANDARD|               null|   PR|     30100|
    |   2|    704|    null|PASEO COSTA DEL SUR|   PR|      null|
    |   3|    709|    null|       BDA SAN LUIS|   PR|      3700|
    |   4|  76166|  UNIQUE|  CINGULAR WIRELESS|   TX|     84000|
    |   5|  76177|STANDARD|               null|   TX|      null|
    |null|   null|    null|               null| null|      null|
    |   7|  76179|STANDARD|               null|   TX|      null|
    +----+-------+--------+-------------------+-----+----------+
    
    scala> df.na.drop("all").show()
    +---+-------+--------+-------------------+-----+----------+
    | id|zipcode|    type|               city|state|population|
    +---+-------+--------+-------------------+-----+----------+
    |  1|    704|STANDARD|               null|   PR|     30100|
    |  2|    704|    null|PASEO COSTA DEL SUR|   PR|      null|
    |  3|    709|    null|       BDA SAN LUIS|   PR|      3700|
    |  4|  76166|  UNIQUE|  CINGULAR WIRELESS|   TX|     84000|
    |  5|  76177|STANDARD|               null|   TX|      null|
    |  7|  76179|STANDARD|               null|   TX|      null|
    +---+-------+--------+-------------------+-----+----------+
    
    
    scala> df.na.drop().show()
    +---+-------+------+-----------------+-----+----------+
    | id|zipcode|  type|             city|state|population|
    +---+-------+------+-----------------+-----+----------+
    |  4|  76166|UNIQUE|CINGULAR WIRELESS|   TX|     84000|
    +---+-------+------+-----------------+-----+----------+
    参考:
    N多spark使用示例:https://sparkbyexamples.com/spark/spark-dataframe-drop-rows-with-null-values/
  • 相关阅读:
    关于GDI+的图片质量
    断点续传的Demo
    offic2007 ,offic2010安装失败 1935错误处理方案
    Easy Slider幻灯片 API
    Lazyload.js延迟加载效果
    MVC3中 多种跳转方式总结
    MVC中 DropDownList编辑默认选中的使用
    ASP.NET Session的七点认识
    easyui 的datagrid的自适应宽度的问题
    Linux常用文件操作高频使用命令
  • 原文地址:https://www.cnblogs.com/v5captain/p/14248659.html
Copyright © 2011-2022 走看看