zoukankan      html  css  js  c++  java
  • pyspark数据处理分析

    相比于pandas,pyspark的dataframe的接口和sql类似,比较容易上手。

    搭建python3环境

    建议使用miniconda3

    下载地址:https://mirrors.bfsu.edu.cn/anaconda/miniconda/  选择py37版本

    conda镜像配置:https://mirrors.bfsu.edu.cn/help/anaconda/

    pip镜像配置:https://mirrors.bfsu.edu.cn/help/pypi/

    miniconda安装,直接sh minicondaxxxxxx.sh 很简单

    选择一个编辑器或者pycharm

    pyspark跑单机模式

    准备数据集data.csv

    name,age
    张三,24
    李四,25
    小红,22
    

    编写一下代码,使用jupyter更佳。

    from pyspark.sql import SparkSession
    
    spark = SparkSession.builder.master("local[*]").getOrCreate()
    print("
    
    app start")
    df = spark.read.option('header','true').csv("data.csv")
    
    df.printSchema()
    
    df.show()
    
    df.filter("age<25").show()
    
    spark.stop()
    
    20/12/05 22:14:07 WARN Utils: Your hostname, shuai-virtual-machine resolves to a loopback address: 127.0.1.1; using 192.168.153.128 instead (on interface ens33)
    20/12/05 22:14:07 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
    WARNING: An illegal reflective access operation has occurred
    WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/home/shuai/miniconda3/lib/python3.7/site-packages/pyspark/jars/spark-unsafe_2.12-3.0.1.jar) to constructor java.nio.DirectByteBuffer(long,int)
    WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
    WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
    WARNING: All illegal access operations will be denied in a future release
    20/12/05 22:14:08 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
    Setting default log level to "WARN".
    To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
    
    
    app start
    root
     |-- name: string (nullable = true)
     |-- age: string (nullable = true)
    
    +----+---+
    |name|age|
    +----+---+
    |张三| 24|
    |李四| 25|
    |小红| 22|
    +----+---+
    
    +----+---+
    |name|age|
    +----+---+
    |张三| 24|
    |小红| 22|
    +----+---+
    
  • 相关阅读:
    在胜利中窥探危机、在失败中寻觅良机
    自我剖析--为了更好的自己
    Python os模块之文件操作
    Python:XXX missing X required positional argument: 'self'
    Python scipy.sparse矩阵使用方法
    计算机视觉算法框架理解
    Python--Argparse学习感悟
    ROC曲线、AUC、Precision、Recall、F-measure理解及Python实现
    Windows版的各种Python库安装包下载地址与安装过程
    NLP常见任务
  • 原文地址:https://www.cnblogs.com/startnow/p/14091285.html
Copyright © 2011-2022 走看看