zoukankan      html  css  js  c++  java
  • Mac下搭建pyspark环境

    https://blog.csdn.net/wapecheng/article/details/108071538

    1.安装Java JDK

    https://www.oracle.com/java/technologies/javase-downloads.html

    然后点击安装即可

     后面发现要下载jdk8版本才行,否则下面会报错,可从这https://www.cr173.com/mac/122803.html下载

    2.

    brew install scala
    brew install apache-spark
    brew install hadoop

    3.

    vim ~/.bash_profile 

    加入下面的环境变量配置(记得jdk路径改成jdk8的)

    # HomeBrew
    export HOMEBREW_BOTTLE_DOMAIN=https://mirrors.tuna.tsinghua.edu.cn/homebrew-bottles
    export PATH="/usr/local/bin:$PATH"
    export PATH="/usr/local/sbin:$PATH"
    # HomeBrew END
    
    #Scala
    SCALA_HOME="/usr/local/Cellar/scala/2.13.3"
    export PATH="$PATH:$SCALA_HOME/bin"
    # Scala END
    
    # Hadoop
    HADOOP_HOME="/usr/local/Cellar/hadoop/3.3.0"
    export PATH="$PATH:$HADOOP_HOME/bin"
    # Hadoop END
    
    # spark
    export SPARK_PATH="/usr/local/Cellar/apache-spark/3.0.1"
    export PATH="$SPARK_PATH/bin:$PATH"
    # Spark End
    
    # JDK
    JAVA_HOME="/Library/Java/JavaVirtualMachines/jdk-16.0.1.jdk/Contents/Home"
    export PATH="$PATH:$JAVA_HOME/bin"
    # JDK END

    4.安装pyspark

    pip install pyspark

    查看下载的JDK位置: 

    $ /usr/libexec/java_home -V
    Matching Java Virtual Machines (1):
        16.0.1, x86_64:    "Java SE 16.0.1"    /Library/Java/JavaVirtualMachines/jdk-16.0.1.jdk/Contents/Home
    
    /Library/Java/JavaVirtualMachines/jdk-16.0.1.jdk/Contents/Home

    测试:

    import os
    os.environ['JAVA_HOME'] = '/Library/Java/JavaVirtualMachines/jdk-16.0.1.jdk/Contents/Home'
    
    import findspark
    findspark.init()
    from pyspark import SparkContext, SparkConf
    sc = SparkContext()
    from pyspark.sql import SparkSession
    # 初始化spark会话
    spark = SparkSession.builder.getOrCreate()

    报错:

    Exception in thread "main" java.lang.ExceptionInInitializerError
        at org.apache.spark.unsafe.array.ByteArrayMethods.<clinit>(ByteArrayMethods.java:54)
        at org.apache.spark.internal.config.package$.<init>(package.scala:1006)
        at org.apache.spark.internal.config.package$.<clinit>(package.scala)
        at org.apache.spark.deploy.SparkSubmitArguments.$anonfun$loadEnvironmentArguments$3(SparkSubmitArguments.scala:157)
        at scala.Option.orElse(Option.scala:447)
        at org.apache.spark.deploy.SparkSubmitArguments.loadEnvironmentArguments(SparkSubmitArguments.scala:157)
        at org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:115)
        at org.apache.spark.deploy.SparkSubmit$$anon$2$$anon$3.<init>(SparkSubmit.scala:990)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.parseArguments(SparkSubmit.scala:990)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:85)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
    Caused by: java.lang.reflect.InaccessibleObjectException: Unable to make private java.nio.DirectByteBuffer(long,int) accessible: module java.base does not "opens java.nio" to unnamed module @40f9161a
        at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:357)
        at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:297)
        at java.base/java.lang.reflect.Constructor.checkCanSetAccessible(Constructor.java:188)
        at java.base/java.lang.reflect.Constructor.setAccessible(Constructor.java:181)
        at org.apache.spark.unsafe.Platform.<clinit>(Platform.java:56)
        ... 13 more
    Traceback (most recent call last):
      File "delete.py", line 14, in <module>
        sc = SparkContext()
      File "/usr/local/opt/apache-spark/libexec/python/pyspark/context.py", line 133, in __init__
        SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
      File "/usr/local/opt/apache-spark/libexec/python/pyspark/context.py", line 325, in _ensure_initialized
        SparkContext._gateway = gateway or launch_gateway(conf)
      File "/usr/local/opt/apache-spark/libexec/python/pyspark/java_gateway.py", line 105, in launch_gateway
        raise Exception("Java gateway process exited before sending its port number")
    Exception: Java gateway process exited before sending its port number

    改成使用jdk8就没事了:

    os.environ['JAVA_HOME'] = '/Library/Java/JavaVirtualMachines/jdk1.8.0_181.jdk/Contents/Home'

    返回:

    21/05/10 11:19:06 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
    Setting default log level to "WARN".
    To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
  • 相关阅读:
    匈牙利算法demo
    linux/windows 文件共享--Samba环境搭建
    神经网络参数量和计算量计算
    C/C++ 开发中使用第三方库常见问题总结
    linux 如何更改docker的默认存储磁盘
    目录下文件递归查找
    c++ 项目开发技巧
    Finding Tiny faces 思想解析
    美女与硬币问题
    深度优先遍历解决连通域求解问题-python实现
  • 原文地址:https://www.cnblogs.com/wanghui-garcia/p/14745208.html
Copyright © 2011-2022 走看看