zoukankan      html  css  js  c++  java
  • PyCharm 远程连接linux中Python 运行pyspark

    PySpark in PyCharm on a remote server

    1、确保remote端Python、spark安装正确

    2、remote端安装、设置

    vi /etc/profile
    添加一行:

    export PYTHONPATH=$SPARK_HOME/python/:$SPARK_HOME/python/lib/py4j-0.10.4-src.zip

    PYTHONPATH=$SPARK_HOME/python/:$SPARK_HOME/python/lib/py4j-0.8.2.1-src.zip
    source /etc/profile

     # 安装pip 和 py4j

    下载pip-7.1.2.tar
    tar -xvf pip-7.1.2.tar
    cd pip-7.1.2
    python setup.py install
    pip install py4j

     # 避免ssh时tty检测

    cd /etc
    chmod 640 sudoers
    vi /etc/sudoers
    #Default requiretty

    3、本地Pycharm设置

    File > Settings > Project Interpreter:

    Project Interpreter > Add remote(前提:remote端python安装成功):

    注意,这里的Python路径为python interpreter path,如果python安装在其它路径,要把路径改过来

    Run > Edit Configuration (前提:虚拟机中共享本地目录成功):

    此处我配置映射是在Tools中进行的

    Tools > Dployment > Configuration

    4、测试

    import os
    import sys
    os.environ['SPARK_HOME'] = '/root/spark-1.4.0-bin-hadoop2.6'
    sys.path.append("/root/spark-1.4.0-bin-hadoop2.6/python")
    
    try:
        from pyspark import SparkContext
        from pyspark import SparkConf
    
        print ("Successfully imported Spark Modules")
    
    except ImportError as e:
        print ("Can not import Spark Modules", e)
        sys.exit(1)
    Result:
    
    ssh://hadoop@192.168.1.131:22/usr/bin/python -u /home/hadoop/TestFile/pysparkProgram/Mainprogram.py
    Successfully imported Spark Modules Process finished with exit code 0

    或者:

    import sys
    sys.path.append("/root/programs/spark-1.4.0-bin-hadoop2.6/python")
    
    try:
        import numpy as np
        import scipy.sparse as sps
        from pyspark.mllib.linalg import Vectors
    
        dv1 = np.array([1.0, 0.0, 3.0])
        dv2 = [1.0, 0.0, 3.0]
        sv1 = Vectors.sparse(3, [0, 2], [1.0, 3.0])
        sv2 = sps.csc_matrix((np.array([1.0, 3.0]), np.array([0, 2]), np.array([0, 2])), shape=(3, 1))
    
        print(sv2)
    
    except ImportError as e:
        print("Can not import Spark Modules", e)
        sys.exit(1)
    Result
    
    ssh://hadoop@192.168.1.131:22/usr/bin/python -u /home/hadoop/TestFile/pysparkProgram/Mainprogram.py
    (0, 0)
    1.0 (2, 0) 3.0 Process finished with exit code 0

    参考:
    https://edumine.wordpress.com/2015/08/14/pyspark-in-pycharm/
    http://renien.github.io/blog/accessing-pyspark-pycharm/
    http://www.tuicool.com/articles/MJnYJb

     参照:

    http://blog.csdn.net/u011196209/article/details/9934721

     

  • 相关阅读:
    同时使用gitee和github
    vim的四种模式及模式切换
    Vim使用入门
    Vim, Vim Diff, Vim Easy, Vim Read-only 区别
    公钥与私钥
    Linux使用Aria2命令下载BT种子/磁力/直链文件
    Content-Type /AJAX /@ResponseBody
    IDEA 添加serialVersionUID 检查
    Servlet中的Context Path | Servlet Path | Path Info
    第K大的数
  • 原文地址:https://www.cnblogs.com/nshuai/p/5856939.html
Copyright © 2011-2022 走看看