zoukankan      html  css  js  c++  java
  • spark编程python实例

    spark编程python实例

    ValueError: Cannot run multiple SparkContexts at once; existing SparkContext(app=PySparkShell, master=local[])

    1.pyspark在jupyter notebook中开发,测试,提交

    1.1.启动

    IPYTHON_OPTS="notebook" /opt/spark/bin/pyspark

    ubuntu-spark-python-notebook1
    下载应用,将应用下载为.py文件(默认notebook后缀是.ipynb)
    sparkcode-saveaspy

    在shell中提交应用

    wxl@wxl-pc:/opt/spark/bin$ spark-submit /bin/spark-submit /home/wxl/Downloads/pysparkdemo.py

    !sparkcode-spark-submit

    3.遇到的错误及解决

    ValueError: Cannot run multiple SparkContexts at once; existing SparkContext(app=PySparkShell, master=local[*])
    d*

    3.1.错误

    ValueError: Cannot run multiple SparkContexts at once; existing SparkContext(app=PySparkShell, master=local[*])
    d*

    ValueError: Cannot run multiple SparkContexts at once; existing SparkContext(app=PySparkShell, master=local[*]) created by <module> at /usr/local/lib/python2.7/dist-packages/IPython/utils/py3compat.py:288

    spark-python-error-scstop

    3.2.解决,成功运行

    在from之后添加

    try:
        sc.stop()
    except:
        pass
    sc=SparkContext('local[2]','First Spark App')

    这里写图片描述

    贴上错误解决方法来源StackOverFlow

    4.源码

    pysparkdemo.ipynb

    {
     "cells": [
      {
       "cell_type": "code",
       "execution_count": 1,
       "metadata": {
        "collapsed": true
       },
       "outputs": [],
       "source": [
        "from pyspark import SparkContext"
       ]
      },
      {
       "cell_type": "code",
       "execution_count": 2,
       "metadata": {
        "collapsed": true
       },
       "outputs": [],
       "source": [
        "try:
    ",
        "    sc.stop()
    ",
        "except:
    ",
        "    pass
    ",
        "sc=SparkContext('local[2]','First Spark App')"
       ]
      },
      {
       "cell_type": "code",
       "execution_count": 3,
       "metadata": {
        "collapsed": true
       },
       "outputs": [],
       "source": [
        "data = sc.textFile("data/UserPurchaseHistory.csv").map(lambda line: line.split(",")).map(lambda record: (record[0], record[1], record[2]))"
       ]
      },
      {
       "cell_type": "code",
       "execution_count": 4,
       "metadata": {
        "collapsed": false,
        "scrolled": true
       },
       "outputs": [
        {
         "name": "stdout",
         "output_type": "stream",
         "text": [
          "Total purchases: 5
    "
         ]
        }
       ],
       "source": [
        "numPurchases = data.count()
    ",
        "print "Total purchases: %d" % numPurchases"
       ]
      },
      {
       "cell_type": "code",
       "execution_count": null,
       "metadata": {
        "collapsed": true
       },
       "outputs": [],
       "source": []
      }
     ],
     "metadata": {
      "kernelspec": {
       "display_name": "Python 2",
       "language": "python",
       "name": "python2"
      },
      "language_info": {
       "codemirror_mode": {
        "name": "ipython",
        "version": 2
       },
       "file_extension": ".py",
       "mimetype": "text/x-python",
       "name": "python",
       "nbconvert_exporter": "python",
       "pygments_lexer": "ipython2",
       "version": "2.7.12"
      }
     },
     "nbformat": 4,
     "nbformat_minor": 0
    }

    pysparkdemo.py

    
    # coding: utf-8
    
    # In[1]:
    
    from pyspark import SparkContext
    
    
    # In[2]:
    
    try:
        sc.stop()
    except:
        pass
    sc=SparkContext('local[2]','First Spark App')
    
    
    # In[3]:
    
    data = sc.textFile("data/UserPurchaseHistory.csv").map(lambda line: line.split(",")).map(lambda record: (record[0], record[1], record[2]))
    
    
    # In[4]:
    
    numPurchases = data.count()
    print "Total purchases: %d" % numPurchases
    
    
    # In[ ]:
    
  • 相关阅读:
    (Redis基础教程之十) 如何在Redis中运行事务
    (Python基础教程之十三)Python中使用httplib2 – HTTP GET和POST示例
    (Redis基础教程之六)如何使用Redis中的List
    (Redis基础教程之九) 如何在Redis中使用Sorted Sets
    (Python基础教程之十九)Python优先级队列示例
    (Python基础教程之十八)Python字典交集–比较两个字典
    (Python基础教程之十七)Python OrderedDict –有序字典
    Heap_Sort
    Quick_Sort
    Merge_Sort
  • 原文地址:https://www.cnblogs.com/lanzhi/p/6467680.html
Copyright © 2011-2022 走看看