zoukankan      html  css  js  c++  java
  • ubuntu系统下安装pyspider:使用supervisord启动并管理pyspider进程配置及说明

    首先感谢segmentfault.com的“imperat0r_”用户的文章新浪的“小菜一碟”用户的文章。这是他们的配置文件。我参考也写了一个,在最后呢。

    重点说明写在前面。本人用supervisord配置好pyspider后,pyspider一直有问题,不能正常运行。找了很久原因。最后想起,supervisord启动的进程是否正常这个问题。于是果断用supervisorctl命令查看所有管理的进程。果然发现有两个进程启动失败。怎么办?马上修改错误的参数啊!

    参数!参数!参数!一定要把参数配置正确,这是王道。

    “imperat0r_”的配置

    如果你使用源代码启动,可以使用这个配置。如果你使用已编译过的pyspider,请参考下面的配置。只有一个区别,就是启动的路径不一样。我自己的配置文件里,我对参数进行了简要的说明。

    [group:pyspider]
    program=pyspider-webui,pyspider-scheduler,pyspider-processor,pyspider-result_worker,pyspider-fetcher,pyspider-phantomjs
    priority=999
    
    [program:pyspider-webui]
    command=/usr/local/bin/pyspider/run.py -c /root/config.json webui
    directory=/root
    autostart=true
    autorestart=true
    priority=905
    user=root
    
    [program:pyspider-scheduler]
    command=/usr/local/bin/pyspider/run.py -c /root/config.json scheduler
    directory=/root
    autostart=true
    autorestart=true
    priority=900
    user=root
    
    [program:pyspider-processor]
    command=/usr/local/bin/pyspider/run.py -c /root/config.json processor
    directory=/root
    autostart=true
    autorestart=true
    priority=903
    user=root
    
    [program:pyspider-result_worker]
    command=/usr/local/bin/pyspider/run.py -c /root/config.json result_worker
    directory=/root
    autostart=true
    autorestart=true
    priority=904
    user=root
    
    [program:pyspider-fetcher]
    command=/usr/local/bin/pyspider/run.py -c /root/config.json --phantomjs-proxy="localhost:25555" fetcher
    directory=/root
    autostart=true
    autorestart=true
    priority=902
    user=root
    
    [program:pyspider-phantomjs]
    command=/usr/local/bin/pyspider/run.py -c /root/config.json phantomjs
    directory=/root
    autostart=true
    autorestart=true
    priority=901
    user=root

    新浪的“小菜一碟”的配置:

    如果你使用已编译过的pyspider,请参考这个配置。只有一个区别,就是启动的路径不一样。

    [group:pyspider]
    program=pyspider-webui,pyspider-scheduler,pyspider-processor,pyspider-result_worker,pyspider-fetcher,pyspider-phantomjs
    priority=999
    
    [program:pyspider-webui]
    command=pyspider -c config.json webui
    autostart=true
    autorestart=true
    priority=905
    user=root
    directory=/usr/pyspider/
    
    
    [program:pyspider-scheduler]
    command=pyspider -c config.json webui scheduler
    directory=/usr/pyspider/
    autostart=true
    autorestart=true
    priority=900
    user=root
    directory=/usr/pyspider/
    
    [program:pyspider-processor]
    command=pyspider -c config.json  processor
    directory=/usr/pyspider/
    autostart=true
    autorestart=true
    priority=903
    user=root
    
    [program:pyspider-result_worker]
    command=pyspider -c config.json result_worker
    directory=/usr/pyspider/
    autostart=true
    autorestart=true
    priority=904
    user=root
    
    [program:pyspider-fetcher]
    command=pyspider -c config.json  --phantomjs-proxy="localhost:25555" fetcher
    directory=/usr/pyspider/
    autostart=true
    autorestart=true
    priority=902
    user=root
    
    [program:pyspider-phantomjs]
    command=pyspider -c config.json phantomjs --phantomjs-path ./phantomjs/bin/phantomjs
    directory=/usr/pyspider/
    autostart=true
    autorestart=true
    priority=901
    user=root

    本人自己的配置文件。

    这个配置文件可以使pyspider每个组件单独启动进程,单独管理,不影响整体运行。我对这个配置文件学了很久,下面记录一下详细信息,希望对新手有用。每个参数的中文说明见下一节。

    [group:pyspider]
    program=pyspider-webui,pyspider-scheduler,pyspider-processor,pyspider-result_worker,pyspider-fetcher,pyspider-phantomjs
    priority=999
    stderr_logfile=/home/chg/py3env-pyspider/lib/python3.5/site-packages/pyspider/pyspider_err.log            
    stdout_logfile=/home/chg/py3env-pyspider/lib/python3.5/site-packages/pyspider/pyspider.log                
    
    [program:pyspider-webui]                                                                                  
    command=/home/chg/py3env-pyspider/bin/pyspider -c /home/chg/py3env-pyspider/lib/python3.5/site-packages/pyspider/config.json webui
    directory=/home/chg/py3env-pyspider/bin/
    autostart=true
    autorestart=true
    priority=905
    user=chg
    stderr_logfile=/home/chg/py3env-pyspider/lib/python3.5/site-packages/pyspider/pyspider_err.log
    stdout_logfile=/home/chg/py3env-pyspider/lib/python3.5/site-packages/pyspider/pyspider.log
    
    [program:pyspider-scheduler]
    command=/home/chg/py3env-pyspider/bin/pyspider -c /home/chg/py3env-pyspider/lib/python3.5/site-packages/pyspider/config.json scheduler
    directory=/home/chg/py3env-pyspider/bin/
    autostart=true
    autorestart=true
    priority=900
    user=chg
    stderr_logfile=/home/chg/py3env-pyspider/lib/python3.5/site-packages/pyspider/pyspider_err.log
    stdout_logfile=/home/chg/py3env-pyspider/lib/python3.5/site-packages/pyspider/pyspider.log
    
    
    [program:pyspider-processor]
    command=p/home/chg/py3env-pyspider/bin/pyspider -c /home/chg/py3env-pyspider/lib/python3.5/site-packages/pyspider/config.json processor
    directory=/home/chg/py3env-pyspider/bin/
    autostart=true
    autorestart=true
    priority=903
    user=chg
    stderr_logfile=/home/chg/py3env-pyspider/lib/python3.5/site-packages/pyspider/pyspider_err.log
    stdout_logfile=/home/chg/py3env-pyspider/lib/python3.5/site-packages/pyspider/pyspider.log
    
    [program:pyspider-result_worker]
    command=/home/chg/py3env-pyspider/bin/pyspider -c /home/chg/py3env-pyspider/lib/python3.5/site-packages/pyspider/config.json result_worker
    directory=/home/chg/py3env-pyspider/bin/
    autostart=true
    autorestart=true
    priority=904
    user=chg
    stderr_logfile=/home/chg/py3env-pyspider/lib/python3.5/site-packages/pyspider/pyspider_err.log
    stdout_logfile=/home/chg/py3env-pyspider/lib/python3.5/site-packages/pyspider/pyspider.log
    
    [program:pyspider-fetcher]
    command=/home/chg/py3env-pyspider/bin/pyspider -c /home/chg/py3env-pyspider/lib/python3.5/site-packages/pyspider/config.json --phantomjs-proxy="localhost:25555" fetcher
    directory=/home/chg/py3env-pyspider/bin/
    autostart=true
    autorestart=true
    priority=902
    user=chg
    stderr_logfile=/home/chg/py3env-pyspider/lib/python3.5/site-packages/pyspider/pyspider_err.log
    stdout_logfile=/home/chg/py3env-pyspider/lib/python3.5/site-packages/pyspider/pyspider.log
    
    [program:pyspider-phantomjs]
    command=/home/chg/py3env-pyspider/bin/pyspider -c /home/chg/py3env-pyspider/lib/python3.5/site-packages/pyspider/config.json phantomjs
    directory=/home/chg/py3env-pyspider/bin/
    autostart=true
    autorestart=true
    priority=901
    user=chg
    stderr_logfile=/home/chg/py3env-pyspider/lib/python3.5/site-packages/pyspider/pyspider_err.log
    stdout_logfile=/home/chg/py3env-pyspider/lib/python3.5/site-packages/pyspider/pyspider.log

    参数中文说明

    感谢”使用supervisord来管理process“的文章。

    ; Sample supervisor config file.
    ;
    ; For more information on the config file, please see:
    ; http://supervisord.org/configuration.html
    ;
    ; Note: shell expansion ("~" or "$HOME") is not supported.  Environment
    ; variables can be expanded using this syntax: "%(ENV_HOME)s".
     
    [unix_http_server]          ; supervisord的unix socket服务配置
    file=/tmp/supervisor.sock   ; socket文件的保存目录
    ;chmod=0700                 ; socket的文件权限 (default 0700)
    ;chown=nobody:nogroup       ; socket的拥有者和组名
    ;username=user              ; 默认不需要登陆用户 (open server)
    ;password=123               ; 默认不需要登陆密码 (open server)
     
    ;[inet_http_server]         ; supervisord的tcp服务配置
    ;port=127.0.0.1:9001        ; tcp端口
    ;username=user              ; tcp登陆用户
    ;password=123               ; tcp登陆密码
     
    [supervisord]                ; supervisord的主进程配置
    logfile=/tmp/supervisord.log ; 主要的进程日志配置
    logfile_maxbytes=50MB        ; 最大日志体积,默认50MB
    logfile_backups=10           ; 日志文件备份数目,默认10
    loglevel=info                ; 日志级别,默认info; 还有:debug,warn,trace
    pidfile=/tmp/supervisord.pid ; supervisord的pidfile文件
    nodaemon=false               ; 是否以守护进程的方式启动
    minfds=1024                  ; 最小的有效文件描述符,默认1024
    minprocs=200                 ; 最小的有效进程描述符,默认200
    ;umask=022                   ; 进程文件的umask,默认200
    ;user=chrism                 ; 默认为当前用户,如果为root则必填
    ;identifier=supervisor       ; supervisord的表示符, 默认时'supervisor'
    ;directory=/tmp              ; 默认不cd到当前目录
    ;nocleanup=true              ; 不在启动的时候清除临时文件,默认false
    ;childlogdir=/tmp            ; ('AUTO' child log dir, default $TEMP)
    ;environment=KEY=value       ; 初始键值对传递给进程
    ;strip_ansi=false            ; (strip ansi escape codes in logs; def. false)
     
    ; the below section must remain in the config file for RPC
    ; (supervisorctl/web interface) to work, additional interfaces may be
    ; added by defining them in separate rpcinterface: sections
    [rpcinterface:supervisor]
    supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface
     
    [supervisorctl]
    serverurl=unix:///tmp/supervisor.sock ; use a unix:// URL  for a unix socket
    ;serverurl=http://127.0.0.1:9001 ; use an http:// url to specify an inet socket
    ;username=chris              ; 如果设置应该与http_username相同
    ;password=123                ; 如果设置应该与http_password相同
    ;prompt=mysupervisor         ; 命令行提示符,默认"supervisor"
    ;history_file=~/.sc_history  ; 命令行历史纪录
     
    ; The below sample program section shows all possible program subsection values,
    ; create one or more 'real' program: sections to be able to control them under
    ; supervisor.
     
    ;[program:theprogramname]
    ;command=/bin/cat              ; 运行的程序 (相对使用PATH路径, 可以使用参数)
    ;process_name=%(program_name)s ; 进程名表达式,默认为%(program_name)s
    ;numprocs=1                    ; 默认启动的进程数目,默认为1
    ;directory=/tmp                ; 在运行前cwd到指定的目录,默认不执行cmd
    ;umask=022                     ; 进程umask,默认None
    ;priority=999                  ; 程序运行的优先级,默认999
    ;autostart=true                ; 默认随supervisord自动启动,默认true
    ;autorestart=unexpected        ; whether/when to restart (default: unexpected)
    ;startsecs=1                   ; number of secs prog must stay running (def. 1)
    ;startretries=3                ; max # of serial start failures (default 3)
    ;exitcodes=0,2                 ; 期望的退出码,默认0,2
    ;stopsignal=QUIT               ; 杀死进程的信号,默认TERM
    ;stopwaitsecs=10               ; max num secs to wait b4 SIGKILL (default 10)
    ;stopasgroup=false             ; 向unix进程组发送停止信号,默认false
    ;killasgroup=false             ; 向unix进程组发送SIGKILL信号,默认false
    ;user=chrism                   ; 为运行程序的unix帐号设置setuid
    ;redirect_stderr=true          ; 将标准错误重定向到标准输出,默认false
    ;stdout_logfile=/a/path        ; 标准输出的文件路径NONE=none;默认AUTO
    ;stdout_logfile_maxbytes=1MB   ; max # logfile bytes b4 rotation (default 50MB)
    ;stdout_logfile_backups=10     ; # of stdout logfile backups (default 10)
    ;stdout_capture_maxbytes=1MB   ; number of bytes in 'capturemode' (default 0)
    ;stdout_events_enabled=false   ; emit events on stdout writes (default false)
    ;stderr_logfile=/a/path        ; stderr log path, NONE for none; default AUTO
    ;stderr_logfile_maxbytes=1MB   ; max # logfile bytes b4 rotation (default 50MB)
    ;stderr_logfile_backups=10     ; # of stderr logfile backups (default 10)
    ;stderr_capture_maxbytes=1MB   ; number of bytes in 'capturemode' (default 0)
    ;stderr_events_enabled=false   ; emit events on stderr writes (default false)
    ;environment=A=1,B=2           ; process environment additions (def no adds)
    ;serverurl=AUTO                ; override serverurl computation (childutils)
     
    ; The below sample eventlistener section shows all possible
    ; eventlistener subsection values, create one or more 'real'
    ; eventlistener: sections to be able to handle event notifications
    ; sent by supervisor.
     
    ;[eventlistener:theeventlistenername]
    ;command=/bin/eventlistener    ; 运行的程序 (相对使用PATH路径, 可以使用参数)
    ;process_name=%(program_name)s ; 进程名表达式,默认为%(program_name)s
    ;numprocs=1                    ; 默认启动的进程数目,默认为1
    ;events=EVENT                  ; event notif. types to subscribe to (req'd)
    ;buffer_size=10                ; 事件缓冲区队列大小,默认10
    ;directory=/tmp                ; 在运行前cwd到指定的目录,默认不执行cmd
    ;umask=022                     ; 进程umask,默认None
    ;priority=-1                   ; 程序运行的优先级,默认-1
    ;autostart=true                ; 默认随supervisord自动启动,默认true
    ;autorestart=unexpected        ; whether/when to restart (default: unexpected)
    ;startsecs=1                   ; number of secs prog must stay running (def. 1)
    ;startretries=3                ; max # of serial start failures (default 3)
    ;exitcodes=0,2                 ; 期望的退出码,默认0,2
    ;stopsignal=QUIT               ; 杀死进程的信号,默认TERM
    ;stopwaitsecs=10               ; max num secs to wait b4 SIGKILL (default 10)
    ;stopasgroup=false             ; 向unix进程组发送停止信号,默认false
    ;killasgroup=false             ; 向unix进程组发送SIGKILL信号,默认false
    ;user=chrism                   ; setuid to this UNIX account to run the program
    ;redirect_stderr=true          ; redirect proc stderr to stdout (default false)
    ;stdout_logfile=/a/path        ; stdout log path, NONE for none; default AUTO
    ;stdout_logfile_maxbytes=1MB   ; max # logfile bytes b4 rotation (default 50MB)
    ;stdout_logfile_backups=10     ; # of stdout logfile backups (default 10)
    ;stdout_events_enabled=false   ; emit events on stdout writes (default false)
    ;stderr_logfile=/a/path        ; stderr log path, NONE for none; default AUTO
    ;stderr_logfile_maxbytes=1MB   ; max # logfile bytes b4 rotation (default 50MB)
    ;stderr_logfile_backups        ; # of stderr logfile backups (default 10)
    ;stderr_events_enabled=false   ; emit events on stderr writes (default false)
    ;environment=A=1,B=2           ; process environment additions
    ;serverurl=AUTO                ; override serverurl computation (childutils)
     
    ; The below sample group section shows all possible group values,
    ; create one or more 'real' group: sections to create "heterogeneous"
    ; process groups.
     
    ;[group:thegroupname]
    ;programs=progname1,progname2  ; 任何在[program:x]中定义的x
    ;priority=999                  ; 程序运行的优先级,默认999
     
    ; The [include] section can just contain the "files" setting.  This
    ; setting can list multiple files (separated by whitespace or
    ; newlines).  It can also contain wildcards.  The filenames are
    ; interpreted as relative to this file.  Included files *cannot*
    ; include files themselves.
     
    ;[include]
    ;files = relative/directory/*.ini
  • 相关阅读:
    电脑连接到手机并安装手机驱动usb-driver
    创建安卓模拟器的两种方式及常用Android命令介绍
    在loadrunner中用头文件的形式对字符串进行MD5加密操作
    Android自动化测试Uiautomator--UiCollection接口简介
    Android自动化测试Uiautomator--UiScrollable接口简介
    Android自动化测试Uiautomator--UiObject接口简介
    Android自动化测试Uiautomator--UiSelector接口简介
    Uiautomator简介及其环境搭建、测试执行
    Android自动化测试Uiautomator--UiDevice接口简介
    Eclipce 配置javaEE
  • 原文地址:https://www.cnblogs.com/microman/p/6138082.html
Copyright © 2011-2022 走看看