sudo apt-get install libcurl4-openssl-dev libxml2-dev libxslt1-dev
sudo atp-get install phantomjs
激活虚拟环境(python3.6.7)
pip install pyspider
执行pysqpider 即可
如果出现mysql相关的错误执行下面的语句先。
sudo apt-get purge mysql*
sudo apt-get autoremove
sudo apt-get autoclean
sudo apt-get dist-upgrade
发布
This document is based on MySQL + RabbitMQ
config.json
Although you can use command-line to specify the parameters. A config file is a better choice.
{
"taskdb": "mysql+taskdb://username:password@host:port/taskdb",
"projectdb": "mysql+projectdb://username:password@host:port/projectdb",
"resultdb": "mysql+resultdb://username:password@host:port/resultdb",
"message_queue": "amqp://username:password@host:port/%2F",
"webui": {
"username": "some_name",
"password": "some_passwd",
"need-auth": true
}
}
Database Connection URI type: should be one of `taskdb`, `projectdb`, `resultdb`.
running
You should run components alone with subcommands. You may add &
after command to make it running in background and use screen or nohup to prevent exit after your ssh session ends. It's recommended to manage components with Supervisor.
# start **only one** scheduler instance
pyspider -c config.json scheduler
# phantomjs
pyspider -c config.json phantomjs
# start fetcher / processor / result_worker instances as many as your needs
pyspider -c config.json --phantomjs-proxy="localhost:25555" fetcher
pyspider -c config.json processor
pyspider -c config.json result_worker
# start webui, set `--scheduler-rpc` if scheduler is not running on the same host as webui
pyspider -c config.json webui
you can get complete options by running pyspider --help
and pyspider webui --help
for subcommands.
"webui"
in JSON is configs for subcommands. You can add parameters for other components similar to this one.
To deploy pyspider components in each single processes, you need at least one database service. pyspider now supports MySQL, MongoDB and PostgreSQL. You can choose one of them.
And you need a message queue service to connect the components together. You can use RabbitMQ, Beanstalk or Redis as message queue.
pip install --allow-all-external pyspider[all]
Even if you had install pyspider using
pip
before. Install withpyspider[all]
is necessary to install the requirements for MySQL/MongoDB/RabbitMQ