zoukankan      html  css  js  c++  java
  • scrapy定时执行抓取任务

    在ubuntu环境下,使用scrapy定时执行抓取任务,由于scrapy本身没有提供定时执行的功能,所以采用了crontab的方式进行定时执行:

    首先编写要执行的命令脚本cron.sh

    #! /bin/sh                                                                                                                                            
    
    export PATH=$PATH:/usr/local/bin
    
    cd /home/zhangchao/CVS/testCron
    
    nohup scrapy crawl example >> example.log 2>&1 &

     

    执行,crontab -e,规定crontab要执行的命令和要执行的时间频率,这里我需要每一分钟就执行scrapy crawl example这条爬取命令:

    # Edit this file to introduce tasks to be run by cron.
    #
    # Each task to run has to be defined through a single line
    # indicating with different fields when the task will be run
    # and what command to run for the task
    #
    # To define the time you can provide concrete values for
    # minute (m), hour (h), day of month (dom), month (mon),
    # and day of week (dow) or use '*' in these fields (for 'any').#
    # Notice that tasks will be started based on the cron's system
    # daemon's notion of time and timezones.
    #
    # Output of the crontab jobs (including errors) is sent through
    # email to the user the crontab file belongs to (unless redirected).
    #
    # For example, you can run a backup of all your user accounts
    # at 5 a.m every week with:
    # 0 5 * * 1 tar -zcf /var/backups/home.tgz /home/
    #
    # For more information see the manual pages of crontab(5) and cron(8)
    #
    # m h  dom mon dow   command
    
    */1 * * * *  sh /home/zhangchao/CVS/testCron/cron.sh

     

    编辑好了后,发现ubuntu的/var/log/下面没有crontab的日志,原因是因为ubuntu默认没有开启crontab的日志功能,执行如下操作:

    emacs /etc/rsyslog.d/50-default.conf ,将cron.*这一行前的注释打开:

    image

    然后重启sudo  service rsyslog  restart

    最后就可以使用tail –f  /var/log/cron.log查看crontab的日志了,可以看到cron.sh每一分钟被执行了一次:

    image

     

    借此机会复习下,crontab的常见格式:

    每分钟执行  */1 * * * *

    每小时执行     0 * * * *

    每天执行        0 0 * * *

    每周执行       0 0 * * 0

    每月执行        0 0 1 * *

    每年执行       0 0 1 1 *

    Image

     

     

     

     

  • 相关阅读:
    [HDOJ4417]Super Mario(归并树)
    [POJ2104] K-th Number(归并树,二分)
    2017北理校赛G题 人民的名义(FFT)
    [CF762C] Two Strings(预处理,二分答案)
    [CF798D] Mike and distribution(贪心,鸽笼原理,随机)
    [CF798C] Mike and gcd problem(规律,gcd)
    2017北理校赛H题 青蛙过河(线段树, dp, 离散化)
    [CF798B] Mike and strings(暴力)
    [CF798A] Mike and palindrome(水题,trick)
    [CCPC2017]湘潭邀请赛
  • 原文地址:https://www.cnblogs.com/justinzhang/p/4500409.html
Copyright © 2011-2022 走看看