zoukankan      html  css  js  c++  java
  • scrapy snippet

    1. spider文件

    from scrapy.contrib.spiders import CrawlSpider, Rule
    from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
    from scrapy.selector import HtmlXPathSelector
    
    item = DomzItem()
    image_urls = hxs.select('//img/@src').extract()
    item['image_urls'] = ["http:" + x for x in image_urls]
    return item
    from scrapy.selector import HtmlXPathSelector
    hxs = HtmlXPathSelector(response)
    
    class MySpider(CrawlSpider): #控制下载速度
        name = 'myspider'
        download_delay = 2
    
    $ scrapy crawl somespider -s JOBDIR=crawls/somespider-1
    #这样开始下载之后可以Ctrl + C停止,恢复下载还是同样的命令
    $ scrapy crawl somespider -s JOBDIR=crawls/somespider-1
    name = "wikipedia"
    allowed_domains = ["wikipedia.org"]
    start_urls = [
      "http://en.wikipedia.org/wiki/Pune"
    ]
    

     2. setting文件

    ITEM_PIPELINES = ['scrapy.contrib.pipeline.images.ImagesPipeline']
    IMAGES_STORE= '...'
    

    3. item 文件

     image_urls = Field()
     images = Field() 
    
  • 相关阅读:
    SpringMVC
    spring-02
    spring-01
    适配器模式
    状态模式
    抽象工厂模式
    观察者模式(发布-订阅模式)
    建造者模式(生成器模式)
    外观模式
    迪米特法则
  • 原文地址:https://www.cnblogs.com/bushe/p/4003392.html
Copyright © 2011-2022 走看看