zoukankan      html  css  js  c++  java
  • Scrapy爬取某装修网站部分装修效果图

    爬取图片资源

    spider文件
    from scrapy.linkextractors import LinkExtractor
    from scrapy.spiders import CrawlSpider, Rule
    import re
    import time
    from ..items import ZhuangxiuItem
    
    class ZhuangxiuspiderSpider(CrawlSpider):
        name = 'zhuangxiuSpider'
        allowed_domains = ['www.zhuangyi.com']
        start_urls = ['http://www.zhuangyi.com/xiaoguotu/keting/p1/']
    
        rules = (
            # 提取详情页信息 callback 回调函数, 将相应交给这个函数来处理
            # 第二步:分类主页的下一页
            # Rule(LinkExtractor(allow=r'(.*?)/pd+'), follow=True),
            # 第三步: 详情页面
            Rule(LinkExtractor(allow=r'(.*?)d+.html'), follow=True, callback='parse_item'),
        )
    
        def parse_item(self, response):
            img_url_list = re.findall(r'http://pic.zhuangyi.com/Member/d/d+/./d+.jpg', response.text)
            item = ZhuangxiuItem()
            item['image_urls'] = img_url_list
            item['title'] = time.time()
            yield item
    
    items.py 中
    
    
    import scrapy
    
    
    class ZhuangxiuItem(scrapy.Item):
        # define the fields for your item here like:
        title = scrapy.Field()
        image_urls = scrapy.Field()
    
    settings
    
    DEFAULT_REQUEST_HEADERS = {
      'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
      'Accept-Language': 'en',
      'Referer': 'http://www.zhuangyi.com/'
    }
    
    
    IMAGES_STORE = 'img'
    ITEM_PIPELINES = {
       'scrapy.pipelines.images.ImagesPipeline': 300,
    }
    
  • 相关阅读:
    jmeter怎么衡量tps的值
    QPS、TPS、并发用户数、吞吐量关系
    PPAPI插件开发指南
    WebRTC手记之WebRtcVideoEngine2模块
    WebRTC手记Channel概念
    WebRTC手记之本地音频采集
    WebRTC手记之本地视频采集
    WebRTC手记之框架与接口
    WebRTC手记之初探
    Chromium的GPU进程启动流程
  • 原文地址:https://www.cnblogs.com/wangyue0925/p/11248709.html
Copyright © 2011-2022 走看看