zoukankan      html  css  js  c++  java
  • Scracpy爬取图片实例

    非常简单,直接上爬虫代码

    # -*- coding: utf-8 -*-
    import scrapy
    import urllib
    import logging
    
    class TopitComSpider(scrapy.Spider):
        name = "topit.com"
        allowed_domains = ["topit.com"]
        start_urls = [
            'http://www.topit.me',
        ]
        def parse(self, response):
            counter = 0
            image_urls1=response.xpath("//div[@class='catalog']/div[@class='e m'][position()<=8]/a/img/@src").extract()
            image_urls2=response.xpath("//div[@class='catalog']/div[@class='e m'][position()>8]/a/img/@data-original").extract()
            image_urls = image_urls1 + image_urls2
            for url in image_urls:
                urllib.urlretrieve(url, "/root/pic/"+str(counter)+'.jpg')
                logging.debug(url)
                counter=counter+1
            pass

    遗留问题:

    在用xpath匹配的时候用or将两个表达式连接起来匹配不到,只好分开匹配,再把结果合并。原因不明,有知道的朋友还请告知,谢谢

  • 相关阅读:
    algorithm,ds,Skew heap
    python,file,os,shutil
    python,pickle,serialize
    algorithm,ds,leftist heap
    shell,chapter 2
    shell,chapter 1
    shell,Introduction to shell
    signal and slot in pyside
    python,cron,gae,bae
    python,mysql,sql,mysqldb
  • 原文地址:https://www.cnblogs.com/gordon0918/p/6531861.html
Copyright © 2011-2022 走看看