zoukankan      html  css  js  c++  java
  • python+scrapy爬取亚马逊手机商品

     1 # -*- coding: utf-8 -*-
     2 
     3 # Define here the models for your scraped items
     4 #
     5 # See documentation in:
     6 # http://doc.scrapy.org/en/latest/topics/items.html
     7 
     8 import scrapy
     9 
    10 
    11 class AmazonItem(scrapy.Item):
    12     # define the fields for your item here like:
    13     # name = scrapy.Field()
    14     description=scrapy.Field()
    15     price=scrapy.Field()
    16     url=scrapy.Field()
    17     value=scrapy.Field()
     1 #!/usr/bin/python
     2 
     3 import scrapy
     4 class AmazonSpider(scrapy.Spider):
     5     name='amazon'
     6     allowd_domains=['amazon.cn']
     7     start_urls=['http://www.amazon.cn/s/ref=sv_cps_0?ie=UTF8&node=665002051&page=1']
     8     def parse(self,response):
     9         try:
    10             page=response.xpath("//span[@class='pagnDisabled']/text()").extract()[0]
    11         except:
    12             pass
    13         for item in response.xpath("//li[@class='s-result-item']"):
    14             title=item.xpath("./div/div[2]/div/a/h2/text()").extract()[0]
    15             price=item.xpath("./div/div[3]/div[1]/a/span[1]/text()").extract()[0]
    16             url=item.xpath("./div/div[1]/div/div/a[1]/@href").extract()[0]
    17             print title
    18             print price
    19             print url
    20         for i in range(int(page)):
    21             request=scrapy.Request('http://www.amazon.cn/s/ref=sv_cps_0?ie=UTF8&node=665002051&page='+str(i),callback=self.parse)
    22             yield request
    23             
  • 相关阅读:
    The nineteenth day
    The eighteen day
    弱读下
    弱读上
    失爆 爆破音
    连读
    The seventeenth day
    The sixteenth day
    React 官方脚手架 create-react-app快速生成新项目
    pc端引入微信公众号文章
  • 原文地址:https://www.cnblogs.com/tmyyss/p/4554173.html
Copyright © 2011-2022 走看看