zoukankan      html  css  js  c++  java
  • scrapy crawl xmlfeed spider

    from scrapy.spiders import XMLFeedSpider
    from myxml.items import MyxmlItem
    
    class XmlspiderSpider(XMLFeedSpider):
        name = 'xmlspider'
        allowed_domains = ['sina.com.cn']
        start_urls = ['http://blog.sina.com.cn/rss/1165656262.xml']
        iterator = 'iternodes' # you can change this; see the docs
        itertag = 'rss' # change it accordingly
    
        def parse_node(self, response, selector):
            i =MyxmlItem()
            i['title'] = selector.xpath('/rss/channel/item/title/text()').extract()
            #i['url'] = selector.select('url').extract()
            #i['name'] = selector.select('name').extract()
            #i['description'] = selector.select('description').extract()
            for j in range(len(i['title'])):
                print(i['title'][j])
            return i
  • 相关阅读:
    bzoj 5455
    hdu 6705
    hdu 6706
    斜率优化
    bzoj3672
    bzoj1367
    bzoj2118
    bzoj2337
    Codeforces 1077D Cutting Out(二分答案)
    Codeforces 1079C Playing Piano(记忆化搜索)
  • 原文地址:https://www.cnblogs.com/Erick-L/p/6835510.html
Copyright © 2011-2022 走看看