zoukankan      html  css  js  c++  java
  • Scrapy学习-4-Items类&Pipelines类

    items类使用
    作用
      能使得我们非常方便的操作字段名

    在items.py中定制我们的类
    class ArticleItem(scrapy.Item):
    
        title = scrapy.Field()
    
        create_time = scrapy.Field()
    
        url = scrapy.Field()
    
        url_id = scrapy.Field()
    
        front_image_url = scrapy.Field()
    
        front_image_path = scrapy.Field()
    
        praise_nums = scrapy.Field()
    
        comment_nums = scrapy.Field()
    
        fav_nums = scrapy.Field()
    
        tags = scrapy.Field()
    
        content = scrapy.Field()
     
    在spider项目中导入ArticleItem
    def parse(self, response):
    
        article_item = ArticleItem()
    
        article_item['title'] = title
    
        article_item['create_time'] = create_time
    
        article_item['url'] = url
    
        import hashlib
    
        m = hashlib.md5()
    
        m.update(url)
    
        article_item['url_id'] = m.hexdigest()
    
        article_item['praise_nums'] = praise_nums
    
        article_item['comment_nums'] = comment_nums
    
        article_item['fav_nums'] = fav_nums
    
        article_item['tags'] = tags
    
        article_item['front_image_url'] = front_image_url
    
        article_item['content'] = content
    
        yield article_item

    Pipelines类

    步骤  
      在parse中使用items做值填充,并传递到pipelines做数据处理
     
    默认类
    class ArticlespiderPipeline(object):
        def process_item(self, item, spider):
            return item
     
     


  • 相关阅读:
    左偏树
    “今日头条杯”首届湖北省大学程序设计竞赛现场赛
    最短路
    BP神经网络算法改进
    图论学习路线
    差分演化算法
    51Nod 1413 权势二进制
    51Nod 1315 合法整数集
    Treap(树堆)
    今日头条杯2018湖北省大学生程序设计竞赛(网络赛)
  • 原文地址:https://www.cnblogs.com/cq146637/p/9053187.html
Copyright © 2011-2022 走看看