1. 在项目下创建一个images文件用于存放图片
2. 载图片相关模块
1 pip install pillow
3.修改配置文件,激活pipelines
ITEM_PIPELINES = { 'ArticleSpider.pipelines.ArticlespiderPipeline': 300, 'scrapy.pipelines.images.ArticleImagePipeline': 1, } IMAGES_URLS_FIELD = 'front_image_url' import os PROJECT_DIR = os.path.abspath(os.path.dirname(__file__)) IMAGES_STORE = os.path.join(PROJECT_DIR, 'images') # 过滤图片大小 IMAGES_MIN_HEIGHT = 100 IMAGES_MIN_WIDTH = 100
注意
images默认被当做列表处理,所以item在赋值时,要使用 [ image_name ] 接收
4.在保存图片的时候我们可以顺便保存图片路径,以便下次获取
# 自定制一个pipelines # 添加一个class from scrapy.pipelines.images import ImagesPipeline class ArticleImagePipeline(ImagesPipeline): def item_completed(self, results, item, info): for res, value in results: image_path = value['path'] item['front_image_path'] = image_path # 处理完成路径需要将item返回,因为在settings中,配置的了优先级,该pipelines可以将items继续传递给下一个pipelines中 return item