zoukankan html css js c++ java

scrapy_图片下载

需要安装第三方库：

安装 pillow库

pip install -i https://pypi.doubanio.com/simple pillow

如何对图片进行自动下载？

首先明白，图片去哪下？图片如何下？保存到哪？

setting：

ITEM_PIPELINES = {
   'ArticleSpider.pipelines.JobbolePipeline': 2,
　　# 注册scrapy自带的下载器，后面带的值越小越先执行
   'scrapy.pipelines.images.ImageHandle': 1
}
# 指定获取图片url的字段名称
IMAGES_URLS_FIELD = "img_url"
# 指定图片的下载路径，同级目录的images目录下
project_dir = os.path.abspath(os.path.dirname(__file__))
IMAGES_STORE = os.path.join(project_dir, 'images')

如何获得图片的路径并保存？

在pipelines.py中定义一个处理图片的类，继承scrapy中处理的图片的类

from scrapy.pipelines.images import ImagesPipeline


class JobboleImagerPipeline(ImagesPipeline):
    """
    获得图片下载路径
    """
    def item_completed(self, results, item, info):
　　　　　# 如果解析到图片的url，添加图片url字段
        if 'img_url' in item:
            for key, value in results:
                # print(key)
                img_path = value['path']
                # print(value['path'])
                item['img_path'] = img_path
        return item

查看全文

相关阅读:
hdu 1241 Oil Deposits(dfs入门）
hdu 1022 Train Problem I（栈）
DFS中的奇偶剪枝（转自chyshnu）
ural 1821. Biathlon
hdu 1237 简单计算器(栈）
hdu 1010 Tempter of the Bone(dfs+奇偶剪枝）
1119. Metro(动态规划，滚动数组）
hdu 1312 Red and Black(dfs入门)
C#匿名委托和匿名方法使用小技巧
 ubuntu下netbeans乱码问题解决

原文地址：https://www.cnblogs.com/2bjiujiu/p/7237441.html