zoukankan      html  css  js  c++  java
  • python_scrapy_filespipe重写

    主要原因:需要下载文件并保留原有后缀名,但scrapy的下载管道没有这个选项,需要重新定义filespipelines功能,参考其他人的文件,

    import time
    from urllib import parse
    from scrapy.pipelines.files import FilesPipeline
    class FileRenamePipeline(FilesPipeline):
        def file_path(self, request, response=None, info=None):
            print('_'*100)
            timest = str(int(time.time()*1000))
            name = parse.unquote(parse.unquote(request.url).split(';')[1]).split('"')[1]
            if '.' in name:
                file_name = name.split('.')[0] + '_' + timest + '.' + name.split('.')[1]
            else:
                file_name = name + '_' + timest
            return 'full/' + file_name
        custom_settings = {
            'ITEM_PIPELINES':{
                'spider_dataPlat.pipelines.FileRenamePipeline':2,
                              },
            'FILES_STORE':'E:下载', # 文件下载路径
        }
            items = SpiderFileItem()
            items['file_urls'] = [final_url]
            items['files'] = name.split('.')[0]
            yield items
  • 相关阅读:
    菜根谭#39
    菜根谭#38
    菜根谭#37
    菜根谭#36
    菜根谭#35
    菜根谭#34
    菜根谭#33
    菜根谭#32
    mysqli的使用
    mysql常用修改创建语句
  • 原文地址:https://www.cnblogs.com/hejianlong/p/10237167.html
Copyright © 2011-2022 走看看