zoukankan      html  css  js  c++  java
  • Scrapy 如何控制导出顺序

    Scrapy 如何控制导出顺序

    1. 遇到的问题

    在用Scrapy到处item的时候,发现顺序错乱(应该是按照abc的顺序排列的),并不是items.py文件中定义的顺序,那么如何控制呢?

    2. fields_to_export

    我在查看官网文档的时候找到了这个属性,它的解释是这样的:

    fields_to_export

    A list with the name of the fields that will be exported, or None if you want to export all fields. Defaults to None.

    Some exporters (like CsvItemExporter) respect the order of the fields defined in this attribute.

    When using item objects that do not expose all their possible fields, exporters that do not support exporting a different subset of fields per item will only export the fields found in the first item exported. Use fields_to_export to define all the fields to be exported.

    大致意思是:这个列表(它是一个列表)可以控制导出的字段个数,但是在一些导出器像CsvItemExporter可以控制导出字段的顺序

    所以:只需要在使用Exporter的时候,传一个fields_to_export的参数,就可以控制导出字段的个数/顺序

    3. 示例

    pipelines.py

    from scrapy.exporters import JsonLinesItemExporter, CsvItemExporter
    from itemadapter import ItemAdapter
    
    fields_to_export = ['city_name', 'house_addr', 'house_class', 'house_size', 'house_facility', 'house_price',
                        'house_release_time']
    
    
    class JsonLinesItemPipeline:
    
        def __init__(self):
            self.file = open('storages/renting.jl', 'wb')
            self.exporter = JsonLinesItemExporter(self.file, encoding='utf-8', fields_to_export=fields_to_export)
    
        def open_spider(self, spider):
            pass
    
        def process_item(self, item, spider):
            self.exporter.export_item(item)
            return item
    
        def close_spider(self, spider):
            self.file.close()
    
    
    class CsvItemPipeline:
    
        def __init__(self):
            self.file = open('storages/renting.csv', 'wb')
            self.exporter = CsvItemExporter(self.file, fields_to_export=fields_to_export)
    
        def open_spider(self, spider):
            pass
    
        def process_item(self, item, spider):
            self.exporter.export_item(item)
            return item
    
        def close_spider(self, spider):
            self.file.close()
    

    参考:

  • 相关阅读:
    打印机连接向导
    字符串替换
    登入脚本统一公司桌面
    判断文件是否存在
    DOS系统变量
    修改文件访问权限
    【CF1017C】The Phone Number(构造)
    【CF1017B】The Bits(模拟)
    【CF1017A】The Rank(签到)
    【CF1016B】Segment Occurrences(模拟)
  • 原文地址:https://www.cnblogs.com/pineapple-py/p/14613390.html
Copyright © 2011-2022 走看看