zoukankan      html  css  js  c++  java
  • Scrapy中将item字段转为简体or繁体

    1. 安装hanziconv
    安装一个简繁体转换的包:

    pip install hanziconv

    2. 自定义一个itempiples
    找到项目中的pipelines.py文件

    添加自定义的pipeline:

    from hanziconv import HanziConv
    
    class HanziconvPipeline(object):
    
    def process_item(self, item, spider):
    project_info = item['project_info']
    for key, value in project_info.items():
    if value is not None:
    if isinstance(value, unicode):
    value = HanziConv.toTraditional(str(value))
    print key, value
    project_info[key] = value
    else: # 不为中文不处理
    pass
    else: # value为None 初始化为空串
    project_info[key] = ""
    return item

    此代码为本人项目代码,判断value为unicode,则转换为繁体;

    若要将繁体转换为简体,请将toTraditional改为toSimplified。

    3. 配置项目pipeline
    找到settings.py中的ITEM_PIPELINES
    添加自定义的pipelines:

    ITEM_PIPELINES = {
    'scrapy_redis.pipelines.RedisPipeline': 400,
    '<project_name>.pipelines.HanziconvPipeline': 300
    }

    :warning: <project_name>需手动修改为自己的项目名称!

    转载于 https://blog.csdn.net/weixin_34082854/article/details/87429754

  • 相关阅读:
    day14: 生成器进阶
    day13: 迭代器和生成器
    day12:装饰器的进阶
    day11:装饰器
    day10:函数进阶
    English class 81:How Vulnerability can make our lives better?
    piano class 12
    UE4之循环
    UE4之数组
    UE4中常见的类
  • 原文地址:https://www.cnblogs.com/hankleo/p/10507664.html
Copyright © 2011-2022 走看看