zoukankan      html  css  js  c++  java
  • Scrapy中将item字段转为简体or繁体

    1. 安装hanziconv
    安装一个简繁体转换的包:

    pip install hanziconv

    2. 自定义一个itempiples
    找到项目中的pipelines.py文件

    添加自定义的pipeline:

    from hanziconv import HanziConv
    
    class HanziconvPipeline(object):
    
    def process_item(self, item, spider):
    project_info = item['project_info']
    for key, value in project_info.items():
    if value is not None:
    if isinstance(value, unicode):
    value = HanziConv.toTraditional(str(value))
    print key, value
    project_info[key] = value
    else: # 不为中文不处理
    pass
    else: # value为None 初始化为空串
    project_info[key] = ""
    return item

    此代码为本人项目代码,判断value为unicode,则转换为繁体;

    若要将繁体转换为简体,请将toTraditional改为toSimplified。

    3. 配置项目pipeline
    找到settings.py中的ITEM_PIPELINES
    添加自定义的pipelines:

    ITEM_PIPELINES = {
    'scrapy_redis.pipelines.RedisPipeline': 400,
    '<project_name>.pipelines.HanziconvPipeline': 300
    }

    :warning: <project_name>需手动修改为自己的项目名称!

    转载于 https://blog.csdn.net/weixin_34082854/article/details/87429754

  • 相关阅读:
    ZOJ 2158 Truck History
    Knight Moves (zoj 1091 poj2243)BFS
    poj 1270 Following Orders
    poj 2935 Basic Wall Maze (BFS)
    Holedox Moving (zoj 1361 poj 1324)bfs
    ZOJ 1083 Frame Stacking
    zoj 2193 Window Pains
    hdu1412{A} + {B}
    hdu2031进制转换
    openjudge最长单词
  • 原文地址:https://www.cnblogs.com/hankleo/p/10507664.html
Copyright © 2011-2022 走看看