zoukankan      html  css  js  c++  java
  • Python语言总结 4.2. 和字符串(str,unicode等)处理有关的函数

    4.2.7. 去除控制字符:removeCtlChr

    Python语言总结
    4.2. 和字符串(str,unicode等)处理有关的函数
    Sidebar     Prev | Up | Next
    4.2.7. 去除控制字符:removeCtlChr

    使得处理后的字符串,在XML都是合法的了。

    #------------------------------------------------------------------------------
    # remove control character from input string
    # otherwise will cause wordpress importer import failed
    # for wordpress importer, if contains contrl char, will fail to import wxr
    # eg:
    # 1. http://againinput4.blog.163.com/blog/static/172799491201110111145259/
    # content contains some invalid ascii control chars
    # 2. http://hi.baidu.com/notebookrelated/blog/item/8bd88e351d449789a71e12c2.html
    # 165th comment contains invalid control char: ETX
    # 3. http://green-waste.blog.163.com/blog/static/32677678200879111913911/
    # title contains control char:DC1, BS, DLE, DLE, DLE, DC1
    def removeCtlChr(inputString) :
        validContent = '';
        for c in inputString :
            asciiVal = ord(c);
            validChrList = [
                9, # 9= =tab
                10, # 10= =LF=Line Feed=换行
                13, # 13= =CR=回车
            ];
            # filter out others ASCII control character, and DEL=delete
            isValidChr = True;
            if (asciiVal == 0x7F) :
                isValidChr = False;
            elif ((asciiVal < 32) and (asciiVal not in validChrList)) :
                isValidChr = False;
           
            if(isValidChr) :
                validContent += c;

        return validContent;
           

    Example 4.11. removeCtlChr的使用范例

    # remove the control char in title:
    # eg;
    # http://green-waste.blog.163.com/blog/static/32677678200879111913911/
    # title contains control char:DC1, BS, DLE, DLE, DLE, DC1
    infoDict['title'] = removeCtlChr(infoDict['title']);
               

    [Tip]     关于控制字符

    如果不了解什么是控制字符,请参考:ASCII字符集中的功能/控制字符
    Prev      Up      Next
    4.2.6. 去除非单词(non-word)的字符:removeNonWordChar      Home      4.2.8. 将字符实体替换为Unicode数字实体:replaceStrEntToNumEnt

        Contents
        Search

    loading table of contents...
    Search
     

    Search Highlighter (On/Off)
  • 相关阅读:
    实习小白::(转) Cocos2d-x 3.0开发(五)关联程序逻辑与cocoStudio导出文件
    实习小白::(转)Cocos2d-x 3.0开发(六)使用cocoStudio创建一个骨骼动画
    实习小白::(转) Cocos2d-x 3.0 开发(七)在程序中处理cocoStudio导出动画
    实习小白::(转) cocos2d-x使用cocosStudio编辑的动画文件
    (转)cocos2d-x 每帧动画的播放设置一个监听函数的做法
    Filter
    使用Cookie记住用户名和密码
    动态规划
    热分布
    背包问题
  • 原文地址:https://www.cnblogs.com/lexus/p/3323632.html
Copyright © 2011-2022 走看看