zoukankan      html  css  js  c++  java
  • 使用python转换编码格式

    之前有写过一个使用powershell转换文档格式的方法,然而因为powershell支持不是很全,所以并不好用。这里使用python再做一个。

    思路

    检测源码格式,如果不是utf8,则进行转换,否则跳过

    代码

    import chardet
    import sys
    import codecs
    
    
    def findEncoding(s):
        file = open(s, mode='rb')
        buf = file.read()
        result = chardet.detect(buf)
        file.close()
        return result['encoding']
    
    
    def convertEncoding(s):
        encoding = findEncoding(s)
        if encoding != 'utf-8' and encoding != 'ascii':
            print("convert %s %s to utf-8" % (s, encoding))
            contents = ''
            with codecs.open(s, "r", encoding) as sourceFile:
                contents = sourceFile.read()
    
            with codecs.open(s, "w", "utf-8") as targetFile:
                targetFile.write(contents)
    
        else:
            print("%s encoding is %s ,there is no need to convert" % (s, encoding))
    
    
    if __name__ == "__main__":
        if len(sys.argv) != 2:
            print("error filename")
        else:
            convertEncoding(sys.argv[1])
    

    实际测试,可以成功转换。

    知识点

    1. chardet,这个模块是用来检测编码格式的。检测完成之后返回一个dict类型。dict的key又两个,一个是encode,一个是confidence,参数函数顾名思义。
    2. with as 这个语法很好用,特别是在打开文件的时候,可以处理忘记关闭文件导致文件一直被占用等异常。

    批量转换

    import chardet
    import sys
    import codecs
    import os
    
    
    def findEncoding(s):
        file = open(s, mode='rb')
        buf = file.read()
        result = chardet.detect(buf)
        file.close()
        return result['encoding']
    
    
    def convertEncoding(s):
        if  os.access(s,os.W_OK):
            encoding = findEncoding(s)
            if encoding != 'utf-8' and encoding != 'ascii':
                print("convert %s %s to utf-8" % (s, encoding))
                contents = ''
                with codecs.open(s, "r", encoding) as sourceFile:
                    contents = sourceFile.read()
    
                with codecs.open(s, "w", "utf-8") as targetFile:
                    targetFile.write(contents)
    
            else:
                print("%s encoding is %s ,there is no need to convert" % (s, encoding))
        else:
            print("%s read only" %s)
    
    
    def getAllFile(path, suffix='.'):
        "recursive is enable"
        f = os.walk(path)
        fpath = []
    
        for root, dir, fname in f:
            for name in fname:
                if name.endswith(suffix):
                    fpath.append(os.path.join(root, name))
    
        return fpath
    
    
    def convertAll(path):
        fclist = getAllFile(path, ".c")
        fhlist = getAllFile(path, ".h")
        flist = fclist + fhlist
        for fname in flist:
            convertEncoding(fname)
    
    
    if __name__ == "__main__":
        path = ''
        if len(sys.argv) == 1:
            path = os.getcwd()
    
        elif len(sys.argv) == 2:
            path = sys.argv[1]
        else:
            print("error parameter")
            exit()
    
        convertAll(path)
    

    可以指定目录,也可以在当前目录下用,递归遍历。

    知识点

    1. os.walk,遍历所有文件
    2. os.access,检查文件属性
  • 相关阅读:
    metal的gpu query
    体积雾 global fog unity 及改进
    hdr rt format对颜色的影响
    unity deferred lighting
    unity linear space时 photoshop blend的正确设置
    unity linear work flow
    一些数据 bandwidth之类
    deferred rendering with msaa
    unity 显示mipmaplevel
    【转】在C#中使用SendMessage
  • 原文地址:https://www.cnblogs.com/WeyneChen/p/6670588.html
Copyright © 2011-2022 走看看