zoukankan      html  css  js  c++  java
  • python decode unicode encode

           字符串在Python内部的表示是unicode编码,因此,在做编码转换时,通常需要以unicode作为中间编码,即先将其他编码的字符串解码(decode)成unicode,再从unicode编码(encode)成另一种编码。

           代码中字符串的默认编码与代码文件本身的编码一致,以下是不一致的两种:

            1. s = u'你好'

                该字符串的编码就被指定为unicode了,即python的内部编码,而与代码文件本身的编码(查看默认编码:import sys   print('hello',sys.getdefaultencoding())  ascii 。设置默认编码:import sys reload(sys)  sys.setdefaultencoding('utf-8')))无关。因此,对于这种情况做编码转换,只需要直接使用encode方法将其转换成指定编码即可.

            2. # -*- coding: utf-8 -*-

                s = ‘你好’

                此时为utf-8编码,ascii编码不能显示汉字

    isinstance(s, unicode)  #用来判断是否为unicode ,是返回True,不是返回False

    unicode(str,'gb2312')与str.decode('gb2312')是一样的,都是将gb2312编码的str转为unicode编码 

    使用str.__class__可以查看str的编码形式

    原理说了半天,最后来个包治百病的吧:)


    #!/usr/bin/env python
    #coding=utf-8
    s="中文"

    if isinstance(s, unicode):
    #s=u"中文"
    print s.encode('gb2312')
    else:
    #s="中文"
    print s.decode('utf-8').encode('gb2312')

    语音模块代码:

    # -*- coding: utf-8 -*-import
    import sys
    print('hello',sys.getdefaultencoding())
    def xfs_frame_info(words):
    
        #decode utf-8 to python internal unicode coding
        isinstance(words,unicode)
        wordu = words.decode('utf-8')
    
        #encode python unicode to gbk
        data = wordu.encode('gbk')
        
        length = len(data) + 2
    
        frame_info = bytearray(5)
        frame_info[0] = 0xfd
        frame_info[1] = (length >> 8)
        frame_info[2] = (length & 0x00ff)
        frame_info[3] = 0x01
        frame_info[4] = 0x01
    
           
        buf = frame_info + data
        print("buf:",buf)
    
        return buf
    
    if __name__ == "__main__":
    
        print("hello world")
        words1= u'你好'
        #encodetype = isinstance(words1,unicode)
        #print("encodetype",encodetype)
        print("origin unicode", words1)
        
        words= words1.encode('utf-8')
        print("utf-8 encoded", words)
        a = xfs_frame_info(words)
        print('a',a)
    
    if __name__ == "__main__":
    
        print("hello world")
        words1= '你好'
        print("oringe utf-8 encode:",words1)
        encodetype = isinstance(words1,unicode)
        wordu = words1.decode('utf-8')
        print("unicode from utf-8 decode:",wordu)
        #encodetype = isinstance(words1,utf-8)
        #encodetype = isinstance(words1,'ascii')
        #print("encodetype",encodetype)
        #print("origin unicode", words1)
        
        word_utf8 = wordu.encode('utf-8')
        #encodetype2 = isinstance(words,utf8)
        #print("encodetype2",encodetype2)
        print("utf-8 encoded",word_utf8)
        a = xfs_frame_info(word_utf8)
        print('a',a)

    你好前不加u''时,要多一步decode为unicode

  • 相关阅读:
    C#操作REDIS例子
    A C# Framework for Interprocess Synchronization and Communication
    UTF8 GBK UTF8 GB2312 之间的区别和关系
    开源项目选型问题
    Mysql命令大全——入门经典
    RAM, SDRAM ,ROM, NAND FLASH, NOR FLASH 详解(引用)
    zabbix邮件报警通过脚本来发送邮件
    centos启动提示unexpected inconsistency RUN fsck MANUALLY
    rm 或者ls 报Argument list too long
    初遇Citymaker (六)
  • 原文地址:https://www.cnblogs.com/cj2014/p/4236114.html
Copyright © 2011-2022 走看看