zoukankan html css js c++ java

python decode unicode encode

字符串在Python内部的表示是unicode编码，因此，在做编码转换时，通常需要以unicode作为中间编码，即先将其他编码的字符串解码（decode）成unicode，再从unicode编码（encode）成另一种编码。

代码中字符串的默认编码与代码文件本身的编码一致，以下是不一致的两种:

1. s = u'你好'

该字符串的编码就被指定为unicode了，即python的内部编码，而与代码文件本身的编码(查看默认编码：import sys print('hello',sys.getdefaultencoding()) ascii 。设置默认编码：import sys reload(sys) sys.setdefaultencoding('utf-8')))无关。因此，对于这种情况做编码转换，只需要直接使用encode方法将其转换成指定编码即可.

2. # -*- coding: utf-8 -*-

s = ‘你好’

此时为utf-8编码，ascii编码不能显示汉字

isinstance(s, unicode) #用来判断是否为unicode ,是返回True，不是返回False

unicode(str,'gb2312')与str.decode('gb2312')是一样的，都是将gb2312编码的str转为unicode编码

使用str.__class__可以查看str的编码形式

原理说了半天，最后来个包治百病的吧：）

#!/usr/bin/env python
#coding=utf-8
s="中文"

if isinstance(s, unicode):
#s=u"中文"
print s.encode('gb2312')
else:
#s="中文"
print s.decode('utf-8').encode('gb2312')

语音模块代码：

# -*- coding: utf-8 -*-import
import sys
print('hello',sys.getdefaultencoding())
def xfs_frame_info(words):

    #decode utf-8 to python internal unicode coding
    isinstance(words,unicode)
    wordu = words.decode('utf-8')

    #encode python unicode to gbk
    data = wordu.encode('gbk')
    
    length = len(data) + 2

    frame_info = bytearray(5)
    frame_info[0] = 0xfd
    frame_info[1] = (length >> 8)
    frame_info[2] = (length & 0x00ff)
    frame_info[3] = 0x01
    frame_info[4] = 0x01

       
    buf = frame_info + data
    print("buf:",buf)

    return buf

if __name__ == "__main__":

    print("hello world")
    words1= u'你好'
    #encodetype = isinstance(words1,unicode)
    #print("encodetype",encodetype)
    print("origin unicode", words1)
    
    words= words1.encode('utf-8')
    print("utf-8 encoded", words)
    a = xfs_frame_info(words)
    print('a',a)

if __name__ == "__main__":

    print("hello world")
    words1= '你好'
    print("oringe utf-8 encode:",words1)
    encodetype = isinstance(words1,unicode)
    wordu = words1.decode('utf-8')
    print("unicode from utf-8 decode:",wordu)
    #encodetype = isinstance(words1,utf-8)
    #encodetype = isinstance(words1,'ascii')
    #print("encodetype",encodetype)
    #print("origin unicode", words1)
    
    word_utf8 = wordu.encode('utf-8')
    #encodetype2 = isinstance(words,utf8)
    #print("encodetype2",encodetype2)
    print("utf-8 encoded",word_utf8)
    a = xfs_frame_info(word_utf8)
    print('a',a)

你好前不加u''时，要多一步decode为unicode

查看全文

相关阅读:
C#操作REDIS例子
 A C# Framework for Interprocess Synchronization and Communication
UTF8 GBK UTF8 GB2312 之间的区别和关系
 开源项目选型问题
 Mysql命令大全——入门经典
 RAM, SDRAM ,ROM, NAND FLASH, NOR FLASH 详解（引用）
zabbix邮件报警通过脚本来发送邮件
 centos启动提示unexpected inconsistency RUN fsck MANUALLY
rm 或者ls 报Argument list too long
初遇Citymaker (六)

原文地址：https://www.cnblogs.com/cj2014/p/4236114.html