decode 解码
encode 转码
unicode是一种编码,具体可以百度搜
# coding: UTF-8 u = u'汉' print repr(u) # u'u6c49' s = u.encode('UTF-8') print repr(s) # 'xe6xb1x89' u2 = s.decode('UTF-8') print repr(u2) # u'u6c49' # 对unicode进行解码是错误的 # s2 = u.decode('UTF-8') # 同样,对str进行编码也是错误的 # u2 = s.encode('UTF-8')
s = u.encode('UTF-8') 是把u转码成utf-8
u2 = s.decode('UTF-8')是把u解码成utf-8
如果是windows下编码一般是gbk,所以解码时候要用 u.decode('gbk'),如下
>>> u='格式' >>> u.decode('gbk') u'u683cu5f0f' >>> u.decode('utf-8') Traceback (most recent call last): File "<pyshell#111>", line 1, in <module> u.decode('utf-8') File "C:Python27libencodingsutf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeDecodeError: 'utf8' codec can't decode byte 0xb8 in position 0: invalid start byte >>>