zoukankan html css js c++ java

python字符串问题

相关知识点:
字符串在Python内部的表示是unicode编码，因此，在做编码转换时，通常需要以unicode作为中间编码，即先将其他编码的字符串解码（decode）成unicode，再从unicode编码（encode）成另一种编码。 
decode的作用是将其他编码的字符串转换成unicode编码，如str1.decode('gb2312')，表示将gb2312编码的字符串str1转换成unicode编码。 
encode的作用是将unicode编码转换成其他编码的字符串，如str2.encode('gb2312')，表示将unicode编码的字符串str2转换成gb2312编码。 
因此，转码的时候一定要先搞明白，字符串str是什么编码，然后decode成unicode，然后再encode成其他编码

问题：
Traceback (most recent call last):
  File "do_subprocess.py", line 17, in <module>
    print(output.decode('utf-8'))
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc8 in position 2: invalid continuation byte


原因：(相似问题)
In binary, 0xE9 looks like 1110 1001. If you read about UTF-8 on Wikipedia, you’ll see that such a byte must be followed by two of the form 10xx xxxx. So, for example:
>>> b'xe9x80x80'.decode('utf-8')
u'u9000'
But that’s just the mechanical cause of the exception. In this case, you have a string that is almost certainly encoded in latin 1. You can see how UTF-8 and latin 1 look different:
>>> u'xe9'.encode('utf-8')
b'xc3xa9'
>>> u'xe9'.encode('latin-1')
b'xe9'
(Note, I'm using a mix of Python 2 and 3 representation here. The input is valid in any version of Python, but your Python interpreter is unlikely to actually show both unicode and byte strings in this way.)

解决方法：
将utf-8改为gbk
print(output.decode('gbk'))

参考CSDN
参考stack overflow

查看全文

相关阅读:
阿里云CentOS 7无外网IP的ECS访问外网（配置网关服务器）
CentOS 7配置成网关服务器
 Mac/Ubuntu下的数据建模工具PDMan，替代PowerDesigner
Docker卸载高版本重装低版本后启动提示：driver not supported
Redis连接出现Error: Connection reset by peer的问题是由于使用Redis的安全模式
 Mac流量监控/硬盘监控小工具
 CentOS 7创建自定义KVM模板（现有KVM迁移到另外一台机）
vi显示行号
 阿里云与微软云的对照表
 CentOS下安装Jenkins（Docker/war/tomcat/java -jar）

原文地址：https://www.cnblogs.com/irockcode/p/7889548.html