zoukankan      html  css  js  c++  java
  • python字符串问题

    相关知识点:
    字符串在Python内部的表示是unicode编码,因此,在做编码转换时,通常需要以unicode作为中间编码,即先将其他编码的字符串解码(decode)成unicode,再从unicode编码(encode)成另一种编码。 
    decode的作用是将其他编码的字符串转换成unicode编码,如str1.decode('gb2312'),表示将gb2312编码的字符串str1转换成unicode编码。 
    encode的作用是将unicode编码转换成其他编码的字符串,如str2.encode('gb2312'),表示将unicode编码的字符串str2转换成gb2312编码。 
    因此,转码的时候一定要先搞明白,字符串str是什么编码,然后decode成unicode,然后再encode成其他编码
    
    问题:
    Traceback (most recent call last):
      File "do_subprocess.py", line 17, in <module>
        print(output.decode('utf-8'))
    UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc8 in position 2: invalid continuation byte
    
    
    原因:(相似问题)
    In binary, 0xE9 looks like 1110 1001. If you read about UTF-8 on Wikipedia, you’ll see that such a byte must be followed by two of the form 10xx xxxx. So, for example:
    >>> b'xe9x80x80'.decode('utf-8')
    u'u9000'
    But that’s just the mechanical cause of the exception. In this case, you have a string that is almost certainly encoded in latin 1. You can see how UTF-8 and latin 1 look different:
    >>> u'xe9'.encode('utf-8')
    b'xc3xa9'
    >>> u'xe9'.encode('latin-1')
    b'xe9'
    (Note, I'm using a mix of Python 2 and 3 representation here. The input is valid in any version of Python, but your Python interpreter is unlikely to actually show both unicode and byte strings in this way.)
    
    解决方法:
    将utf-8改为gbk
    print(output.decode('gbk'))
    

    参考CSDN
    参考stack overflow

  • 相关阅读:
    Linux 文件权限
    Linux 查看磁盘使用情况
    绑定到外部验证服务LDAP、配置 autofs
    创建逻辑卷
    查找一个字符串
    查找用户目录下的指定文件
    配置NTP时间服务器
    通过Roslyn构建自己的C#脚本(更新版)(转)
    Elon Musk
    可能改变世界的13个“终结”(上)
  • 原文地址:https://www.cnblogs.com/irockcode/p/7889548.html
Copyright © 2011-2022 走看看