zoukankan html css js c++ java

Python使用content.encode("utf-8").decode("unicode-escape")导致中文乱码的解决方法

当想要把一个字符串中的u002F这样的字符串转成正常字符串时，如果字符串中存在中文字符，将导致中文被转成乱码。
例如：

content = "\u002F哈哈"
content = content.encode("utf-8").decode("utf-8") 
==> u002F哈哈  无法进行转码

如果使用.decode(“unicode-escape”)

content = "\u002F哈哈"
content = content.encode("utf-8").decode("unicode-escape")
==> /å“ˆå“ˆ   中文被转码导致乱码

解决方法是逐段解码，只对uxxxx这样的字符串进行unicode-escape解码，代码如下

import re
content = "\u002F哈哈"
content = re.sub(r'(\u[sS]{4})',lambda x:x.group(1).encode("utf-8").decode("unicode-escape"),content)
==> /哈哈

补充：自己

content = "u002F哈哈"
content.encode("utf-8").decode("unicode-escape")
print(content)
==> /哈哈

原文：https://blog.csdn.net/wang785994599/article/details/97653329

查看全文

相关阅读:
redis 订阅者与发布者（命令行）
CentOS 6 使用 tptables 打开关闭防火墙与端口
 CentOS 7 使用 firewalld 打开关闭防火墙与端口
 Python面向对象编程-OOP
python命名规则 PEP8编码规则(约定俗成)
python 装饰器概念
 python常用模块 os,datetime,time,MySQLdb,hashlib
python xml.etree.ElementTree 处理xml 文件变量流 xml概念
 Pycharm小技巧
 python概要笔记2