Python pymongo 中文乱码问题 - 走看看

zoukankan html css js c++ java

Python pymongo 中文乱码问题

原文地址：
http://windkeepblow.blog.163.com/blog/static/1914883312013988185783/

如题，其实我的问题很简单，就是在写爬虫的时候拿到网页的信息包含类似“u65b0u6d6au5faeu535au6ce8u518c”的字符串，实际上这是unicode的中文编码，对应的中文为“新浪微博注册”。其实我就是想找一个函数让这一串东西显示中文而已，没想到百度了白天找到合适的。遇到这种问题千万不要用什么 “python编码” “unicode中文编码” “unicode解码”这样的关键字去搜，一大堆网页出来毫不相关。
其实这个问题一个函数搞定，如下：
Example 1:
>>> s = r"u65b0u6d6au5faeu535au6ce8u518c"
>>> s
'\u65b0\u6d6a\u5fae\u535a\u6ce8\u518c'
>>> print s
u65b0u6d6au5faeu535au6ce8u518c
>>> s = s.decode("unicode_escape"); #就是这个函数
>>> print s
新浪微博注册

Example 2:
>>> str_ = "Russophoxe9bic, clichd and just plxe9ain stupid."
>>> print str_
Russopho?bic, clichd and just pl?ain stupid.
>>> str_ = str_.decode("unicode_escape")
>>> print str_
Russophoébic, clichd and just pléain stupid.
(这个方法解决了我在插入数据到mongodb时遇到的“bson.errors.InvalidStringData: strings in documents must be valid UTF-8”问题)

附上关于这个问题的相关博客链接：http://www.cnblogs.com/yangze/archive/2010/11/16/1878469.html

查看全文

相关阅读:
如何诊断RAC数据库上的“IPC Send timeout”问题？
ORA-1157处理过程
 ORA-1157 Troubleshooting
SQL优化案例（执行计划固定）
数据库io层面故障
 sql优化案例（索引创建不合理）
SQL优化案例(union问题)
Redis在Windows下的安装与使用
 npm使用淘宝镜像
 基于compose单机部署 etcd + coredns

原文地址：https://www.cnblogs.com/xibuhaohao/p/12101985.html

Copyright © 2011-2022 走看看