python 中文编码处理方法 - 走看看

zoukankan html css js c++ java

python 中文编码处理方法

decode early, unicode everywhere, encode late

1.在输入或者声明字符串的时候，尽早地使用decode方法将字符串转化成unicode编码格式；

2.然后在程序内使用字符串的时候统一使用unicode格式进行处理，比如字符串拼接、字符串替换、获取字符串的长度等操作；

3.最后，在输出字符串的时候（控制台/网页/文件），通过encode方法将字符串转化为你所想要的编码格式，比如utf-8等。

输入（str，utf-8）--decode--> 操作unicode --encode--> 输出(str, utf-8)

#-*-coding:gb2312#-*-

__author__='Administrator'

importchardet

#变量若声明前面不加u则在python2.x中采用ascii字符集，类型为str

ss1="我是中文"

printtype(ss1)#<type'str'>

printchardet.detect(ss1)#{'confidence':0.99,'encoding':'GB2312'}

#前面加u，采用Unicode字符集，类型为Unicode

ss2=u'中文'

printtype(ss2)#<type'unicode'>

#调用decode函数，将str转换成unicode

ss3=ss1.decode('gb2312')

printtype(ss3)#<type'unicode'>

#调用encode，将unicode转换成str

ss4=ss3.encode('utf8')

printtype(ss4)#<type'str'>

printchardet.detect(ss4)#{'confidence':0.938125,'encoding':'utf-8'}

查看全文

相关阅读:
听较强节奏的歌曲，能让你更长时间投入到学习中
 小康网站维护笔记
 apache基础
 python安装多版本
 apache负载调优
 docker 进阶
 openstack 网络更改版
 linux 搭建testlink的问题总结
 26. Remove Duplicates from Sorted Array（LeetCode）
Add to List 172. Factorial Trailing Zeroes（LeetCode）

原文地址：https://www.cnblogs.com/alert123/p/4936768.html

Copyright © 2011-2022 走看看