zoukankan html css js c++ java

python爬虫--编码问题y

1)中文网站爬取下来的内容中文显示乱码

Python中文乱码是由于Python在解析网页时默认用Unicode去解析，而大多数网站是utf-8格式的，并且解析出来之后，python竟然再以Unicode字符格式输出，会与系统编码格式不同，导致中文输出乱码，知道原因后我们就好解决了。

# -*- coding: utf-8 -*-  
  
import urllib2  
import sys  
import urllib  
#设置编码  
reload(sys)  
sys.setdefaultencoding('utf-8')  
#获得系统编码格式  
type = sys.getfilesystemencoding()  
r = urllib.urlopen("http://www.baidu.com")  
#将网页以utf-8格式解析然后转换为系统默认格式  
a = r.read().decode('utf-8').encode(type)  
print a

2)使用raw_input()读取键盘输入的中文乱码问题

raw_input()里面的中文提示出现乱码以及读出来之后显示乱码

例如，想要用键盘输入一个关键字，用这个关键字进行搜索，如果直接将这个中文关键字放入url中，那么将会无法进行搜索

#-*- coding:utf-8 -*-
import urllib2
import re
import sys

# 设置编码
reload(sys)
sys.setdefaultencoding('utf-8')
# 获得系统编码格式
type = sys.getfilesystemencoding()
word = raw_input("请输入关键字: ".decode('utf-8').encode('gbk')).decode(type)
url = 'https://image.baidu.com/search/flip?tn=baiduimage&ie=utf-8&word='+word+'&pn=0'
request = urllib2.Request(url)
response = urllib2.urlopen(request)
page = response.read().decode('utf-8').encode('gbk')
print page

查看全文

相关阅读:
SoftWater——SDN+UnderWater系列论文一
 《面向对象程序设计》2018年春学期寒假及博客作业总结
 2017级面向对象程序设计——团队作业3
2017级面向对象程序设计——团队作业2
生活
 一文搞懂transform: skew
如何实现css渐变圆角边框
 使用腾讯云的图片缩略图服务
 谈谈实现瀑布流布局的几种思路
 vue cli 3 那些事儿

原文地址：https://www.cnblogs.com/lzhc/p/7911641.html