python开源项目Scrapy抓取文件乱码解决 - 走看看

zoukankan html css js c++ java

python开源项目Scrapy抓取文件乱码解决

scrapy进行页面抓去的时候，保存的文件出现乱码，经过分析是编码的原因，只需要把编码转换为utf-8即可，代码片段

......

import chardet

......

content_type = chardet.detect(html_content)

#print(content_type['encoding'])

if content_type['encoding'] != "UTF-8":

html_content = html_content.decode(content_type['encoding'])

html_content = html_content.encode("utf-8")

open(filename,"wb").write(html_content)

....

这样保存的文件就是中文了。

步骤:

先把gb2312的编码转换为unicode编码

然后在把unicode编码转换为utf-8.

查看全文

相关阅读:
父div不会被子div撑高
 ie6兼容问题
 浏览器兼容性技巧
 css hack基本语法
 网站设置为灰色
 .net cookie跨域请求指定请求域名
 实体对象属性和值转为键值对Dictionary
C#通过对象属性名修改值
 jQuery.noConflict()解决imgBox.js依赖jquery版本问题
 华为OJ之最长公共子序列

原文地址：https://www.cnblogs.com/Byrd/p/4434463.html

Copyright © 2011-2022 走看看