python爬虫常用的库 - 走看看

zoukankan html css js c++ java

python爬虫常用的库

1，请求：requests

　 requests.get(url, headers)

　 requests.post(url, data=data, files=files)

　 urllib模块:

　 Python2

　 import urllib2

　 response = urllib2.urlopen('http://www.baidu.com');

　 Python3

　 import urllib.request

　 response =urllib.request.urlopen('http://www.baidu.com');

2，解析：

　　lxml (解析网页)

　　from lxml import etree

　　# 获取请求网页数据

　　html = etree.HTML(text)

3，存储：

　　mongodb数据库

　　mysql数据库

　　redis数据库

4，工具：

　　selenium自动化工具

5，框架：

　　scrapy和scrapy-redis

查看全文

相关阅读:
利用HttpClient进行带参数的http文件上传
 使用mysqlproxy 快速实现mysql 集群读写分离 [转]
SQL Server Express 自动备份方法
 screen 配置文件
 linux启动DHCP
dynamic table_name in cursor
LogMiner and supplemental logging
RAC 规划配置网络环境
 RAC prepare OS and installation media
一次导数据流程

原文地址：https://www.cnblogs.com/hellohorld/p/10189679.html

Copyright © 2011-2022 走看看