Python 网络爬虫与信息获取（一）—— requests 库的网络爬虫

zoukankan html css js c++ java

Python 网络爬虫与信息获取（一）—— requests 库的网络爬虫
1. 安装与测试

进入 cmd（以管理员权限），使用 pip 工具，pip install requests 进行安装；

基本用法：
```
>> import requests
>> r = requests.get('http://www.baidu.com')
                    # 访问百度主页
>> r.status_code
200     
                    # 状态码，200 表示访问成功
>> r.encoding = 'utf-8'
                    # 修改编码
>> r.text
                    # 打印网页内容
```
2. requests 库的七个主要方法
- request：构造一个请求，是构造以下各方法的基础方法；
  
  后续的 6 个方法均需调用 request 方法；
- get：获取 html 网页的主要方法，对应于 http 的 get；
  
  r = requests.get(url)
  
  构造一个向服务器请求资源的 Request 对象；
  
  返回一个包含服务器资源的 Response 对象；
- head：获取 html 网页头信息，对应于 http 的 head；
- post：向 html 网页提交 post 请求，对应于 http 的 post；
- put：向 html 网页提交 put 请求，对应于 http 的 put；
- patch：向 html 网页提交局部修改请求（patch，补丁，也就是修改，局部更新），对应于 http 的 patch；
- delete：向 html 页面提交删除请求，对应于 http 的 delete；
4. Response 对象的属性
- r.status_code
  
  r.status_code == requests.codes.ok，如果返回 True，则表示打开正常；
- r.text：http 相应内容的字符串形式，
- r.content：http 相应内容的二进制形式；
- r.encoding：猜测的编码，从 headers 中的 charset 中获得，但并非所有的服务器都会对其相关资源的编码进行规定和要求；
  
  如果 headers 中不存在 charset，则认为（猜测）其编码为ISO-8859-1
- r.apparent_encoding：根据内容分析出的编码方式，备选编码；
```
>> r = requests.get('http://www.baidu.com')
>> r.encoding
'ISO-8859-1'
>> r.apparent_encoding
'utf-8'
>> r.encoding = r.apparent_encoding
```
5. 与其他库的结合
- BeautifulSoup：做 html 页面的解析；
```
>> from bs4 import BeautifulSoup
>> r = requests.get(url)
>> BeautifulSoup(r.text).get_text()
```
查看全文

相关阅读:
线程池ThreadPoolExecutor
常用引擎+存储过程
 在浏览器中输入www.baidu.com后执行的全过程
 win端git连接私服仓库+上传本地项目+从服务器下载文件到win
TCP的三次握手和四次挥手+TCP和UDP的区别
 2017网易---计算糖果
 ubuntu下wireshark+scapy+tcpreply
网易2017---数列还原
 2017网易---藏宝图
 2017网易---不要二

原文地址：https://www.cnblogs.com/mtcnn/p/9421808.html

最新文章
JAXB
JAXB
JAXB
JAXB
JAXB
JAXB
JAXB
JAXB
JAXB
JAXB

Python 网络爬虫与信息获取（一）—— requests 库的网络爬虫

1. 安装与测试

2. requests 库的七个主要方法

4. Response 对象的属性

5. 与其他库的结合