爬虫所需要的模块:
requests:
requetes是使用Apache2 licensed的许可证,基于Python开发的http库。
在Python内置模块的基础上进行的高度封装,从而使得Python进行网络请求时,变的好
用,使用requests可以轻易得完成浏览器的任何操作。
1. get请求: get请求分为有参和无参。
无参:
import requests
response = requests.get("想要get获取的网页url")
response.encoding = "需要转义的字符编码"
print(response.text)
有参:
import requests
payload = {"key1":"value1","key2":"value2"}
response = requests.get("想要get获取的网页url",params=payload)
response.encoding = "需要转义的字符编码"
print(response.text)
2.post请求:
import requests
payload = {"key1":"value1","key2":"value2"}
response = requests.post("想要post获取的网页url",data=payload)
response.encoding = "需要转义的字符编码"
print(response.text)
#发送请求头和数据实例
import requests
import json
payload = {"some":"data"}
headers = {"content-type":"application/json"}
response = requests.post("想要post获取的网页url",data=json.dumps(payload),
headers=headers)
response.encoding = "需要转义的字符编码"
print(response.text)
3. 其他请求;
requests.get(url, params
=
None
,
*
*
kwargs)
requests.post(url, data
=
None
, json
=
None
,
*
*
kwargs)
requests.put(url, data
=
None
,
*
*
kwargs)
requests.head(url,
*
*
kwargs)
requests.delete(url,
*
*
kwargs)
requests.patch(url, data
=
None
,
*
*
kwargs)
requests.options(url,
*
*
kwargs)
# 以上方法均是在此方法的基础上构建
requests.request(method, url,
*
*
kwargs)
BeautifulSoup:
re: