使用requests
由于处理网页验证和Cookies时,需要写Opener和Handler来处理,为了更方便地实现这些操作,就有了更强大的库requests。requests库功能很强大。能实现Cookies、登录验证、代理设置等操作。
简单使用requests库
import requests
r = requests.get('http://wwww.baidu.com/')
print(type(r), r.status_code, r.text, r.cookies, sep='\n\n')
GET请求
返回相应的请求信息
requests.get(url, params)
# url表示要捕获的页面链接,params表示url的额外参数(字典或字节流格式)
举例1:
import requests
r = requests.get('http://httpbin.org/get')
print(r.text)
# 输出
{
"args": {},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Host": "httpbin.org",
"User-Agent": "python-requests/2.21.0"
},
"origin": "120.85.108.192, 120.85.108.192",
"url": "https://httpbin.org/get"
}
举例2
import requests
data = {
'name': 'LiYihua',
'age': '21'
}
r = requests.get('http://httpbin.org/get', params=data)
print(r.text)
# 输出:
{
"args": {
"age": "21",
"name": "LiYihua"
},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Host": "httpbin.org",
"User-Agent": "python-requests/2.21.0"
},
"origin": "120.85.108.92, 120.85.108.92",
"url": "https://httpbin.org/get?name=LiYihua&age=21"
}
举例3
import requests
r = requests.get('http://httpbin.org/get')
print(type(r.text), r.json(), type(r.json()), sep='\n\n')
# 输出:
<class 'str'>
{'args': {}, 'headers': {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'Host': 'httpbin.org', 'User-Agent': 'python-requests/2.21.0'}, 'origin': '120.85.108.92, 120.85.108.92', 'url': 'https://httpbin.org/get'}
<class 'dict'>
举例4
抓取照片
import requests
r = requests.get('https://github.com/favicon.ico')
with open('favicon.ico', 'wb') as f:
f.write(r.content)
# 运行结束后生成一个名为favicon.ico的图标
POST请求
这是一种比较常见的URL请求方式,举例:
import requests
data = {
'name': 'LiYihua',
'age': 21
}
r = requests.post('http://httpbin.org/post', data=data)
print(r.text)
# 输出:
{
"args": {},
"data": "",
"files": {},
"form": {
"age": "21",
"name": "LiYihua"
},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Content-Length": "19",
"Content-Type": "application/x-www-form-urlencoded",
"Host": "httpbin.org",
"User-Agent": "python-requests/2.21.0"
},
"json": null,
"origin": "120.85.108.90, 120.85.108.90",
"url": "https://httpbin.org/post"
}
# POST请求成功,获得返回结果,form部分为提交的数据
Response
- text 和 content 获取响应的内容
- status code 属性得到状态码
- headers 属性得到响应头
- cookies属性得到 Cookies
- url属性得到 URL
- history属性得到请求历史
举例:
import requests
r = requests.get('https://www.cnblogs.com/liyihua/')
print(type(r.status_code), r.status_code,
type(r.headers), r.headers,
type(r.cookies), r.cookies,
type(r.url), r.url,
type(r.history), r.history,
sep='\n\n')
# 输出:
<class 'int'>
200
<class 'requests.structures.CaseInsensitiveDict'>
{'Date': 'Thu, 20 Jun 2019 08:18:00 GMT', 'Content-Type': 'text/html; charset=utf-8', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'Vary': 'Accept-Encoding', 'Cache-Control': 'private, max-age=10', 'Expires': 'Thu, 20 Jun 2019 08:18:10 GMT', 'Last-Modified': 'Thu, 20 Jun 2019 08:18:00 GMT', 'X-UA-Compatible': 'IE=10', 'X-Frame-Options': 'SAMEORIGIN', 'Content-Encoding': 'gzip'}
<class 'requests.cookies.RequestsCookieJar'>
<RequestsCookieJar[]>
<class 'str'>
https://www.cnblogs.com/liyihua/
<class 'list'>
[]
requests 的高级用法
-
文件上传
import requests files = { 'file': open('favicon.ico', 'rb') } r = requests.post('http://httpbin.org/post', files=files) print(r.text) # 输出: { "args": {}, "data": "", "files": { "file": "data:application/octetstream;base64,AAABAAIAEBAAAAEAIAAoBQAAJgAAACAgAAABACAAKBQAAE4FAAAoAAAAEAAAACAAAAABACAAAAAAAAAFAAA... }, "form": {}, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Content-Length": "6665", "Content-Type": "multipart/form-data; boundary=c1b665273fc73e67e57ac97e78f49110", "Host": "httpbin.org", "User-Agent": "python-requests/2.21.0" }, "json": null, "origin": "120.85.108.71, 120.85.108.71", "url": "https://httpbin.org/post" }
-
会话维持
-
Session对象,可以方便的维护一个会话
import requests requests.get('http://httpbin.org/cookies/set/number/123456789') r = requests.get('http://httpbin.org/cookies') print(r.text) # 输出: { "cookies": {} } import requests s = requests.Session() s.get('http://httpbin.org/cookies/set/number/123456789') r = s.get('http://httpbin.org/cookies') print(r.text) # 输出: { "cookies": { "number": "123456789" } }
-
SSL证书验证
import requests r = requests.get('https://www.12306.cn') print(r.status_code) # 没有出错会输出:200 # 如果请求一个HTTPS站点,但是证书验证错误的页面时,就会错误。 # 为了避免错误,可以将改例子稍作修改 import requests from requests.packages import urllib3 urllib3.disable_warnings() r = requests.get('https://www.12306.cn', verify=False) print(r.status_code)
-
代理设置
import requests proxies = { 'http': 'socks5://user:password@10.10.1.10:3128', 'https': 'socks5://user:password@10.10.1.10:1080' } requests.get('https://www.taobao.com', proxies=proxies) # 使用SOCKS协议代理
-
超时设置
import requests r = requests.get('https://taobao.com', timeout=(0.1, 1)) print(r.status_code) # 输出:200
-
身份验证
import requests from requests.auth import HTTPBasicAuth r = requests.get('http://localhost', auth=HTTPBasicAuth('liyihua', 'woshiyihua134')) print(r.status_code) # 输出:200 # 也可以使用OAuth1方法 import requests from requests_oauthlib import OAuth1 url = 'https://api.twitter.com/1.1/account/verify_credentials.json' auth = OAuth1('YOUR_APP_KEY', 'YOUR_APP_SECRET' 'USER_OAUTH_TOKEN', 'USER_OAUTH_TOKEN_SECRET') requests.get(url, auth=auth)
-
Prepared Request(准备请求
要获取一个带有状态的 Prepared Request, 需要用Session.prepare_request()
from requests import Request, Session url = 'http://httpbin.org/post' data = { 'name': 'LiYihua' } # 参数 header = { 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537 (KHTML, like Gecko Chrome/53.0.2785.116 Safari/537.36' } # 伪装浏览器 s = Session() # 会话维持 req = Request('POST', url, data=data, headers=header) prepped = s.prepare_request(req) # Session的prepare_request()方法将req转化为一个 Prepared Request对象 r = s.send(prepped) # send() 发送请求 print(r.text) # 输出: { "args": {}, "data": "", "files": {}, "form": { "name": "LiYihua" }, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Content-Length": "12", "Content-Type": "application/x-www-form-urlencoded", "Host": "httpbin.org", "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537 (KHTML, like Gecko Chrome/53.0.2785.116 Safari/537.36" }, "json": null, "origin": "120.85.108.184, 120.85.108.184", "url": "https://httpbin.org/post" }
-