requests的基本使用:
response = requests.get("http://httpbin.org")
带参数的get请求:
方法1: response = requests.get("http://httpbin.org/get?name=gemey&age=22") # 参数和地址用?号隔开,参数之间用&连接 方法2: data = {"name":"germey","age":22} response = requests.get("http://httpbin.org/get",params=data) print(response.text) # 读取response的内容。
解析json
response = requests.get("http://httpbin.org/get") print(response.text) print(type(response.text)) # <class 'str'> print(response.json()) print(type(response.json())) # <class 'dict'>
获取二进制数据
response = requests.get("https://github.com/favicon.ico") print(type(response.text),type(response.content)) # <class 'str'> <class 'bytes'> print(response.text) print(response.content) with open("logo.ico","wb") as f: f.write(response.content) f.close()
添加headers
headers = {"user-agent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36"} response = requests.get("https://www.zhihu.com/explore",headers = headers) print(response.text)
响应
response属性
response = requests.get("http://www.jianshu.com",headers = headers) print(type(response.status_code),response.status_code) print(type(response.headers),response.headers) print(type(response.cookies),response.cookies) print(type(response.url),response.url) print(type(response.history),response.history)
高级操作
文件上传
files = {"file":open("logo.ico","rb")} response = requests.post("http://httpbin.org/post",files=files) print(response.text)
获取cookie
response = requests.get("https://www.baidu.com") print(response.cookies) for key,value in response.cookies.items(): print(key +"=" +value)
会话维持
s = requests.Session() s.get("http://httpbin.org/cookies/set/number/123456789") # 这一步是自己设置一个cookies,访问真正网站时这个是由网站自己生成的,不需要此步操作 response = s.get("http://httpbin.org/cookies") # 这一步才是真正获取cookies print(response.text)
证书验证:
对于一些https的网站,需要证书验证,如果该网站证书不是官网的,那么请求就会报错,如果想不报错,则取消证书验证。
response = requests.get("https://www.12306.cn",verify = False) print(response.status_code)
虽然上面的方法能够跳过验证,正常访问,但还是会有一句警告信息,使用下面方法取消警告信息:
from requests.packages import urllib3 urllib3.disable_warnings() response = requests.get("https://www.12306.cn",verify = False) print(response.status_code)
代理设置
proxies = { "http":"http://127.0.0.1:9743", "https":"https://127.0.0.1:9743" } response = requests.get("https://www.taobao.com",proxies = proxies) print(response.status_code)
如果代理是有用户名和密码的,可以使用下面格式:
proxies = { "http":"http://user:password@127.0.0.1:9743" } response = requests.get("https://www.taobao.com",proxies = proxies) print(response.status_code)
超时设置
response = requests.get("http://httpbin.org/get",timeout = 0.5) print(response.status_code)
超时设置,异常捕获
from requests.exceptions import ReadTimeout try: response = requests.get("http://httpbin.org/get",timeout = 0.5) print(response.status_code) except ReadTimeout: print("Timeout")
认证设置:有些网站打开的时候就需要输入用户名密码才能看到内容,这时就需要这个。
from requests.auth import HTTPBasicAuth r = requests.get("http://120.27.34.24:9001",auth=HTTPBasicAuth("user","123")) print(r.status_code)
异常处理:
import requests from requests.exceptions import ReadTimeout,ConnectionError,RequestException try: response = requests.get("http://httpbin.ort/get",timeout = 0.5) print(response.status_code) except ReadTimeout: print("TimeOut") except ConnectionError: print("Connection error") except RequestException: print("Error")
Requests获取连接的IP地址
1 import requests 2 3 rsp = requests.get("http://www.baidu.com", stream=True) # 要想下面的方法有结果,必须设置stream=True,不然会报错。 4 print rsp.raw._connection.sock.getpeername()[0] 5 print rsp.raw._connection.sock.getsockname()[0]