import requests response=requests.get("https://www.baidu.com") #print(response) #print(type(response)) print(response.text)
print (response.encoding)
print(response.content.decode("utf-8"))
r.text返回网页的源代码
r.content返回源代码的字节码,.decode(编码)使用某种编码解析字节码
r.encoding 返回识别的编码,如果识别错误,就会乱码
r.status_code返回状态码
print(response.status_code)--返回状态码结果是200
response.get()参数
- url 请求的地址
- params 请求网址附带的参数
- headers 请求网址附带的参数头
response=requests.get("http://www.antvv.com/?cate=4") print(response.text)
a={} response=requests.get("http://www.antvv.com",params=a)
用来测试http请求的网址
http://httpbin.org/get 获取电脑信息 http://httpbin.org/post
response=requests.get("http://httpbin.org/get") print(response.text)
返回的结果是
D:ProgramDataAnaconda3python.exe "E:/WXA/PyCharm study/爬虫介绍和基础库/demo1_requests请求.py" { "args": {}, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Host": "httpbin.org", "User-Agent": "python-requests/2.19.1", "X-Amzn-Trace-Id": "Root=1-5e9fd4f8-4e3d91cc100f2c6674d3c0b2" }, "origin": "124.64.16.230", "url": "http://httpbin.org/get" } Process finished with exit code 0
在User-Agent中可以看到是爬虫信息,此时可以给定义一个headers
自定义UA,
user-agent:服务器识别用户当前使用什么浏览器,如果不设置就是Python-requests,可以使用headers参数设定
referer:上一页请求的地址,也就是你从哪个页面跳转到当前页面,部分网站会拦截referer不正确的请求
headers={ "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.25 Safari/537.36 Core/1.70.3741.400 QQBrowser/10.5.3863.400"
"Referer":"http://httpbin.org"
} response=requests.get("http://httpbin.org/get",headers=headers) print(response.text)
返回的结果为:
D:ProgramDataAnaconda3python.exe "E:/WXA/PyCharm study/爬虫介绍和基础库/demo1_requests请求.py" { "args": {}, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Host": "httpbin.org", "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.25 Safari/537.36 Core/1.70.3741.400 QQBrowser/10.5.3863.400", "X-Amzn-Trace-Id": "Root=1-5e9fd5d3-e0944316a8c4783b8e08fd2e" }, "origin": "124.64.16.230", "url": "http://httpbin.org/get" }
此时便看不出User-Agent是什么了
- stream流式传输
# 获取图片 url="https://ss3.bdstatic.com/70cFv8Sh_Q1YnxGkpoWK1HF6hhy/it/u=1208538952,1443328523&fm=26&gp=0.jpg" r=requests.get(url,headers=headers) print(r.content) with open("1.jpg",'wb') as file: file.write(r.content)
# 获取图片 用流的方式 url="https://ss3.bdstatic.com/70cFv8Sh_Q1YnxGkpoWK1HF6hhy/it/u=1208538952,1443328523&fm=26&gp=0.jpg" r=requests.get(url,headers=headers,stream=True) # print(r.content) with open("1.jpg",'wb') as file: for j in r.iter_content(102400): file.write(j) print(j)
- timeout 设定超时时间,超过时间则会报错
url="https://www.zhihu.com" try: r=requests.get(url,timeout=2) print(r.text) except BaseException: print("超时了")
- proxiesd代理
#proxies代理 url="http://httpbin.org/get" proxies={ "http":"182.35.84.181:9999", "https":"", } r=requests.get(url,proxies=proxies) print(r.text)
- SSL
verify=False 不强制认证证书,如遇到sslError可以设定,现在12306不要求验证了
import requests response=requests.get('http://www.12306.cn',verify=False) print(response.status_code) print(response.content.decode('utf-8'))
-
json格式的返回值
url="http://httpbin.org/get" r=requests.get(url) resp_str=r.text print(resp_str) print(type(resp_str))
输出结果
D:ProgramDataAnaconda3python.exe "E:/WXA/PyCharm study/爬虫介绍和基础库/demo1_requests请求.py" { "args": {}, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Host": "httpbin.org", "User-Agent": "python-requests/2.19.1", "X-Amzn-Trace-Id": "Root=1-5ea13127-3bf7c712fb636862fd58c91c" }, "origin": "117.136.0.252", "url": "http://httpbin.org/get" } <class 'str'>
- json.loads()
json.loads()把Python字符串转换成Python的字典或者列表
url="http://httpbin.org/get" r=requests.get(url) resp_str=r.text import json resp_dict=json.loads(resp_str)#json.loads()把Python字符串转换成Python的字典或者列表 print(resp_dict) print(type(resp_dict)) print(resp_dict['url'])
输出结果
D:ProgramDataAnaconda3python.exe "E:/WXA/PyCharm study/爬虫介绍和基础库/demo1_requests请求.py" {'args': {}, 'headers': {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'Host': 'httpbin.org', 'User-Agent': 'python-requests/2.19.1', 'X-Amzn-Trace-Id': 'Root=1-5ea134f7-dc8428a433fcf066a2fde876'}, 'origin': '117.136.0.252', 'url': 'http://httpbin.org/get'} <class 'dict'> http://httpbin.org/get Process finished with exit code 0
- json.dumps()
print(json.dumps({"name":"tom",'age':18,'sex':"male"}))#json.dumps(Python-obj)是把Python的字典或者列表转换成json字符串 print(type(json.dumps({"name":"tom",'age':18,'sex':"male"})))
- r.json()解析json字符串,这是requests模块带的json解析
url="http://httpbin.org/get" r=requests.get(url) resp_str=r.json() print(resp_str)
输出结果
D:ProgramDataAnaconda3python.exe "E:/WXA/PyCharm study/爬虫介绍和基础库/demo1_requests请求.py" {'args': {}, 'headers': {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'Host': 'httpbin.org', 'User-Agent': 'python-requests/2.19.1', 'X-Amzn-Trace-Id': 'Root=1-5ea13733-900d16cad13db5581550d818'}, 'origin': '117.136.0.252', 'url': 'http://httpbin.org/get'} Process finished with exit code 0
- post方式---向服务器上传图片等
#post 方式 url="http://httpbin.org/post" data={ "uname":"admin", "upwd":"123456" #此处uname,upwd查看http://www.antvv.com/login/login.html 右键查看源代码 } r=request.post(url,data=data) print(r.text)
-
files向服务器发送文件
#post 方式 url="http://httpbin.org/post" data={ "uname":"admin", "upwd":"123456" #此处uname,upwd查看http://www.antvv.com/login/login.html 右键查看源代码 } #files向服务器发送文件 files={ "img1":open("./1.jpg",'rb') } r=requests.post(url,data=data,files=files) print(r.text)