Urllib库
python内置的http请求库
1、urllib.request 请求模块
2、urllib.error 异常处理模块(try,catch)
3、urllib.parse url解析模块
4、urllib.robotparser robots.txr解析模块
urlopen
get请求
import urllib.request reponse=urllib.request.urlopen("http://www.baidu.com") print(reponse.read().decode('utf-8'))#.read()读请求内容
post请求
import urllib.parse#貌似加不加都行 import urllib.request data=bytes(urllib.parse.urlencode({'name':'汪国强'}),encoding='utf-8') response=urllib.request.urlopen('http://httpbin.org/post',data=data) print(response.read().decode('utf-8'))
urllib.error
import urllib.request import socket import urllib.error try: response=urllib.request.urlopen('http://httpbin.org/get',timeout=0.01) except urllib.error.URLError as e: #超时属于URLError if isinstance(e.reason,socket.timeout): print('timeout')
对响应的一些处理
状态码、响应头
import urllib.request import socket import urllib.error response=urllib.request.urlopen('http://www.baidu.com') print(response.status) print('-----------------') print(response.getheaders()) print('-----------------') print(response.getheader('Server'))
得到:
200 状态码
-----------------
响应头
[('Date', 'Mon, 25 Dec 2017 09:59:01 GMT'), ('Content-Type', 'text/html; charset=utf-8'), ('Transfer-Encoding', 'chunked'), ('Connection', 'Close'), ('Vary', 'Accept-Encoding'), ('Set-Cookie', 'BAIDUID=C941C9CEBE13F4D6264663E5A10D4603:FG=1; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.com'), ('Set-Cookie', 'BIDUPSID=C941C9CEBE13F4D6264663E5A10D4603; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.com'), ('Set-Cookie', 'PSTM=1514195941; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.com'), ('Set-Cookie', 'BDSVRTM=0; path=/'), ('Set-Cookie', 'BD_HOME=0; path=/'), ('Set-Cookie', 'H_PS_PSSID=25394_1453_21119_25178_22157; path=/; domain=.baidu.com'), ('P3P', 'CP=" OTI DSP COR IVA OUR IND COM "'), ('Cache-Control', 'private'), ('Cxy_all', 'baidu+e8e6fa769a31bd4f787c267655da18e6'), ('Expires', 'Mon, 25 Dec 2017 09:58:11 GMT'), ('X-Powered-By', 'HPHP'), ('Server', 'BWS/1.1'), ('X-UA-Compatible', 'IE=Edge,chrome=1'), ('BDPAGETYPE', '1'), ('BDQID', '0xc958493100031c89'), ('BDUSERID', '0')]
-----------------
指定的响应头内容
BWS/1.1
如果想在请求时加上请求头怎么办?
import urllib.request import urllib.parse head={ "Host": "httpbin.org", "Upgrade-Insecure-Requests": "1", } dic={'name':'165498489'} data=bytes(urllib.parse.urlencode(dic),encoding='utf-8') request=urllib.request.Request('http://httpbin.org/post',data=data,headers=head,method='POST') response=urllib.request.urlopen(request) print(response.read().decode('utf-8'))
或者使用request.add_header()
import urllib.request,parser dic={'name':'165498489'} data=bytes(urllib.parse.urlencode(dic),encoding='utf-8') request=urllib.request.Request('http://httpbin.org/post',data=data,method='POST') request.add_header( "Upgrade-Insecure-Requests", "1" ) response=urllib.request.urlopen(request) print(response.read().decode('utf-8'))
Handler
代理
使用代理ip
import urllib.request proxy_handler=urllib.request.ProxyHandler({ 'http':'http://116.199.115.78:80/' }) opener=urllib.request.build_opener(proxy_handler) response=opener.open('http://httpbin.org/ip') print(response.read().decode('utf-8'))