zoukankan      html  css  js  c++  java
  • Python-requests模块浅析一、requests是如何调用urllib3的

    到今天我才发现博客园支持markdown,终于可以愉快的编写了!!

    今天看一段代码的时候突然想到,requests是怎么实现长链接的?

    • 然后一顿找,大致知道了requests是依靠Session类的请求头实现的(当然自定义请求头也没有问题)

      class Session(SessionRedirectMixin):
      
          __attrs__ = [
              'headers', 'cookies', 'auth', 'proxies', 'hooks', 'params', 'verify',
              'cert', 'prefetch', 'adapters', 'stream', 'trust_env',
              'max_redirects',
          ]
      
          def __init__(self):
      
              self.headers = default_headers()          
      ............
      
      def default_headers():
          """
          :rtype: requests.structures.CaseInsensitiveDict
          """
          return CaseInsensitiveDict({
              'User-Agent': default_user_agent(),
              'Accept-Encoding': ', '.join(('gzip', 'deflate')),
              'Accept': '*/*',
              'Connection': 'keep-alive',
          })
      可以看到默认请求头就是个长链接keep-alive
      
    • 那么requests的Session作用是什么?又是一顿找,最后在requests文档里面找到了这句话

      会话对象:会话对象让你能够跨请求保持某些参数。它也会在同一个 Session 实例发出的所有请求之间保持 cookie, 期间使用 urllib3 的 connection pooling 功能。
      

      说白了就是实现了会话维持,真正使我感兴趣并写下这篇文章的,是最后一句话期间使用 urllib3 的 connection pooling 功能。

    • 那么requests的模块在哪调用到了urllib3?以及connection pooling具体实现了什么?第一个问题我们接下来跟着源码看一看,第二个问题留到下次讨论。

    • 首先来看一看requests的Session类吧

      class Session(SessionRedirectMixin):
          __attrs__ = [
              'headers', 'cookies', 'auth', 'proxies', 'hooks', 'params', 'verify',
              'cert', 'prefetch', 'adapters', 'stream', 'trust_env',
              'max_redirects',
          ]
      
          def __init__(self):
      
              ...............
              self.adapters = OrderedDict()
              self.mount('https://', HTTPAdapter())
              self.mount('http://', HTTPAdapter())
              ..................
      

      首先在Session类中,初始化方法有一个self.mount(),其中加载了HTTPAdapter(),那么首先来看一下mount方法做了什么

          def mount(self, prefix, adapter):
              """Registers a connection adapter to a prefix.
      
              Adapters are sorted in descending order by prefix length.
              """
              self.adapters[prefix] = adapter
              keys_to_move = [k for k in self.adapters if len(k) < len(prefix)]
      
              for key in keys_to_move:
                  self.adapters[key] = self.adapters.pop(key)
      
          def __getstate__(self):
              state = {attr: getattr(self, attr, None) for attr in self.__attrs__}
              return state
      
          def __setstate__(self, state):
              for attr, value in state.items():
                  setattr(self, attr, value)
      大致可以看出来,是组成了一个adapters的有序字典,key是http/https,value是HTTPAdapter对象;
      

      adapters这之后基本上就是在Session类的send方法里面使用了,并没有涉及到pool的概念,重点就在它传递过来的HTTPAdapter这个对象

      首先看一段HTTPAdapter的注释
          Usage::
            >>> import requests
            >>> s = requests.Session()
            >>> a = requests.adapters.HTTPAdapter(max_retries=3)
            >>> s.mount('http://', a)
            这里写的很清楚了,基本用法就是手动构造s.mount,而在倒数第二行可以看到,可以为HTTPAdapter手动传参,可以看到类有以下几个参数:
            pool_connections=DEFAULT_POOLSIZE,     # 链接池容量
            pool_maxsize=DEFAULT_POOLSIZE,         # 容量最大值,和上一个是一样的
            max_retries=DEFAULT_RETRIES,           # 重试次数
            pool_block=DEFAULT_POOLBLOCK           # 链接池是否阻止链接
          
          
      class HTTPAdapter(BaseAdapter):
          __attrs__ = ['max_retries', 'config', '_pool_connections', '_pool_maxsize',
                       '_pool_block']
      
          def __init__(self, pool_connections=DEFAULT_POOLSIZE,
                       pool_maxsize=DEFAULT_POOLSIZE, max_retries=DEFAULT_RETRIES,
                       pool_block=DEFAULT_POOLBLOCK):
              if max_retries == DEFAULT_RETRIES:
                  self.max_retries = Retry(0, read=False)
              else:
                  self.max_retries = Retry.from_int(max_retries)
              self.config = {}
              self.proxy_manager = {}
      
              super(HTTPAdapter, self).__init__()
      
              self._pool_connections = pool_connections
              self._pool_maxsize = pool_maxsize
              self._pool_block = pool_block
      
              self.init_poolmanager(pool_connections, pool_maxsize, block=pool_block)
      
      

      可以看出来,在这个对象中,定义了pool_connection的一系列属性,而且不仅仅是pool_connection,requests中的一系列配置,都是在这个类中完成proxy_headers/add_headers/request_url,甚至还有两个方法:get_connection/build_response;可以看出Adapter这个类是requests的一个核心类

      那我们就从头捋一下requests的源码 # 太占空间了我只贴有用代码了
      比如我发送一个post请求:requests.post('127.0.0.1:12345', {'data': 'hello world'})
      
      # 进入requests.api
      def post(url, data=None, json=None, **kwargs):
          return request('post', url, data=data, json=json, **kwargs)
      
      def request(method, url, **kwargs):
          with sessions.Session() as session:
              return session.request(method=method, url=url, **kwargs)
           
      # 返回了一个session的对象,并调用了request方法,进入requests.session
      class Session(SessionRedirectMixin):
          ...
          def request(self, method, url,
                  params=None, data=None, headers=None, cookies=None, files=None,
                  auth=None, timeout=None, allow_redirects=True, proxies=None,
                  hooks=None, stream=None, verify=None, cert=None, json=None):
                  ....
              resp = self.send(prep, **send_kwargs)   # 这里进入了send方法,不知道大家有没有印象,上面讲过send函数中调用了adapters,下面我会把具体调用步骤列出来
              return resp
      # 到此就和上面的串联了起来,adapters就是HTTPAdapter的对象
      
      

      Session类的send函数调用adapters过程

          def send(self, request, **kwargs):
              .............
              # Get the appropriate adapter to use
              adapter = self.get_adapter(url=request.url)  # 函数在下方
      
              # Start time (approximately) of the request
              start = preferred_clock()
      
              # Send the request
              r = adapter.send(request, **kwargs)   # 调用了HttpAdapter的send方法
              ..........
              
              
              
          def get_adapter(self, url):         # 在get_adapter函数中取出了HttpAdapter对象
              for (prefix, adapter) in self.adapters.items():
                  if url.lower().startswith(prefix.lower()):
                      return adapter
      
      

      接下来就看看HTTPAdapter里面的send实现了什么,重头戏来了,下面的是HTTPAdapter的send函数,注意不要和上面Session的搞混了

          def send(self, request, stream=False, timeout=None, verify=True, cert=None, proxies=None):
              try:
                  conn = self.get_connection(request.url, proxies)   # 函数在下方
              except LocationValueError as e:
                  raise InvalidURL(e, request=request)
      
      			.........................................# 这一堆都是在配置和判断就略过了
                  
                          # Receive the response from the server
                          try:
                              # For Python 2.7, use buffering of HTTP responses
                              r = low_conn.getresponse(buffering=True)
                          except TypeError:
                              # For compatibility with Python 3.3+
                              r = low_conn.getresponse()
      
                          resp = HTTPResponse.from_httplib(
                              r,
                              pool=conn,
                              connection=low_conn,
                              preload_content=False,
                              decode_content=False
                          )
                      except:
                          # If we hit any problems here, clean up the connection.
                          # Then, reraise so that we can handle the actual exception.
                          low_conn.close()
                          raise
      		............................................# 这一堆都是在raise各个情况的error也略过了
              return self.build_response(request, resp)
      
          
          
          # get_connection func
          # 这次注释特意留了下来,从注释可以看出来,send里面的get_connection返回的是一个urllib3链接,到这里终于能从requests的代码跳到urllib3了,而下面的proxy_manager.connection_from_url/self.poolmanager.connection_from_url其实就是在调用urllib3的模块了
          def get_connection(self, url, proxies=None):
              """Returns a urllib3 connection for the given URL. This should not be
              called from user code, and is only exposed for use when subclassing the
              :class:`HTTPAdapter <requests.adapters.HTTPAdapter>`.
              :param url: The URL to connect to.
              :param proxies: (optional) A Requests-style dictionary of proxies used on this request.
              :rtype: urllib3.ConnectionPool
              """
              proxy = select_proxy(url, proxies)
      
              if proxy:
                  proxy = prepend_scheme_if_needed(proxy, 'http')
                  proxy_url = parse_url(proxy)
                  if not proxy_url.host:
                      raise InvalidProxyURL("Please check proxy URL. It is malformed"
                                            " and could be missing the host.")
                  proxy_manager = self.proxy_manager_for(proxy)
                  conn = proxy_manager.connection_from_url(url)
              else:
                  # Only scheme should be lower case
                  parsed = urlparse(url)
                  url = parsed.geturl()
                  conn = self.poolmanager.connection_from_url(url)
      
              return conn
      
      

      追着源码跑了半天才看到调用的地方,requests源码不是很多,逻辑也很清晰,当然这里并没有深入的去讲解各个功能的实现,因为感觉太复杂 了,以我的文笔水平大概是写不出来的:P 所以只是简单的介绍了一下对urllib3的引用,有兴趣的童鞋可以自己去看一看,下一次 试着去看一看urllib3的源码

    ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 有任何问题请随时交流~ Email: araise1@163.com
  • 相关阅读:
    撩课-Python-每天5道面试题-第8天
    声明提前、原型、静态方法的一些所得
    梳理ajax
    两数之和、整数反转、回文数
    node 基础API(fs)
    node 基础API(event)
    node 基础API(Buffer)
    node 基础API(path)
    node 调试技巧
    node process(进程) 几个常用属性
  • 原文地址:https://www.cnblogs.com/seasen/p/12888512.html
Copyright © 2011-2022 走看看