zoukankan      html  css  js  c++  java
  • requests源码分析

    0.前言

    (1) 拆部分reques中感兴趣t的轮子

    (2)对一些感兴趣的pythonic写法做一些归纳

    1.用object.__setattr__来初始化构造函数

    反正我之前就是直接实例对象时把所有参数传入构造函数的,一般人都这样..但事实证明这种方式并不好(可能),所以后来作者又把这种方式改掉了...但原谅我也不知道这两者有什么好坏之分..

    class Request(object):
        """The :class:`Request` object. It carries out all functionality of
        Requests. Recommended interface is with the Requests functions.
        
        """
        
        _METHODS = ('GET', 'HEAD', 'PUT', 'POST', 'DELETE')
        
        def __init__(self):
            self.url = None
            self.headers = dict()
            self.method = None
            self.params = {}
            self.data = {}
            self.response = Response()
            self.auth = None
            self.sent = False
            
        def __repr__(self):
            try:
                repr = '<Request [%s]>' % (self.method)
            except:
                repr = '<Request object>'
            return repr
        
        def __setattr__(self, name, value):
            if (name == 'method') and (value):
                if not value in self._METHODS:
                    raise InvalidMethod()
            
            object.__setattr__(self, name, value)

    初始化操作:

    def get(url, params={}, headers={}, auth=None):
        """Sends a GET request. Returns :class:`Response` object.
        :param url: URL for the new :class:`Request` object.
        :param params: (optional) Dictionary of GET Parameters to send with the :class:`Request`.
        :param headers: (optional) Dictionary of HTTP Headers to sent with the :class:`Request`.
        :param auth: (optional) AuthObject to enable Basic HTTP Auth.
        """
        
        r = Request()
        
        r.method = 'GET'
        r.url = url
        r.params = params
        r.headers = headers
        r.auth = _detect_auth(url, auth)
        
        r.send()
        
        return r.response

    2.大量复杂的参数传递时采用**kwargs

    用**kwargs可在方法间的传递大量参数,不需要自己每次都初始化一个dict用来传参(嗯,之前我就是这样的傻逼)

    def get(url, params={}, headers={}, cookies=None, auth=None):
        return request('GET', url, params=params, headers=headers, cookiejar=cookies, auth=auth)
    
    def request(method, url, **kwargs):
        data = kwargs.pop('data', dict()) or kwargs.pop('params', dict())
    
        r = Request(method=method, url=url, data=data, headers=kwargs.pop('headers', {}),
                    cookiejar=kwargs.pop('cookies', None), files=kwargs.pop('files', None),
                    auth=kwargs.pop('auth', auth_manager.get_auth(url)))
        r.send()
    
        return r.response

    3.monkey patch

    热修复技术方案,可以参考协程,协程为了实现异步效果,替换了python原生的很多库。就是模块在加载前,把自己的模块在系统加载前替换掉原系统模块,然后达到自己的(不可告人的)目的。

    这里其实不是requests使用了monkey patch,而是pyopenssl这个库,这个是为了修复python2.7中SNI的bug,将原来的ssl_wrap_socket方法做了替换(不过我没看到requests有任何注入操作,坑爹...)

    # 替换
    def inject_into_urllib3():
        'Monkey-patch urllib3 with PyOpenSSL-backed SSL-support.'
    
        connection.ssl_wrap_socket = ssl_wrap_socket
        util.HAS_SNI = HAS_SNI
        util.IS_PYOPENSSL = True
    
    # 还原
    def extract_from_urllib3():
        'Undo monkey-patching by :func:`inject_into_urllib3`.'
    
        connection.ssl_wrap_socket = orig_connection_ssl_wrap_socket
        util.HAS_SNI = orig_util_HAS_SNI
        util.IS_PYOPENSSL = False

    如果在请求https过程中出现SNIMissing的问题,可以考虑这么解决:

    pip install pyopenssl ndg-httpsclient pyasn1
    
    try:
        import urllib3.contrib.pyopenssl
        urllib3.contrib.pyopenssl.inject_into_urllib3()
    except ImportError:
        pass

    相当于就是执行主动注入的操作(但这个不应该是requests框架自己该集成的么...)

    4.hook函数

    requests中有一个钩子函数,看历史版本,原来提供的回调入口有好几个,目前只有response一个回调入口了,测试代码如下

    import requests
    
    def print_url(r, *args, **kwargs):
        print r.content
        print r.url
    
    requests.get('http://httpbin.org', hooks=dict(response=print_url))

    这会发生什么呢?requests会在requests.Response返回前回调这个print_url这个方法。可以看到,回调操作是在requests拿到请求结果后才去操作的

        def send(self, request, **kwargs):
            """
            Send a given PreparedRequest.
    
            :rtype: requests.Response
            """
            ...
    
            # Get the appropriate adapter to use
            adapter = self.get_adapter(url=request.url)
    
            # Start time (approximately) of the request
            start = datetime.utcnow()
    
            # Send the request
            r = adapter.send(request, **kwargs)
    
            # Total elapsed time of the request (approximately)
            r.elapsed = datetime.utcnow() - start
    
            # Response manipulation hooks
            r = dispatch_hook('response', hooks, r, **kwargs)    

    那dispatch_hook又干了什么呢?

    def dispatch_hook(key, hooks, hook_data, **kwargs):
        """Dispatches a hook dictionary on a given piece of data."""
        hooks = hooks or dict()
        hooks = hooks.get(key)
        if hooks:
            if hasattr(hooks, '__call__'):
                hooks = [hooks]
            for hook in hooks:
                _hook_data = hook(hook_data, **kwargs)
                if _hook_data is not None:
                    hook_data = _hook_data
        return hook_data

    可以看到dispatch_hook本身是可以拓展的,但可惜的是目前requests只有response入口了,也许是为了安全吧。

    其实说真的,requests的hook使用起来真的不够好,真正好用的hook,可以看看flask.

    5.上下文管理器(历史版本)

    with requests.settings(timeout=0.5):
        requests.get('http://example.org')
        requests.get('http://example.org', timeout=10)

    在with之中,所有的配置加载都是在局部生效的,就算requests.get('http://example.org', timeout=10),但requests对象中的timeout属性依然是0.5而不是10,怎么实现的呢?

    class settings:
        """Context manager for settings."""
        
        cache = {}
        
        def __init__(self, timeout):
            self.module = inspect.getmodule(self)
            
            # Cache settings
            self.cache['timeout'] = self.module.timeout
            
            self.module.timeout = timeout
            
        def __enter__(self):
            pass
            
        def __exit__(self, type, value, traceback):
            # Restore settings 
            for key in self.cache:
                setattr(self.module, key, self.cache[key])

    其实很简单,只要在进入这个context时,将原有的属性储存起来,退出context时,重新set回去就行了。

    6.重定向redirect

    requests对每一个send请求都会做重定向的判断,具体就是如果是重定向,那就执行以下这个方法

        def resolve_redirects(self, resp, req, stream=False, timeout=None,
                              verify=True, cert=None, proxies=None, **adapter_kwargs):
            """Receives a Response. Returns a generator of Responses."""
    
            i = 0
            hist = [] # keep track of history
    
            while resp.is_redirect:
                prepared_request = req.copy()
    
                if i > 0:
                    # Update history and keep track of redirects.
                    hist.append(resp)
                    new_hist = list(hist)
                    resp.history = new_hist
           ...
    
                url = resp.headers['location']
    
                # Handle redirection without scheme (see: RFC 1808 Section 4)
                if url.startswith('//'):
                    parsed_rurl = urlparse(resp.url)
                    url = '%s:%s' % (parsed_rurl.scheme, url)
    
           ...
                extract_cookies_to_jar(prepared_request._cookies, req, resp.raw)
                prepared_request._cookies.update(self.cookies)
                prepared_request.prepare_cookies(prepared_request._cookies)
    
                # Rebuild auth and proxy information.
                proxies = self.rebuild_proxies(prepared_request, proxies)
                self.rebuild_auth(prepared_request, resp)
    
                # Override the original request.
                req = prepared_request
    
                resp = self.send(
                    req,
                    stream=stream,
                    timeout=timeout,
                    verify=verify,
                    cert=cert,
                    proxies=proxies,
                    allow_redirects=False,
                    **adapter_kwargs
                )
    
                extract_cookies_to_jar(self.cookies, prepared_request, resp.raw)
    
                i += 1
                yield resp

    可以看到,requests会从url = resp.headers['location']取出重定向后的url,将resp追加到history中,然后重设head,cookie,proxy,auth执行self.send操作,然后yield resp后进入下一次循环,判断是否是redirect,最多redirect次数为30次.

  • 相关阅读:
    JavaScript DOM编程艺术 读书笔记(简略)
    关于暂停或终止更新的相关读书笔记
    Core Java Volume II—Using annotations
    Data Structure and Algorithms Analysis in C Note (II)
    Hibernate实战——理解对象/关系持久化 笔记
    Data Structure and Algorithms Analysis in C Note (I)
    Java 8实战 第三章
    GitHub入门与实践 学习笔记(二)
    Computer Networking A Top-Down Approach 笔记(一)
    进程基础
  • 原文地址:https://www.cnblogs.com/alexkn/p/6266573.html
Copyright © 2011-2022 走看看