zoukankan      html  css  js  c++  java
  • 2、Django源码分析之启动wsgi发生了哪些事

    一 前言

    Django是如何通过网络socket层接收数据并将请求转发给Django的urls层?

    有的人张口就来:就是通过wsgi(Web Server Gateway Interface)啊!

    Django框架完全遵循wsgi协议,底层采用socket、socketserver、select网络模型实现,可以利用操作系统的非堵塞和线程池等特性。

    Django本身是用python代码实现的wsgi服务,并发非常低,默认6个,而线上部署Django项目时一般采用C语言实现的uwsgi。

    二 顺藤摸瓜

    看源码,找到程序的入口是第一步,很简单,我们怎么启动django来着

    python3.6 manage.py runserver 8088 
    # ps:
    # python解释器版本:3.6
    # django版本:2.2.7
    

    好了,就它manage.py,我们来看看它里面都干了些啥(读源码不必面面俱到,找到关键代码即可)

    #!/usr/bin/env python
    """Django's command-line utility for administrative tasks."""
    import os
    import sys
    
    
    def main():
        os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'Foo.settings')
        try:
            from django.core.management import execute_from_command_line 
        except ImportError as exc:
            raise ImportError(
                "Couldn't import Django. Are you sure it's installed and "
                "available on your PYTHONPATH environment variable? Did you "
                "forget to activate a virtual environment?"
            ) from exc
        execute_from_command_line(sys.argv)
    
    
    if __name__ == '__main__':
        main()
    

    执行了 django.core.management下的execute_from_command_line ,关键代码是execute_from_command_line,看它就好,但在这之前需要提一嘴django.core.management是一个包,在导入时会执行其下的 __ init __ .py,这里面不仅有我们将要看的execute_from_command_line,其实还做了很多其他事

    #在导入from django.apps import apps时会运行django.apps.__init__.py文件,这是整个django程序的开端,它其实做了非常多的事情(例如:初始化日志模块、加载INSTALL_APP、检查各APP是否正常、检查缓存模块是否正常等),当一切无误时才会往下走,否则将会报错退出程序。
    

    我们来看一下django.core.management. __ init __.py中的execute_from_command_line

    def execute_from_command_line(argv=None):
        """Run a ManagementUtility."""
        utility = ManagementUtility(argv) # 调用当前文件中的类ManagementUtility产生对象,这个类就在该函数的上方,一找就能找到    
        utility.execute()                 # 调用类ManagementUtility中的方法execute
    

    关键代码utility.execute() ,去类ManagementUtility中可以找到,该方法特别长,就不列举了,一连串if条件就是判断参数是否合法

    关键代码在self.fetch_command(subcommand).run_from_argv(self.argv),链式调用,我们一点一点来看

    先看fetch_command(subcommand),即fetch_command('runserver'),就在类ManagementUtility中往上翻可以找到该方法

    def fetch_command(self, subcommand):
            """self.fetch_command
    是利用django内置的命令管理工具去匹配到具体的模块,例如self.fetch_command(subcommand)其实就相当于是self.fetch_command('runserver'),它最终找到了==django.contrib.staticfiles.management.commands.runserver.Command==这个命令工具。
    django中的命令工具代码组织采用的是策略模式+接口模式,也就是说django.core.management.commands这个目录下面存在各种命令工具,每个工具下面都有一个Command接口,当匹配到'runserver'时调用'runserver'命令工具的Command接口,当匹配到'migrate'时调用'migrate'命令工具的Command接口。
            """
            commands = get_commands() # 关键代码1
            try:
                app_name = commands[subcommand] # 关键代码2
            except KeyError:
                if os.environ.get('DJANGO_SETTINGS_MODULE'):
                    settings.INSTALLED_APPS
                else:
                    sys.stderr.write("No Django settings specified.
    ")
                possible_matches = get_close_matches(subcommand, commands)
                sys.stderr.write('Unknown command: %r' % subcommand)
                if possible_matches:
                    sys.stderr.write('. Did you mean %s?' % possible_matches[0])
                sys.stderr.write("
    Type '%s help' for usage.
    " % self.prog_name)
                sys.exit(1)
            if isinstance(app_name, BaseCommand):
                klass = app_name
            else:
                klass = load_command_class(app_name, subcommand) # 关键代码3
            return klass # 关键代码4
    

    关键代码1:get_comands()会返回一个字典

    关键代码2:app_name = commands[subcommand]取值操作即app_name='django.core'

    关键代码3:klass = load_command_class(app_name, subcommand)即klass = load_command_class('django.core' ,’runserver‘),自己去看很简单,klass=django.core.management.commands.runserver.Command类

    好啦,此时我们得知self.fetch_command(subcommand)得到的是类Command,好多人就在这懵逼了,接下来链式调用应该去找run_from_argv(self.argv)了,但是在Command类中怎么也找不到,傻逼了吧,去Command的父类BaseCommand里找啊。

    class BaseCommand:
        def run_from_argv(self, argv):
            """
    run_from_argv的作用是初始化中间件、启动服务,也就是拉起wgsi(但实际上并不是由它来直接完成,而是由后续很多其他代码来完成),直观上看它应该是runserver.Command对象的一个方法,但实际上要稍微更复杂一些,因为没有列出关联代码,所以在下一个代码块中进行说明。
            """
            self._called_from_command_line = True
            parser = self.create_parser(argv[0], argv[1])
    
            options = parser.parse_args(argv[2:])
            cmd_options = vars(options)
            args = cmd_options.pop('args', ())
            handle_default_options(options)
            try:
                self.execute(*args, **cmd_options) # 关键代码
            except Exception as e:
                if options.traceback or not isinstance(e, CommandError):
                    raise
                if isinstance(e, SystemCheckError):
                    self.stderr.write(str(e), lambda x: x)
                else:
                    self.stderr.write('%s: %s' % (e.__class__.__name__, e))
                sys.exit(1)
            finally:
                try:
                    connections.close_all()
                except ImproperlyConfigured:
    
                    pass
    

    关键代码self.execute(*args, **cmd_options),注意了,这个execute应该去Command类里找啊,因为该self是Command类的对象,让我们回到Command类中,找execute

    class Command(BaseCommand):
        
        def execute(self, *args, **options):
            if options['no_color']:
            super().execute(*args, **options) # 关键代码1
    
        def get_handler(self, *args, **options):
            """Return the default WSGI handler for the runner."""
            return get_internal_wsgi_application()
    
        def handle(self, *args, **options):
            if not settings.DEBUG and not settings.ALLOWED_HOSTS:
                raise CommandError('You must set settings.ALLOWED_HOSTS if DEBUG is False.')
    
            self.use_ipv6 = options['use_ipv6']
            if self.use_ipv6 and not socket.has_ipv6:
                raise CommandError('Your Python does not support IPv6.')
            self._raw_ipv6 = False
            if not options['addrport']:
                self.addr = ''
                self.port = self.default_port
            else:
                m = re.match(naiveip_re, options['addrport'])
                if m is None:
                    raise CommandError('"%s" is not a valid port number '
                                       'or address:port pair.' % options['addrport'])
                self.addr, _ipv4, _ipv6, _fqdn, self.port = m.groups()
                if not self.port.isdigit():
                    raise CommandError("%r is not a valid port number." % self.port)
                if self.addr:
                    if _ipv6:
                        self.addr = self.addr[1:-1]
                        self.use_ipv6 = True
                        self._raw_ipv6 = True
                    elif self.use_ipv6 and not _fqdn:
                        raise CommandError('"%s" is not a valid IPv6 address.' % self.addr)
            if not self.addr:
                self.addr = self.default_addr_ipv6 if self.use_ipv6 else self.default_addr
                self._raw_ipv6 = self.use_ipv6
            self.run(**options) # 关键代码2
    
        def run(self, **options):
            use_reloader = options['use_reloader']
    
            if use_reloader:
                autoreload.run_with_reloader(self.inner_run, **options)
            else:
                self.inner_run(None, **options) # 关键代码3
    
        def inner_run(self, *args, **options):
            autoreload.raise_last_exception()
    
            threading = options['use_threading']
            shutdown_message = options.get('shutdown_message', '')
            quit_command = 'CTRL-BREAK' if sys.platform == 'win32' else 'CONTROL-C'
    
            self.stdout.write("Performing system checks...
    
    ")
            self.check(display_num_errors=True)
            self.check_migrations()
            now = datetime.now().strftime('%B %d, %Y - %X')
            self.stdout.write(now)
            self.stdout.write((
                "Django version %(version)s, using settings %(settings)r
    "
                "Starting development server at %(protocol)s://%(addr)s:%(port)s/
    "
                "Quit the server with %(quit_command)s.
    "
            ) % {
                "version": self.get_version(),
                "settings": settings.SETTINGS_MODULE,
                "protocol": self.protocol,
                "addr": '[%s]' % self.addr if self._raw_ipv6 else self.addr,
                "port": self.port,
                "quit_command": quit_command,
            })
    
            try:
                handler = self.get_handler(*args, **options)
                run(self.addr, int(self.port), handler,
                    ipv6=self.use_ipv6, threading=threading, server_cls=self.server_cls) # 关键代码4
            except socket.error as e:
                ERRORS = {
                    errno.EACCES: "You don't have permission to access that port.",
                    errno.EADDRINUSE: "That port is already in use.",
                    errno.EADDRNOTAVAIL: "That IP address can't be assigned to.",
                }
                try:
                    error_text = ERRORS[e.errno]
                except KeyError:
                    error_text = e
                self.stderr.write("Error: %s" % error_text)
                os._exit(1)
            except KeyboardInterrupt:
                if shutdown_message:
                    self.stdout.write(shutdown_message)
                sys.exit(0)
    

    关键代码1:super().execute(* args,** options)会去父类BaseCommand中找到excute方法,该方法中的关键代码为output = self.handle( * args, ** options),该self是Command类的对象,所以接着去Command类中找到handle方法

    关键代码2 -> 关键代码3 -> 关键代码4 -> 定位到一个run方法,该方法就在本文件开头位置导入过

    from django.core.servers.basehttp import (
        WSGIServer, get_internal_wsgi_application, run,
    )
    

    截止到该部分,实际上就是一个初始化过程,全部都为'runserver'服务,虽然很多代码没有列出来,但是它确实做了一些,例如参数解析、端口指定检测、ipv4检测、ipv6检测、端口是否占用、线程检查等工作。

    接下来把注意力放在django.core.servers.basehttp下的run函数上,代码如下

    def run(addr, port, wsgi_handler, ipv6=False, threading=False, server_cls=WSGIServer): # 形参wsgi_handler的值为StaticFilesHandler
        """知会各个对象启动wsgi服务"""
        server_address = (addr, port) 
        if threading:
            httpd_cls = type('WSGIServer', (socketserver.ThreadingMixIn, server_cls), {}) # 关键代码1
        else:
            httpd_cls = server_cls
        httpd = httpd_cls(server_address, WSGIRequestHandler, ipv6=ipv6) # 关键代码2
        if threading:
            httpd.daemon_threads = True
        httpd.set_app(wsgi_handler) # 关键代码3
        httpd.serve_forever() # 关键代码4
    

    关键代码1:调用内置元类type创建一个类WSGIServer,该类继承了(socketserver.ThreadingMixIn, WSGIServer),去代码块WSGIServer类中查看它本身只继承了wsgiref.simple_server.WSGIServer、object这两个类,通过type重新创建一下是给类WSGIServer强行添加了一个爹socketserver.ThreadingMixIn,这么做的意义是每次调用类WSGIServer的时候都会单独启用一个线程来处理,说完了WSGIServer的第一个基类,我们再来说它的第二个基类WSGIServer完整的继承家族

    django.core.servers.basehttp.WSGIServer
    wsgiref.simple_server.WSGIServer、 socketserver.ThreadingMixIn
    http.server.HTTPServer
    socketserver.TCPServer
    socketserver.BaseServer
    object
    

    httpd_cls这个变量被定义完成之后,由于大量的继承关系,它其实已经不单纯的属于django,它是一个传统意义上的WSGI服务对象了。

    关键代码2:httpd = httpd_cls(server_address, WSGIRequestHandler, ipv6=ipv6)这行代码非常重要,因为它是WSGI服务器与django之间相互通信的唯一枢纽通道,也就是说,当WSGI服务对象收到socket请求后,会将这个请求传递给django的WSGIRequestHandler(下节会列出WSGIRequestHandler是如何工作的)。

    关键代码3:httpd.set_app(wsgi_handler)是将django.contrib.staticfiles.handlers.StaticFilesHandler传递给WSGIServer当作一个application,当WSGIServer收到网络请求后,可以将数据分发给django.core.servers.basehttp.WSGIRequestHandler,最终由django.core.servers.basehttp.WSGIRequestHandler将数据传递给application(即:django.contrib.staticfiles.handlers.StaticFilesHandler)。

    关键代码4:httpd.serve.forever() 启动非堵塞网络监听服务。

    总结:综上所述其实都是在为启动django服务而做准备,大致内容如下

    #1、解析运行 python manage.py 所提供的参数,例如: runserver.
    #2、根据参数 找到相对应的 命令管理工具。
    #3、加载所有的app。
    #4、检查端口、ipv4检测、ipv6检测、端口是否占用、线程检查、orm对象检查(表是否创建)。
    #5、实例化WSGIRequestHandler,并且将它注册到python Lib库中的WSGIServer中。
    #6、最后启动python Lib库中的WSGIServer
    

    三 httpd.serve.forever() 后续事宜

    承接上一小节httpd.serve_forever我们接着聊,httpd.serve_forever调用的是socketserver.BaseServer.serve_forever方法

    #1、socketserver.BaseServer.serve_forever方法采用了selector网络模型进行等待数据,每0.5秒遍历一次文件描述符,当有数据进来时,ready变量会是一个socket请求对象,这时会将后续工作转交给self._handler_request_noblock方法(即:socketserver.BaseServer._handler_request_noblock)去处理。
    
    #2、socketserver.BaseServer._handler_request_noblock方法基本没做什么事情(self.verify_request压根就没有检查任何东西),直接就把后续工作转交给 socketserver.BaseServer.process_request 方法。
    
    #3、socketserver.BaseServer.process_request也没做什么事情,直接就将后续工作转交给socketserver.BaseServer.finish_request方法,只不过在最后加了一条关闭请求的命令。
    
    #4、socketserver.BaseServer.finish_request也没做什么事情,直接就将后续工作转交给socketserver.BaseServer.RequestHandlerClass。
    
    #5、socketserver.BaseServer.RequestHandlerClass是由上一节httpd = httpd_cls(server_address, WSGIRequestHandler, ipv6=ipv6)传递过来的参数django.core.servers.basehttp.WSGIRequestHandler。 也就是说当执行self.RequestHandler(request, client_address, self)时等同于执行django.core.servers.basehttp.WSGIRequestHandler(request, client_address, self)。 
    

    serve_forever就是开启了一个while来无限监听网络层的scoket请求,当一条请求过来时,就层层转交到django.core.servers.basehttp.WSGIRequestHandler代码如下

    class WSGIRequestHandler(simple_server.WSGIRequestHandler):
        protocol_version = 'HTTP/1.1'
    
        def address_string(self):
            return self.client_address[0]
    
        def log_message(self, format, *args):
            extra = {
                'request': self.request,
                'server_time': self.log_date_time_string(),
            }
            if args[1][0] == '4':
                if args[0].startswith('x16x03'):
                    extra['status_code'] = 500
                    logger.error(
                        "You're accessing the development server over HTTPS, but "
                        "it only supports HTTP.
    ", extra=extra,
                    )
                    return
    
            if args[1].isdigit() and len(args[1]) == 3:
                status_code = int(args[1])
                extra['status_code'] = status_code
    
                if status_code >= 500:
                    level = logger.error
                elif status_code >= 400:
                    level = logger.warning
                else:
                    level = logger.info
            else:
                level = logger.info
    
            level(format, *args, extra=extra)
    
        def get_environ(self):
            for k in self.headers:
                if '_' in k:
                    del self.headers[k]
    
            return super().get_environ()
    
        def handle(self): # 关键代码
            self.close_connection = True
            self.handle_one_request()
            while not self.close_connection:
                self.handle_one_request() 
            try:
                self.connection.shutdown(socket.SHUT_WR)
            except (socket.error, AttributeError):
                pass
    
        def handle_one_request(self): 
            """Copy of WSGIRequestHandler.handle() but with different ServerHandler"""
            self.raw_requestline = self.rfile.readline(65537)
            if len(self.raw_requestline) > 65536:
                self.requestline = ''
                self.request_version = ''
                self.command = ''
                self.send_error(414)
                return
    
            if not self.parse_request(): 
                return
    
            handler = ServerHandler(
                self.rfile, self.wfile, self.get_stderr(), self.get_environ()
            )
            handler.request_handler = self     
            handler.run(self.server.get_app())
    

    关键代码:方法handle,至于如何调用到它,需要从WSGIRequestHandler的实例化说起,上面我们提到当执行self.RequestHandler(request, client_address, self)时等同于执行django.core.servers.basehttp.WSGIRequestHandler(request, client_address, self),而WSGIRequestHandler的继承的父类们如下

    #1、django.core.servers.basehttp.WSGIRequestHandler
    #2、wsgiref.simple_server.WSGIRequestHandler
    #3、http.server.BaseHTTPRequestHandler
    #4、socketserver.StreamRequestHandler
    #5、socketserver.BaseRequestHandler
    #6、object
    

    实例化类WSGIRequestHandler时发现它并没有 __ init __ 和 __ call __ 方法,需要去父类中找,最终在socketserver.BaseRequestHandler中找到,它调用了self.hande方法,注意self.handle并不是直接调用BaseRequestHandler中的handle,根据对象属性的查找关系,会去django.core.servers.basehttp.WSGIRequestHandler类中找,找到了handle,其实是相当于回调了handle,代码如下

        def handle(self):
            self.close_connection = True
            self.handle_one_request()
            while not self.close_connection:
                self.handle_one_request() # 关键代码
            try:
                self.connection.shutdown(socket.SHUT_WR)
            except (socket.error, AttributeError):
                pass
    

    关键代码:self.handle_one_request()直接在当前类中找到,代码如下

        def handle_one_request(self):
            """Copy of WSGIRequestHandler.handle() but with different ServerHandler"""
            self.raw_requestline = self.rfile.readline(65537)
            if len(self.raw_requestline) > 65536:
                self.requestline = ''
                self.request_version = ''
                self.command = ''
                self.send_error(414)
                return
    
            if not self.parse_request():  
                return
    
            # 关键代码1
            handler = ServerHandler(
                self.rfile, self.wfile, self.get_stderr(), self.get_environ()
            )
            # 关键代码2
            handler.request_handler = self      
            handler.run(self.server.get_app())
    

    关键代码1:实例化了ServerHandler对象。

    关键代码2:意思是将django.contrib.staticfiles.handlers.StaticFilesHandler转交给ServerHandler去运行,而ServerHandler对象并没有run方法,去它的父类们中去找,

    #1、django.core.servers.basehttp.ServerHandler
    #2、wsgiref.simple_server.ServerHandler
    #3、wsgiref.handlers.SimpleHandler
    #4、wsgiref.handlers.BaseHandler # 在此处找到run方法
    #5、object
    

    最终在 wsgiref.handlers.BaseHandler中找到了run方法,代码如下

    class BaseHandler:
       ............
    
        def run(self, application):
            try:
                self.setup_environ()
                self.result = application(self.environ, self.start_response) # 关键代码
                self.finish_response()
            except:
                try:
                    self.handle_error()
                except:
                    self.close()
                    raise  
    

    关键代码:application(self.environ, self.start_response)也就相当于是django.contrib.staticfiles.handlers.StaticFilesHandler.__ call __(self.environ, lf.start_response)。

    class StaticFilesHandler(WSGIHandler): # django专门用来处理静态文件的类
        """
        WSGI middleware that intercepts calls to the static files directory, as
        defined by the STATIC_URL setting, and serves those files.
        """
        # May be used to differentiate between handler types (e.g. in a
        # request_finished signal)
        handles_files = True
    
        def __init__(self, application):
            self.application = application
            self.base_url = urlparse(self.get_base_url())
            super().__init__()
    
        def load_middleware(self):
            # Middleware are already loaded for self.application; no need to reload
            # them for self.
            pass
    
        def get_base_url(self):
            utils.check_settings()
            return settings.STATIC_URL
    
        def _should_handle(self, path):
            """
            Check if the path should be handled. Ignore the path if:
            * the host is provided as part of the base_url
            * the request's path isn't under the media path (or equal)
            """
            return path.startswith(self.base_url[2]) and not self.base_url[1]
    
        def file_path(self, url):
            """
            Return the relative path to the media file on disk for the given URL.
            """
            relative_url = url[len(self.base_url[2]):]
            return url2pathname(relative_url)
    
        def serve(self, request):
            """Serve the request path."""
            return serve(request, self.file_path(request.path), insecure=True)
    
        def get_response(self, request):
            from django.http import Http404
    
            if self._should_handle(request.path):
                try:
                    return self.serve(request)
                except Http404 as e:
                    return response_for_exception(request, e)
            return super().get_response(request)
    
        def __call__(self, environ, start_response):
            if not self._should_handle(get_path_info(environ)):
                return self.application(environ, start_response) # 关键代码1
            return super().__call__(environ, start_response)
    

    关键代码1:self.application(environ, start_response) ,先说self.application是个啥呢,可以看到在该类的 __ init __ 方法中执行了一个self.application = application,那它的值到底是啥呢?

    教你一招,源码读到这里,不必再回头,读源码的窍门在于读一点记录一点,遇到看不懂的变量打印一下值看一下即可,最好不要重复回头,那样只会让你更晕,例如我们用管理用户(修改django源码需要权限)修改文件django.contrib.staticfiles.handlers.StaticFilesHandler加一行打印代码,

        def __init__(self, application):
            self.application = application
            print('django源码打印--->self.application值为',self.application) # 打印
            self.base_url = urlparse(self.get_base_url())
            super().__init__()
    

    然后重启django可以看到self.application的值为 < django.core.handlers.wsgi.WSGIHandler object at 0x106cf0278 > ,去查看类django.core.handlers.wsgi.WSGIHandler 的实例化发现加载了中间件self.load_middleware(),至此我们完成分析如何从wsgi服务到将url请求信息转交给django,剩下的就是django的内部流程啦,我们有机会再继续剖析吧

    另外补充:可以用同样的手法查看envion变量,该变量非常重要,http协议的请求信息都被放入了environ变量中。我们分析流程中的WSGIServer类主要用于处理socket请求和对接WSGIRequestHandler,WSGIRequestHandler类主要针对environ进行预处理和对接WSGIServerHandler,而ServerHandler类则主要用于执行应用程序(application)和返回响应给WSGIServer。

  • 相关阅读:
    Spring3:AOP
    Spring2:bean的使用
    Spring1:Spring简介、环境搭建、源码下载及导入MyEclipse
    Vue.js——60分钟组件快速入门(上篇)
    vue父子组件嵌套的时候遇到
    Vue.js——60分钟快速入门
    [Vue warn]: Cannot find element: #app
    关于RabbitMQ以及RabbitMQ和Spring的整合
    spring集成多个rabbitMQ
    RabbitMQ-从基础到实战(5)— 消息的交换(下)
  • 原文地址:https://www.cnblogs.com/amgulen/p/13951113.html
Copyright © 2011-2022 走看看