最近在学习Python网络编程模块,说到网络编程最基础的当属socket编程.网上关于Python Socket编程的文章也不少,但楼主有一个习惯,用完别人的东西之后一定要明白其中的原理,理解作者的思想用起来才得心应手。要理解作者的思想最好的方法当属看源码。本来想从最基本的socket模块开始研究的,但想了想还不如那样的话还不如直接研究Unix socket编程.于是就从SocketServer模块开始研究,看了一下SocketServer模块的代码 733行,
但是去掉注释后估计也就几百行代码.
我们这里先对SocketServer模块的官方介绍(help(SocketSever))进行解释:
For socket-based servers:
- address family:
- AF_INET{,6}: IP (Internet Protocol) sockets (default)
- AF_UNIX: Unix domain sockets
- others, e.g. AF_DECNET are conceivable (see <socket.h>
- socket type:
- SOCK_STREAM (reliable stream, e.g. TCP)
- SOCK_DGRAM (datagrams, e.g. UDP)
上面的常量代表的是TCP/IP协议的一些名称
address family IP地址类型
AF_INET 因特尔协议也就是我们常说的TCP/IP
AF_UNIX Unix上的socket
socket type 套接字类型
SOCKET_STREAM 面向连接的套接字,即在通信之前一定要建立一条连接,就跟跟朋友打电话那样.这种通信方式也被称为“虚电路”或“;流套接字”。实现这种协议就是传输控制层即TCP。我们HTTP协议就是基于TCP协议。
SOCKET_DGRAM 面向无连接,这意味着无需建立连接就可以进行通信。就像邮政服务一样,给你送东西之前不需要通知你,直接把你包裹放到你门前(当然这是美国邮政方法)你自己取。实现这种协议的就是用户数据包协议即UDP,我们常用的聊天工具就是基于这种传输方式。
For request-based servers (including socket-based):
- client address verification before further looking at the request
(This is actually a hook for any processing that needs to look
at the request before anything else, e.g. logging)
这段话搞不明白(谁能翻译一下).不过不影响。
- how to handle multiple requests:
- synchronous (one request is handled at a time)
- forking (each request is handled by a new process)
- threading (each request is handled by a new thread)
如何处理多个请求?
同步(在同一个时间只处理一个请求)
进程(每个请求申请一个新的进程进行处理)
线程(每个请求开辟一条新的线程进行处理)
The classes in this module favor the server type that is simplest to
write: a synchronous TCP/IP server. This is bad class design, but
save some typing. (There's also the issue that a deep class hierarchy
slows down method lookups.)
该模块的类的服务类型是一个最简单的同步的TCP/IP服务.一个糟糕的设计,不过
节省一些打印(调侃语气?) 同时还有一个问题:深层次的类在查找方法的时候很慢。
There are five classes in an inheritance diagram, four of which represent
synchronous servers of four types:
+------------+
| BaseServer |
+------------+
|
v
+-----------+ +------------------+
| TCPServer |------->| UnixStreamServer |
+-----------+ +------------------+
|
v
+-----------+ +--------------------+
| UDPServer |------->| UnixDatagramServer |
+-----------+ +--------------------+
以上是该模块类的设计。不过通过工具我们查看更详细类图
我们从上面类图可以看出类图的设计分两个方向:面向连接Stream,面向无连接Datagram.
当然最基础的类是BaseServer。有了大体设计思路,我们在接着看下面的介绍.
Note that UnixDatagramServer derives from UDPServer, not from
UnixStreamServer -- the only difference between an IP and a Unix
stream server is the address family, which is simply repeated in both
unix server classes.
UnixDatagramServer继承自UDPServer,并不是继承自UnixStreamServer。IP和Unix Stream唯一的区别就是 地址类型。
Forking and threading versions of each type of server can be created
using the ForkingMixIn and ThreadingMixIn mix-in classes. For
instance, a threading UDP server class is created as follows:
class ThreadingUDPServer(ThreadingMixIn, UDPServer): pass
进程和线程版本的类可以通过使用ForkingMixIn 和 ThreadingMixIn 这两个mix-in类创建。
一个线程类型的UDPServer实例通过下面方式创建:
class ThreadingUDPServer(ThreadingMixIn, UDPServer): pass
可以看出ThreadingUDPServer继承自ThreadingMixIn和UDPServer的多继承类。其实我们上面的类的作用我们都可以从类名上判别出来。
The Mix-in class must come first, since it overrides a method defined
in UDPServer! Setting the various member variables also changes
the behavior of the underlying server mechanism.
Mix-in类必须放在前面,尽管他覆盖了UDPServer定义的类。设置多个成员变量同时改变了类的底层的服务机制。
To implement a service, you must derive a class from
BaseRequestHandler and redefine its handle() method. You can then run
various versions of the service by combining one of the server classes
with your request handler class.
为了实现一个服务,你必须继承自BaseRequestHandler并且重新定义它的handle()方法。
使用你的handler class你可以运行各种类型的服务通过结合其中的一个服务类。
The request handler class must be different for datagram or stream
services. This can be hidden by using the request handler
subclasses StreamRequestHandler or DatagramRequestHandler.
针对datagram或者stream的请求 你必须编写不同的处理类。
通过使用子类StreamRequestHandler和DatagramRequestHandler隐藏这些差别。
Of course, you still have to use your head!
当然你自己心里必须清楚
For instance, it makes no sense to use a forking server if the service
contains state in memory that can be modified by requests (since the
modifications in the child process would never reach the initial state
kept in the parent process and passed to each child). In this case,
you can use a threading server, but you will probably have to use
locks to avoid two requests that come in nearly simultaneous to apply
conflicting changes to the server state.
#要保持清醒头脑,比如当服务需要在内存中保有一个状态信息,而每个request都可以修改它,那么此时
使用多进程server就是没有意义的。因为子进程对状态信息作的修改不能被父进程获取,也就不能被父进程传递给其他的子进程。这个时候应该使用多线程server,但是很可能我们需要用lock来防止两个线程同时修改
这个状态信息
On the other hand, if you are building e.g. an HTTP server, where all
data is stored externally (e.g. in the file system), a synchronous
class will essentially render the service "deaf" while one request is
being handled -- which may be for a very long time if a client is slow
to reqd all the data it has requested. Here a threading or forking
server is appropriate.
#另一方面,如果想建一个有很大量数据读写的server(如HTTP server),那么同步方式很可能让server“假死”,此时应该考虑使用多进程或多线程
In some cases, it may be appropriate to process part of a request
synchronously, but to finish processing in a forked child depending on
the request data. This can be implemented by using a synchronous
server and doing an explicit fork in the request handler class
handle() method.
#有些情况下可能要先以同步方式处理一个请求的一部分,然后根据需要建子进程来处理剩下的数据,这种情况可以在request handler class的handle()函数中显式调用fork()
Another approach to handling multiple simultaneous requests in an
environment that supports neither threads nor fork (or where these are
too expensive or inappropriate for the service) is to maintain an
explicit table of partially finished requests and to use select() to
decide which request to work on next (or whether to handle a new
incoming request). This is particularly important for stream services
where each client can potentially be connected for a long time (if
threads or subprocesses cannot be used).
#有的时候,我们要同时处理多个request,却又不能使用fork和thread(比如因为资源开销不允许),这时候可以
用select模型
更多信息
Future work:
- Standard classes for Sun RPC (which uses either UDP or TCP)
- Standard mix-in classes to implement various authentication
and encryption schemes
- Standard framework for select-based multiplexing