http://www.cppblog.com/qinqing1984/archive/2015/05/03/210521.html
引言
在Unix的世界里,万物皆文件,通过虚拟文件系统VFS,程序可以用标准的Unix系统调用对不同的文件系统,甚至不同介质上的文件系统进行读写操作。对于网络套接字socket也是如此,除了专属的Berkeley Sockets API,还支持一些标准的文件IO系统调用如read(v)、write(v)和close等。那么为什么socket也支持文件IO系统调用呢?在Linux上,这是通过套接口伪文件系统sockfs来实现的,因为sockfs实现了VFS中的4种主要对象:超级块super block、索引节点inode、目录项对象dentry和文件对象file,当执行文件IO系统调用时,VFS就将请求转发给sockfs,而sockfs就调用特定的协议实现,层次结构如下图:
本文以linux 2.6.34实现为基础,本篇阐述初始化和Socket创建两部分的实现,下篇阐述Socket操作和销毁两部分的实现。
初始化
在内核引导时初始化网络子系统,进而调用sock_init,该函数主要步骤如下:创建inode缓存,注册和装载sockfs,定义在net/socket.c中。
![](http://www.cppblog.com/Images/OutliningIndicators/None.gif)
2
![](http://www.cppblog.com/Images/OutliningIndicators/ExpandedBlockStart.gif)
3
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
![](http://www.cppblog.com/Images/dot.gif)
4
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
5
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
6
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
7
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
![](http://www.cppblog.com/Images/dot.gif)
8
![](http://www.cppblog.com/Images/OutliningIndicators/ExpandedBlockEnd.gif)
创建inode缓存
init_inodecache为socket_alloc对象创建SLAB缓存,名称为sock_inode_cachep,socket_alloc定义在include/net/sock.h中。
![](http://www.cppblog.com/Images/OutliningIndicators/ExpandedBlockStart.gif)
2
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
3
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
4
![](http://www.cppblog.com/Images/OutliningIndicators/ExpandedBlockEnd.gif)
注册sockfs
调用VFS的函数register_filesystem实现注册,sock_fs_type定义在net/socket.c中。
![](http://www.cppblog.com/Images/OutliningIndicators/ExpandedBlockStart.gif)
2
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
3
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
4
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
5
![](http://www.cppblog.com/Images/OutliningIndicators/ExpandedBlockEnd.gif)
sock_fs_type包含了文件系统sockfs的名称、创建和销毁super block的函数,其中sockfs_get_sb实现在net/socket.c中。
![](http://www.cppblog.com/Images/OutliningIndicators/None.gif)
2
![](http://www.cppblog.com/Images/OutliningIndicators/ExpandedBlockStart.gif)
3
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
4
![](http://www.cppblog.com/Images/OutliningIndicators/ExpandedBlockEnd.gif)
它在kern_mount内被执行,通过调用get_sb_pseudo创建了一个super block(包含一个对应dentry及一个关联inode):操作对象为sockfs_ops,根目录名称为socket:,对应的根索引节点编号为1。
sockfs_ops定义在net/socket.c中。
![](http://www.cppblog.com/Images/OutliningIndicators/ExpandedBlockStart.gif)
2
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
3
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
4
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
5
![](http://www.cppblog.com/Images/OutliningIndicators/ExpandedBlockEnd.gif)
sock_alloc_inode用于分配inode对象,将在socket创建过程中被调用;sock_destroy_inode用于释放inode对象,将在socket销毁过程中被调用;simple_statfs用于获取sockfs文件系统的状态信息。
装载sockfs
由kern_mount函数实现装载一个伪文件系统(当然,它没有装载点),返回一个static vfsmount对象sock_mnt。
经过以上步骤后,所创建的VFS对象关系如下图:
对于根目录项,不用进行路径转换,因此dentry的d_op为空(未画出);对于伪文件系统,操作索引对象没有意义,所以inode的i_op为空(未画出)。
Socket创建
系统调用socket、accept和socketpair是用户空间创建socket的几种方法,其核心调用链如下图:
从上图可知共同的核心就3个过程:先构造inode,再构造对应的file,最后安装file到当前进程中(即关联映射到一个未用的文件描述符),下面就这3个过程进行详细说明。
构造inode
由sock_alloc函数实现,定义在net/socket.c中。
![](http://www.cppblog.com/Images/OutliningIndicators/None.gif)
2
![](http://www.cppblog.com/Images/OutliningIndicators/ExpandedBlockStart.gif)
3
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
4
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
5
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
6
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
7
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
![](http://www.cppblog.com/Images/dot.gif)
8
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
9
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
![](http://www.cppblog.com/Images/dot.gif)
10
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
11
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
12
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
13
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
![](http://www.cppblog.com/Images/dot.gif)
14
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
15
![](http://www.cppblog.com/Images/OutliningIndicators/ExpandedBlockEnd.gif)
先调用new_inode创建inode对象,再设置它的类型为S_IFSOCK,由此可知inode对应的文件类型为套接字。new_inode是文件系统的一个接口函数,用于创建一个inode对象,定义在fs/inode.c中,它调用了sockfs超级块的操作对象即sockfs_ops的sock_alloc_inode方法,由于sock_alloc_inode实际创建的是socket_alloc复合对象,因此要使用SOCKET_I宏从inode中取出关联的socket对象用于返回。
构造file
有了inode对象后,接下来就要构造对应的file对象了,由sock_alloc_file实现,定义在net/socket.c中。
![](http://www.cppblog.com/Images/OutliningIndicators/None.gif)
2
![](http://www.cppblog.com/Images/OutliningIndicators/ExpandedBlockStart.gif)
3
![](http://www.cppblog.com/Images/OutliningIndicators/ExpandedSubBlockStart.gif)
4
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
5
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
6
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
7
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
8
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
9
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
![](http://www.cppblog.com/Images/dot.gif)
10
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
11
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
![](http://www.cppblog.com/Images/dot.gif)
12
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
13
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
14
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
15
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
16
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
17
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
18
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
19
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
![](http://www.cppblog.com/Images/dot.gif)
20
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
21
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
22
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
23
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
24
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
25
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
26
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
27
![](http://www.cppblog.com/Images/OutliningIndicators/ExpandedBlockEnd.gif)
sock为上一过程返回的套接字对象,该函数主要做了以下几件事:
1)得到空闲的文件描述符fd,实际上就是fd数组的索引,准备作为返回值。
2)先初始化路径path:其目录项的父目录项为超级块对应的根目录,名称为空,操作对象为sockfs_dentry_operations,对应的索引节点对象为sock套接字关联的索引节点对象,即SOCK_INODE(sock);装载点为sock_mnt。
sockfs_dentry_operations定义在net/socket.c中。
![](http://www.cppblog.com/Images/OutliningIndicators/ExpandedBlockStart.gif)
2
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
3
![](http://www.cppblog.com/Images/OutliningIndicators/ExpandedBlockEnd.gif)
sockfs_dname会被d_path调用,用于计算socket对象的目录项名称。
3)设置索引节点的文件操作对象为socket_file_ops,定义在net/socket.c中。
![](http://www.cppblog.com/Images/OutliningIndicators/ExpandedBlockStart.gif)
2
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
![](http://www.cppblog.com/Images/dot.gif)
3
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
4
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
5
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
![](http://www.cppblog.com/Images/dot.gif)
6
![](http://www.cppblog.com/Images/OutliningIndicators/ExpandedSubBlockStart.gif)
7
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
8
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
![](http://www.cppblog.com/Images/dot.gif)
9
![](http://www.cppblog.com/Images/OutliningIndicators/ExpandedBlockEnd.gif)
4)调用alloc_file,以path和socket_file_ops为输入参数,这样返回得到的file便与sock的inode关联上了,并且操作对象为socket_file_ops,最后设置到输出参数f中。
5)建立file与socket的一一映射关系。
安装file
由fd_install实现,定义在fs/open.c中。
![](http://www.cppblog.com/Images/OutliningIndicators/None.gif)
2
![](http://www.cppblog.com/Images/OutliningIndicators/ExpandedBlockStart.gif)
3
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
4
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
5
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
6
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
7
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
8
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
9
![](http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif)
10
![](http://www.cppblog.com/Images/OutliningIndicators/ExpandedBlockEnd.gif)
fd和file分别为上一过程返回的空闲文件描述符和文件对象,使RCU技术来设置file到当前进程的fd数组中。
经过以上过程后,所创建的VFS对象关系图如下
![](http://www.cppblog.com/images/cppblog_com/qinqing1984/sockfs_vfsobj_sock_create.png)
fd为file*数组的索引而不是成员字段;vfsmount与初始化之VFS对象关系图中的vfsmount是同一个对象,即sock_mnt;对于伪文件系统,操作索引对象没有意义,所以inode的i_op为空(未画出)。