File I/O
Introduction
We’ll start our discussion of the UNIX System by describing the functions available for file I/O—open a file, read a file, write a file, and so on. Most file I/O on a UNIX system can be performed using only five functions: open, read, write, lseek, and close.We then examine the effect of various buffer sizes on the read and write functions.
The functions described in this chapter are often referred to as unbuffered I/O, in contrast to the standard I/O routines, which we describe in Chapter 5. The term unbuffered means that each read or write invokes a system call in the kernel. These unbuffered I/O functions are not part of ISO C, but are part of POSIX.1 and the Single UNIX Specification.
Whenever we describe the sharing of resources among multiple processes, the concept of an atomic operation becomes important. We examine this concept with regard to file I/O and the arguments to the open function. This leads to a discussion of how files are shared among multiple processes and which kernel data structures are involved. After describing these features, we describe the dup, fcntl, sync, fsync, and ioctl functions.
File Descriptors
To the kernel, all open files are referred to by file descriptors. A file descriptor is a non-negative integer. When we open an existing file or create a new file, the kernel returns a file descriptor to the process. When we want to read or write a file, we identify the file with the file descriptor that was returned by open or creat as an argument to either read or write.
By convention, UNIX System shells associate file descriptor 0 with the standard input of a process, file descriptor 1 with the standard output, and file descriptor 2 with the standard error. This convention is used by the shells and many applications; it is not a feature of the UNIX kernel. Nevertheless, many applications would break if these associations weren’t followed.
Although their values are standardized by POSIX.1, the magic numbers 0, 1, and 2 should be replaced in POSIX-compliant applications with the symbolic constants STDIN_FILENO, STDOUT_FILENO, and STDERR_FILENO to improve readability.These constants are defined in the <unistd.h> header.
File descriptors range from 0 through OPEN_MAX−1. (Recall Figure 2.11.) Early historical implementations of the UNIX System had an upper limit of 19, allowing a maximum of 20 open files per process, but many systems subsequently increased this limit to 63.
With FreeBSD 8.0, Linux 3.2.0, Mac OS X 10.6.8, and Solaris 10, the limit is essentially infinite, bounded by the amount of memory on the system, the size of an integer, and any hard and soft limits configured by the system administrator.
open and openat Functions
A file is opened or created by calling either the open function or the openat function.
#include <fcntl.h>
int open(const char *path,int oflag,... /* mode_t mode */ );
int openat(int fd,const char *path,int oflag,... /* mode_t mode */ );
Both return: file descriptor if OK, −1 on error
O_RDONLY | Open for reading only. |
O_WRONLY | Open for writing only. |
O_RDWR | Open for reading and writing. |
Most implementations define O_RDONLY as 0, O_WRONLY as 1, and O_RDWR as 2, for compatibility with older programs. |
|
O_EXEC | Open for execute only. |
O_SEARCH | Open for search only (applies to directories). |
The purpose of the O_SEARCH constant is to evaluate search permissions at the time a directory is opened. Further operations using the directory’s file descriptor will not reevaluate permission to search the directory. None of the versions of the operating systems covered in this book support O_SEARCH yet. |
One and only one of the previous five constants must be specified. The following constants are optional:
O_APPEND | Append to the end of file on each write. We describe this option in detail in Section 3.11. |
O_CLOEXEC | Set the FD_CLOEXEC file descriptor flag. We discuss file descriptor flags in Section 3.14. |
O_CREAT | Create the file if it doesn’t exist. This option requires a third argument to the open function (a fourth argument to the openat function) — the mode, which specifies the access permission bits of the new file. (When we describe a file’s access permission bits in Section 4.5, we’ll see how to specify the mode and how it can be modified by the umask value of a process.) |
O_DIRECTORY | Generate an error if path doesn’t refer to a directory. |
O_EXCL | Generate an error if O_CREAT is also specified and the file already exists. This test for whether the file already exists and the creation of the file if it doesn’t exist is an atomic operation. We describe atomic operations in more detail in Section 3.11. |
O_NOCTTY | If path refers to a terminal device, do not allocate the device as the controlling terminal for this process. We talk about controlling terminals in Section 9.6. |
O_NOFOLLOW | Generate an error if path refers to a symbolic link. We discuss symbolic links in Section 4.17. |
O_NONBLOCK | If path refers to a FIFO, a block special file, or a character special file, this option sets the nonblocking mode for both the opening of the file and subsequent I/O. We describe this mode in Section 14.2. |
In earlier releases of System V, the O_NDELAY (no delay) flag was introduced. This option is similar to the O_NONBLOCK (nonblocking) option, but an ambiguity was introduced in the return value from a read operation. The no-delay option causes a read operation to return 0 if there is no data to be read from a pipe, FIFO, or device, but this conflicts with a return value of 0, indicating an end of file. SVR4-based systems still support the no-delay option, with the old semantics, but new applications should use the nonblocking option instead. |
|
O_SYNC | Have each write wait for physical I/O to complete, including I/O necessary to update file attributes modified as a result of the write. We use this option in Section 3.14. |
O_TRUNC | If the file exists and if it is successfully opened for either write-only or read–write, truncate its length to 0. |
O_TTY_INIT | When opening a terminal device that is not already open, set the nonstandard termios parameters to values that result in behavior that conforms to the Single UNIX Specification. We discuss the termios structure when we discuss terminal I/O in Chapter 18. |
The following two flags are also optional. They are part of the synchronized input and output option of the Single UNIX Specification (and thus POSIX.1).
O_DSYNC | Have each write wait for physical I/O to complete, but don’t wait for file attributes to be updated if they don’t affect the ability to read the data just written. |
The O_DSYNC and O_SYNC flags are similar, but subtly different. The O_DSYNC flag affects a file’s attributes only when they need to be updated to reflect a change in the file’s data (for example, update the file’s size to reflect more data). With the O_SYNC flag, data and attributes are always updated synchronously. When overwriting an existing part of a file opened with the O_DSYNC flag, the file times wouldn’t be updated synchronously. In contrast, if we had opened the file with the O_SYNC flag, every write to the file would update the file’s times before the write returns, regardless of whether we were writing over existing bytes or appending to the file. |
|
O_RSYNC | Have each read operation on the file descriptor wait until any pending writes for the same portion of the file are complete. |
Solaris 10 supports all three synchronization flags. Historically, FreeBSD (and thus Mac OS X) have used the O_FSYNC flag, which has the same behavior as O_SYNC. Because the two flags are equivalent, they define the flags to have the same value. FreeBSD 8.0 doesn’t support the O_DSYNC or O_RSYNC flags. Mac OS X doesn’t support the O_RSYNC flag, but defines the O_DSYNC flag, treating it the same as the O_SYNC flag. Linux 3.2.0 supports the O_DSYNC flag, but treats the O_RSYNC flag the same as O_SYNC. |
The file descriptor returned by open and openat is guaranteed to be the lowest numbered unused descriptor. This fact is used by some applications to open a new file on standard input, standard output, or standard error. For example, an application might close standard output — normally, file descriptor 1—and then open another file, knowing that it will be opened on file descriptor 1. We’ll see a better way to guarantee that a file is open on a given descriptor in Section 3.12, when we explore the dup2 function.
The fd parameter distinguishes the openat function from the open function. There are three possibilities:
1) The path parameter specifies an absolute pathname. In this case, the fd parameter is ignored and the openat function behaves like the open function.
2) The path parameter specifies a relative pathname and the fd parameter is a file descriptor that specifies the starting location in the file system where the relative pathname is to be evaluated. The fd parameter is obtained by opening the directory wherethe relative pathname is to be evaluated.
3) The path parameter specifies a relative pathname and the fd parameter has the special value AT_FDCWD. In this case, the pathname is evaluated starting in the current working directory and the openat function behaves like the open function.
The openat function is one of a class of functions added to the latest version of POSIX.1 to address two problems. First, it gives threads a way to use relative pathnames to open files in directories other than the current working directory. As we’ll see in Chapter 11, all threads in the same process share the same current working directory, so this makes it difficult for multiple threads in the same process to work in different directories at the same time. Second, it provides a way to avoid time-of-checkto-time-of-use (TOCTTOU) errors.
The basic idea behind TOCTTOU errors is that a program is vulnerable if it makes two file-based function calls where the second call depends on the results of the first call. Because the two calls are not atomic, the file can change between the two calls, thereby invalidating the results of the first call, leading to a program error. TOCTTOU errors in the file system namespace generally deal with attempts to subvert file system permissions by tricking a privileged program into either reducing permissions on a privileged file or modifying a privileged file to open up a security hole. Wei and Pu [2005] discuss TOCTTOU weaknesses in the UNIX file system interface.
Filename and Pathname Truncation
What happens if NAME_MAX is 14 and we try to create a new file in the current directory with a filename containing 15 characters? Traditionally, early releases of System V, such as SVR2, allowed this to happen, silently truncating the filename beyond the 14th character. BSD-derived systems, in contrast, returned an error status, with errno set to ENAMETOOLONG. Silently truncating the filename presents a problem that affects more than simply the creation of new files. If NAME_MAX is 14 and a file exists whose name is exactly 14 characters, any function that accepts a pathname argument, such as open or stat, has no way to determine what the original name of the file was, as the original name might have been truncated.
With POSIX.1, the constant _POSIX_NO_TRUNC determines whether long filenames and long components of pathnames are truncated or an error is returned. As we saw in Chapter 2, this value can vary based on the type of the file system, and we can use fpathconf or pathconf to query a directory to see which behavior is supported.
Whether an error is returned is largely historical. For example, SVR4-based systems do not generate an error for the traditional System V file system, S5. For the BSD-style file system (known as UFS), however, SVR4-based systems do generate an error. Figure 2.20 illustrates another example: Solaris will return an error for UFS, but not for PCFS, the DOS-compatible file system, as DOS silently truncates filenames that don’t fit in an 8.3 format. BSD-derived systems and Linux always return an error.
If _POSIX_NO_TRUNC is in effect, errno is set to ENAMETOOLONG, and an error status is returned if any filename component of the pathname exceeds NAME_MAX.
Most modern file systems support a maximum of 255 characters for filenames. Because filenames are usually shorter than this limit, this constraint tends to not present problems for most applications.
creat Function
A new file can also be created by calling the creat function.
#include <fcntl.h>
int creat(const char *path,mode_t mode);
Returns: file descriptor opened for write-only if OK, −1 on error
Note that this function is equivalent to
open(path,O_WRONLY | O_CREAT | O_TRUNC, mode);
Historically, in early versions of the UNIX System, the second argument to open could be only 0, 1, or 2. There was no way to open a file that didn’t already exist. Therefore, a separate system call, creat, was needed to create new files. With the O_CREAT and O_TRUNC options now provided by open,a separate creat function is no longer needed.
We’ll show how to specify mode in Section 4.5 when we describe a file’s access permissions in detail.
open(path,O_RDWR | O_CREAT | O_TRUNC, mode);
close Function
An open file is closed by calling the close function.
#include <unistd.h>
int close(int fd);
Returns: 0 if OK, −1 on error
Closing a file also releases any record locks that the process may have on the file. We’ll discuss this point further in Section 14.3.
When a process terminates, all of its open files are closed automatically by the kernel. Many programs take advantage of this fact and don’t explicitly close open files. See the program in Figure1.4, for example.
重定向和管道的命令行简介
重定向
假设您想要一张 images 目录中所有以 .png 结尾的文件列表
$ ls images/*.png 1>file_list
这表示把该命令的标准输出(1)重定向到(>)file_list 文件。其中的 > 操作符是输出重定向符。如果要重定向到的文件不存在,它将被创建;如果它已经存在,那么它先前的内容将被覆盖。
该操作符默认的描述符就是标准输出,因此就不用在命令行上特意指出。所以,上述命令可以简化为
$ ls images/*.png >file_list
其结果是一样的。然后您就可以用某个文本文件查看器(比如 less)来查看。
现在,假定您想要知道这样的文件有多少
wc -l 0<file_list
其中的 < 操作符是输入重定向符,并且其默认重定向描述符是标准输入(即 0)。因此您只需
wc -l <file_list
假定您又想去掉其中所有文件的“扩展名”,并将结果保存到另一个文件。您只要将 sed 的标准输入重定向为 file_list,并将其输出重定向到结果文件 the_list
sed -e 's/.png$//g' <file_list >the_list
重定向标准错误输出也很有用。例如:您会想要知道在 /shared 中有哪些目录您不能够访问。一个办法是递归地列出该目录并重定向错误输出到某个文件,并且不要显示标准输出:
ls -R /shared >/dev/null 2>errors
这表示标准输出将被重定向到(>)/dev/null,并将标准错误输出(2)重定向到(>)errors 文件。
管道
管道在某种程度上是标准输入和标准输出重定向的结合。其原理同物理管道类似:一个进程向管道的一端发送数据,而另一个进程从该管道的另一端读取数据。如Figure 3.0, 通过管道之后cmd1,cmd2的标准输出(standard output)不会显示在屏幕上面。
管道符是 |。
Figure 3.0 管道
让我们再来看看上述文件列表的例子。假设您想直接找出有多少对应的文件,而不想先将它们保存到一个临时文件,您可以
ls images/*.png | wc -l
这表示将 ls 命令的标准输出(即文件列表)重定向到 wc 命令的输入。这样您就直接得到了想要的结果。
注意:
1)管道命令只处理前一个命令正确输出(standard output),不处理错误输出(standard error)
2)管道命令右边命令,必须能够接收标准输入流(standard input)命令才行
您也可以使用下述命令得到“除去扩展名”的文件列表
ls images/*.png | sed -e 's/.png$//g' >the_list
或者,如果您想要直接查看结果而不想保存到某个文件:
ls images/*.png | sed -e 's/.png$//g' | less
lseek Function
Every open file has an associated ‘‘current file offset,’’ normally a non-negative integer that measures the number of bytes from the beginning of the file. (We describe some exceptions to the ‘‘non-negative’’ qualifier later in this section.) Read and write operations normally start at the current file offset and cause the offset to be incremented by the number of bytes read or written. By default, this offset is initialized to 0 when a file is opened, unless the O_APPEND option is specified.
An open file’s offset can be set explicitly by calling lseek.
#include <unistd.h>
off_t lseek(int fd,off_t offset,int whence);
Returns: new file offset if OK, −1 on error
- If whence is SEEK_SET, the file’s offset is set to offset bytes from the beginning of the file.
- If whence is SEEK_CUR, the file’s offset is set to its current value plus the offset. The offset can be positive or negative.
- If whence is SEEK_END, the file’s offset is set to the size of the file plus the offset. The offset can be positive or negative.
Because a successful call to lseek returns the new file offset, we can seek zero bytes from the current position to determine the current offset:
off_t currpos;
currpos = lseek(fd, 0, SEEK_CUR);
This technique can also be used to determine if a file is capable of seeking. If the file descriptor refers to a pipe, FIFO, or socket, lseek sets errno to ESPIPE and returns −1.
下列是较特别的使用方式:
欲将读写位置移到文件开头时: lseek(fd, 0, SEEK_SET);
欲将读写位置移到文件尾时: lseek(fd, 0, SEEK_END);
想要取得目前文件位置时: lseek(fd, 0, SEEK_CUR);
The three symbolic constants—SEEK_SET, SEEK_CUR, and SEEK_END—were introduced with System V. Prior to this, whence was specified as 0 (absolute), 1 (relative to the current offset), or 2 (relative to the end of file). Much software still exists with these numbers hard coded.
The character l in the name lseek means ‘‘long integer.’’ Before the introduction of the off_t data type, the offset argument and the return value were long integers. lseek was introduced with Version 7 when long integers were added to C. (Similar functionality was provided in Version 6 by the functions seek and tell.)
Example
The program in Figure3.1 tests its standard input to see whether it is capable of seeking.
/**
* 文件名: fileio/seek.c
* 内容:用于测试对其标准输入能否设置偏移量
* 时间: 2016年 08月 23日 星期二 16:03:00 CST
* 作者:firewaywei
*
*/
#include"apue.h"
int
main(void)
{
if(lseek(STDIN_FILENO,0, SEEK_CUR)==-1)
{
printf("cannot seek ");
}
else
{
printf("seek OK ");
}
exit(0);
}
Figure 3.1 Test whether standard input is capable of seeking
If we invoke this program interactively, we get
$ ./a.out < /etc/passwd
seek OK
$ cat < /etc/passwd | ./a.out
cannot seek
$ ./a.out < /var/spool/cron/FIFO
cannot seek
Normally, a file’s current offset must be a non-negative integer. It is possible, however, that certain devices could allow negative offsets. But for regular files, the offset must be non-negative. Because negative offsets are possible, we should be careful to compare the return value from lseek as being equal to or not equal to −1, rather than testing whether it is less than 0.
The /dev/kmem device on FreeBSD for the Intel x86 processor supports negative offsets. Because the offset (off_t) is a signed data type (Figure 2.21), we lose a factor of 2 in the maximum file size. If off_t is a 32-bit integer,the maximum file size is 2^31−1bytes.
lseek only records the current file offset within the kernel—it does not cause any I/O to take place. This offset is then used by the next read or write operation.
The file’s offset can be greater than the file’s current size, in which case the next write to the file will extend the file. This is referred to as creating a hole in a file and is allowed. Any bytes in a file that have not been written are read back as 0.
A hole in a file isn’t required to have storage backing it on disk. Depending on the file system implementation, when you write after seeking past the end of a file, new disk blocks might be allocated to store the data, but there is no need to allocate disk blocks for the data between the old end of file and the location where you start writing.
Example
The program shown in Figure 3.2 creates a file with a hole in it.
/**
* 文件名: fileio/hole.c
* 内容:用于创建一个具有空洞的文件。
* 时间: 2016年 08月 23日 星期二 16:03:00 CST
* 作者:firewaywei
*/
#include"apue.h"
#include<fcntl.h>
char buf1[]="abcdefghij";
char buf2[]="ABCDEFGHIJ";
int
main(void)
{
int fd;
if((fd = creat("file.hole", FILE_MODE))<0)
{
err_sys("creat error");
}
/* offset now = 10 */
if(write(fd, buf1,10)!=10)
{
err_sys("buf1 write error");
}
/* offset now = 16384 */
if(lseek(fd,16384, SEEK_SET)==-1)
{
err_sys("lseek error");
}
/* offset now = 16394 */
if(write(fd, buf2,10)!=10)
{
err_sys("buf2 write error");
}
exit(0);
}
Running this program gives us
$ ./hole
$ ll file.hole
-rw-r--r-- 1 fireway fireway 16394 8月 23 16:18 file.hole
fireway:~/study/apue.3e/fileio$ od -c file.hole
0000000 a b c d e f g h i j