Chapter 3 Document I / O3.1 Introduction This chapter begins to discuss U N i x system, first explain the available file I / O functions - open files, read files, write files, and more. Most U N i x files I / O require only 5 functions: O P e n, R E A D, W R I t e, Lseek, and C L O S E. Then explain the effect of different buffers on the length of R e a d and the W R I t E function. The functions described in this chapter are often referred to as I / O without cache (U N B U Ffered I / O, which is compared with the standard I / O function described in Chapter 5). The term - without cache means that each R e A D and W R I t e are all invoked in the core. These non-cached I / O functions are not part of ANSI C, but is part of P O S i x. 1 and X P G 3. As long as it involves sharing resources between multiple processes, the concept of atomic operations is very important. We will discuss this concept through file I / O and Parameters transmitted to the o P e n function. And further discuss how to share files between multiple processes and involve the relevant data structure of the kernel. After discussing these features, the D u p, f c n t l and the I O c t L function will be described. 3.2 File Descriptor For the kernel, all open files are referenced by the file descriptor. The file descriptor is a non-negative integer. When an existing file is opened or a new file is created, the kernel returns a file descriptor to the process. When reading, write a file, using the file descriptor returned by O P e n or C R e a t, transmitting it as a parameter to R E A D or W R I t e. According to conventions, UNIX shell combines file descriptor 0 with the standard input of the process, and the file descriptor 1 is combined with the standard output, and the file descriptor 2 is combined with the standard error output. This is a unix shell and a lot of applications used by the application, and independent of the kernel. Despite this, if this practice is not followed, then many U N i x applications cannot work. In P O S i x. 1 application, the number of magic numbers 0, 1, 2 should be converted into a symbol constant S t D i n_ f i l E N O, S T D O U T _f i l E N O and S T D E R R _ f i l E N O These constants are defined in the header file . The scope of the file descriptor is 0 ~ o p e n_ m a x (see Table 2 - 7). The upper limit of the early U N i x version is 1 9 (allowing each process to open 2 0 files), and now many systems add to 6 3. SVR 4 and 4. 3 BSD The variation range of the file descriptor does not specify, it is only subject to the total amount of memory, the word length of the whole word, and the soft or hard restrictions configured by the system administrator. . 3.3 Open function calls O P e n function You can open or create a file. #include
• O_WRONLY is only written. • O_RDWR read, write open. Many implementations are defined as 0, o _ W R o n ion defined as 1, o _ R D W R is defined as 2 to compatibility with early systems. You should specify only one in these three constants. The following constants are optional: • O_APpend is added to the end of the file at a time. 3. Section 11 will be described in detail. • O_CREAT Creates it if this file does not exist. When using this selection, the third parameter M O D E is simultaneously described, and the access license of the new file is described. (4. 5 Section will explain the permission bit of the file, then you can understand how Mode, and how to modify it with the UMASK value of the process.) • O_EXCL If the O_EXCL is specified in O_ CRE AT, the file already exists, Error. This can test if a file exists, if there is no existence, create this file into an atomic operation. 3. Section 11 will explain the atomic operation in more detail. • o_trunc If this file exists, and for only read or only written successfully, it is truncated to 0. • o_nOctty If the P A T H N A M e refers to a terminal device, this device does not assign the device as a control terminal for this process. 9.6 will explain the control terminal. • O_nonblock If P A T H N A M e refers to a F I f O, a block special file or a character special file, this selection is a non-blocking method for this file for this file and subsequent I / O operations. 1 2. 2 will explain this mode of work. The earlier system V version introduces the O _ N D E L AY flag, which is similar to the O _ N O N B L O C K (not blocked) select item, but has errors in the return value of the read operation. If the data cannot be read from the pipe, F I f O or device, the selection is not delayed to return R E A D to 0, which conflicts with the return value 0 indicating that the end of the file is read. S VR 4 still supports this semantic non-delayed selection, but new applications should be used instead of using non-blocking selection. • o_sync enables each W R I t e to wait until physical I / O operation is completed. 3. This selection will be used. O _ S Y N c selection is not an integral part of P O S i x. 1, but S vr 4 supports this selection. The file descriptor returned by O P e n must be the minimum unused descriptor number. This is used by many applications to open a new file on standard input, standard output, or standard error output. For example, an application can turn off the standard output first (usually the file descriptor 1), then open another file, you can understand that the file will be opened on the file descriptor 1 in advance. When the D u P 2 function is described in 3. 1 2, it can be understood that a better method is guaranteed to open a file on a given descriptor. File name and path name truncation If Name _ max is 1 4, and we try to create a new 3 6 UNIX environment with 1 5 characters in the current directory, what happens? According to the traditional, early system V versions, this method is allowed to use, but always cut the file name is from 1 4 characters, while the system of the BSD class returns an error ENAME to OLONG. This issue is not only related to creating new files. If n a m e _ m a x is 1 4, there is a file that has a file name exactly 1 4 characters, then this problem is encountered by any function (O P e n, s t a t, etc.) of its parameters (O P e n, s t a t, etc.). In P O S i x. 1, constant _ p o s i x _ n o _ T R u N c determines whether the file name or path name is to be truncated, or returns an error. Chapter 1 2 will illustrate this value to be changed for each different file system.
FIPS 151-1 requires returning an error. SVR 4 pairs of traditional system V file systems (S 5) do not guarantee return errors (see Table 2 - 6), but for the BSD style file system (UFS), SVR 4 guarantee return error, 4. 3 BSD always Returns an error. If _ p O s i x _ n o _ t r is valid, then the entire path name exceeds PAT H _ M A X, or the path name in the path name exceeds N a m e _ m a x, return E N a M e to O L o N G. 3.4 CREAT functions can also create a new file with the C R e A T function. #include
#include
Program 3-2 Create a void file Chapter 3 file I / O 3 9 Download Running the program Get: $ a. OUT $ ls -1 file.hole Check its size - RW-R - R - 1 Stevens 50 Jul 31 05:50 File.hole $ od -c file.hole observed actual content 0000000 Abcdefghij / 0/0/0/0/0/0000020/0/0/0/0/0/0/0/0 / 0/0/0/0/0/0/0/0000040/0/0/0/0/0/0/0/0 Abcdefg H0000060 I J0 0 0 0 0 6 2 Use the OD (1) command to observe the file The actual content. The - C flag in the command line indicates that the file content is printed in a character. It can be seen from it, 3 0 unwritten bytes in the middle of the file are read into zero. A seven-digit starting at each row is the byte displacement represented in an octave. This example calls the W R I t E function described in Section 3.8. 4. 1 2 Sections will be more explained for files with holes. 3.7 The READ function reads data from the Open file with the R e a d function. #include
#include
Table 3-1 Time results for reading operation with different cache lengths BUFFSIZE user CPU system CPU clock time cycle number (seconds) (second) 1 2 3. 8 3 9 7. 9 4 2 3. 4 1 468 8022 1 2. 3 2 0 2 734 4014 6. 1 1 0 0. 6 1 0 7. 2 367 2018 3. 0 5 0. 7 5 4. 0 183 6011 6 1. 5 2 5 3 2 7.8 1 3. 7 45 9016 4 0. 3 6. 6 7. 0 22 9511 2 8 0. 2 3. 3. 6 11 4762 5 6 0 1 1. 8 1. 9 5 7.01. 1 2 8691 024 0. 0. 6 0. 6 1 4352 048 0. 0 0. 4 0. 4 7 1 84 096 0 0. 4 98 192 0. 0. 3 0. 3 1 8 016 384 0. 0 0. 3 0. 3 9 032 768 0. 0 0. 3 0. 3 4 565 536 0 . 0 0. 3 0. 3 2 3131 072 0. 0 0. 3 0. 3 1 2 Program 3 - 3 Read the file, the standard output is reordered to / dev / NULL. The file system used in this test is a Berkeley Fast File System, with a block length of 8 1 92 bytes. (The block length is represented by S T _ B L K S I z E, in 4. 1 2: 8 1 9 2). The minimum value of the system C P U time begins to appear at B u f f s i z e to 8 1 92, continue to increase the cache length and there is no effect on this time. We will return to this example later. 3. 1 3 Section will use this to explain the effect of synchronous writing, 5. 3.10 File Sharing U N i x Supports sharing open files between different processes. This sharing is required between the introducing the D u P function. To this end, the kernel is used for the data structure for all I / O. The kernel uses three data structures, and the relationship between them determines a process that may generate another process in file sharing. (1) Each process has a record item in the process table, and one opens the file descriptor in each record, it can treat it as a vector, each descriptor takes an item. Associated with each file descriptor is: (a) File Descriptor Sign. (b) pointing to a pointer to a file entry. (2) The kernel maintains a file table for all open files. Each file table item contains: (a) file status flag (read, write, addiction, synchronization, non-blocking, etc.). (b) Current file displacement. 4 2 U N i X Environment Advanced Programming Download (c) Pointers to the Venus entry of this file. (3) Each open file (or device) has a V node structure. The V node contains a file type and a pointer information for functions for various operations for this file. For most files, the V node also contains the I node (index node) of the file. This information is read from the disk when the file is opened, so all information about the file is fast and available. For example, the i node contains the owner of the file, the device where the file is located, which points to the pointer of the actual data block used on the disk (4. 1 4, detailing the UNIX file system, will More introduction to I nodes.) We ignore some of the details that do not affect our discussion. For example, open a file descriptor table is usually in the user area without in the process table. In S V R 4, this data structure is a link table structure. The file table can be implemented in a variety of ways - not necessarily the file form item array.
In 4.3 b S D, the V node contains the actual I node (see Figure 3 - 1). S V R 4 stores the V node in the i node for most file system types. These implementations do not affect our discussion of file sharing. Figure 3 - 1 shows the relationship between the three tables of the process. This process has two different open files - a file opens to standard input (file descriptor 0), and the other open is the standard output (file descriptor is 1). Figure 3-1 Opening the kernel data structure of the file From the early version of U N i X, the basic relationship between these three tables has been maintained so far. This arrangement is very important for sharing files between different processes. When you are in the next chapter and other file sharing methods will return to this picture. The V node structure has recently added. When supporting multiple file system types on a given system, this work is required, which is independently completed by Peter Weinbe RG ER (Bell Lab) and Bill Joy (S Un Company) of. S u n call this file system as a virtual file system (Virtual Files Y S t e m), called the I node portion of the file system type as a V node [Kleiman 1986]. When various manufacturers have added support for the network file system (N f S) of S U N, they have widely used V node structures. In S V R 4, the V node changes the I node structure that is independent of the file system type in S V R 3. If the two independent processes each open the same file, there is a arrangement shown in Figure 3 - 2. We assume the first one to enter the third chapter file I / O 4 3 download FD logo file table V node table V node information I node information Current file length V node information I node information Current file length file status flag Current file bit shift V node Pointer file status flag The current file bitmode V node pointer processes The file is turned on on the file descriptor 3, while the other process turns this file on the file descriptor 4. Open each process of this file get a file entry, but only one V node entry for a given file. One reason for each process has its own file entry is that this arrangement makes each process have its own current displacement of the file. Figure 3-2 After the two independent processes have opened the same file, they will now be further illustrated for the operations described above. • After completing each W R I TE, the current file bitmills in the file entry increases the number of bytes written. If this makes the current file displacement exceeds the current file length, the current file length in the i node entry is set to the current file bit shift (that is, the file is extended). • If a file is opened with an O _ a P P e n d, the corresponding flag is also set to the file status flag of the file entry. Each time you write a write action for this file with the add-on flag, the current file bitmill in the file entry is first set to the length of the file in the i-node entry. This makes each write data to the current tail of the file. • The LSeek function only modifies the current file bitmill in the file entry, and does not perform any I / O operations. • If a file is positioned to L S e e k, the current file bitmill in the file entry is set to the current file length in the I node entry. There may be multiple file descriptors to point to the same file entry. When you discuss the D u p function in the 3. 1 2 section, we can see this. The same situation also occurred after F o R k, at which time the parent, the child process shares the same file entry for each open file descriptor. Note that the file descriptor flag and file status flag are different from the scope of action, and the former is only used for a descriptor of a process, while the latter applies to all descriptors in any process to the given file entry. At 3. 1 3: When the F c n t l function, we will learn how to access and modify the file descriptor flag and file status flag. Everything above can work correctly for multiple processes.
Each process has its own file entry, its 4 4 Unix Environment Advanced Programming Download Procedure Item File File Status Flag Current File Bit Show V Node Pointer File Status Sign Current File Bit Shift V Node Pointer V Node V Node Information I Node Information Current File Length FD Flag Process Item FD flag has its own current file bitmid. However, when multiple processes are written as a file, it may generate unexpected results. To illustrate how to avoid this, you need to understand the concept of atomic operations. 3. 11 Atom operation 3. 11.1 Add to a file to consider a process, it is to add data to a file end. Early UNIX version does not support Open's O _ Append selection, so the program is written into the following form: IF (Lseek (FD, 0L, 2) <0) / * position to eof * / err_sys ("Lseek Error") ; if (Write (FD, BUFF, 100)! = 100) / * and write * / err_sys ("Write Error"); for a single process, this program can work normally, but if there are multiple processes, There will be a problem. (If this program is executed simultaneously by multiple processes, this will be added to a journal file, which is assumed that there are two independent processes A and B, all of which add operations for the same file. Each process has been opened, but the O _ a pp e n d is not used. The relationship between each data structure is as shown in Figure 3-2. Every process has its own file entry, but shares a V node entry. Assume that the process A calls L S E E K, which sets the current displacement amount of the file of the process A to 1 5 0 0 bytes (at the end of the current file). The kernel switching process then runs the process B. Process B performs L S E E K, which is also set to 1 5 0 byte of the current displacement amount of the file (at the end of the current file). Then B call W R i T E, which increases the current file bitmill of this file to 1 6 0 0. Because the length of the file has increased, the kernel is updated to 1 6 0 0 in the current file length in the V node. Then, the kernel performs process switches to restore the process A to run. When a call W R i TE E, the data is written to the file from its current file bitmodes (1 5 0 0). This also transforms the process B just written to the data in the file. The problem here is in the logical operation "positioning file to the end of the file, then writes" using two separate function calls. The method of solving the problem is to make these two operations into one atomic operation for other processes. Any operation requiring more than 1 function call cannot be an atomic operation, because between two function calls, the kernel may temporarily suspend the process (as we assume before we have assumed). U N i x provides a method to make this operation a atomic operation, which is to set the O _ a pp e n d when the file is opened. As mentioned in the previous section, this will set the current displacement amount of the process to the end of the file before writing this file, so it no longer needs to call LSeek before each write. . 3. 11.2 Creating a file When describing the O _ C R e AT and O _ E x Cl of the o P e n function, we have seen another example of the atomic operation. When these two selection items are specified, the file has existed, O P e n will fail. We have mentioned whether the file is existed and the file is created. The two operations are performed as an atomic operation.
If there is no such an atomic operation, then the following block is written: IF ((fd = open (panave, o_wronly) <0) if (errno == Enoent) {IF ((fd = Creat (Pathname, Mode)) <0) Err_sys ("Creat Error");} elseerr_sys ("Open Error"); Chapter 3 File I / O 4 5 Download If another process creates this file, then there will happen problem. If there is between the two function calls, another process creates this file, and it is written to the file into some data, then the data just written will be wiped when the C R E A t in this program is executed. Bring both in one atomic operation, this problem will not be generated. In general, atomic operations refers to an operation consisting of multiple steps. If the operating atom is executed, or it is performed, or if it is not executed, it is impossible to implement a subset of all steps. At 4. 1 5: Atomic operations will also be discussed when the L i n k functions and the locks are included in 1 2. 3. 3.12 DUP and DUP 2 Functions The following two functions can be used to copy an existing file descriptor: #include
The difference between them is: (1) DUP2 is an atomic operation, and C L O S E and F C N T L include two function calls. It is possible to insert a signal capture function between C L O S E and F C N T L, which may modify the file descriptor. (Chapter 1 0 will explain the signal.) (2) There are some different E R R N o between D u p 2 and f c n t1. D u p 2 system call originated from V7 and then propagated to all B S D versions. The F C N T L method for replication file descriptor is first used by the system I i i, and the system V continues to be employed. S V R 3. 2 The D u P 2 function is selected, 4. 2 B S D selects the F C N t L function and the F _ D u P F function. P O S i x. 1 requires that the F _ D u p f d function of D u p 2 and f C N t L has both. 3.13 FCNTL Function F C N T L function can change the nature of the file already opened. #include
When describing the o p e n function, the file status flag has been described. They are listed in Table 3 - 2. Table 3-2 For FCNTL file status flag file status flag Description o _ rdonly read-only open o _ WRONLY only write O _ RDWR Read / write open o _ appended write to file tail O _ Nonblock Non-blocking method O _ SYNC Waiting to complete o _ async asynchronous I / O (4. 3 BSD) Unfortunately, three access mode flags (O _ RDON LY, O _ Wron LY, and O _ RDWR) does not 1 person. (As mentioned above, the value of these three signs is 0, 1 and 2, due to historical reasons. These three values mutually exclusive - one file can only have one of these three values.) So the first must use shielded word o _ Accmode acquires the access mode and then compares the results with these three values. • f_setfl Sets the file status flag to the value of the third parameter (taken as an integer value). Several markers that can be changed are: O _ a p p e n d, o _ n o n c l o c k, o _ s y n c and o _ a s y n C. • F_Getown Takes the process I d or process group I D of the current receiving S i g I O and S I g U R g signal. 1 2. 6. 2 will discuss these two 4. 3 b S D asynchronous I / O signal. • F_SETOWN Sets Process I D or Process Group I D for receiving S i g i o and s i g u r g signals. Positive A rg specifies a process I d, a negative A RG represents a process group I D equal to A rg absolute value. The return value of F c n t L is related to the command. If an error, all commands returns - 1, if success, returns some other value. The following three commands have a specific return value: f_dupfd, f_getfd, f_getfl, and f _ g e to w N. The first returns a new file descriptor, the second returns the corresponding flag, the last one returns a positive process I D or negative process group i D. Example Program 3 - 4 Take the command line parameters of a file descriptor and print its file flag for this descriptor. Program 3-4 Print file flag 4 8 U N i x Environment Advanced programming download Note, we use functional test macros _ P O S i x _ S O U R C E, and the condition is compiled with the file access flags defined in P O S i x. 1. Several conditions when calling the program from KORN S Hell: $ a.out 0 dev / ttyread only $ a.out 1> Temp.foo $ cat temp.fooWrite Only $ a.out 2 2> > Temp.fooWrite Only, Append $ A.out 5 5 5 <> Temp.fooreAd Writek ORN S Hell clause 5 <> Temp. foo Indicates the file temp. foo on the file descriptor 5 for reading, write. Examples must be cautious when modifying the file descriptor flag or file status flag, first get the current flag value, then follow the wish to modify it, and finalize the new flag value. Can't just perform F _ S e t f d or f_, which closes the previously set flag. Program 3 - 5 is a function that sets one or more file status flags for a file descriptor.
Program 3-5 Opens one or more file status flags on a file descriptor Chapter 3 File I / O 4 9 Download If a statement in the middle is changed to: VAL & = ~flags; / * Turn flags off * / It constitutes another function, we call it CLR _ FL and will be used in some examples later. This statement makes the current file status flag value V A L and F L A g s's back code logic and operation. If you are at the beginning of the program 3 - 3, add the following line to call S e t _ f L, turn on the synchronous write flag. Set_fl (stdout_fileno, o_sync); this causes each W R I t e to wait until the data is written to the disk back. In U N i x, typically W R i TE is only discharged into queues, and the actual I / O operation may be performed at some time later. The database system is likely to use O _ S Y N C, so that when the system crashes, it will know that the data is indeed written on the disk. When the program is running, the setting O _ S Y N c flag will increase clock time. To test this, run the program 3 - 3, which will be copied from one file on the disk to another file. Then, set the O _ S Y N C flag in this program to complete the same work described above, compare the results of the two, see Table 3 - 3. Table 3-3 Operation User CPU (Seconds) System CPU (Second) Clock Time (Second) Clock Time (Second) Clock Time (Second) Clock Time (2) is taken from Table 3-1 BUFFSIZE = 8192. 0. 3 0 . 3 disk files WRITE 0. 0 1. 0 2. 3o _ SYNC set the disk file write 0. 0 1. 4 1 3. 4 Table 3 - 3 3 rows in buffsize 8 1 9 2 The measurement was measured. The result of the results in Table 3 - 1 is the case where the disk file is read, and then written to / D E V / N U L, so there is no disk output. The second line in Table 3 - 3 corresponds to the read one disk file, and then writes to another disk file. That's why there is a difference in the first and 2 lines in Table 3 - 3. When writing disk files, the system time is increased because the kernel needs to copy data from the process and discharge the data into the queue by the disk drive to disk. When writing to disk files, clock time has also increased. When synchronous is written, the system time is slightly increased, and the clock time is increased by 6 times. From this example, we see the necessity of F C N T L. Our program operates on a descriptor (standard output), but does not know the file name of the corresponding file opened by S H E L L. Because this is S H E L L, it cannot be set by our request when it is open, so it cannot be set. f C N T L allows it to modify its nature as a descriptor that opens the file. When describing the non-blocking pipe (1 4. 2), we will also understand that since the identification of the P i P e is just its descriptor, the function of F C N T L is also required. 3.14 IOCTL Function IOCTL Function is an I / O operation of a glove box. I / O operations that cannot be represented by other functions in this chapter can usually be represented by I O C TI. Terminal I / O is the maximum use of IOCTL (Chapter 11 will introduce P O S i x. 1 has been used instead of the terminal I / O operation in I O C T L).
#include
Even if the system is ignored, and the following calls are successful: fd = open ("/ dev / fd / 0", o_rdwr); we still do not write F D. Chapter 3 Document I / O 5 1 Download We can also call C R e a t as the path name parameter with / d e v / f D, or call OP E n, and specify O _ C R e AT. This allows the program to call C R E A T, if the path name parameter is / d E v / f D / 1, etc. can still work. Some systems provide pathname / d E V / S T D I n, / d E V / S T D O U T and / D E V / S T D e R. These equivalents / D E V / F D / 0, / D E V / F D / 1 and / D E V / F D / 2. / D e v / f d file is mainly used by S H E L1, which allows the program to use path name parameters to treat standard inputs and standard outputs in a way to other path names. For example, the C A T (1) program explains a single-particularly interpretation of the command line as an input file name, which is the standard input. For example: Filter file2 | cat file1 - file3 | LPR First CAT read File 1, then read its standard input (the output of the filter file2 command), then read file 3, if you support / dev / fd, you can delete CAT pairs - Special treatment, so we can type the following command line: filter file2 | cat file1 / dev / fd / 0 file3 | LPR in the command line - As a parameter specification, the input or standard output is used by many programs. But this will bring some questions, such as if you use - specify the first file, then it seems like starting the selection of another command line. / d e v / f d increases the consistency of file name parameters and clearer. 3.16 Summary This chapter describes the traditional UNIX I / O function. Because each read, Write enters the kernel due to calling system calls, so these functions are called unacceptable I / O functions. In the case of only R e A D and W R I t, we observed the effects of different I / O lengths to read files. This chapter describes atomic operations when the same file is added to the same file and multiple processes. The kernel is also introduced to share the data structure of the open file information. These data structures will also be involved in the later part of this book. We also introduce I O C T L and F C N t L function. Chapter 1 will use these two functions to use I O C T L for stream I / O systems, and F C N T L is used to record locks. Exercise 3. 1 When reading / writing disk files, does the function described in this chapter have a cache mechanism? Please explain the reason. 3. 2 Write the same function as the D u P 2 function in the same 3. 1 2, requires no f C n t l function and has the correct error handling. 3. 3 In assumed a process to perform the following three function calls: fd1 = open (pathname, OFLAGS); FD2 = DUP (FD1); FD3 = Open (Pathname, Oflags); draw results map (see Figure 3 - 3 ).