Summary
Traditionally, the consistency maintenance of file system after maintaining power-down or system crashes is mainly used: one is to synchronize the metadata (Metadata) having dependent order, and the other is to write the writing form. Atomic operation organization. Soft Updates, a method different from them, is an implementation mechanism for ensuring the total retention of the file system on the disk by ensuring metadata. Use SoftUpdates to avoid demand for stand-alone logs or a large number of synchronous write. At the same time, it can also merge many previously independent and synchronous operations, thereby reducing 40% -70% write operation in a file operation-intensive environment (such as program development, mail servers, etc.). While improving performance, SoftUpdates can better maintain the consistency of the file system. By ensuring that inconsistency is only in the unspecite block or I-node, SoftUpdate can eliminate the dependence of the file system check program for the system crash. In this way, the file system is immediately available after the restart. In addition, the lost block and the I-node can be recovered in a file system in the run state by a background task.
This paper describes a SoftUpdates implementation integrated into the 4.4BSD Fast File System (Fast FileSystem). It details the system when establishing a quality-quality system, modifications to the research prototype and the BSD system. At the same time, it also discusses the lessons, difficulties, and lessons that will be transferred from research to reality; those unconventional file system operations (for example, fsck and 'fsync') need to be re-considered and increased. Code. The final real-world experience has proven that earlier research results: Softupdates is well integrated into existing document systems to ensure metadata relevance, and basically reach the best performance.
Section 1 Research Background and Introduction
Metadata (such as a file directory, I-node, and idle block mapping table) indicate the structure of the raw store. Metadata provides a pointer and descriptor, linking the sectors on the disk into files and distinguishes them. In order to maintain reliable storage for a long time, the file system must have a non-expected system crash, such as power failure and operating system failure, to ensure the integrity of metadata. Since similar crashs typically result in the loss of all information stored in the volatile main memory, the information stored in the non-volatile memory (eg, a disk) must always have sufficient consistency to determine the file system. Consistent. In particular, the mirroring of the file system can never include a suspension pointer, and it is not enough to lead to a resource ownership pointer, or an unprecedable activity resource. Maintaining these principles The small metadata objects are usually updated sequentially (or by atomic operation packet).
In the past, the BSD Express File System (FFS) and its derived systems used synchronous writes to ensure stable memory write order. For example, create a file in the BSD system, first need to assign and initialize a new I-node and populate a new directory pointing. Since synchronous writing, the file system will force the application of the file to wait for these initialization operations, and the result is that the operation is similar to creation in these systems, and the deletion file will be in the disk, not the speed of the CPU / memory. . Since disk operation is slower than other components, synchronous writing will reduce system performance. Metadata update issues can also be resolved by other mechanisms under the premise of using NVRAM technology, for example, uninterruptible power supplies (UPS) or Flash can be used. At this point, just ensure the consistency of NVRAM, and the update can be copied to the disk in any convenient manner. Also is an atomic operation group that groups an operation to contain some kind of write log or use ShadowPaging. In summary, these methods can achieve the purpose by adding information that can be used to rebuild submitted metadata after system failure or media corruption. Many modern file systems have successfully used the written log to get better performance than synchronous writing. Another way in [Ganger & Patt, 1994] is recommended, SoftUpdates, and evaluated in the research model. Use SoftUpdates, file system delay writing (such as copy-written cache) metadata, tracking updated dependencies, and maintaining one-one dependencies between them when copying. Because many metadata blocks contain a large number of pointers, if the dependency is only in block-level records, the SoftUpdates tracks the dependencies of the pointer, which allows the block to write in any order. Non-independent updates will roll back before other writing and restore them after writing, and loop dependence is thus eliminated. When using SoftUpdates, the application always sees the latest metadata block copy, and the data on the disk is always consistent with other content on it. In this paper, we describe the process of integrating SoftUpdates in 4.4BSD FFS in NetBSD, OpenBSD, FreeBSD, and BSDI operating systems. At the same time, we discussed the experience, lessons, and describe some of the more complex problems in the file system, using core memory tracking dependencies, complete "fsync" call implementations, some system calls, and so on. Correctly detect and process lost resources in FSCK, clean and correctly complete additional considerations you need to do, and increase the complexity of the code accordingly. Despite these difficulties, our performance test confirmed the conclusions of the previous research. In particular, using SoftUpdates in BSD FFS eliminates 5% of the differences between most synchronous writes, and, and the theoretical optimization (FFS fully updated) is less than 5%. At the same time, SoftUpdate makes BSD FFS semantics, more integrity, and better security guarantee. In addition, it can restore immediately after collapse (no need to perform FSCK first). The rest of this paper includes: Section 2, describing the update dependencies in BSD FFS operations; Section 3 describes how BSD SoftUpdates implements how to handle them, including critical data structures, how to use these structures, and integrated to 4.4BSD operations The process in the system; the section 4 discusses the experience and lessons we have obtained into the production environment realization; Section 5 summarize the performance improvement after the introduction of SoftUpdates in 4.4BSD systems; Section 6 discusses new File system snapshot support, and this feature how this feature is used in a local FSCK for background execution in the background; Section 7 summary The status and availability of BSD SoftUpdates code.
Section 2 Update Dependencies in the BSD Quick File System Many important file system operations consist of a series of associated metadata updates. In order to ensure that the non-expected fault can be recovered, these modifications usually must be copied to reliable storage in a specific order. For example, when you create a new file, the file system first assigns an I-node to initialize it and create a directory entry to it. If the system has been written in the new directory entry, the corresponding I-node crashes when it is not written, the integrity will be destroyed because the I-node status on the disk is unknown. In order to ensure the consistency of metadata, the initialized I-node must reach the reliable memory before the new directory item. We call this requirement to update dependencies - safely write to directory items depending on the first I-node. Update order can be described with three simple rules:
1. Never point to it before a structure is initialized (for example, the I-node must be initialized before the directory item references) 2. Never reuse this resource before all poks to a resource (for example, pointing The i-node pointer of the data block must be cleared before that data block is assigned to other I-nodes.) 3. Never implement a reset operation on the old pointer before the new pointer settings of a live resource (for example, When you are more named, you should not remove the old name of this i-node before writing to the new name). This section describes the update dependence issues in BSD FFS, limited to space, we assume that readers have a preliminary understanding of the BSD FFS described in [McKusick ETAL, 1996].
There are 8 BSD FFS operations to be updated sequentially to ensure the recovery of crash: Create a file, delete files, create a directory, remove directory, file / directory rename, block allocation, indirect block maintenance, and idle mapping table management.
I-nodes and data blocks are two major resources for BSD FFS management. In order to manage these resources, two bit mapping tables are used to manage these resources. For each I-node of the file system, there is a corresponding bit in the I-node bitmap table, and when this location 1 indicates that this I-node is used, and 0 means that this I-node is idle. Similarly, for each data block, there is a corresponding bit in the data block mapping table indicating that it is idle or in use. The FFS file system can be split into a fixed size unit in a cylinder group. Each cylinder group has a block including the I-node and the block of the data block in the current cylindrical group. For a large file system, this organizational structure enables the core memory (Kernel Memory) that can only store this small file system unit. The active cylindrical group is stored in a separate I / O buffer and can be written independently of other cylindrical groups.
When you create a file, the three component metadata in different stand-alone blocks will be modified. The first is a new, initialized I-node, which includes the file type, which is set to 1 connection count (this means that it is active, such as being referenced by a directory entry), its permissions and other information; Then I-Node Bit Map table to reflect the status of the I-node assignment; the last is the new directory entry, which will include the new file name, and a pointer to the i-node. To ensure that the bitmap table generally reflects all allocated resources, the bit mapping table must be written before I-node or directory item. Since the initialization I-node is in an unknown state before the write disk, the rule 1 requires all related updates about it must be written after writing. Although not explicitly required, most FFS implementations are written to the directory block before the system call is created. This additional synchronization write ensures that the file name has been saved to a reliable memory when the application is subsequent "FSYNC" system call. If you don't do this, then "fsync" call will have to find all unwritten directories containing the name of the file and write to disk. Similar update dependencies also exist when they specify another name for an I-node (also known as Hard Link, hard connection), because adding the second name requires the file system to increase the connection count of the I-node, and write the directory Override this I-node before. When you delete a file, you will modify the directory block, i-node, and several column-faced mapping tables. In the directory block, the relevant directory item will be "removed", this operation clears the I-node pointer. In the I-node block, the type field of the associated I-node, the connection count, and the data block pointer will be cleared. The data block of the deleted file and the I-node are then reflected by the corresponding data block / i-node mapping table. Rule 2 specifies the update dependencies between directory items and I-nodes, and I-nodes and all modified mapping epitopes. In order to keep the connection count, there is a similar dependency when a file alias (hard connection) is removed.
Creating and deleting directories are generally the same as the aforementioned manual operation. However, because ".." is a subdirectory pointing to the parent directory, which triggers more update dependencies. In particular, when creating, the connection count of the parent directory must be added and synchronized to the disk before the ".." pointer of the new directory. Similarly, when deleting, the connection count of the parent directory must be reduced after the ".." pointer of the subdirectory (of course, this clearing process is impurned when the subdirector is deleted, not actually executed).
When the new block is assigned, it will be updated in the location where the bit mapping table is used to reflect it, while the contents of the block will be initialized in newly written data or 0. In addition, the pointer to the new data block will be added to the I-node or indirect block (later in detail later). To ensure that the bit mapping table on the disk generally reflects the allocated resource, the bit mapping table must be written before the pointer. Similarly, since the newly allocated disk position is unknown, the rule 1 specifies the new block and the update dependence between its pointers. Since the data throughput will be reduced by maintaining this update dependencies by using synchronous write methods, many implementations ignore this process for regular data blocks. This design will weaken integrity and security, because the newly allocated block often contains deleted file data. From this point, SoftUpdates will be protected against block allocation under almost no loss of performance.
Maintaining indirect blocks do not lead to different new renewal dependencies, but they are very meaningful here. The interior block and the allocation of the block pointer pointing to the indirect block are similar, while the file is deleted, and the release is more important to it.
Taking into account the reference to the I-node is to determine if the block is directly or indirectly connected thereon, the I-node that is emptied to the indirect block refers to the recoverable pointer to clear all declaration (SAID) blocks. . Once the pointer is written, all blocks it point to can be released; only the interruption of the file will generate an indirect block pointer and block update dependencies. Some FFS implementations do not include this difference, although this may greatly extend the operation time when deleting big files. Documentation will affect two directory items when file change. A new directory item will be created (including new name) to point to the corresponding I-node, and the original directory item is removed. Rule 3 is determined that the new directory item should be created before the old directory item is removed to ensure that the reference to the file is not lost after the crash. If the reference count itself is counted, the rename operation needs to perform four disk updates as follows: add the connection count of the I-node, create a new directory item, remove the old item, reduce the connection count of the I-node; if The same name is existing, then it should be deleted first.
The rename is POSIX file operation, in other words, this operation is atom in a multi-user environment. Interestingly, POSIX did not require the above semantics, and most implementations could not provide it.
Among the active file system is often changed. Therefore, the copy of the bit mapping table in the core memory is often different from the disk. If the system is stopped before writing these differences, some of the allocated I-nodes and block data cannot be reflected on the disk. To maintain consistency, startup after system crashing typically must run file system checker FSCK to check all I-nodes in the file system to determine the I-nodes in the file system and allow the bitmap table in the consistent state. One additional advantage of SoftUpdates is that it can track the write operation for the bitmap tab and use this information to ensure that the newly allocated I-node or pointer to the newly allocated data block will definitely list the bit mapping table. After writing a disk. This guarantees that there will never be assigned I-nodes or data blocks not marked on the bit map table, so that FSCK is not required to run FSCK after the system crashes. This feature will be described in Section 6.