The File System This chapter describes how Linux maintains files in the file system it supports. Describes the virtual file system vfs and explains one of the true file systems in Linux core to a one of the most important features that support Linux, making it supports many different file systems. This makes it very flexible, you can coexist with many other operating systems. When writing this chapter, Linux has been supporting 15 file systems: EXT, EXT2, XIA, Minix, UMSDoS, MSDoS, VFAT, Proc, SMB, NCP, ISO9660, SYSV, HPFS, AFFS, and UFS, and is not suspicious, with Time passed, add more file systems. In Linux, like UNIX, different file systems that can be used by the system are not accessed through the device identifier, such as drive numbers, or device names, but are connected to a single tree structure, representing a unified single entity File system. Linux adds it to this single file system tree when the file system is installed. All file systems, no matter what type, all installed in a directory, the file that is installed, masks the original content of this directory. This directory is called installation directory or mount point. When this file system is uninstalled, the installation directory's own file can appear. When the disk is initialized (for example with fdisk), the physical disk is divided into a set of logical partitions using a partition structure. Each partition can place a file system, such as an EXT2 file system. The file system is organized into logical tree structure on the block of the physical device, soft link, etc. The device that can include a file system is a block device. The first partition of the first IDE disk drive in the system, IDE disk partition / dev / hda1 is a block device. The Linux file system regards these blocks as a simple linear block, and does not know the size of the physical disk that is not concerned. Map the request for the specific block of the device to the terminology meaningful for the device: This block is saved in the track, sector, and cylinders on the hard disk, which is the task of each block device driver. A file system should work with the same way regardless of what device it saves, and the same feeling. In addition, use Linux file system, whether these different file systems are irrelevant to different physical media under different hardware controllers (at least for system users). File systems may not even be installed remotely over the network connection. Consider the following example, a Linux system root file system on an SCSI disk. A e boot etc lib opt TMP USR C f CDROM FD Proc Root Var Sbin D Bin DEV Home MNT LOST FOUND No matter what you operate or the program does not need to know / c is actually in the system's first IDE disk A installed VFAT file system. In this example (actually the Linux system in my home), / e is the master IDE disk on the second Ide controller. The first IDE controller is a PCI controller, and the second is an ISA controller, which also controls IDE CDROM, which is not critical. I can dial the network to me with a Modem and PPP network protocol. At this time, I can remotely install my file system of the Alpha Ax Linux system to / MNT / Remote.
The file in the file system contains a collection of data: the file containing the source of this chapter is an ASCII file called FileSystems.Tex. A file system not only saves the data of the files it includes, but also saves the structure of the file system. It saves all information seen by Linux users and processes, such as files, directories, soft links, file protection information, and more. In addition, it must safely save this information, the basic consistency of the operating system depends on its file system. No one can use a random loss of data and file operating system (do not know if there is, although I have been hurt more than Linux developers). Minix is the first file system of Linux, which is quite limited, and the performance is relatively poor. Its file name cannot be longer than 14 characters (this is still better than the 8.3 file name), the largest gathering size is 64M bytes. The first look, the 64M byte seems to be large enough, but the medium database requires a larger file size. The first document system, expansion file system, or EXT (Extend File System) designed for Linux, is introduced in April 1992, which has solved many problems, but still feels low in performance. Therefore, in 1993, the second edition of the expansion file system was added, or EXT2. This file system is described in detail later in this chapter. An important development is conducted when the EXT file system increases to Linux. The real file system is separated from the operating system and system service via an interface layer, which is called virtual file system or VFS. VFS allows Linux to support a number of (usually different) file systems, each with a general software interface to VFS. All details of the Linux file system are converted through software, so all file systems are the same for programs that Linux cores and programs running in the system. Linux's virtual file system layer allows you to simultaneously implement many different file systems. The implementation of the Linux virtual file system makes access to its files as fast as possible quickly and effective. It must also guarantee that the files and file data are stored correctly. These two requirements may not be equal. Linux VFS is cache information in memory when installing and using each file system. When file and directory creation, write and delete, these cache data are changed, and you must be very careful to update the file system correctly. If you can see the data structure of the file system in the running core, you can see the file system read and write data blocks, describe the data structure of the file and directory being accessed, and the device driver is not Stop operation, get and save data. The most important thing in these caches is buffercache, combining when the file system accesss their underlying block devices. When the block is accessed, they are placed in the Buffer Cache, and placed in different queues depending on their status. Buffer Cache not only caches the data buffer, but also helps managing the asynchronous interface of the block device driver. 9.1 The SecondEded File System (EXT2) EXT2 is an extensible and powerful file system for Linux. It is the most successful file system in the Linux community, which is the foundation of all current Linux release. EXT2 file system, like all majority file systems, based on the data stored in the data block in the data block. These data blocks are the same length, although the block length of different EXT2 file systems can be different, but for a specific EXT2 file system, its block length is determined when creating (using MKE2FS). The length of each file is taken in accordance with the block.
If the block size is 1024 bytes, a 1025-byte file takes up two 1024 bytes of blocks. Unfortunately, this means averages half a block for every file. In normal calculations, you will use disk to use to exchange CPUs for memory, in which case Linux is the same as most operating systems, for less CPU load, use relatively low efficiency disk utilization to exchange. Not all blocks in the file system contain data, and some blocks must be used to place information about the file system structure. EXT2 describes each file in the system with an Inode data structure to define the topology of the system. A inode describes which blocks in a file use which blocks and file access rights, file modification time, and file type. EXT2 file system is described in an inode, and each inode is identified by a unique digital identity. The file system's inode is put together, in the inode table. The directory of EXT2 is a simple special file (they also use inode description), including the pointer of the Inode of their directory entry. Figure 9.1 shows an EXT2 file system to occupy a series of blocks on a block structure. As long as the file system is mentioned, the block device can be seen as a series of blocks that can read and write. The file system does not need to care on which one of the physical media is to be placed, which is the work of the device driver. When a file system needs to read information or data from the block device including its block, it requests to read the number of pieces of a number of devices that it supports. The EXT2 file system divides the logical partition it occupies into block groups. In addition to the real files and directories in addition to information and data blocks, each group is also copying information for file system consistency. This copy of the copy is necessary for the disaster, the file system needs to be recovered. The following is a detailed description of the content of each block group. 9.1.1 The Ext2 Inode (EXT2 I Node) In the EXT2 file system, the i node is the cornerstone of the construction: Each file and directory in the file system uses only one inode description. Ext2 of each block group is placed in the Inode table, and there is a bitmap that allows the system to track allocation and unallocated I nodes. Figure 9.2 Displaying an ext2 inode format, in other information, it includes some domain: see INCLUDE / Linux / Ext2_fs_i.h Mode includes two group information: This inode describes what and the user's permissions for it. For EXT2, an inode can describe a file, directory, symbolic link, block device, character device, or FIFO. Owner Information Users and Group identifiers for this file or directory. This allows the file system to properly perform file access rights to control the size of the size file (byte) TimeStamps this time created with the time to be modified last. DataBlocks points to the pointer of the block of the data described in this inode. The initial 12 is a physical block that points to the data described in this inode, and the last three pointers include more levels of indirect data blocks. For example, the two-stage indirect block pointer points to a block pointer to the block pointer to the data block. This means that less than or equal to 12 data block size is faster than a larger file access. You should pay attention to ext2 inode to describe special device files. These are not true files, and the program can be used to access the device. / DEV All device files are to allow the program to access Linux devices. For example, the mount program uses the device file it wants to install as a parameter. 9.1.2 The Ext2 Superblock (EXT2 Super Block) Super block includes a description of this file system base size and shape.
The information in it allows the file system management program to maintain the file system. Usually only the hyper block in block group 0 is installed in installation, but each block group contains a copy of the copy, which is used for the system crash. In addition to other information, it includes: see include / linux / ext2_fs_sb.h Magic Number Allows the installation software to check if this is a super block for an EXT2 file system. For the current version of EXT2 is 0xef53. The Revision Level MAJOR and MINOR revision level allow the installation code to determine if this file system supports only the characteristics of this file system specific revision. This is also a characteristic compatible domain, helping the installation code determines which new features can be safely used on this file system. Mount Count And Maximum Mount Count Count All allows the system to determine if this file system needs to be fully checked. Every time the file system is installed, Mount Count increases. When it is equal to Maximum Mount Count, alarm message "Maximal Mount Count Reached, Running E2FSCK IS Recommended" is displayed. Block Group Number stores the block group number of this super block copy. The BLOCK Size's block of the file system, such as 1024 bytes. The number of blocks in the Blocks Per Group group. Like a block size, this is determined when the file system is created. Free Blocks File System Space INODES File System Free Inode First Inode This is the first inode number in the system. The first inode in an EXT2 root file system is '/' directory entry 9.1.3 The Ext2 Group Descriptor (EXT2 Group Descriptor) Each block group has a data structure description. As a super block, the group descriptor of all the loss groups is copied in each group. Each group descriptor includes the following information: See include_group_desc blocks Bitmap The block number of this block group, with the Inode Bitmap, inode bitmap, inode bitmap in the allocation and recycling process Block number. Used in the allocation and recycling of inode. The block number of the INode Table of this block of Inode Table. Inode, each Ext2 Inode data structure, is described below, and the Free Inodes Count, the Used Directory Count group descriptor is sequentially arranged, and they together constitute a Group Descriptor Table (Group DescriptorTable). Each block group includes a block group descriptor table and a complete copy of its super block. Only the first copy (in block group 0) is actually used by the EXT2 file system. Other copies, like other copies of the hyper block, only when the master copy is damaged. 9.1.4 Ext2 Directories (ext2 directory) In the EXT2 file system, the directory is a special file to create and store the access path for files in the file system. Figure 9.3 shows the layout of a directory entry in memory. A directory file is a list of directory entries, each of which includes the following information: See include / linux / ext2_fs.h ext2_dir_entry This directory entry. This is an index of the inode array in the inode table in the block group. Figure 9.3 Catalog entry referenced by files called File is I1.
Name Length This catalog entry is the first two entries in each directory entry in the name of each directory. "This directory" and "Parent Directory" are always "this directory". 9.1.5 Finding a file in a ext2 file system Find a file in an EXT2 file system) Linux's file name and all the formats of all UNIX file names. It is a series of directory names, separated by "/", ending with the file name. An example of a file name is /Home/rusling/.cshrc, where / home and / rusling are directory names, and the file name is .cshrc. Like other UNIX systems, Linux does not care about the format of the file name itself: it can be arbitrarily consisting of printable characters. In order to find inode representing this file in the EXT2 file system, the system must parse the file name in the directory until you get this file. The first inode we need is the root of this file system. We find its number through the super block of the file system. To read an ext2 inode we must find in the inode table in the appropriate block group. For example, if the root number is 42, then we need the 42nd inode in the inode table in block group 0. Root inode is an ext2 directory, in other words, the root inode mode describes it is a directory, its data block includes an EXT2 directory entry. Home is one of these directory entry, this directory entry gives us the INODE number of the / home directory. We must read this directory (first read its inode, then read the data block read directory entry from this inode), find the RUSLING entry, give the inode number of the description / home / rusling directory. Finally, we read the directory entry to describe the / home / rusling directory, find the inode number of the .cshrc file so that we get the data block including information in the file. 9.1.6 Changing the size of a file in an Ext2 File System (Change a file in the EXT2 file system) One common problem with the file system is that it tends to more fragments. The block distribution of the file data is distributed throughout the file system, the smaller the data block, the more efficient access to the order of the file data block. The EXT2 file system attempts to overcome this situation, which is assigned to a new block of a file physically and its current data block close or at least with its current data block in a block group. Only this failed it assigns data blocks in other block groups. Whenever a process is trying to write data like a file, the Linux file system checks whether the data will exceed the end of the file last distribution block. If so, it must assign a new data block for this file. Until this assignment is complete, the process cannot run, it must wait for the file system to assign a new data block and write the remaining data before you can continue. The first thing to do in the EXT2 block allocation routine is to lock the EXT2 hypup of this file system. Allocation and release blocks need to change the domain in the hyper block, and the Linux file system cannot allow multiple processes to change at the same time. If another process needs to allocate more data blocks, it must wait until this process is completed. Waiting for a super block is suspended, you can't run until the super block is controlled by its current user. An authorized authorization for the super block is based on a first-first-service base (First Come First Serve), once a process has a hyper block control, it maintains control to it.
After locking the super block, the process checks if the file system has enough idle blocks. If there is not enough free blocks allocated more attempts fail, the process to hand over control of the file system superblock. If the file system has enough free blocks, the process will attempt to allocate a block. If this EXT2 file system has established a pre-allocated data block, we can use it. The pre-allocated block does not actually exist, which is just a reserved block in the bitmap of allocating blocks. VFS inode represents our attempt to allocate a new file data block EXT2 two specific fields: prealloc_block and prealloc_count, are the number of pre-allocated a number of blocks and a block of pre-allocated. If there is no pre-subordinate block or pre-allocated, the EXT2 file system must allocate a new data block. EXT2 file system first view the files last data block after block of data is free. Logically, this is the highest-to-range block, because it can be accessed faster. If the block is not idle, continue the search, looking for the ideal data block in the next 64 blocks. This block is not the most ideal, but at least other data blocks of the file is quite close, in a block group. See fs / ext2 / balloc.c ext2_new_block () If these blocks are not idle, the process begins to see all other blocks until it finds an idle block. The block allocation code looks for the cluster of 8 idle data blocks in these block groups. If you can't find 8 times, it reduces the requirements. If you want to pre-allocate a block, and allow, it will be updated accordingly and prealloc_block prealloc_count. No matter where you find an idle data block, the block allocation code will update the block bitmap of the block group and assign a data buffer from the Buffer Cache. This data buffer uses the device identifier of the support file system and the block number of the assignment block to the only identifier. Data in the buffer is set to 0, the buffer is marked as "dirty" denotes its contents have not been written to the physical disk. Finally, the super block itself flag "dirty", show which changes made, and its lock is released. If there is a process waiting for a super block, the first process in the queue is allowed to run, getting the power of the super block, and performing its file operation. The data of the process is written to a new data block. If the data block is filled, the entire process is repeated, and the other data block 9.2 The Virtual File System (Virtual File System VFS) Figure 9.4 shows the virtual file system of the Linux core and its The relationship between the real file system. The virtual file system must manage all different file systems installed any time installed. To this end it describes the entire file management system (virtual) and various real data structure mounted file system. Quite confusion is that VFS also uses the term hyper block and inode to describe the system's files, and the Hyper block and inode used by the EXT2 file system are very similar. Like the EXT2 inode, inode VFS system is described in the files and directories: content and topology of the virtual file system. From now on, in order to avoid confusion, I will use VFS inode and VFS hyper block to distingurate with EXT2's inode and super block. See FS / * When each file system is initialized, it is registered with VFS. This occurs during system startup when the operating system initializes itself. The real file system itself establishes in the kernel or as a loadable module. The file system module is loaded when the system needs, so if the VFAT file system is implemented by the core module, then it is only loaded when a VFAT file system is installed.
When a file system block device installed (including the root file system), it must read the VFS superblock. Each type of file system superblock read routine must find the topology of the file system, and map information to a data structure VFS superblock. VFS superblock stored list listing the file system installed on the system and their VFS. Each superblock contains a VFS pointer information and the completion of the specific functions of the file system routines. For example, EXT2 superblock represents a file system mounting comprises a pointer reads the routines associated with the inode EXT2. This EXT2 inode read routine, like all file systems and associated inode read routine, like filling VFS inode domain. Each VFS hyper block includes a pointer to a VFS inode in a file system. For the root file system, which is an inode "/" directory. This information is mapped to the file system EXT2 reasonably efficient, but for other file system is relatively inefficient. When the system processes access files and directories, call the system routine travel system VFSinode. For example, enter a file ls or cat then a directory, so look behalf of the VFS filesystem VFS inode. Mapping for each file and directory systems are used on behalf of a VFS inode, inode so some will be repeat visits. The inode stored in the inode cache, which makes access to them more quickly. If an inode inode cache is not, then you must call and a file system-related routine to read the appropriate inode. Reads the inode of action it was placed inode cache, after the visit to the inode will leave it in the cache. VFS inode less use will be removed from the cache. See fs / inode.c all Linux file systems use a common buffer cache to cache data buffer device bottom, so you can speed up access to the physical storage device file system, thus speeding up access to the file system. The buffer cache is independent of the file system, integrated into the Linux kernel distribution, read and write data buffer mechanism. Let Linux file system independent of the underlying media support and device drivers have special benefits. All equipment block structure to register with the Linux kernel, and appears as a unified block-based, usually asynchronous interface. Even relatively complex piece of equipment, such as SCSI devices as well. When the real file system reads data from the underlying physical disk, causing the block read physical block device drivers from the device they control. Integrated in the buffer cache block device interface. When the file system reads a block of time, they are all stored in the file system and the Linux kernel shared global buffer cache in. Wherein the buffer (buffer) labeled with their block number and a read device unique identifier. So, if the same data is often necessary, it will read from the buffer cache instead of reading (takes more time) from the disk. Some devices support the read ahead (read ahead), the data block will be read in advance, to prepare for possible future reading. See fs / buffer.c VFS also holds a cache directory lookup, so a common directory inode can be quickly found. As a test, try to directory listings you do not have a recent list. The first time you list, you'll notice a brief pause, then the second time you list, the result will come out immediately.
The catalog cache itself does not store Inode in the directory, which is the inside cache, the directory cache is just the full name of the directory project and their inode numbers. See fs / dcache.c 9.2.1 The VFS Superblock (VFS superblock) of each mounting are represented by a VFS file system superblock. In addition to other information, VFS super block comprising: see include / linux / fs.h Device This device identifier block device comprising a file system. For example, / dev / hda1, the first IDE disk, the device identifier is the first system inode 0x301 Inode pointers therein mounted inode pointer to the file system. Covered's inode pointer to the inode file system installation directory. For the root file system, it is not covered VFS superblock pointer. Blocksize file system block size in bytes, for example 1024 bytes. Superblock operations superblock pointer pointing to a set of file system of the present routine. Among other types, VFS pointer using a File System Specific routines read inode and superblock File System type points to the mounted file system file_system_type a pointer to a data structure of the file system information is required 9.2.2 The VFS Inode like EXT2 file system, VFS in each file, directory, etc., with one and only one VFSinode representative. Each VFS inode in the file system routines related information acquired from the underlying file system. VFS inode exists only in the core memory, as long as the system is useful, it has been kept in the VFS inode cache. In addition to other information, VFS inode number field comprising: a device identifier see include / linux / fs.h (or other entity represented by the VFS inode) Device to store the file device. Inode nunber the inode number, unique in the file system. Device and inodenumber combination is unique in the entire virtual file system. Mode like EXT2, like access to this domain describe what this VFS inode and its representatives. User ids Times owner identifier created, the byte size of the block and writing the modified time Block size of the file, for example, 1024 bytes Inode operations pointer to a set of routines address. The file system-related routines and, for the inode operations performed, for example, the file system component Count truncate the inode number of this currently represented by the VFS inode use. Count 0 means that the inode is free, it can be discarded or reused. Lock this field is used to lock the VFS inode. For example, when it is read from the file system when Dirty indicate whether the VFS inode is written, if so, the underlying file system needs to be updated. File System Specific Information 9.2.3 Registering The File Systems (Registration File System) When you create a Linux core, you will be asked if you need each supported file system. When the core of the establishment, including the file system initialization code calls the initialization routines for all the built-in file system. The Linux file system can also be built into a module. In this case, they can be loaded or manually loaded when needed.
Alone in a file system module when it registered itself to the core, when unloaded, it is written off. Each file system initialization routine register itself are virtual file system, and using a data structure representative of file_system_type, which involves a pointer and a name of the file system VFS to it's superblock read routine. Figure 9.5 displays file_system_type data structure is put on a list by the pointer in file_systems. Each file_system_type data structure contains the following information: see fs / filesystems.c sys_setup () See include / linux / fs.h file_system_type Superblock read routine, when the file system is one example of the installation, this routine is called by the VFS File Systme name the name of the file system, such as whether ext2 device needed the file system needs a device support? Not all file systems require a device to store. For example, / proc file system, you do not need a block device can check / proc / filesystems to see which register file systems, for example: ext2nodev proc iso9660 9.2.4 Mounting a File System (a file system is mounted) When the user attempts to install a super when the file system, Linux kernel must first verify the parameters passed system call. Although Mount can perform some basic inspections, it doesn't know if this core is established to be supported file system or whether the proposed installation point exists. Consider the following mount command: $ mount -t iso9660 -o ro / dev / cdrom / mnt / cdrom The mount command is passed to the core three pieces of information: the name of the file system, the device comprising a physical block of the file system and the new file system to be installed in an existing file system topology which one place. The first thing to do is a virtual file system to find the file system. It first look at each file_system_type list of data structures file_systems pointed to view all known file systems. If it finds a matching name, it is the core support until the file system type, and get the address of the file system related routines, to read the superblock of this file system. If it can not find a match for the name of the file system, if the core can be built to support the core module loaded on demand (see Chapter 12), you can continue. In this case, before continuing, the core will request the kernel daemon loads the appropriate file system module. See fs / super.c do_mount () See fs / super.c get_fs_type () Step 2, if the physical device passed by Mount has not been installed, you must find VFS inode that is about to become a new file system installation point. The VFS inode may or must be read from the cache block device supporting the installation point of the file system inode. Once the inode is found, check if it is a directory, and no other file system is installed there. A directory can not be used with more than a mount point of the file system. In this case, the installation code must be assigned to VFS VFS superblock and pass a message to install the file system superblock read routine. All VFS superblock system are stored in a data structure consisting of super_blocks super_block vector table, a structure must be assigned for this installation. Routine reads the superblock must be filled based on its domain information to obtain VFS super block read from the physical device.
For EXT2 file system, this mapping or transformation information relatively easy, it only needs to read the superblock EXT2 and to fill VFS superblock. For other file systems, such as MS DOS file system, is not such a simple task. No matter what file systems, filling VFS superblock means that you must read the information in the file system to support its description from the block device. If the block device can not be read or it does not contain this type of file systems, mount command fails. Each mounting of the file system data structure is described using a vfsmount, see Figure 9.6. They line up in a list vfsmntlist pointed. Another pointer, vfsmnttail point to the last entry in the list, and mru_vfsmnt pointer to a file system used most recently. Each VFSMount structure includes a device number of the block device of this file system, a file system installation directory, and a pointer to the VFS hybrid assigned to this file system installation. root inode VFS superblock point of this type of file system data structures and file_system_type the file system. This inode has been residing in the VFS inode cache during this file system loading. See fs / super.c add_vfsmnt () 9.2.5 Finding a File in the Virtual File System (look for a file in the virtual file system) VFS inode to find a file in a virtual file system, VFS name must be in sequence, one at a time catalog, directory to find the VFS inode middle of each one. Each directory to find every file system-related calls and look up routine (address on behalf of VFS inode in the parent directory). Because the VFS file system superblock has always root inode of the file system, and using the pointer indicates the superblock, so the process can continue. When you look for inode in the real file system, you must check the directory cache of this directory. If there is no such entry in the directory cache, the real file system either obtains VFS inode from the underlying file system or from inode cache. 9.2.6 Unmounting a File System (uninstall a file system) I usually work manual assembly process described as anti-demolition, but for somewhat different unmount the file system. If there is something in the file system of a file, the file system can not be uninstalled. For example, if a process is in use / mnt / cdrom directory or its subdirectory, you cannot uninstall / mnt / cdrom. If something is using the file system to be uninstalled, its VFS inode will be in VFS inode cache. Uninstall the code checks the entire inode list, find the file system inode belong occupied by the device. If the VFS superblock the mounted file system is dirty, that it has been modified, then it must be written back to the file system on the disk. Once it writes a disk, the memory occupied by this VFS hyper block is returned to the core of idle memory pool. Finally, this installed VMSMount data structure is also removed and released from VFSMNTLIST. See fs / super.c do_umount () refer to fs / super.c remove_vfsmnt () The VFS Inode Cache travel when mounted file system when they VFS inode constantly being read, and sometimes written. The virtual file system maintains an inode cache to accelerate access to all installed file systems. Each inode cache read from a VFS inode, the system can be omitted for the physical access to the device.
See fs / inode.c VFS inode cache implemented using a hash table (hash table) manner, the entry is a pointer to both the same hash value of the inode list VFS. Of an inode hash value computed from its inode number and device number of the physical device comprising the underlying file system. Whenever a virtual file system needs to access an inode, it first looks VFS inode cache. To find an inode in the inode hash table, the system first calculates its hash value, and then use it as the inode hash table index. The thus-obtained pointer with the same hash value of the inode list. Then read it again until it finds all of the inode and have it looking up inode inode same inode number and the same device identifier. If this can be found in the inode cache, its count increases, indicating that it had another user, access to the file system continues. Otherwise, you must find a free file system to let the VFS inode inode read into memory. How to get a free inode, VFS has a range of options. If the system can allocate more VFS inode, it will do so: it allocates core pages and put them into a new, free inode, into the inode list. A list of all the system in addition VFS inode in the inode hash table is also a point of first_inode. If you already have a system that allows all of some inode, it must find a reusable inode. Which is a good candidate to use the amount (count) is inode 0: this means that the system does not currently use them. What really matters VFS inode, such as root inode file system, already has a number greater than the amount of 0, there will never be elected to do reuse. Once a candidate target of reuse, it is cleared. The VFS inode might be dirty, then the system must wait for it to be unlocked before you can continue. The candidate for this VFS inode must be cleared before reuse. Although a new VFS inode must also call a routine related to the file system, use the real file system poisoned from the bottom layer to fill this inode. When it fills, this new VFS Inode is used in the amount of 1 and is locked, so it is accessible before it fills in valid information. To get VFS inode it actually needs, the file system may need access to some of the other inode. This occurs when you read a directory: only the final directory inode is needed, but the inode intermediate directories must also be read. When using the VFS inode cache fill process and uses less inode will be discarded, used inode more will remain in the cache. The Directory Cache (cache directory) In order to speed up access to frequently used directories, VFS maintains a cache of directory entries. When the real file system finds the directory, the details of these directory are added to the directory cache. The next time the same directory to find the time, such as a list or open files inside, can be found in the directory cache. Only a short directory entries (up to 15 characters) to be cached, but this is reasonable, because the shorter directory name is most commonly used. For example: When the X server startup, / usr / X11R6 / bin is accessed very frequently.
See fs / dcache.c directory cache contains a hash table, each entry pointing to a list of directory cache entries having the same hash value. The Hash function uses the device number and directory of the device that stores this file system to calculate the offset or index in the Hash Table. It allows to quickly find the directory entry cache. If a cache is taking too long to find the time, or simply can not find such a cache is of no use. In order to keep these cache valid and up-to-date, VFS saves a list of minimal use (LRU) directory cache entries. The first time a directory entry is cached, that is, when it was first looking for, it is added to the LRU list of the first stage of the final. For Full Cache, this will remove the entries in front of the LRU list. When again the directory entry is accessed, it was moved to the last second LRU cache list. Similarly, this time it is removed the second level cache LRU list foregoing secondary cache directory entries. Such removed from the primary and secondary directory entries in the LRU list is no problem. The reason why these entries at the front of the list simply because they have not been accessed recently. If accessed, they will be at the last list of lists. LRU cache entries in the secondary list entry is more secure than in an LRU list in the cache. Because these entries are only used to find and duplicate references. The Buffer Cache When using the installation file system when they have a large amount of block devices read and write data block requests. All blocks of data read and write requests are core by standard routine calls, transmitted in the form of a data structure buffer_head to the device driver. These data structures are given all the information required for the device drivers: a device identifier uniquely identifies the device, block number tells a driver where to read. All block devices are the same size as a linear combination of the block. In order to accelerate access to the physical block device, Linux maintains a cache block buffers. All the system is stored in the movable block buffer buffer cache, even those new, unused buffers. The buffer is shared for all physical block devices: Any time there are many blocks in the cache buffer, a system may be of any piece of equipment, generally have different states. If there is valid data in the buffer cache, which can save the system access to a physical device. Any data block buffer for writing / reading device to block / have entered from the buffercache. Over time, it may be removed from the buffer, the buffer make room for other more needed, or if it is accessed frequently, it may have been left in the buffer. This cache block buffer zone number and device identifier that uniquely identifies the block with the buffer belongs. The buffer cache consists of two functional parts. The first part is the list of free buffer block. Each of the same size of the buffer (the system can support) a list. The system's idle block buffer is queued in these lists when creating or discarded. Currently supported buffer size is 512, 1024, 2048, 4096 and 8192 bytes. The second portion is a functional buffer (Cache) itself. This is a hash table, is a pointer to the scale for linking buffer having the same hashindex. Hash index is generated from the device identifier and the number of data blocks belong out. Figure 9.7 shows the number of entries and hash table. Either one block buffer in the free list, either in the buffer cache. When they are in the cache buffer, they are queuing LRU list. Each buffer type is a LRU list, and the system uses these types to perform actions on a type of buffer. For example, the new data is written to the disk buffer.
Type buffer reflects its status, Linux currently supports the following types: clean unused, new buffer (buffer) locked locked buffer, the buffer waiting to be written dirty dirty. Contains new valid data will be written to disk, but until now has not yet scheduled to write shared shared buffer unshared once shared buffer, but now there is no shared file system whenever it needs from the underlying physical device a read buffer when it tries to get a block from the buffer cache. If it can not get a buffer from the buffer cache, it will clean out a buffer from the free list of suitable size, this new buffer will be entered into the buffer cache. If it needs to have a buffer in the buffer cache, then it may or may not be current. If it is not the latest, or it is a new block buffer, the file system must request the device driver to read the appropriate data block from the disk. Like all of the same cache, buffer cache must be maintained, so that it can run effectively and equitably allocate cache entries between the use of buffer cache block device. Linux uses the core daemon BDFLUSH to perform a lot of finishing work on this cache, but others are automatically performed during the use of the cache area. 9.3.1 The bdflush Kernel Daemon (core daemon bdflsuh) bdflush core daemon is a simple core daemon, for there are many dirty buffers (buffer containing the data must be written to disk) system provides dynamic the response to. As a core thread started at boot time, very easy to confuse it called his position "kflushd", and this is your ps processes in the system display when you will see the name. The number of buffers that process dirty most of the time in sleep, wait until added to the system is too great. When the buffer allocation and release time, on the number of buffers inspection system dirty, then wake bdflush. The default threshold is 60%, but if the system is a great need for a buffer, bdflush will be awakened.
This value can be checked and set by Updage command: #update -d bdflush version 1.4 0: 60 Max Fraction of Lru List to Examine for Dirty Blocks 1: 500 Max Numr of Dirty Blocks to Write Each Time Bdflush Activated 2: 64 Num of Clean buffers to be loaded onto free list byrefill_freelist 3: 256 Dirty block threshold for activating bdflush in refill_freelist 4: 15 Percentage of cache to scan for free clusters 5: 3000 Time for data buffers to age before flushing 6: 500 Time for non-data ( Dir, Bitmap, ETC) Buffers to age Bufferflushing 7: 1884 Time Buffer Cache Load Average Constant 8: 2 LAV Ratio (Used). No matter when it is written, become a dirty buffer, all The dirty buffer is linked in the BUF_DIRTY LRU list, and BDFlush will try to write reasonable number of buffers to their disk. This number may also be provided with a check and update commands, the default is 500 (see example). 9.3.2 The update Process (update process) update command is not just a command, it is also a daemon. When run as superuser (system initialization), it regularly all the old dirty buffers are written to disk. It performs these tasks by calling the system service routine, or more or less the task of BDFLUSH. When it generated when a dirty buffers, which are marked on the system time should be written on its own disk. When update is run, it looks at all dirty buffers in the system to find buffers with expired write time. Each expired buffer is written to the disk. See fs / buffer.c sys_bdflush () The / proc file system / proc file system really reflects the ability of Linux virtual file systems. It doesn't actually exist (this is also a tip of Linux), / proc, and its subdirectory, and the files do not exist. But why you can cat / proc / devices? / Proc file system, like a real file system, also register yourself to the virtual file system, but when its files and directories are opened, the / proc file system uses the core when it is turned on. After creating these files and directories. For example, the core / proc / defices file is generated from the core description thereto. The / PROC file system represents a user-readable window that enters the core internal workspace. Some Linux subsystems, such as Chapter 12 Linux kernel modules described in / proc file system, create entries. Device Special Files Linux, like all versions of Unix, as it represents the hardware device to be special file. For example, / dev / null device is empty.