Linux inode cache mechanism analysis
Blue forest http://www.lslnet.com, November 8, 2002 16:24
Author: Zhan Rongkai (Zhan Rouzhan zhanrk@sohu.com) Linux Inode Cache mechanism is implemented in the fs / inode.c file. 1.1. Inode's SLAB Assigner Cache Index Node Cache, the implementation of the ICache mechanism is based on the SLAB distributor cache of the Inode object, so it is necessary to apply or release an inode object from physical memory, must pass KMEM_CACHE_ALLOC ( The function is performed by the function and the KMEM_CACHE_FREE () function. The SLAB allocation cache of the Inode object is defined by a KMEM_CACHE_T type pointer inode_cachep. This SLAB distributor cache is created by the KMEM_CACHE_CREATE () function in Inode_init () in Inode Cache's initialization function inode_init (). Linux also defines two package functions in the inode.c file to implement a Inode object from the Inode_Cachep SLAB distributor cache or release a no longer inode object to the SLAB distributor, as shown below: #define alloc_inode () / ((struct inode *) kmem_cache_alloc (inode_cachep, SLAB_KERNEL)) static void destroy_inode (struct inode * inode) {if (! list_empty (& inode-> i_dirty_buffers)) BUG (); kmem_cache_free (inode_cachep, (inode)); } 1.2 and Inode objects Related to the underwent operation Source file inode.c implements some basic operations of the underlay of the Inode object, as follows: (1) Clean_inode () - Initialization section Inode object member domain This function is used A certain member that has just been assigned from the Inode_Cachep SLAB dispenser is initialized to known values (usually 0), but there is an exception, and the linked number i_nlink is initialized to 1. This is a static static internal function, so it can only be called by other functions in Inode.c, such as: get_empty_inode () and get_new_inode ().
/ * * This just initializes the inode fields * to known values before returning the inode .. * * i_sb, i_ino, i_count, i_state and the lists have * been initialized elsewhere .. * / static void clean_inode (struct inode * inode) { static struct address_space_operations empty_aops; static struct inode_operations empty_iops; static struct file_operations empty_fops; memset (& inode-> u, 0, sizeof (inode-> u)); inode-> i_sock = 0; inode-> i_op = & empty_iops; inode-> i_fop = & empty_fops; inode-> i_nlink = 1; / * Note! I_nlink is initialized to 1 * / atomic_set (& inode-> i_writecount, 0); inode-> i_gment = 0; inode-> i_generation = 0; MEMSET (& Inode-> i_dquot, 0, sizeof (inode-> i_dquot); inode -> i_pipe = null; inode-> i_bdev = null; inode-> i_data.a_ops = & empty_aops; inode-> i_data.host = inode; inode-> i_mapping = & inode-> i_data;} (2) get_empty_inode () - Assign an empty inode object from the SLAB dispenser This function is allocated from the SLAB dispenser from the SLAB dispenser from the SLAB dispenser, then initials all domains other than the i_count reference count and the link count I_nlink (some domains) Initialization is done by calling the Clean_Inde function) and connects this inode object into the inode_in_use lin list.
Finally, the pointer returns inode object, as follows: struct inode * get_empty_inode (void) {static unsigned long last_ino; struct inode * inode; inode = alloc_inode (); if (inode) {spin_lock (& inode_lock); inodes_stat.nr_inodes ; List_add (& inode-> i_list, & inode_in_use); inode-> i_sb = null; inode-> i_DEV = 0; inode-> i_ino = last_ino; inode-> i_flags = 0; atomic_set (& inode-> i_count, 1); Inode-> i_State = 0; spin_unlock (& inode_lock); clean_inode;} Linux kernel module usually does not call this function to assign an inode object. Those who want to get a kernel module (such as network layers, etc.) without an index node number, and those fs without any known information, usually use this function to get a new inode object. (3) CLEAR_INODE () - Clear the content in an inode object before the DESTROY_INODE () function releases an Inode object, usually calls the function to clear the contents of the inode object, such as: making the buffer referenced by the inode, unwind Quote for other objects, etc. / ** * clear_inode - clear an inode * @inode: inode to clear * * This is called by the filesystem to tell us * that the inode is no longer useful We just * terminate it with extreme prejudice * / void clear_inode (.. struct inode * inode) {if (list_empty (& inode-> i_dirty_buffers!)) invalidate_inode_buffers (inode); if (inode-> i_data.nrpages) BUG (); if ((inode-> i_state & I_FREEING!)) BUG () ; if (inode-> i_state & i_clear) bug (); wait_on_inode (inode); if (is_quotainit (inode)) DQUOT_DROP (Inode); if (inode-> i_sb && inode-> i_sb-> s_op && inode-> i_SB -> S_OP-> Clear_inode) Inode-> i_SB-> s_op-> clear_inode (inode); if (inode-> i_bdev) {bdput (inode-> i_bdev); inode-> i_bdev = null;} inode-> i_state = I_clear;} 1.3 ICACHE Data Structure Linux Implements the Inode Cache Mechanism by defining a variety of bidirectional linked lists above the Inode_Cachep SLAB dispenser cache to effectively manage memory inode objects.
These lists include: inode list is being used, unused inode list, inode hash chain and the anonymous inode hash chain, they are defined as follows: static LIST_HEAD (inode_in_use); static LIST_HEAD (inode_unused); static struct list_head * inode_hashtable static list_head; / * for inodes with null i_SB * / In addition, there is also a modified and is using the inode two-way linked list S_DIRTY in each super block object Super_block. Each inode object exists in two separate two-way linies: (1) One is the inode hash chain inode_hashtable, used to speed up inode lookup, each inode object is chaind into the hash chain table by i_hash pointer. (2) The other is the "type" linked list of Inode: l If i_count> 0, i_nlink> 0 and the inode is not dirty, this inode is chaind into the system globally inode_in_use bidirectional linked list. L If i_count and i_nlink are greater than 0, this inode is dirty (I_State domain sets I_DIRTY flags), then link the S_DIRTY lin list of the super_block object he belongs through the i_list pointer. l If i_count = 0, link the inode_unused linked list through its I_List. For those anonymous inode objects that do not belong to any super block object (i.e., I_SB = NULL), the overall anonymous inode hash chain table Anon_hash_chain is entered through the i_hash pointer. 1.3.1 Lock protection for the Inode Cache Lin table defines self-spin lock inode_lock in Inode.c to implement mutual exclusive access to all inode cache tags. That is, any code segment that accesses any INODE cache chain must be held by calling the spin_lock () function, and the self-locking lock is released by spin_unlock () after the end of the access. The definition of inode_lock is as follows: spinlock_t inode_lock = spin_lock_unlocked; Note! If you want to change the i_state domain of the Inode object being used, you must first hold the spin lock. 1.3.2 Inode Cache Statistics Global Variable Inodes_STAT Defines the statistics of Inode Cache, which mainly includes the total number of Inode objects in Cache and the number of inode in which they are not used, which are defined as follows: struct {int NR_INODES; int NR_UNUSED; INT Dummy [5];} inodes_stat; 1.3.3 Inode Hash Chain Table Inode Hash List The main purpose is to speed up a specific Inode object in Icache. Pointer inode_hashtable points to a group hash chain table head, all hash functions (b) The same inode object is made into a bidirectional linked list through the i_hash pointer as an interface, and hung in the INode_HashTable [h] hash chain table head. All hash chain table tops set together into an array, the first address of the array is pointed out by pointer inode_hashtable.
In Linux, the Inode Hash Chart Header Array is stored in a continuous physical page frame in the 2ORDER, where ORDER ≥ 1, and its value is related to the value of the total physical page frame number Num_physpages. Therefore, the number of hash chain table heads is: 2ORDER * Page_SIZE / SIZEOF (Struct List_Head). Since the size of the List_Head structure type is 8 bytes (2 32-bit pointers), the number of the inode hash chain table headers can be represented as: 2 (Order 12-3). l Hash chain list Inode Cache's initialization work is done by the inode_init () function, it mainly completes two works: (1) Inode hash chain table initialization, including the INODE hash chain table head array allocated physical memory Wait; (2) Create an Inode SLAB distributor cache.
The function of the source code is as follows: / * * Initialize the hash tables * / void __init inode_init (unsigned long mempages) {struct list_head * head; unsigned long order; unsigned int nr_hash; int i; / * calc order, however. I don't know why do you calculate this? :) * / Mempages >> = (14 - page_shift); mempages * = sizeof; for (ORDER = 0; (1 ul << order) << Page_shift)
(3) In the above DO ... While cycle, the function will first calculate the bit mask (i_haash_mask) of the hash chart header array (i_hash_shift) before allocating physical memory. Among them, the bit mask I_HASH_MASK = NR_HASH-1, NR_HASH is the number of inode Haxi table heads. And i_hash_shift is actually equal to (ORDER 12-3). And I_hash_shift I_hash_mask defined as follows: #define I_HASHBITS i_hash_shift #define I_HASHMASK i_hash_mask static unsigned int i_hash_mask; static unsigned int i_hash_shift; (4) after successful allocation of physical memory, the hash function starts inode list head array table for header Initialization, that is, the PREV in each header is initialized by the init_list_head macro to point to yourself. (5) Finally, the function creates the SLAB distributor cache inode_cachep of the Inode object by calling the KMEM_CACHE_CREATE () function. l Calculating a Hash Tissue value of a inode object Linux unique to determine a value of an inode object is a binary group (device number, index node), and the device number and super block object correspond to one, so Hash Column Value Calculation Function HASH is parameter as a super_block object and index node i_ino. As follows: static inline unsigned long hash (struct super_block * sb, unsigned long i_ino) {unsigned long tmp = i_ino | ((unsigned long) sb / L1_CACHE_BYTES); tmp = tmp (tmp >> I_HASHBITS) (tmp >> I_HASHBITS * 2); Return TMP & I_Hashmask;} l The operational function of the Inode Hash List with the inode hash chain table with: INSERT_INODE_HASH () and remove_inode_hash (), and find_inode ().
Insert_inode_hash () function is used to a non-target inode hash into a respective index node hash chain: void insert_inode_hash (struct inode * inode) {struct list_head * head = & anon_hash_chain; if (inode-> i_sb) head = inode_hashtable hash (inode-> i_sb, inode-> i_ino); spin_lock (& inode_lock); list_add (& inode-> i_hash, hEAD); spin_unlock (& inode_lock);} Remove_inode_hash () function is used to put an inode object from him from him Greek the list extraction: void remove_inode_hash (struct inode * inode) {spin_lock (& inode_lock); list_del (& inode-> i_hash); INIT_LIST_HEAD (& inode-> i_hash); spin_unlock (& inode_lock);} while find_inode () function is used in Inode objects that are uniquely identified by (SB, INO) in a list, as shown in: / * * Called with the inode lock held. * Note: We are not increasing the inode-refcount, you must call __iGet () * by hand after calling find_inode now! This simplifies iunique and will not * add any additional branch in the common code. * / static struct inode * find_inode (struct super_block * sb, unsigned long ino, struct list_head * head, find_inode_t find_actor, void * opaque) {struct list_head * TMP; struct inode * inode; tmp = head; ; {TMP = TMP-> Next; Inode = NULL; if (TMP == HEAD) Break; Inode = List_ENTRY (TMP, Struct Inode, i_hash); if (Inode-> i_ino! = INO) Continue; if (Inode -> i_sb! = SB) Continue; if (inode, ino, opaque) Continue; Break;} returnore;} Note! Be sure to hold an inode_lock lock when calling the Find_inode () function. 1.4 ICache Inode object Access interface --iGet / IPUT 1.4.1 Iget () - Reference an Inode object Other kernel module To access an Inode object, you must access the interface via the Iget (). This function will first look for the corresponding inode object in the Hash chain table of ICache. If you find it, add the reference count of the object 1, and then return the pointer to the inode object.
If you are not found, a new Inode object is assigned by calling the GET_NEW_INODE () function. The source code of this function is shown below (including / linux / fs.h): Static Inline Struct Inode * Iget (Struct Super_Block * SB, Unsigned Long Ino) {Return Iget4 (SB, INO, NULL, NULL);} Out, the actual work is done by the Iget4 () function (with 4 parameters of Iget functions, called Iget4), the source code is as follows (inode.c): struct inode * Iget4 (Struct Super_block * SB, unsigned long ino, find_inode_t find_actor, void * opaque) {struct list_head * head = inode_hashtable hash (sb, ino); struct inode * inode; spin_lock (& inode_lock); inode = find_inode (sb, ino, head, find_actor, opaque); if (inode) {__iget (inode); spin_unlock (& inode_lock); wait_on_inode (inode); return inode;} spin_unlock (& inode_lock); / * * get_new_inode () will do the right thing, re-trying the search * in case it had To block at any point. * / return get_new_inode (SB, INO, HEAD, FIND_AACTOR, OPAQUE);} Note: (1) First, find the inode object through the hashing function hash should be in that hash chain table, then call Find_inode () Functions look for whether there is this inode object in this hash chain table. . (2) If the corresponding Inode object is found in the hash chain list (uniquely determined by the Super Block Object Pointer SB and Index Node INO), first call the __iGet () function to increase the reference count of the Inode object. Then call the wait_on_inode () function Wait for the Inode object to be unlocked by other kernel modules. Finally, return the pointer to this inode object directly. (3) If not found, the GET_NEW_INODE () function is called allocated a new inode object. Inline function __iGet () is used to add a reference count for an inode object. This function first judges the current value of the I_Count of the Inode object. If it is not 0, then the i_count plus 1 will return directly. If it is 0, then add i_count to 1, then look at this inode object is currently dirty, if it is dirty (i_state & i_dirty is not 0), then nothing, let the inode object continue to stay in a super block object The S_DIRTY Link list. If it is not dirty, the Inode object is removed from the original "Type" linked list (should be in the inode_unused list) and connect it into the inode_in_use linked list. Finally, the number of unused INODEs in ICACHE statistics is 1.
static inline void __iget (struct inode * inode) {if (atomic_read (& inode-> i_count)) {atomic_inc (& inode-> i_count); return;} atomic_inc (& inode-> i_count);! if ((inode-> i_state & I_DIRTY)) {list_del (& inode-> i_list); List_add (& Inode-> i_list, & inode_in_use);} inodes_stat .nr_unused--;} wait_on_inode () function Test if the INODE object has been locked, if yes, call _ _wait_on_inode () function Wait waiting for the inode being unlocked. Otherwise, return directly: static inline void wait_on_inode (struct inode * inode) {if (inode-> i_state & i_lock) __wait_on_inode (inode);} get_new_inode () function is used to get a new Inode object. This function first calls alloc_inode macros allocates a new Inode object from the inode_cachep this SLAB spinner cache.
As follows: static struct inode * get_new_inode (struct super_block * sb, unsigned long ino, struct list_head * head, find_inode_t find_actor, void * opaque) {struct inode * inode; inode = alloc_inode (); if (inode) {struct inode * old; spin_lock (& inode_lock); / * We released the lock, so .. * / old = find_inode (SB, INO, HEAD, FIND_AACTOR, OPAQUE); if (! o) {inodes_stat .nr_inodes ; list_add (& inode-> i_list, & inode_in_use; list_add; inode-> i_sb = sb; inode-> i_dev = SB; Inode-> i_DEV = SB-> S_DEV; inode-> i_ino = ino; inode-> i_flags = 0; atomic_set (& inode- > i_count, 1); inode-> i_state = i_lock; spin_unlock (& inode_lock); clean_inode (inode); sb-> s_op-> read_inode (inode); / * * this is special! we do not need the spinlock * when clearing I_LOCK, because we're guaranteed * that nobody else tries to do anything about the * state of the inode when it is locked, as we * just created it (so there can be no old holders * that have not tested I_LOCK). * / inode-> i_state & = ~ i_lock; wake_up (& inode-> i_wait); Retur n inode;} / * * Uhhuh, somebody else created the same inode under * us Use the old inode instead of the one we just * allocated * / __iget (old);.. spin_unlock (& inode_lock); destroy_inode (inode); inode = OLD; WAIT_ON_INODE (INODE);} Return Inode;} (1) Since the caller has released the inode_lock lock when the GET_NEW_INODE function is called, other kernel modules may have created during the inode_lock to call the GET_NEW_INODE function. (SB, INO) determined index nodes. Therefore, the GET_NEW_INODE function must re-call the Find_inode function to find the corresponding inode object in the hash chain table again.
(2) If FIND_INODE () returns a valid pointer, the other modules have created (SB, INO) Inode objects, so call the destroy_inode () function to destroy the Inode object created, then call __iget () Function increases the reference count of the INode object found, and then waits him to be unlocked with the wait_on_inode () function. (3) If FIND_INODE () returns NULL, initialize the previously allocated inode object. Among them, you want to call the clean_inode () function and the S_OP-> Read_inode () method of the super block to read the contents of the disk index node on the block device. Finally, return the pointer of the assigned inode object. 1.4.2 IPUT () Function - Releases a reference to an Inode object When other kernel objects or modules no longer need an inode object, the reference to the INODE object must be released via the IPUT () function. The IPUT () function minus the reference number of the Inode object. If it is reduced to 0, the inode is released. Otherwise it will return directly.
/ ** * iput - put an inode * @inode: inode to put * * Puts an inode, dropping its usage count If the inode use count hits * zero the inode is also then freed and may be destroyed * / void iput.. (struct inode * inode) {if (inode) {struct super_operations * op = null; if (inode-> i_sb && inode-> i_sb-> s_op) op = inode-> i_SB-> s_op; if (OP && OP- > put_inode) op-> put_inode (inode);! if (atomic_dec_and_lock (& inode-> i_count, & inode_lock)) return; if (inode-> i_nlink) {list_del (& inode-> i_hash);! INIT_LIST_HEAD (& inode-> i_hash) ; list_del (& inode-> i_list); INIT_LIST_HEAD (& inode-> i_list); inode-> i_state | = I_FREEING; inodes_stat.nr_inodes--; spin_unlock (& inode_lock); if (inode-> i_data.nrpages) truncate_inode_pages (& inode-> i_data, 0); if (op && op-> delete_inode) {void (* delete) = OP-> delete_inode; / * s_op-> delete_inode intels clear_inode () * / delete (inode);} Else Clear_inode (inode); if (inode-> i_state! = i_clear) bug ();} else {if (! list_empty (& inode-> i_hash) {if (! (inode-> i _state & I_DIRTY)) {list_del (& inode-> i_list); list_add (& inode-> i_list, & inode_unused);} inodes_stat.nr_unused ; spin_unlock (& inode_lock); return;} else {/ * magic nfs path * / list_del (& inode- > i_list); INIT_LIST_HEAD (& inode-> i_list); inode-> i_state | = I_FREEING; inodes_stat.nr_inodes--; spin_unlock (& inode_lock); clear_inode (inode);}} destroy_inode (inode);}} Note the function As follows: (1) The function first calls the S_OP-> PUT_INODE () method (if any) in the super block object to release this inode object.