Lucene Dictionary file structure

xiaoxiao2021-03-06 53

Dictionary files in Lucene have neither use the B Tree structure of the commercial database, nor is it more than Hash. The two files are formed by .tii and .tis two files.

At the .tis file, a packet point is generated, and the TERM number (from 0) in the .tis file can be used in IndexInterval, the current TERM's pre-drive Term is used as a packet point (the first packet point is "") Save in the .tii file. Alternately fill in the .tii and .tis file until the Layer 2 file structure is established.

Some people will ask. Why don't I use the Hash method? Tii file? The answer is: You need to find the TERM of Query Term when you queries, and the Hash method cannot be competent (the Hash algorithm cannot be found).

The .tii file saves the pointer to the .tis file, retrieve the .tii file to be prefetched into the memory, then fold the query to find the neighboring and less than or equal to Query Term packet TERM, from .tii file The mid-group point TERM pointer points to the .tis file location start, order query. TERM in the .tis file until you find Quey Term or to find a dictionary to sort the TERM of Query Term (indicating that there is no query term).

The LUCENE default packet span is 128 TERM, which is the ROOT layer file, which does not reach 128 TERM. The query is similar to B TREE (there is no splitting of the parent node after more than 128, so the lucene dictionary file is not a B Tree structure), that is, The .tis file can quickly query 128 × 128 TERM. Of course, this is not the final bottleneck, macro said as long as the .tii file can meet the OS's Page frame structure, but this is too optimistic.

Lucene does not use a special dictionary file structure (or a commercial system common structure), which is related to Lucene (Dynamic index, fast), which can be learned by seeing my article later. Tomorrow map is clear, please wait. You are also welcome to discuss.

转载请注明原文地址:https://www.9cbs.com/read-68590.html

9cbs

New Post(0)