JDBM Open Source File Database Used in full text search

zhaozj2021-02-11  201

Recently ordered to be a full-text search program, establish a file index with keywords, similar to the following structure:

Word 1 article number 1, article number 2, article number 3, article number 4 ... word 2 article number 3, article number 4, article number 5, article number 6 ... word 3 article number 1, article number 2, Article number 8, article number 7 ....... have friends I said why I don't use Lucene, I certainly put forward it, but I have been rejected. I have to pull up my trousers. This storage method, according to the keyword, you can quickly find all the articles where the words are located, as for multiple keyword queries, you have to make a collection of results. I originally intended to use GDBM file data inventory index (the relational database is not very easy to use), I found that Java seems that I can't support GDBM, I don't want to go to GDBM with JNi, but I am found in SourceForge. JDBM can be used instead. If your program needs to save some simple data, you don't want to use the database, you can choose JDBM. Also attached to the two-dimensional term, used to distinguish between articles, this word has no word surface maintenance, simple, will be used.

JDBM is very simple, specify a file name, you can use it as a HashTable to use: import jdbm. * ;. . . JDBMRecordManager NewsIndex = New JDBMRecordManager (filename); jdbmhashtable hashtable = newsindex.gethastable ("words") ;.. . . // Establish an index method, WordBreaker knocks the article into a word public void index (string docid, string body) {Try {WordBreaker.Settext (body); string [] words = wordbreaker.breakall (); for (int I = 0; I

}

} Catch (exception ex) {ex.printstacktrace ();}}

// Output all data to see, test use method PUBLIC VOID Show () {Try {jdbmenumeration enum = HashTable.keys (); while (enum.hasmoreElements ()) {string akey = (String) enum.nexTelement (); String value = (string) Hashtable.get (akey); System.out.println (Akey ":" value);}} catch (exception ex) {}} // Remember to close JDBM file public void close () {TRY {Hashtable.dispose (); newsindex.close ();} catch (Exception EX) {}} Alternate binary word program WordBreaker: Import java.text. *; Import java.util. *;

Public class wordbreaker {char [] punctures = new char [] {'.

',', ','; ',': ',' "','" ',' (',') ','! ','? ',' ◎ ',' # ',' ¥ ','% ',' ... ',' ※ ',' × ',' [','], '"', '"', '", ' "', ','}; public WordBreaker () {} public void setText (String text) {this.sourceText = text;} private String sourceText =" "; public String [] breakAll () {StringBuffer enWord = new StringBuffer (); StringBuffer cnword = new stringbuffer (); boolean lastinsertcn = false; for (int i = 0; i 255 && Isword (c)) {cnword.Append (c); LastInsertCn = true;} else f (c <255) {if (LastInsertN) {enword.Append (''); enword.Append (c); LastInsertCn = false; } Else {enword.Append (c);}}} string str = cnword.tostring (); string [] result = new string [str.length () - 1]; for (int i = 0; i

Public static void main (string [] args) {WordBreaker WB = New WordBreaker (); wb.setText ("under the current 80-issuer mechanism, the supervisory department is less than 1/3, there is no one's right to speak. The work of the review party is equally independent of the supervision department. In this sense, in the face of the market, the SFC has been suspected for people. "); Wb.printArray (wb.breakall ()); } private void printArray (Object [] OS) {for (INT i = 0; i

}

} Private boolean isword (char c) {for (int i = 0; i "Beijing Jingtian Tian Anmen". English in accordance with spaces and punctuation. Download jdbm: jdbm.sf.net

转载请注明原文地址:https://www.9cbs.com/read-5774.html

New Post(0)