Lucene is a full-text search API

xiaoxiao2021-03-06  70

Lucene is a full-text retrieval of API, which is more examples, and the cases of applications.

LUCENE and the references of this article.

This study, is practical, one is simple application, the other is the web application, the third is Chinese, four-related applications (Lucene homepage in Sandbox).

0, ready to work to Lucene's homepage download current stable version Lucene-1.2.tar.gz, unzipped, put two JAR files in the Lucene-1.2 directory Lucene-1.2.jar and Lucene-Demo2-1.2.jar After the appropriate directory, add it to the ClassPath environment variable.

TAR ZXVF Lucene-1.2.tar.gz <---- decompression

CD Lucene-1.2

CP * .jar $ dp

<--- Store the directory of the JAR file, replace the actual directory according to the specific work requirements

ClassPath = $ ClassPath: $ dp / lucene-1.2.jar: $ dp / Lucene-demos-1.2.jar; Export ClassPath

If you don't want to log in every time you are logged in, you can edit / etc / profile or your own directory. Profile, add the last line of the file to the last line of the file. Windows settings, right-click "My Computer" on the desktop, select "Advanced" -> Environment Variable "-> Select ClassPath->" Edit ", add the full path name of the two JAR files in the input box, pay attention to separation The symbol is a semicolon (;). See the right figure.

1, run DEMO

$ java org.apache.lucene.Demo.indexfiles / usr / local / man / man1 /

<- Establish indexing of MAN files

Adding /usr/local/man/man1/mysql.1

...........

Adding /usr/local/man/man1/cvs.1

1614 Total MilliseConds

$ java org.apache.lucene.demo.searchfiles

<- Retrieve

Query: Password

Searching for: Password

7 Total Matching Documents

0. /usr/local/man/man1/mysql.1

......

6. /usr/local/man/man1/mysqlshow.1

Query:

Ok! Lucene stands in Demo runs successfully

The primary API function called by this DEMO program:

/ * About the main function of the index * /

File File = New File (Argv []);

Indexwriter Writer = New IndexWriter ("INDEX", New StandardAnalyzer (), true);

Document doc = new document (); doc.add (Field.Text ("path", file.getpath ()); doc.add ("Modified", Datefield.Timetostring (file.lastmodified ()) )); Fileinputstream is = new fileinputstream (f); Reader Reader = New BufferedReader (New InputStreamReader (IS)); Doc.Add (Field.Text ("Contents", Reader);

Writer.addDocument (DOC);

Writer.optimize (); Writer.close (); / * About the main function of retrieval * / seat searcher = new indexsearcher ("index"); analyzer analyzer = new standardanalyzer (); query query = queryparser.parse (lineforsearch, " Contents ", Analyzer; Hits Hits = Searcher.Search (query); for (int i = start; i

3. Run LuceneWeb assume that Tomcat is installed in the $ TOMCATHOME directory, replacing $ TOMCATHOME with a real directory when applying.

CD $ TOMCATHOME / WebApps

Mkdir Lucenedb

CD Lucenedb

Java Org.Apache.lucene.Demo.indexhtml -create -index $ TOMCAT / WebApps / Lucenedb ../examples

<- With a relative path "..", point to the location of the indexed file, two to display the URL of the index file, because the retrievalful JSP program is in the LuceneWeb subdirectory .Examples can be used in other real applications Directory name

CD ..

CP ~ / Lucene-1.2 / LuceneWeb.war.

<- LuceneWeb.war under your decompressed lucene-1.2 directory

../bin/shudown.sh

. ../bin/startup.sh

Then access http://yourdomain.com:8080/luceneweb through the client, if the browser should appear on the right. . Reclusion to the server

CD LuceneWeb

vi configuration.jsp

<- Change the value of indexlocation to "$ Tomcathome / WebApps / LuceneDB";

CD ..

Jar -ur Luceneweb.war Luceneweb

Go to the client, refresh the page, then enter the word to retrieve it. Unfortunately, this can only retrieve English words. And if the Title of the html page is Chinese characters, there is a problem. Figure.

The indexhtml here can index the files of HTM, HTML, and TXT types, using an HTMLPARSER, except that the previous example is basically the same.

转载请注明原文地址:https://www.9cbs.com/read-91895.html

New Post(0)