Lucene1.3-final completed the words and retrieves that support Chinese

zhaozj2021-02-16 156

Screen.width * .65) this.width = screen.width * .65 "border = 0 name = PIC_3256>

Author: lotus (xuhb@ah163.com)

There are many LUCENE materials on the Internet. Everyone in Chinese materials is the article of the car (http://www.chedong.com/tech/lucene.html), and everyone discusses the most online search, The most influential articles in Chinese, or the Weblucene (http://www.chedong.com/tepid/weblucene.html) written by the car, but those are Lucene1.2, Nowadays, Lucene1.3-Final is said to fully support Chinese full-text retrieval.

Because of the fifth item in the Changes.txt in the Lucene 1.3-Final.zip package is as follows:

5. Fix StandardTokenizer's handling of CJK characters (Chinese, Japanese and Korean ideograms). Previously contiguous sequences were combined in a single token, which is not very useful. Now each ideogram generates a separate token, which is more useful.

This shows that Lucene1.3-final can retrieve in Sino-Japanese and Korean.

have a test:

Test environment: Windows 2000 Pro, JDK1.3.1 or above

1, download Lucene-1.3-Final.zip.

2, extract Lucene-1.3-final.zip, and add Lucene-1.3-Final.jar and Lucene-Demos-1.3-Final.jar to the system's classpath.

3, build a directory and copy some HTML or TXT files (file content!) Into this directory as a full text. Such as: Building a directory D: / Lucenetest / Index, copying in some Chinese content, which can also have multi-level subdirectory.

OK, the environment is ready, you can test it!

4, enter the DOS mode, enter the command: java org.apache.lucene.Demo.indexfiles d: / lucenetest / index

Such as: c: /> java org.indexfiles d: / lucenetest / Index Enter, then index all files in the D: / LuceneTest / Index directory, including files in the subdirectory, and will Index file writing: C: / index directory (automatically created, starting the inDex directory according to your DOS).

Ok, the index has been built, and the test is retrieved.

5, enter the command: java org.apache.lucene.Demo.searchfiles

Enter, such as: c: /> java org.apache.lucene.Demo.searchfiles Enter

Query: Enter the search content here, such as: "It is recommended to do a grammar check first," So long :)

Successful, the result came:

Searching for: "It is best to make a speech check" 1 Total matching documents0. D: / lucenetest / index / learning Lucene's personal. TXT test results See the drawings.

It can be seen that Lucene-1.3-Final fully supports Chinese full-text search, use single-word cut! !

How, good, hurry to change Lucene-1.3-final :)

Reference:

Http://jakarta.apache.org/lucene/docs/gettingstarted.html

转载请注明原文地址:https://www.9cbs.com/read-14144.html

9cbs

New Post(0)