JIVE Note 2 ---- About Chinese Search in Jive2

zhaozj2021-02-16  43

Although JIVE is good, it is a waste in Chinese processing. This is also true of Lucene. In the latest version of Lucene1.2RC2, the QueryParser class can only handle the search for A-Z | A-Z | 0-9. If you have entered Chinese, you will throw it out of a "Lexial Parse Error" error. Moreover, the keyword in the e-article is based on the space distinguishing, which is not suitable for Chinese, so the dictionary-based mean or overlapped sympathy is generally used. Understand the above situation, we can start modifying Lucene to search for Chinese search. (1) Modify QueryParser.jj and modify the words definition inside to accommodate Unicode double bytes. QueryParser is generated by Javacc. It is recommended to find out the syntax of JavaCc, and the concept of EBNF. (2) Write airs' analyzer and tokenizer. Here I got a chinesetokenizer.java from Lucene's mail list, I wrote a ChineseAlyzer.java according to StandardAnalyzer.java. We will use this analyzer with overlapped mode to cut a complete Chinese statement. (3) Modify DBQuery.java and SearchManager.java in jive2, change imoprt com.lucene. * ... to Import Org.Apache.lucene. * ... change the StandardAnalyzer in the inside to ChineseAnalyzer (4) compile, Start JIVE, Rebuild Index. Now, your JIVE2 can support Chinese search Note: JIVE and AppServer must run in the Chinese environment. For example, the NT / 2000 of Chinese is set to EXPORT LC_ALL = zh_cn under UNIX / Linux. At the same time, the global.jsp plus Request.SetCharacterencoding ("GB2312"); add ContentType = "text / html; charSet, respectively = GB2312 ". This can not change the vast majority of APPServer, including garbage tomcat4.0.1. Also, modify DBURL for jdbc: mysql: // localhost / jive2? Useunicode = true & characterencoding = GB2312 attached: Modified queryparser.jjzh / chineseAlyzer.java zh / chinesetokenizer.java three files of ZIP package I have written . Speaking is very vague, maybe the older dismisses, the newbie can't understand. I can't help but my expression is very poor. Good luck!

转载请注明原文地址:https://www.9cbs.com/read-27857.html

New Post(0)