[Repost] Several free Chinese fensed modules
Http://emuch.net/html/200512/152933.html
Author: zsglly Release Date: 2005-12-13 View: 1 from:
http://emuch.net
Several free Chinese fellowship modules
- | Ultra Posted on 2005-8-14 20:19:05
It is necessary to use the Chinese word technique while doing graduation thesis, and now summarizes the information I have found.
First, what is Chinese word
As we all know, English is based on words, and the word and words are separated by spaces, and Chinese is in units, all words in the sentence can only describe one. For example, English sentence "I am a student", in Chinese: "I am a student." The computer can be very simple to know "Student" through spaces "Student" is a word, but it is not easy to understand "learning", "born" two words together represent a word. Dividing Chinese Chinese characters into meaningful words, is Chinese word, and some people are also called clever words. "I am a student", the result of the word is: "I am a student."
Chinese word is the foundation of other Chinese information processing, and the search engine is just an application of Chinese word. Other such as machine translation (MT), speech synthesis, automatic classification, automatic summary, automatic school peer, etc., all need to be used.
Most of the Chinese scratches are currently studying research institutes, Tsinghua, Peking University, Chinese Academy, Beijing Language College, Northeast University, IBM Research Institute, Microsoft China Research Institute, etc. have their own research team, and commercial companies that truly study Chinese word in addition to There are almost no massive technology.
Google's Chinese scope technology is a Chinese word technology provided by companies called Basis Technology (http://www.basistech.com). Baidu uses the word technology developed by its company. China is used in China. Tenical technology provided by http://www.hylanda.com. Industry Comments Seavasification of Massive Science and Technology is currently considered to be the best Chinese scavenging technology in China, with a word accuracy of more than 99%, thereby alleviating the search results in search results in search results. (The above content is extracted from Appendix 1)
Second, calculate the Chinese lexical analysis system ICTCLAS
On the basis of many years of research, the Chinese Academy of Sciences has been developed, and the system of ICTCLAS (Institute of Computing Technology, Chinese Lexical Analysis System) based on multi-layer horses model is developed. The system features: Chinese Pieces; Words, Non-login. The correct rate of the word is as high as 97.58% (the nearest 973 expert group evaluation results), the unregistered word recognition based on role marks can achieve higher than 90% recall, of which the identification recall of Chinese name is close to 98%, word and mean label processing speed. 31.5kb / s. ICTCLAS and the other 14 of the other 14 free releases are widely reported by Chinese and foreign media, and there are many free Chinese fellowship modules in China or less reference to ICTCLAS.
Download Page: http://www.nlp.org.cn/project/project.php?proj_id=6
Since ICTCLAS is written by C language, the mainstream development tool is not very convenient, so there are some enthusiastic programmers to change ICTCLAS to other languages such as Java and C #.
(1) Fenci, Java ICTCLAS, Download Page: http://www.xml.org.cn/printpage.asp?boardid=2&id=11502
(2) AutoSplit, another Java ICTCLAS, I can't find the download page, click on the local download (3) Xiaoxi Chinese word, once there is a download page, I can't find it. According to the author, from ICTCLAS, there are three versions of Java, C # and C , introduction page: http://www.donews.net/accesine
Third, massive intelligent word study
Massive Intelligent Computing Technology Research Center intends to share the research results of Chinese information processing, jointly improve the level of Chinese information processing, and thoroughly publish the "Sema-volume Intelligence Research Edition" for experts, scholars and enthusiasts. research.
Download Page: http://www.hylanda.com/cgi-bin/download/download.asp?id=8
Fourth, other
(1) CSW Chinese intelligent word component
Operating environment: Windows NT, 2000, XP, or higher, can be called in Microsoft, other Microsoft, etc. in ASP, VB.
Introduction: CSW Chinese intelligent word DLL components can be split by a text automatically, separated by regular Chinese phrases, and separated in a specified manner, and can semantically, word frequency labeling. Its wide field is used in the information data retrieval and analysis of all walks of life.
Download Page: http://www.vgoogle.net/
(2) C # written in Chinese word components
According to the author, a DLL file can be made in Chinese and English word. Fully C # hosting code is written, independently developed.
Download page: http://www.rainsts.net/Article.asp?id=48
Appendix: 1. Winter; Chinese search engine technology reveals: Chinese word; http://www.e800.com.cn/Articles/98/1091788186451.html