Word-based text analysis technology
http://www.kmcenter.org/blog/more.asp?name=crop&iD=204: Source
Word-based text analysis 1 core technology: 1.1 Words use words to analyze text content, and obtain key elements of text content. 1.2 Self-learning technology of the word. 1.3 Unique word technology combined with the entire technology. 2 Technical features: 2.1 Since the word library is self-learning, there is no need to set a large dictionary library in advance, and the most important thing is that learning is sustainable and self-run, and the system's understanding will change with the external world Change, and can continue to add new text messages. This method avoids the shortcomings of currently common sample learning methods, that is, a large amount of manual intervention (no manual intervention is not available), can not update knowledge in time. 2.2 Due to the same word, for different people, the words may be different, so it can form its own subordinar intention according to the habit of each user, can use the sub-word library to two analyze the text, generate a personality Chemical results. 2.3 System learning is divided into two ways, knowledge learning and experience learning: 2.3.1 knowledge learning: The system automatically performs on the Internet, no specific destination absorbs various information on the Internet, and analyzes the collected information as knowledge Reserved, the entire process does not require manual intervention, and can be carried out 24 hours a day. 2.3.2 Experience Learning: Each user's specific use, its results will also be reserved as experience and correct the results of knowledge learning. 2.4 Piece technology does not pursue 100% accuracy, and pay attention to practicality, fast, not relying on a huge vocabulary or knowledge base, so you can do not for specific areas, you can solve the word, the name, place name, and new words, etc. These problems are difficult to solve in traditional word methods, especially new vocabulary, almost a worldwide problem. 2.5 Because the entire core algorithm is non-based, dictionary and grammar, but from the imitation of the understanding of the language text, such as a child does not know the dictionary and grammar, but can understand what others say, so as long as the core is slightly If you make changes to English and other text languages, just as a baby, which country you put him, he can learn the local language. 3 Application: 3.1 Correlation Search: 3.1.1 Based on keywords Full text search. For example, in a computer-related article, it may be very small, even at all, and there is no vocabulary such as software, hardware, procedures, memory, etc., and may not be called "computer" in the article. Called "Computer", when searching for this vocabulary, the system can analyze the key elements extracted from the article and "Computer" words are closely related, so they will also put them in the search results. In addition, it is possible to put the wrong result, such as using "Apple" search, but put the article about "Apple pigment" in the search results, "Apple" and "Apple Pigments" are actually two things. 3.1.2 Sorting from this to search results is clearly the most scientific because it is sorted according to keywords and article meaning of the article, but is not based on the links of this non-library factor. 3.1.3 It is also possible to obtain personalized sort results according to the case of the user submit, because the same keyword is different for different people. For example, when using "Football" to search, people who like the football hope that the article about the full color is in front, people like the Premier League hope that the article about the Premier League is in front. 3.1.4 Advice to the user further retrieving the search. When the user uses a word search, some relevant words can be given to the user to further retrieve more required content. For example, after using "space" to retrieve, you can give a series of related words such as universe, galaxy, earth, sun, spacecraft, astronomy.
The current search engine can only give vocabulary containing the first keyword. For example, in order to find Jinxin Company, "Gold Information", "Gold Information", "Metallurgical Information" will appear in the relevant words. Obviously, this is not related, the largest search engine - Baidu, there is no solving this problem. 3.1.5 Natural Language Question Retrieval: You can use natural language asking, such as "What brand of computer quality is good", and not necessarily use one or a few words, which is more in line with the usual use habits. Google currently has this feature, but its technology is very simple, thereby causing many misunderstandings, also raised "Apple" and "Apple Pigment" List, a general search engine (no natural language question), search for "Apple" Retrieve "Apple Pigments", but do not find "Apple", there is no mistake, and Google is simply demolished "Apple Pigments" and "Pigment". The words are respectively retrieved, so there will be a mistake in the search results of "Apple" search results, which is not as good as the general search engine from a certain point of view. Word-based text analysis technology, you can use natural language, and you can avoid such errors that Google appear. 3.1.6 Search Products Market Strategy: Since Google and Baidu are very huge, and competitors are too powerful, this should not be the main direction. Relatively speaking, the search engine for the website is much smaller, and the market prospect is broad, and the space is very. Currently, for many websites, especially e-commerce, supply and demand information websites (such as Alibaba China.alibaba.com, the search efficiency is not important, that is a few seconds to find tens of thousands of results, for them It is meaningless. They want to be accurate, can find the most useful information for the merchants. At present, many websites are seeking such technologies and products, but because of the difficulty of artificial intelligence, this product is less. 3.2 Automatically search for the content it needs according to user hobbies and habits. Many times, people need information for hobbies and habits are difficult to use, two keywords, and through the above technology, they can analyze the hobbies and habits of users from the user's browsing page, and then automatic when the user enters the website. Presenting his favorite content in front of his eyes, and this analysis is continuous, you can instantly master the customs of user habits and hobbies. I have had many people to achieve this feature. If I win Haiwei General Manager Zhang Shuxin leads a team to go to Runxun, I want to do such a website, but because there is no technical analysis of the article, it will fail. Microsoft's MSN and Yahoo websites have also launched a similar function, "My MSN" (http://china.msn.com/help/default.asp?ihelppageid=0) and "My Yahoo" (HTTP : ///Help.yahoo.com/help/gb/my/my-01.html), but they all need users to set keywords, this is three questions: 3.2.1 The biggest problem is that users are troublesome, do not want Use; 3.2.2 As mentioned above, habits, hobbies are not a few keywords, sometimes users don't know which keywords should be summarized; 3.2.3 habits and hobbies are development changes In this case, this change is generally subtle, and the user is also difficult to explain to modify the keywords set, even if the aware, it will often forget and laze to modify.