New Direction of Information Search Technology Application: Popular Retrieval and Knowledge Retrieval

zhaozj2021-02-16  53

New Direction of Information Retrieval Technology Application: Popular Retrieval and Knowledge Retrieval [2001-09-26] Water

Information retrieval and the development of full text retrieval

How to find information quickly, accurate, comprehensively, is especially important in the era of knowledge economy. In recent years, information retrieval technology has achieved rapid development. It is especially worth mentioning that Chinese full-text search technology has developed very rapidly, and domestic self-developed products have achieved most of the market share, which competes for a core technology. The domain is very difficult to be valuable. The famous full-text search system TRS has achieved excellent results in the governments, enterprises, media and education, with market share of more than 70%. At present, the current technology has been mature and is being widely used.

Search engine search technology development and deficiencies

The development of the Internet has greatly promoted the development and application of information retrieval technology. A large number of search engine products are born, providing netizens with good fast information acquisition and network information navigation tools, the most famous search engine includes Google, Altavista, etc. Search engine services and search engine technology are completely different concepts, each portal will provide search engine services, but the search engine technology behind it is a general user. The search engine technology is also widely used, but the information of the Internet and the general enterprise internal information are different. There are two key issues that need to be resolved: First, the speed, traditional information retrieval system general index library scale in G-class, However, Internet web search requires tens of millions of billion web pages, and the other is relevant, and there are too many information, and the ratio and sorting are particularly important. Solving the basic strategy of the first issue is to use the retrieval server cluster technology to solve the second problem method involves link analysis techniques that have developed like Google and Baidu.

Internet web search engines are currently facing three main challenges. First, the quality of the search still needs to improve. At present, there is basically no intelligent technology in the search engine. Similar to "Sun Wukong's natural language search", it is basically propaganda, no substantial natural language processing and understanding, the second is the knowledge compression, the Internet There are too many information repeats, and must be heavy, some experts are called knowledge compression. The third is how to turn to enterprise applications, and the search engine is basically developed with the wave of .com, and therefore, with .com's destruction, search engine technology development company has also fallen into difficulties, foreign countries such as Altavista is not listed, domestic Search Engine Company is probably not exception, so everyone will turn to the corporate market (traditional market) to find real profit points, but some advantages of search engine technology often do not work in enterprise applications, and even become disadvantages, become incomplete , Inspiration, unstable search technology. Such as sorting technology, the search requirements for enterprise applications are sorted based on content, that is,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, Effective means) The sorting of the results of the query does not work, link analysis is based on the basis of the number of web pages is used as an importance, and the link to the webpage inside a website is determined by the website content editing system, its link The number of times is completely accidental factors and cannot be used as a basis for discriminating importance. Also, for example, the search results are required to be stable, but the search engine is often not done, and in many search engine applications, in order to improve the retrieval strategy used in the large-scale web page (can be said to be a skill), and index Methods often lead to unstable and unbel understandable results of search results. For example, we have found a portal search engine, query www.shuku.net in China's mirror, "We entered the retrieval expression" Mirror China WWW Shuku Net "no result, but we query" WWW Shuku Net "and" China WWW Shuku Net "and" Mirror WWW Shuku Net "results are 368/368/361, which is wrong. It is unacceptable to enterprise users. Also, search engine applications generally adopt server cluster technology, this for most enterprise applications Not suitable and unnecessary; the index and service of the search engine are separated, which cannot adapt to the dynamic growth and modification of data in the enterprise application; the internet search engine is based on file system, but the content of the enterprise application will generally be safe and concentrated. Survuralially in the data warehouse. Therefore, advanced search engine technology has many limitations in enterprise applications. Unbound information retrieval: PERVASIVE RETRIEVAL

On the one hand, the information exhibits diversity in explosive growth and on the other hand, and business decisions are increasingly dependent on timely and accurate information, so the application of information retrieval is the ubiquitous information retrieval, that is, popularization Pervasive Content Retrieval. The main direction of popularization of retrieval applications is:

1. In the operating system, the built-in content retrieval engine, the upcoming Windows XP operating system will have a built-in advanced search engine, aspect users quickly search and locate files on the hard disk, and its characteristics are characterized by a variety of formatting. Document.

2. Content search engines in various electronic publications and information reading tools, such as CD publications, eBook readers, PDAs, and even content retrieval interfaces.

3. Search in massive databases or digital library applications.

4. Web information retrieval, this is the traditional search engine application and the station (site) retrieval for the feature website. 5. Intelligent information retrieval in e-commerce applications, providing strong content integration and search capabilities for B2C and B2B applications, thereby accelerating customer satisfaction in e-commerce applications.

Popular retrieval of search technology has put forward some new requirements, including: scalability (small to PDA, large to the entire Internet search and corporate data warehouse); support standards (such as XML, J2EE, Z39.50, etc.); mixed search capabilities , Usability (from expert to general consumers), etc.

Search technology provides engine for content management

The popularity of the search should be successful, to create profits for the enterprise, a major challenge is closely combined with content management, becoming the core engine in the content management value chain, content management is with the popularity of the Internet and the development of e-commerce, content It is the extension of traditional data and information concepts, a content management system requires something other than traditional data and information, such as a site page, reflecting the style of template file or even some websites. For a content management system that greatly increases in quantity and species, if there is no powerful search engine, it is incomplete to be unimaginable. Search as the core of the content management, in order to adapt to different user needs and different data objects, there must be different upper service interfaces, and some are reflected in the query of structured data, and some are reflected in document / knowledge base. Search, there is also an embodiment of personalization. These different forms are mainly in the need for different types of users.

Knowledge retrieval: information retrieval technology development focus and direction

The full text search solves the problem of query problems in general non-structured text information, and effectively solves the problem of relational database management systems that cannot query non-structured information. However, the effect of full-text retrieval needs to be further improved. The ability to adapt to different applications requires improvement, and its core is to develop knowledge retrieval. The development of knowledge retrieval should be able to effectively solve some key issues: 1. Structured data and non-structural data Mixed retrieval, in e-commerce applications, the system can usually need to efficiently solve the hybrid search problem of structured data and non-structural data. For example, in a talent database query, in addition to some of the characteristics of talents, More importantly, inquiring the contents of its resume, although some products have hybrid retrieval functions, the core data model is not well resolved, and it is necessary to further develop.

2. Semi-structured content retrieval-xml content retrieval engine, XML has become a standard for data descriptions and exchange, so the semi-finplancular configuration of XML can achieve better retrieval effect than traditional full text retrieval.

3. Intelligent knowledge retrieval, intelligent retrieval is often misleaded by some manufacturers, such as "Chinese", including the "People's Republic", is not retrieved, retrieving "Computer", can retrieve the content containing "Computer", These are intelligently retrieving primary phases, intelligent knowledge retrieval should pay more attention to text mining, and we believe that a smart retrieval system should at least contain some functions: (1) Chinese word trisss with large-scale instances Library. (2) has the function of the subject dictionary, a general sense of synonym, pinyin retrieval, homophone search. (3) With the content-based similarity retrieval function, it has an automatic classification (automatic cluster) and automatic summary function, with knowledge compression and re-function (4) have text mining functions, such as understanding of numbers, new word learning, etc. . (5) Intelligent agent, automatic and self-service retrieval. Knowledge retrieval depends on the breakthrough of linguistics, is never a day, IBM and Microsoft China Research Institute, some universities and research institutes have invested great power in this regard. In terms of productization, Elbao is acquired. After the relationship between Chinese full-text retrieval, research and development in knowledge retrieval has recently achieved significant breakthroughs, first introduced practical Chinese automated classification, automatic summary, automatic density and similarity retrieval technology, thus for Chinese content Management technology and products have laid a solid core competitive advantage.

转载请注明原文地址:https://www.9cbs.com/read-19740.html

New Post(0)