Full-text search and XML

xiaoxiao2021-03-06  42

Full-text search and XML

(Ksu99@163.com) Drawn from XQuery 1.0 and XPath 2.0 Full-Text W3C Working Draft 09 July 2004

http://www.w3.org/tr/2004/wd-xquery-full-text-20040709/)

The XML document may contain highly structured data (numbers, date), non-structured data (label free stream text) and semi-structured data (text-containing text). When a document contains unstructured or semi-structured data, it is important to use data such as a full-text search such intelligence retrieval technology. Full text search and substrs search in many ways:

Full-Text Search Search for phrases (sequence of words) instead of substrings. Substing Search contains a news entry for string "Lease", will return a news entry that contains "FooBar Corporation Releases the 20.9 Version ...", and the full-text search of the phrase "Lease" will not be like this .

The full-text search in the expectation will support language-based and logo-based retrieves, while substrs are retrieved. An example based on language retrieval is "Give me all the news entries containing the same stem with 'mouse'" (looking for 'mouse' and 'mice'). " An example of a mark-based search is "Give me all the news items containing the word 'XML' in the 'Query' in 3 words (tags)." The full text search is affected by the changes in language and minute differential. The result of returning often has different validity. This is a precise lookup when you retrieve all the prices less than $ 100 in a website. There is a group of cameras in line with this search, and there is a group of people who do not meet. Similarly, when you use "mouse" to use "mouse" to retrieve, there is only one expected result set. When you do a full-text search, assume that all news entries that contain words "mouse", you might want to find a word "mice", there may be "rodents" (maybe "Computers"!). But not all the results are suitable: some results are more than other "Mousey (like mice)." Because full-text retrieval can be inaccurate, we have a score or relevance concept: We usually look forward to seeing the most relevant results at the top of the result list. Of course, relevance is in the eyes of the Beholder. Note: With the development of XQuery / XPath, the score concept is applied to the structured search inquiry. For example, when developing a travel plan or purchasing a camera, it is sometimes used for a proximity to a proximity. If XQuery / XPath defines a unified inaccurate match, we assume that we can utilize the score frame provided by the ful-text language.

As XML becomes mainstream, the user looks forward to store and retrieve their documents in XML format. This requires a standard way to make a full-text search for XML documents, as well as structured retrieval. A similar to a full-text retrieval guidance ISO defines the SQL / MM-FT standard. SQL / MM-FT definitions extended SQL, providing a similar function that enables it to express a full-text query, just as the full-text language mentioned here is the same as the extension of XQuery 1.0 / XPath 2.0.

The full text query is executed on the tagged text, that is, divided into a word sequence, punctuation unit and space.

A word is defined as any character, elementary syntax, or character sequence returned by the marker, as a base unit that is queried. Each instance of a word consists of one to a plurality of consecutive characters. In addition, words are defined by implementation. Note that the continuous word does not need to be separated by punctuation or spaces, and the words may overlap. The phrase is a sequence of order words that can contain any number of words. Marking makes the relative position (for example, approximate operators) using functions and operators. It also uniquely identifies sentences and paragraphs that contain words. Marking also allows functions and operators to operate in part or stem of words (for example, wildcard, stem).

We use namespaces "

FT

"(representative

Full-text

)versus

URL http://www.w3.org/2004/07/xquery-full-text

Correspondingly and used to define the namespace of the full text. We also use "

FTS

"Come to define in semantic sections.

转载请注明原文地址:https://www.9cbs.com/read-76526.html

New Post(0)