0, write in front
This article is mainly described in Windows2000
In the Server environment, Exchange Server and SharePoint Portal Server's full retrieval feature applications involving some concepts under SQL Server.
This article will only explain some basic concepts, mainly based on personal understanding and examples, I hope to understand detailed information, you can access the connection of the reference. For errors and shortcomings in the article, please refer time to time.
1. Basic concept
I believe that many people are very awesome for the full-text search, and I want Exchange actually to find a content in the PDF document as an attachment. Exchange is too strong. In fact, Exchange did not do anything (or did something, nothing did not do, the following will be said), the full-text search is the next important service "MS Search Service" implementation, Exchange only needs and MS Search Interaction naturally has a full-text search function. So in the Enterprise Server in the MS, it supports full-text retrieval in Exchange, SharePorit Portal Server and SQL Server.
There is also a service INDEX Service under Windows, which is also completed, there is existence, even if the file system under Windows, and IIS can perform full-text retrieval, which is very convenient.
I don't know if I can say that INDEX Service is a special case of MS Search Service, or I don't see what is different in architecture and completion. This is also one of my confused places.
Speaking that the full text retrieval must be clearly clear, we can know what the service has done, so that it is necessary to develop at that level when it achieves specific functions.
The full text search is divided into two parts: full-text index and full-text query. We know that in order to improve the efficiency of the query, there is a mechanism called "index", the index stores the keyword and the corresponding record in the logical storage space in the index. There is an index table in the Database Management System (DBMS). I believe that everyone can understand that the full text search is also the same principle.
Full-text index: Create an index process, establish a keyword and record correspondence. Creating completed index information or modifying attribute information in an incremental manner.
Full-text catalog: It can be considered a keyword stored organization.
Full-text query (full-text query)
: Use the index classification to find the corresponding record according to keywords.
In MS product system results, the main job of MS Search Service and Index Service (later description Search Service later is to create and maintain index tables and index categories, which can be called search engines; such as Exchange and SQL Server are Providing storage space for records, you can collectively support some specific interfaces of MS Search, you can use. The process of full-text index is the MS Search request storage engine, obtains data that requires index, analyzes keywords, creates or maintains index information; full-text query is the storage engine request MS Search, MS Search returns the location of the corresponding record according to keywords, storage The engine organizes these records to returns to the caller.
This way we can learn that if you want to support the full text search, you must first have a full-text retrieval service provider to support the full-text retrieval storage engine. Under the platform of Windows Server, for Exchange applications, the above conditions are met, then we can start! 2. Development under the full text search
Understand the basic architecture of the full text search under the Windows platform, we can develop relevant development work in many ways.
2-1, full text inquiry
Full text search is to use query statements to find records that meet the conditions in the storage engine. This should be that everyone is most familiar with the most application, and it is also the focus of this article, and there is a dedicated chapter detail the function of this part.
One thing that needs to be pointed out is that the process of full-text query relies on MS Search and the index data he maintain, but is the function supported by the storage engine. This part of the application is interacting with the storage engine. Ms Search This search engine is for us. transparent. For different storage engines, query statements (substantially SELECT) is different, but there is no difference in nature.
2-2, MS Search under the development of IFILTER
The full-text search is the most amazing place that how can he retrieve the text string from the two-binary or specific encoded files of Word or PDF? Another problem is that MS Search does not support all file formats?
MS Search does not support all file formats, even if the MS is strong, it is impossible to fight the world's UK, but MS Search provides a mechanism that supports documents that support any format, this mechanism is Index Filter, referred to as IFILTER. I personally think that there is two works that IFILTER's ultimate work, one is to read files in the specified format, parse internal text content (rather than format or graphics, other binary inclusive objects) and document properties (for example: authors, classification, etc.); The second is a word (PARSING)
Word or phrase), the full-text retrieval is high, the key to see if the keyword generation is reasonable, I always feel that these IFILTERs in the future are not ideal for Chinese support, and searching Chinese often is inexplicable.
There are four IFILTERs provided by default under Windows: For plain text, for the HTML type document, for the Office Series document, for the MIME file. If you want to support other types of documents, you must download the iFilter provided by this type of document developer, such as PDF, you need to download on Adobe's website (http://www.adobe.com/support/downloads/product.jsp? Product = 1 & Platform = Windows).
Unfortunately if your system needs to support documents without iFilter, or your own defined document, you need to develop custom iFilter. It is necessary to analyze the specific questions when it comes to, there is not much to say.
2-3, developing storage engines that support full-text retrieval
2-1 concern is user demand, which is the most close to the development of the application; 2-2 is concerned about storage content; this part is concerned about storage itself. Simply put when your job is to develop a storage management system, and you want to use the existing retrieval engine (Search Service) then you need to have the retrieval engine itself.
When the concept is introduced, this type of interaction is divided into two categories. When a class is created or maintained, the other class is a query. Simply look at the search engine itself, divided into three parts: Indexing Support / Index Engine, Querying Support / Search Engine (SEARCH ENGINE). When creating an index, the storage engine is primarily dealing with the Indexing Support section, index creation and maintenance is completed by Index Engine; when the full-text query, the main and Querying Support are derived, and the query is done by Search Engine. The so-called support is to achieve a series of interfaces and call a series of interfaces. I have never done it. It is also nonsense, just these.
3, full text inquiry guide
The following will be described in detail how the following is a full text inventory in the Exchange Server 2000 platform.
3-1, configuration
By default, Exchange is a function that does not start full-text retrieval and requires some simple configuration. The configuration process is to create an index for the specified storage space (Public Store or Private Store).
Under the management tool provided by Exchange System Manager, these configuration work is very easy to complete, only three steps.
a) Select a specific PUBLIC Store to perform the "CREATE FULL-TEXT INDEX" function, which is a need to select a directory where Index Catalog is selected.
b) The full-text index just created is nothing anything, and "Start Full Population" is required. This is a more time-consuming operation, it does not generate effects, but it is precisely because of the usual creation of index information, it is possible to relatively fast positioning records.
c) Then you need to modify the "full-text index" tab, modify the "Update Interval" property item, select the time interval for automatically update index information, if you choose ALWAYS RUN, Then update in real time, of course, the system overhead is relatively large. Of course, you can also update index information by "Start Incremental Population).
d) In addition, in the "Full-Text INDEX" tab, select the "THIS INDEX IS CURRENTLY for Searching By Clients" property item, which uses the index information created in this section during the query.
This part of the function I didn't find the programming interface, so it is only possible to configure by manual way. I hope to understand the details, please consult the MS-related white paper.
3-2, syntax description of query statement
2-1 mentioned in the query statement is not provided by the storage engine, usually is an extension of the SELECT statement, and below will explain the full-text query as an example of Exchange SQL.
EXCHANGE The statement of the full text query is not complicated. The following predicate is provided in Exchange Store SQL, which contains the following predicates in the WHERE clause (Clause), you can continue to retrieve the full text:
Contains: Matching the full word of keywords, format,
Contains (["PropertyName" | *,] 'SearchSpec')
If the "PropertyName" section * (ASTRISK), retrieve the keyword in all properties, does not contain "PropertyName", then retrieve the keyword in the text section of the message or document, the same. The "SearchSpec" section contains keywords, wildcards *, such as 'good *'; can also include multiple keyword combination findings, support logical operators and OR. Each keyword requires "(quote), otherwise a syntax error will occur.
FreeText: Compared with Contains, it can be blurred in a variety of words or a set of words in a keyword or a set of words in the keyword.
Freetext (["PropertyName",] 'SearchSpec') The emphasis on the matching form of keyword transform, not the keyword string (Substring), that is, to find 'rose' But you cannot find 'republican' by 'public'.
Formsof: This predicate needs to be included in the Contains or Fretext predicate. By this predicate modification, each transform form of the keyword can be matched, and the transformation form of the keyword is determined by the search engine. Format,
Formsof (Type, "String" [, "String"])
The value of the Type parameter in Exchange is Inflectional.
Rank By: This predicate is usually used to modify the Contains or Fretext predicates, used to represent the frequency of the keyword. Format,
Rank by Clause (Mechanism, Weight)
Where the CLAUSE parameter is Weight or COERCION, Weight represents the weight, and Corecion actually does not have any instructions. Mechanism parameters, represent behavior, such as Weight or MultiPHY. Weight is a value between 0-1, indicating the weight.
When the WHERE clause contains multiple contacts and freetext, there is Rank By, which can be considered to match, very useful. After adding the Rank By predicate, you can read the value of the "URN: Schemas.microsoft.com: FullTextQueryInfo" property to compare the degree of matching between records. I don't know how much the value is the value of this property, but from the situation where the actual data is obtained, the maximum is 128.
Look at a complete example:
Select "DAV: HREF", "URN: Schemas.Microsoft.com: FullTextQueryInfo: Rank"
From scope ('Deep Traversal of "")
WHERE FREETEXT ('"Program" or "Software")') Rank By Weight (1.0) OR Contains ('Formsof (Inflectional, "Java") and "vb") Rank By Weight (0.5)
The above is only explained on the content related to the full text query, the syntax of the Store SQL is completed, and the documentation in the MSDN is reviewed. By the way, Exchange mainly provides collaborative services, rather than document management, so it is not particularly powerful and flexible in support of full text. Another server product of MS SharePoint Portal Server, the main function is document management, one of its important applications, providing retrieval services, so its full-text retrieval is more powerful. In addition to the above, Near, Isabout, RankMethod and other modifications (TERM), which can better control query conditions. The details are also requested to refer to the relevant documentation.
3-3, ADO & WebDAV
Two methods, ado or webdav can be used when performing queries. There is only only the code implemented here.
3-3-1, WebDAV.
By sending HTTP Search requests, command parameters, and corresponding data to the URL of the specified query, it is a certain format XML document (see WebDAV reference manual). Examples are as follows:
private System.Xml.XmlDocument SendSearchRequest (System.String sUrl, System.String sQuery) {System.Net.HttpWebRequest oRequest = null; System.Net.HttpWebResponse oResponse = null; System.Net.NetworkCredential oCredential = null; System.IO. STREAM OSTREAM = NULL; system.text.utf8encoding incoder = new system.text.utf8encoding (); system.byte [] abdata = null; system.xml.xmldocument xmldoc = null;
IF (surl == null || surl == string.empty) Return null; if (SQuery == Null || SQUERY == String.empty) Return NULL;
Abdata = OENCODER.GETBYTES (SQUERY); if (abdata == null) Return NULL;
oCredential = new NetworkCredential ( "administrator", "server", String.Empty); oRequest = (System.Net.HttpWebRequest) WebRequest.Create (sUrl); if (! oRequest = null) {// preparing search request oRequest.ProtocolVersion = HttpVersion.Version11; oRequest.Method = @ "SEARCH"; if (oCredential = null!) oRequest.Credentials = oCredential.GetCredential (new System.Uri (sUrl), String.Empty); oRequest.ContentType = @ "text / xml "; oRequest.ContentLength = abData.Length; oStream = oRequest.GetRequestStream (); oStream.Write (abData, 0, abData.Length); oStream.Close (); // waiting for response try {oResponse = (System. Net.HttpWebResponse) oRequest.GetResponse (); oRequest = null;} catch (System.Exception e) {Trace.WriteLine ( "SendSearchRequest:"! e.Message);} if (oResponse = null) {oStream = oResponse. GetResponseStream (); // Get Data from Stream if (ostream! = Null) {Try {xmldoc = new xmldocument !); If (xmldoc = null) {xmldoc.Load (oStream);}} catch (System.Exception e) {Trace.WriteLine ( "SendSearchRequest:" e.Message);} oStream.Close (); oStream = NULL;}} oresponse = null;} Return XMLDoc;} The incoming parameter is the root path (which can be considered to be a table name, here is the HTTP URL) and the query statement (the format is shown), the result of the return To the XMLDocument instance containing the results of the query.
3-3-2, ADO.
Exchange provides two provodes, ExoledB, and MSDAIPP, available for ADO calls, but only MSDAIPP This Provider supports full-text retrieval, and when using MSDAIPP on Exchange installation, you will generate unknown errors, such as hanging Since it is recommended, WebDAV is recommended. The example of ADO is as follows:
private ADODB.RecordsetClass GetQueryResult (System.String sUrl, System.String sQuery) {ADODB.RecordsetClass rsResult = null; ADODB.ConnectionClass cnnExchange = null; ADODB.CommandClass cmdQuery = null; System.Object objAffectedRecords = null, objParams = null;
IF (surl == null || surl == string.empty) Return null; if (SQuery == Null || SQUERY == String.empty) Return NULL;
try {cnnExchange = new ConnectionClass (); if (cnnExchange == null) return null; cnnExchange.Provider = "provider = msdaipp.dso"; cnnExchange.Open (sUrl, String.Empty, String.Empty, 0);} catch (System.Exception E) {Trace.writeLine ("getQueryResult: Create Connection Failed!" E.MESSAGE); cnnexchange = null; return null;}
cmdQuery = new CommandClass (); if (! cmdQuery = null) {cmdQuery.ActiveConnection = cnnExchange; cmdQuery.CommandType = CommandTypeEnum.adCmdText; cmdQuery.CommandText = sQuery; try {rsResult = (ADODB.RecordsetClass) cmdQuery.Execute (out objAffectedRecords , ref objParams, 0);} catch (system.exception e) {trace.writeline ("getQueryResult: query Data failed!" E.MESSAGE);} cmdquery = null;
Return RsResult;} The incoming parameters are respectively, the root path (which can be considered as a table) and query statement, respectively, the return result is the Recordset instance containing the results of the query. It is necessary to emphasize that MSADIPP's Provider does not support the specified access username and password when the connection is opened.
The above is a brief introduction to the full text search under Exchange. The main points of the IFILTER and storage engine development, etc. have the opportunity to further explain it. Reference documentation
A, SQL Server architecture (http://msdn.microsoft.com/library/en-us/architec/8_ar_sa2_0ehx.asp)
B, Using Custom Filter with index service (http://msdn.microsoft.com/library/en-us/indexsrv/html/ixufilt_912d.asp)
C, Exchagne Store SQL (http://msdn.microsoft.com/library/en-us/wss/wss/_exch2k_sql_web_storage_system_sql.asp)