Microsoft's site search engine insider

xiaoxiao2021-03-06  99

Search Development Learry Jordan, developers Michael Ruggiero and Michael Stanton and .Net Framework Project Manager Hari Sekhar built new version of the Microsoft Web site search engine based on .NET technology. To date, only some of the external developers who have participated in a special session in the Professional Developer Seminar in Olando in July this year in Olando. Now I can finally be public.

If you regularly access the "Inside News" site, you will know that the Microsoft Web group has launched a new improved version of its Search engine before the "Professional Developer Submark" held in July 2000. You already know that the version introduces advanced synonym matches, which can return the most relevant to the expansion BEST BETS logic, and the intelligent cache for the most common search.

However, the inside request of this version is much more than something on the surface.

It will certainly be excited, because the rich version of the search version and the improved search results can significantly bring better search experience (see Search 2.5 technology inside). However, most people are not aware of that we simultaneously transplant the traditional ASP (Active Server Page Active Server page) on the scenes to the new Microsoft .NET framework.

This is the most advanced development of the search group. Because we have deepened into the future of the Internet service. And we want this. Let's talk about the middle edge.

Why is it ported to .NET? Obviously, we are entering the next stage of the Internet. We are crossing the web page in the usual sense, and we have a powerful web service. At this stage, it is extremely important to make resources and information. This way, we can use these resources and information as a service, rather than letting it stay in a messy data warehouse.

Scalable Markup Language (XML) is a means of implementing multi-data set transmission between super distributed systems. It can also make developers aggregate and combine data of various sources in a more valuable new way - so that users can benefit directly.

For Search, we designed the core functions of finding information on Microsoft.com for a variety of custom and localization Search versions. Our group faces challenges in how to make data access. Before .NET appears, we do not allow customers to design programs in our functional design without using DCOM (Distributed Component Object Model) on the secure port, or customers only have to install our various software versions in their servers. To access the code and COM.

We have studied the upcoming .NET technology and recognizes that all long-range issues can be solved by transplanting the code to the .NET framework. Moreover, there is an unexpected harvest, we can also achieve an ubiquitous connection between HTTP and SOAP. For most people, have someone in Microsoft or in a certain place in the world, using our web services to develop applications for completely different purposes internally, have nothing to do. We support both situations while we can also get technical benefits for free.

The latest Search version 2.5 is running on Site Server 3.0 and still uses COM from searching for results. Other aspects of the application are based on XML. As a means of publishing data (eg, Vocabulary and Best Bets) to web servers, we can easily expand our web space.

We simultaneously implement the most commonly used query and results of the customer request, which is implemented by reserving these queries and results on the web server and thus enhanced scalability, further improving performance. Since our core architecture is based on XML, transplantation to a model that will use .NET Framework Web services is really simple, and these .NET Framework web services are based on new ASP technology (ASP technology Active Server Method (ASMX) page). The SEARCH architecture consists of three components: Word Parsing and VocabularyBest Betssearch ResultsSearch architecture is the same as based on ASP-based versions (see Figure 1). Let us understand each component in depth.

Figure 1. After the user submits the query, (1) Put the query first to the parser (PARSER) to analyze and vocabulary analysis, (2) Pass the display terminology of the project to Best Bets, (3) ) The preferred term and the remaining item of the found item are passed to Search Results, (4) Compile the XML document generated using the XSL style sheet, (5) Submit the user's web browser. HTML. Click to zoom in.

Word Parsing and vocabulary _ This is a Windows script component containing a C COM object that exposes various terms segmentation procedures for all languages ​​supported in the SEARCH. This design is essentially because the interface of the entry segmentation program is not easy to write a feet, and typically requires a package of C can be buffed (although this is a way: will explain this later). In the process of transplanting the .NET framework, we use the type list exporter (TLBIMP.EXE) on the C object, and call it through the .NET, so you can call existing COM objects. .

Vocabulary Object Run XPath (Query the language of the XML document) query to map the search terms to the preferred term. It also removes interference entry and produces a formatted data structure, suitable for consumption of Best Bets and Search Results components. An important result is that this fairly complex small script is ported to C #, we can also continue to call traditional objects from it. Here is a small code example in Vocabulary Object:

// We return an array of VocabularyObjects after parsing the user's search // text This ability to create simple typed structures in C # vastly improves // our code modularity and self-documentation Here is the definition of // VocabularyObject:.. Public struct VocabularyObject {public string preferredterm; // structure memberspublic string displayterm; public bool found; public bool multiterm; public bool multiword;

// Constructorpublic VocabularyObject (string preferredterm, bool found, string origphrase, bool multiterm, bool multiword, string displayterm) {PREFERREDTERM = preferredterm; FOUND = found; ORIGPHRASE = origphrase; MULTITERM = multiterm; MULTIWORD = multiword; DISPLAYTERM = displayterm;}} // Example usage. Because the parameters to the objects constructor are // typed, we'll get a compiler error message if we passed an integer // where a string was expected, for example. This is a very nice feature // over Traditional Scripting Environments! VocabularyObject Vo ("Microsoft DirectX", True, "DX", False, False, "DirectX"); one of the NET environments is that you can create multi-data structures for the entire code. The last line above is the statement instance of how to use the code structure of these Vocabulary Object.

Best Bets _ This is a small script component that provides XPath queries for localized XML documents and can generate a URL link for addiction. The XML document is loaded in the application range of each Search application instance, and can work properly and can be tightly coupled to the method of the Vocabulary object. Transplanting small scripts are 100% conversion to the .NET framework, and can utilize System.IO and XML DataNavigator classes (System.newxml namespace).

This is the easiest portable component. It is almost the transition from JScript from JScript to C #. We only make some changes to the code in some places to use the new XML DataNavigator class - to query and update the .NET universal language running time part of the XML document.

Search Results - This complex component is connected to Site Server 3.0 to obtain an actual page description and link matching with the customer's search query. It also contains a perfect cache algorithm.

Building a parallel solution At the time we were experiencing, we also transplanted the entire Search application to the .NET framework for ASP technology while developing Search 2.5. Since this application is to be launched before the PDC date and transplant it to .NET, turn hours, we decided to launch these two versions at the same time, and listed at the same time. Obviously, this is a daunting task because we must manage new versions, understand all functions of the new .NET framework and new language metaphors, build servers with various software platform services, and so on.

About how we make this project, there is a fun story. To ensure simultaneous launch of two versions (Search 2.5 and .NET frameworks, we determine which components that first grasp in the project planning phase are unchanged, which components change the maximum and which components are suitable for which techniques and languages.

We also determine your goals early, try to break down this app and port it in accordance with the way customers may adopt. Because our Microsoft.com people always treat customers in conducting various problems in conducting technical decisions and research investment, we break this application transplantation process into many parts, each part is as possible with customers. The method taken approaches. We want to ensure that every job, including the easiest portions (ie, small scripts to the JScript class) until the maximum time and technical interests - full use of C # programming languages ​​to .NET Framework (100% Manageable Code space). Below is some of our steps we take to respond to this challenge: First, we convert the main ASP pages to ASP . Initially, we pass the .NET Reflection technology calls a small script so that we can call typical COM objects through the query type library at runtime. Important knowledge: We depart from programming models with ASP (where data, business logic, and representation are all mixed), then use ASP to fully object-to-object methods, and finally data separation, programming, and UI. Second, handle the simplest small script and transplant it. Bestbets are the simplest components and does not depend on COM components. We decided to use System.IO, XML Data Navigator and C # programming languages ​​to transplant this component as a DLL. We want to completely transplant this component to the controlled environment and make it fully utilize XML Data Navigator. Important knowledge: We understand the newxml namespace. At the same time, we remove the .NET Reflection when grafting components. This way we can call these components locally. Then we handle the Vocabulary small script in the same way. This component is in the middle of this application in terms of complexity and code line. It consists of a small script that contains business and text resolution rules for Search, and calls C components. We created the purpose of this component is to package COM's boot calls for broken-word. This component has the greatest advantage in movement to the controlled space. This complex component is all transplanted to the .NET framework and C # programming language. This requires some techniques, because it contains more complex function logic and needs to utilize a custom COM object. But this is not too difficult. The next step will abandon C packaging and call them directly. Important: We have changed the function and logic to benefit from the key advantage of C # such as the type of security. When using JScript, developers must remember the type (integer, string) of each variable. C # will do this for you. All variables are determined when declaring, and C # will check your work to ensure that there is no crossing. This helps very much when processing complex code. Note: In the next version of JScript, the programmer will be able to select the type of the variable. Transplant final component: SearchResults. Initially, we call this component through .NET Reflection and the situation is good. Since this code is too big and quite complicated, the work of transplanting the code has continued until now before we launch Search 2.5 versions. It can't find it in the .NET beta, but the work has made significant progress. This version will be released later in October.

In short, this architecture is a masterpiece. We have some real C # .NET components, we have all ASMX pages. Moreover, we demonstrate that you can call custom COM objects via an interope, and call your small script through .NET Reflection. Traditional objects (such as SearchResults) can consume data structures created by C # objects (for example, vocabulary), which is very good. It is worth mentioning before you review the .Net Search beta, this architecture is no user interface. What you see is the default of a web service. We have originally adding a UI, but we keep it now, I want you to see it.

转载请注明原文地址:https://www.9cbs.com/read-105753.html

New Post(0)