Search engine, the most important thing is to search speed, like Baidu, Google and other large search engines, how can I get the desired result in 10000G data on 10000G data. ============================================================================================================================================================================================================= == Ice Snow Personal Works Homepage "www.yooice.com" Copyright 2001-2004 Yooice.com All Rights Reserved Copyright Notice: This site is the original creation of ice and snow in addition to the collection column, and do not reprint it without himself. . www.yooice.com/11k 2004-9-30 - More results on Baidu snapshot www.yooice.com
Www.kombispace.com [Kombi Personal Works Website] ...,,, CD ......... www.kombispace.com [kombi Personal Works Website] www.kombispace.com [kombi Personal Works Website]. .. Vol. 04 [2004/06/14] Note: Except, the copyright published by this site is copyright [kombi]. Not allowable to reprint without permission ... www.kombispace.com/ 20K 2004-8- 3 - More results on Baidu snapshot www.kombispace.com
Dapi ... high school classmates 20 do not confuse (novel serial) Zhongxiang red (love novel) Beijing Military set (essay) small peach wild collection (essay) small peach hidden collection calligraphy art annual original calligraphy joke calligraphy Works Appreciation New Works Appreciation VI Image Sign Design Works Website ... www.dayali.com/ 22k 2004-9-30 - Baidu Snapshot
============================================================================================================================================================================================================= ============ Here the average record has about 300 characters per record. 10 words recorded, 3000 characters are also nearly 2k more characters.
When we check the "computer" keyword, the number of records appeared is: 6,490,000 articles, let's count, how many bytes will be 6490000 * 2K = 12g. In fact, it is not very big. This looks great. But I have to consider the actual situation, the actual situation and reality are different. 1. Each time we can only request a web page, that is, every actual data traffic is only 2K, which does not affect our operation for our 2K data.
2. Our visitors read more than 1/100000, which may only have 10 clicks in 100. It may be no longer point after 1000. That is to say, in fact, we don't have to do this 1000 page or 100 pages. The biggest page of Baidu is 76, there is no 100 pages at all. That is, the amount of characters added to add only 76 * 2k = 152k This is very small.
3. Let's take a look at the data structure recorded by the following. --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Do not confuse (novel serial) Zhongxiang red (love novel) Beijing hiry collection (文) 小 桃 集 (文) 小 藏 头 集 Calligraphy art annual table Original calligraphy Calligraphy Creation Appreciation Creative Works Appreciation New Works Appreciation VI Image Sign Design Works website ... www.dayali.com/ 22k 2004-9-30 - Baidu snapshot ------------------------------------------------------------------------------------------------------------------------------------ --------------- Here there is (title) (content) (How much K) (Time) (Title Super Link) (Baidu Express Super Link) There are only 8 items. This means that in the database, we can only have this 8 items, but sometimes you may need to implement some administrative and special features. But it will not exceed a lot. 4. When we send a request to the service, only one page is given to us, and the speed of one page is about to be displayed for about 5 seconds, and of course, the interpretation of IE is included. These are acceptable as long as one data is not more than 10 seconds. When we point other pages, we are about 10 seconds in two pages, so that we can do a lot in this 10 second server.
5. Every time we see the page is added 5 pages, this means that when you look at the first page, it is impossible to be in the next point 11, because only 10 pages are displayed. When the server can find the first time, he doesn't have to find the 11th time, just to find the front.
Through the above description, let's analyze the technical and methods of possible adoption:
We submit the page to the server when playing a keyword "human and nature". The result obtained is as follows. ============================================================================================================================================================================================================= ================== The coordinated development of human and nature [Note] Chapter 2 Human and Nature Coordinated Development [Font: Small Dad] Chapter Chapter Human the second chapter of the harmonious development of man and nature harmonious development of nature: teaching and research Anonymous source: site author Hits: update time: 2004-6-15 Chapter humans and nature ... www.bdjks.cn/xkjy /chuzhong/shengwu/200406/918.html 51K 2004-6-15 - More results on Baidu snapshot www.bdjks.cn
Thinking about the relationship between humans and nature Select your favorite background color: Thinking Human and Nature Relationship again, February 06, 2004 10:14 Shenzhen News Network broke out in November 2003, thus resending nature Some awe, think about human and nature, achieving nature, human and spiritual harmony. As a human ... www.sznews.com/n1/ca759490.htm 36k 2004-2-6 - Baidu snapshot www.sznews.com more results
Human and natural harmonious human and natural harmony, wealth and civilization coexistence bamboo portraits have been "Tianqing Cang, wild, wind blowing and low-ranking cattle", now "trees are cracked, the river is broken, life is not". " The Elf of the Plateau - Tibetan antelope: You don't have to choose an ecological worry, bamboo manuals are in the big waves, not ... www.wjsgjzx.com.cn/person/shenqingqing/2.htm 14k 2002-9-23 - Baidu snapshot humans are closely related, please don't destroy him humanity and nature is close, please don't destroy him published by
Human and natural environment _ ecological travel _ very travel _ Yuanfei classroom _ Beijing ... travel what is eco-tourism ecological travel equipment knowledge Forest ecotourism reports international ecotourism market analyzes humans and natural environmental moisture The Lake Yuanyuan Long Human Human and Natural Environment Natural Environment is a material foundation for human survival, prosperity; protection and ... www.yuanfeiniao.com.cn/class/lvxing/st/st5.htm 10k 2004-2-27 - Baidu Snapshot
============================================================================================================================================================================================================= ======================== From above, the server is not much decomposed by the phrase, this decomposition may have the following cases.
1. The blurred lookup of the database itself. The structure of the data is under. ID: 1234TITLE: Human and nature are close, please don't destroy him CONTEXT: ... Travel what is eco-tourism ecological travel equipment knowledge forest ecological tourism, international ecological travel market analyzes humans and natural environmental moisture Lake source far-growing river and natural environmental natural environment is a material foundation for human survival, prisoning; protection and ... titleurl: www.boosit.cndate: 2004-8-27 The content you want to display There are these items.
We will find that we can check this content in any of these contents. Explancil that it is impossible to use the characters as a keyword, but can also use common phrases as keywords, such as "Computer" "Computer" "Mobile", etc. For this keyword, we can put it in the background He handles well, all put together. If you have a problem with the storage space, we can only make a table to record the ID number of the keyword and related content. After reading, according to the ID number, if you can put these keywords, if you can include the content, you will improve the query speed, someone will ask, Chinese phrase so much, this is not very slow, realist We are commonly used, not many words with high frequency, all the phrases that everyone have used, only a few thousand in 1 day, for the keyword query, we can use a separate query.
So what should we do if the word query is changed, especially for those decomposed words. In fact, we can replace the words of the destination, such as the "computer tutorial" I decomposed "Computer" and "Tutorial" We can replace "Computer", replace "computer tutorial" or prior replacement "computer" then replace "tutorial "For phrase decomposition, we use several phrase to test it: 1)" Auditing Computer Teaching Procedure "results have three keywords with" auditing "" computing "" program ". Without "computer" or "computer tutorial" or "tutorial". 2) The result of "Sky Computer Teaching Procedure" is found to have only "computer" "program" two keywords here. We may not understand this, how is the keyword decomposed. In this way, there are two ways to do, one is a phrase dictionary, I checked the information, and the commonly used Chinese character phrases are only more than 1,000. For example, we want to know that there are several phrases that "auditor teaching procedures" have become more than 1,000 phrases and "audit computers". If the included is proposed as a keyword. But for search engines, they can define themselves in the long-term lookup. Such as: I checked a "computer", "computer", etc., the search engine will record it in, and obtain the phrase that is highly used in the statistics of time as their own phrase.
============================================================================================================================================================================================================= =============================== 2. What should I do if the server gets the phrase to be checked? We can think that so many data servers have to check how long, this is my algorithm and the structure of the background database. For us to use the usual lookup in the 100G database, such as: select * from hello where title = "loveyou" can be more than dozens of minutes, it is possible to longer, how to do it. We think of the following methods: 1) I didn't test the test speed of the data. Here we use an estimate. We can put 100G data into 10 computers. The front desk uses a server as an middleware to perform a server to mobilize the query operation of the background and return data processing work. 2) The front stage server gets a "LOVE" keyword query gets the phrase keyword after the ratio is obtained. At this time, the query command will be issued to 10 services. The keywords of the query are "love" per database server. After getting the command to check, check "Love". 3) In the data, we will do some processing in order to reduce the size of query data, in addition to reducing the size of query data,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, Test), each table is naming the database server at a certain law. You can turn on 20 or more processes for the lookup, the first table only check 1-20, the second only check 21-41 table. The process is pushed backwards. The same is true of several other servers. 4) We can also record the ID of all the contents of a keyword, and only these ID numbers are saved in the table. For the operation of the query, the administrator is performed by the administrator, which is done in the background, may take a few days. But this time does not affect our speed, because for the results of the lookup, the server background program is done. 5) We also saw that BAIDU only 76 pages, we can completely generate all keywords in two days, three days, 6 days. That is to say, I put all the phrases in my font, generate new The equivalent to a static page waiting for you to check. For example, all the contents of the "machine" are found in time, may have a long time, this root we have no relationship, anyway is not our business. Then take out the 76 pages of the previous, which is 760 generations to generate a new table. Update once a week. 6) When we return data to IE, we will first return the first result from the database to the client first. It is not necessary to wait until all finish are finished. Obviously, for the content with the keyword, we only have 76 items. If there is no keyword, the search is not available at all, at all.