Query Optimization and Paging Algorithm Scheme for Massive Database

xiaoxiao2021-03-05 88

As the "Golden Shield Project" is gradually in-depth and high-speed development of public security information, the public security computer application system is widely used in various police species, departments. At the same time, the core of the application system system, the storage of system data - the database also sharply expands with practical applications, some large-scale systems, such as the data of the population system, even more than 10 million, which can be sealed. Then, how to achieve data (query), analysis, statistics, and extract data from these large-capacity databases, have become a problem that all local system administrators and database administrators need to solve. In the following articles, I will explore how to implement fast data extraction and data paging in the MS SQL Server database with 10 million data in the "Office Automation" system. The following code illustrates some of the data structure of the "Red Skull File" of our instance: create table [dbo]. [TGongwen] (- Tgongwen is a redhead file table name [GID] [INT] Identity (1, 1) NOT NULL, - The ID number of this table is also the primary key [Title] [varchar] (80) collate chinese_prc_ci_as null, the title of the red head file [Fariqi] [DateTime] NULL, - Release Date [neibuyonghu] [varchar] (70) Collate Chinese_PRC_CI_AS NULL, - Release User [Reader] [varchar] (900) collate Chinese_prc_ci_as null, - users who need to be browsed.

Each user is divided by separator, "separate) on [primary] textimage_on [primary] Go, we have added 10 million data to the database: Declare @i int set @ i = 1 while @i <= 250000 begin insert INTO TGONGWEN (Fariqi, Neibuyonghu, Reader, Title) Values ('2004-2-5 ",' Communication Branch ',' Communication Branch, Office, Director Liu, Secretary, Secrets, Admin, Criminal Investigation Detachment, Special Detachment, Patrol Detachment, Economic Investigation Detachment, Harvest Department, Security Detachment, Foreign Disease ',' This is the first 250,000 records') set @ i = @ i 1 end go declare @i int set @ i = 1 while @i <= 250000 Begin Insert Into Tgongwen (Fariqi, Neibuyonghu, Reader, Title) Values ('2004-9-16', 'Office ",' Office, Communication, Director, Secretary, Secretary, Admin, Criminal Investigation Detachment, special service detachment, traffic patrol detachment, investigation detachment, household department, foreign affairs', 'this is the middle 250,000 records') set @ i = @ i 1 end go declare @h int set @ h = 1 While @H <= 100 Begin Declare @i int set @ i = 2002 while @i <= 2003 begin declare @j int set @ J = 0 while @J <50 begin declare @K int set @ k = 0 while @ K <50 Begin Insert Into Tgongwen (Fariqi, Neibuyonghu, Reader, Title) Values (Cast (@i as varchar (4)) '- 8-15 3:' Cast (@J As Varchar (2)) ' : ' Cast (@J As Varchar (2)),' Communication Branch ',' Office, Communication Branch, Director Wang, Secretary Liu, Secretary, Administrative Trip, True Detachment, Traffic Patrol Detachment, Otensise Investigation Detachment, Household et al, foreign affairs', 'this is the last 500,000 records') set @ K = @ K 1 end set @ j = @ i 1 end set @ h = @ h 1 End Go Declare @i Int S ET @ i = 1 While @i <= 9000000 Begin Insert Into Tgongwen (Fariqi, Neibuyonghu, Reader, Title) Values ('2004-5-5 ",' Communication Branch ',' Communication Branch, Office, Director Liu Secretary, Administrative Tribe, Special Detachment, Traffic Patrol Detachment, Economic Investigation Detachment, Household Department, Security Detachment, Foreign Disease ',' This is the last 9 million records') set @ i = @ i 1000000 End Go passes the above statement, we have created 250,000 records released by Communication Division on February 5, 2004, 250,000 records released by the office on September 6, 2004,

In 2002 and 2003, 100 2500 identical dates, and 500,000 pieces of communication section of the communication department in 2003, there were 9 million records released by Communication Division on May 5, 2004. 10 million. I. Because of the affordability, establish "appropriate" index build "appropriate" index is the primary premise of realizing query optimization. Index (index) is another important, user-defined data structure stored on a physical medium. When searching data according to the value of the index code, the index provides quick access to the data. In fact, there is no index, the database can also be successfully retrieved according to the SELECT statement, but as the table gets more and more, the effect of using "appropriate" index is getting more and more obvious. Note that in this sentence, we use the word "appropriate" because if you don't care carefully consider its implementation process when using an index, the index can also improve the working performance of the database. (1) In-depth and simple understanding of the index structure, and you can understand the index as a special directory. Microsoft's SQL Server provides two indexes: cluster index (also known as cluster index, cluster index) and non-aggregated index (Nonclustered Index, also known as non-cluster index, non-cluster index). Below, we will explain the difference between the aggregation index and the non-gathering index: In fact, our Chinese dictionary is itself a gathering index. For example, we have to check the word "An", which will naturally open the first few pages of the dictionary, because "An" pinyin is "An", and the dictionary sorted by the pinyin is the beginning of English letters "A" At the end of "Z", then "An" is naturally ranked in the front of the dictionary. If you finish all the words starting with "A", you still can't find this word, then you don't have this word in your dictionary; the same, if you check "Zhang", then you will turn your dictionary The last part, because the pinyin of "Zhang" is "zhang". In other words, the text of the dictionary itself is a directory, you don't need to check other directories to find the content you need to find. We put this body content itself is a directory that is arranged in accordance with a certain rule is called "aggregation index". If you know a word, you can quickly check this word from automatically. But you may also meet the words you don't know, don't know its pronunciation, this time you can't find the word you want to check according to the just possible, but you need to find it according to the "deflection". The word, then turn it directly to a page according to the page number after this word to find the words you are looking for. However, you combine the ordering of the words found by "Department First Catalog" and "Inspection Form", it is not a real body sorting method. For example, if you check the word "Zhang", we can see the inspection after the checkout The page number "Zhang" is 672, "Zhang" in the inspection table is "Chi" word, but the page number is 63 pages, "Zhang" is "" word, the page is 390 pages. Obviously, these words are not true respectively in the "Zhang" word. Now you have seen the continuous "Chi, Zhang, Zhang" three words actually in the non-aggregated index, is the text of the dictionary Mapping in the non-aggregated index. We can find what you need in this way, but it takes two processes, first find the results in the directory, and turn it back to the page you need. We put this directory purely the directory, and the body is purely the sorting method of the body is called "non-aggregated index". Through the above example, we can understand what is "aggregated index" and "non-aggregated index". Further introduction, we can understand it: Each table can only have a gathering index because the directory can only be sorted in one way.

(2) When the table below uses the aggregation index or the non-aggregated index summarizes when to use the aggregation index or non-aggregation index (very important). Action Description Use the aggregation index to use the non-aggregated index column to be sorted by the packet should return to the data within a range should not be one or very few different values. If the different value should not be small, the different values should not be large. Columns that should not be updated frequently should not have foreign key collections should be used frequently to modify the index. It should not be in fact, we can understand the previous table with an example of the definition of indexes and non-aggregated indexes. Such as: Returns one of a range of data. For example, your table has a time column, just built the aggregated index in this column, then you will check all the data between January 1, 2004 to October 1, 2004, this speed will It is very fast, because your dictionary text is sorted by date, the cluster index only needs to find the beginning and end data to retrieve in all the data to retrieve; not like non-gathering indexes, you must check it first. You find the page number corresponding to each data in the directory, then find the specific content according to the page number. (3) Combining the actual situation, talking about the purpose of the theory of the identity of the index. Although we have just listed when the aggregation index or non-aggregation index should be used, the above rules in practice is easily ignored or cannot be comprehensively analyzed according to the actual situation. Below we will talk about the misunderstandings used in the practice in practice, so that everyone can grasp the method of indexing. 1. The primary key is that the aggregation index is that the idea is extremely wrong and is a waste of the aggregation index. Although SQL Server defaults to establish a gathering index on the primary key. Typically, we will establish an ID column in each table to distinguish each data, and this ID column is automatically increased, and the step size is typically 1. The column GID in our instance of our office is. At this point, if we set this to the primary key, SQL Server will give this column to the aggregation index. This is good, that is, you can make your data physical sorting in the database, but the author thinks that this is not big. Obviously, the advantages of aggregation indexes are obvious, and only one rule of aggregation index in each table, which makes aggregation indexes more precious. From the definition of aggregated indexes we can see, we can see that the maximum benefit of using aggregation index is to quickly reduce query scope according to query requirements, avoid full table scans. In practical applications, because the ID number is automatically generated, we don't know the ID number of each record, so we are difficult to use the ID number in practice. This makes the ID number as a gathering index into a resource waste. Second, let each of the ID numbers are different as aggregation indexes, which do not comply with "large numbers of different values should not establish the aggregate index" rules; of course, this is only for users to modify the record content, especially index items. It will be negative when it is, but there is no impact on the query speed. In an office automation system, whether the system's home page is displayed, the meeting or the user's file query is in any case, and the data query is inseparable from the field. The "date" also has the "user name" of the user itself. . Typically, the home page of office automation displays files or meetings that each user has not yet signed. Although our WHERE statement can only limit the current user has not yet signed, if your system has established a long time, and the amount of data is large, then each user opens a full table scan when each user opens the home page. This is not big, and the vast majority of users have been viewed in 1 month ago, so they can only share the overhead of the database. In fact, we can completely allow users to open the system home page, the database only queries the file that the user has not read in the past 3 months, limits the table scan, and improves the query speed via the "Date" field.

If your office automation system has been established for 2 years, then your homepage displays will be 8 times the original speed, even faster. Here is the "theoretical" three words because if your aggregation index is blindly built on this primary key, your query speed is not so high, even if you are on the "date" field Established index (non-aggregated index). Let's take a look at the speed performance of various queries (250,000 data within 3 months): (1) Create a gathering index only on the primary key, and does not divide the time period : Select GID, FARIQI, NEBUYONGHU, TIM TGONGWEN Time: 128470 ms (ie: 128 seconds) (2) Create a gathering index on the primary key, establish a non-aggregated index on Fariq: SELECT GID, FARIQI, NEIBUYONGHU, TIM TGONGWEN WHERE Fariqi> Dateadd (DAY, -90, getDate ()) When used: 53763 milliseconds (54 seconds) (3) Built the aggregate index on the date column (Fariqi): SELECT GID, FARIQI, NEIBUYONGHU, TIM TGONGWEN WHERE FARIQI> DateAdd (Day, -90, getdate ()) Time: 2423 milliseconds (2 seconds) Although each statement is extracted from 250,000 data, the differences in the various situations are huge, especially to set up indexing in the date Differences in time. In fact, if your database really has 10 million volumes, the primary key is established on the ID column, just like the first, 2 cases above, the performance on the web page is timeout, and it cannot be displayed at all. This is also a most important factor I have abandoned the ID column as the aggregation index. The method of obtaining the above speed is: before each SELECT statement: declare @d DateTime set @ d = getdate () and after the SELECT statement: SELECT [statement execution time (milliseconds)] = datediff (ms, @ D GetDate ()) 2, as long as the index can significantly improve the query speed, we can discover the above example, the second, third statements are identical, and the fields of establishing an index are also the same; different only the former The Fariqi field is built is a non-aggregated index. The latter is built on this field is the aggregated index, but the query speed has a heaters. Therefore, it is not necessary to simply establish an index on any field to improve the query speed. From the statement of the table, we can see that this Fariqi field has 5003 different records in a table with 10 million data. The establishment of a aggregation index on this field is not yet. In reality, we will send a few files every day. The issuance date of these files is the same, which is fully compliant with the establishment of the aggregated index requirements: "There is neither the same, but not only the same rule." From this point of view, we build "appropriate" aggregate indexes are very important for us to improve query. 3, put all the fields that need to improve the query speed to increase the aggregation index to improve the query speed has been discussed: If you are inseparable from the data query, it is "Date" and the "user name" of the user itself. Since these two fields are important, we can consolidate them and build a composite index.

Many people think that just add any fields to the aggregation index, it can improve the query speed, and some people feel confused: If the composite gathering index character is separated, then the query speed will slow down? With this question, let's take a look at the following query speed (the result set is 250,000 data): (Date Rouger Fariqi first places the starting column of composite aggregation index, user name Neibuyonghu ranked) (1) Select GID, FARIQI, NEBUYONGHU, TIM TGONGWEN WHERE FARIQI> 2004-5-5 'query speed: 2513 milliseconds (2) Select GID, Fariqi, Neibuyonghu, Title from Tgongwen Where Fariqi> 2004-5-5' and neibuyonghu = 'Office' query speed: 2516 milliseconds (3) Select GID, Fariqi, Neibuyonghu, title from tgongwen where neibuyonghu = 'Office' query speed: 60280 ms from above, we can see that only the start of the index only Columns are almost the same as query conditions and simultaneous query speeds of the composite aggregation index, even slightly faster than all composite indexes used (in the case where the number of query results set); The non-starting column of the composite aggregation index is a matter of query conditions. Of course, the query speed of statements 1, 2 is the same as the number of items of the query, if all columns of the composite index are used, and the query results are less, so that "index coverage" can be formed, and the performance can achieve optimal . At the same time, please remember: Whether you often use other columns of aggregated indexes, the front lead list must use the most frequent columns. (4) Summary of indexing in other books 1 From tgongwen where fariqi = '2004-9-16' Time: 3326 milliseconds Select Gid, Fariqi, Neibuyonghu, Reader, Title from TGongwen where gid <= 250000 Time: 4470 milliseconds here, using aggregate indexes is not aggregated The primary key is nearly 1/4. 2, use aggregated indexes as a general primary key as the Order by speed, especially in small data, SELECT GID, FARIQI, NEBUYONGHU, Reader, Title from Tgongwen Order by Fariqi: 12936 Select GID, Fariqi, Neibuyonghu, Reader, Title from TGongwen Order by GID Time: 18843 Here, with the aggregated index as the general key key as the Order By, the speed is 3/10. In fact, if the amount of data is small, use the aggregation index as a silend, which is more significant than using the non-aggregated index speed; and if the amount of data is large, if more than 100,000, the speed difference is not obvious. .

3. Use the time period within the aggregated index, the search time will reduce the percentage of the entire data table, regardless of how many Select GIDs, Fariqi, Neibuyonghu, Reader, Title from Tgongwen Where Fariqi> 2004, regardless of the aggregate index -1-1 'Time: 6343 milliseconds (extraction of 1 million) Select GID, Fariqi, Neibuyonghu, Reader, Title from Tgongwen WHERE FARIQI>' 2004-6-6 'Time: 3170 milliseconds (500,000) Select GID, Fariqi, Neibuyonghu, Reader, Title from Tgongwen Where Fariqi = '2004-9-16' Time: 3326 milliseconds (same as the result of the sentence. If the number of collected is the same, the same is equal to the number and equal to the number) Select GID , Fariqi, Neibuyonghu, Reader, Title from TGongwen Where Fariqi> 2004-1-1 'and Fariqi <2004-6-6' Time: 3280 milliseconds 4, the date column will not slow down due to a second input In the example below, a total of 1 million data, 500,000 data after January 1, 2004, but only two different dates, the date is accurate to the day; there are 500,000 data, 5000 different Date, the date is accurate to second. Select Gid, Fariqi, Neibuyonghu, Reader, Title from Tgongwen WHERE Fariqi> '2004-1-1' Order by Fariqi Time: 6390 March Select GID, Fariqi, Neibuyonghu, Reader, Title from Tgongwen Where Fariqi <'2004-1-1 'Order by Fariqi Time: 6453 milliseconds (5) Other considerations "water can be used as a boat," can also be checked, "index is the same. Index helps to improve retrieval performance, but too much or improper index will also cause system to inefficient. Because the user adds an index in the table, the database will do more work. Excessive indexing will even result in index fragmentation. So, we have to build a "appropriate" index system, especially for the creation of aggregated indexes, more refined to make your database can get high performance. Of course, in practice, as a due diligent database administrator, you have to test some programs, find which program is the highest, most effective. Second, how many people in the SQL statement do not know how the SQL statement is executed in SQL Server, and they are worried that the SQL statement written will be misunderstood by SQL Server.

For example: select * from table1 where name = 'zhangsan' and Tid> 10000 and execution: select * from table1 where tid> 10000 and name = 'zhangsan' Some people don't know if the above-mentioned statement is the same, because if it is simple From the statement, the two statements are indeed different. If the TID is a aggregated index, then the latter is only looking for 10,000 records after the table; and the previous sentence must be in full table Find a few name = 'zhangsan', and then subsequent query results according to the restriction condition condition TID> 10000. In fact, such concerns are unnecessary. There is a "query analysis optimizer" in SQL Server, which can calculate the search criteria in the WHERE clause and determine which index can reduce the search space of the table scan, that is, it can be automatically optimized. Although the query optimizer can be optimized according to the WHERE clause, you still need to know how to "Query Optimizer" work. If it is not like this, sometimes Query the optimizer will not follow your original intention. In the query analysis phase, query the optimizer view each stage of the query and determine whether the amount of data that needs to be scanned is useful. If a phase can be used as a scan parameter (SARG), it is called optimized, and the required data can be quickly obtained using indexes. Definition of SARG: Used to restrict a search for a search because it is usually referring to a specific match, a matching match or two or more-over-connection. The form is as follows: column name operator

The operator column name column name can appear on one side of the operator, and constant or variable appears on the other side of the operator. Such as: name = 'Zhang San' Price> 5000 5000

5000 If an expression cannot meet the form of SARG, it will not limit the search range, that is, SQL Server must determine whether it meets all the conditions in the WHERE clause. So a index is useless for expressions that do not satisfy the form of SARG. After introducing SARG, we summarize different experiences using SARG and the conclusions that are encountered in practice: 1. whether the LIKE statement belongs to SARG depends on the type of wildcard used, such as: Name Like 'Zhang% ', This belongs to SARG and: Name Like'% Zhang 'is not SARG. The reason is that wildcard% is open in the string so that the index cannot be used. 2, OR will cause full mete scanning name = 'Zhang San' and price> 5000 symbol SARG, and: name = 'Zhang San' OR Price> 5000 does not comply with SARG. Using OR will cause full meter scans. 3, non-operator, function caused by the unmet SARG form does not satisfy the SARG form of statements, the most typical case, including non-operative statements, such as: not,! =, <>,! <,!>, Not EXISTS, NOT IN, NOT LIKE, etc., there are also functions. Below is a few examples of SARG form: ABS (price) <5000 name like '% three' Some expressions, such as: Where price * 2> 5000 SQL Server will also think that SARG, SQL Server will transform this To: where price> 2500/2 but we do not recommend this, because sometimes SQL Server does not guarantee that this conversion is completely equivalent to the original expression. 4, in the role of IN is quite with or statement: select * from table1 where tid in (2, 3) and select * from table1 where tid = 2 or tid = 3 is the same, will cause full table scans if there is index on TID The index will also be invalid. 5, as little as possible, using NOT 6, Exists, and IN is the same as much as possible, exists is shown that Exists is higher than in the execution efficiency, while using NOT EXISTS to replace Not in. But in fact, I tried it, found that both the front belt without NOT, the execution efficiency between the two is the same. Because of the subquery, we tried this time the PUBS database comes with SQL Server. We can open the SQL Server's Statistics I / O status before running. (1) SELECT TIM, Price from Titles where title_id in (select title_id from sales where qty> 30) The execution result of this sentence is: Table 'Sales'. Scan count 18, logically read 56 times, physically read 0 times, read 0 times. Table 'Titles'. Scan count 1, logic read 2 times, physically read 0 times, read reading 0 times. (2) Select Title, Price from Titles WHERE EXISTS (Select * from sales where sales.title_id = titles.title_id and qty> 30) The execution result of the second sentence is: Table 'Sales'. Scan count 18, logically read 56 times, physically read 0 times, read 0 times. Table 'Titles'.

Scan count 1, logic read 2 times, physically read 0 times, read reading 0 times. We can see this with exists and use in in the use of EXISTS. 7, using the function charIndex () and the front add-in-1 LIKE execution efficiency, we talk about, if you add wildcards in front of Like, it will cause full menu scans, so its execution efficiency is low. But some information said that using the function charIndex () instead of the LIKE speed will have a big improvement, after I trial, I found this description is also wrong: SELECT GID, TITLE, FARIQI, Reader from TGongwen Where CHARINDEX ('Criminal Investigation Detachment ', reader> 0 and fariqi>' 2004-5-5 'Time: 7 seconds, in addition: Scan count 4, logic read 7155 times, physically read 0 times, read reading 0 times. Select GID, TION WHERE Reader Like '%' 'Criminal Investigation Detachs' '%' And Fariqi> '2004-5-5' Time: 7 seconds, the other: Scan count 4, logic read 7155 times , Physically read 0 times, read reading 0 times. 8, Union does not have the best efficiency of the OR. We have already talked about the use of OR in the WHERE clause. Generally, the information I have seen is recommended here to replace OR. It turns out that this statement is suitable for most. Select GID, Fariqi, Neibuyonghu, Reader, Title from TGongwen Where Fariqi = '2004-9-16' or GID> 9990000 Time: 68 seconds. Scan count 1, logic read 404008 times, physically read 283 times, read reading 392,163 times. Select GID, READER, TIM TGONGWEN WHERE FARIQI = '2004-9-16' Union Select Gid, Fariqi, Neibuyonghu, Reader, Title from TGongwen Where Gid> 9990000 Time: 9 seconds. Scan count 8, logic read 67489 times, physically read 216 times, read 7499 times. It seems that using Union is usually more efficient than using OR. But after the test, the author found that if the query column on both sides is the same, then union is inverted and the execution speed of the OR is a lot, although the Union scan is an index, and the OR scan is a full table. Select GID, Fariqi, Neibuyonghu, Reader, Title from Tgongwen Where Fariqi = '2004-9-16' or Fariqi = '2004-2-5' Time: 6423 ms. Scan count 2, logic read 14726 times, physically read once, read 7176 times.

Select GID, READER, TIM TGONGWEN WHERE FARIQI = '2004-9-16' Union Select Gid, Fariqi, Neibuyonghu, Reader, Title from TGongwen WHERE FARIQI = '2004-2-5' Used: 11640 ms. Scan count 8, logic read 14806 times, physically read 108 times, read reading 1144 times. 9, the field extraction should be "Some, how much is much, how much", to avoid "select *" we are doing a test: SELECT TOP 1000 GID, Fariqi, Reader, Title from TGongwen Order by Gid DESC Time: 4673 ms SELECT TOP 10000 GID, FARIQI, TIM TGONGWEN ORDER BY GID DESC Time: 1376 ms SELECT TOP 10000 GID, Fariqi from TGONGWEN ORDER BY GID DESC Time: 80 milliseconds This seems to be, we extracted a field, the data extraction speed will have Corresponding increase. The upgrade speed also depends on the size of the field you discard. 10, count (*) is not more than count (field): Use * will count all columns, obviously less efficient than one world. This statement is actually unfounded. Let's look at: SELECT Count (*) When used: 1500 milliseconds Select Count (GID) from TGongwen Time: 1483 ms SELECT COUNT (FARIQI) from TGongwen Time: 3140 ms Select Count (title) from tgongwen Time: 52050 millisecond from above It can be seen that if you use count (*) and the speed of COUNT (primary key), count (*) is faster than any other field other than the primary key, and the longer the field, the summary speed is The slower it slowly. I think, if you use count (*), SQL Server may automatically find the minimum field to summarize. Of course, if you write a COUNT (primary key), it will come more directly. 11. ORDER BY Press the collected 引引排排排排我们看我们: (GID is the primary key, Fariqi is a polymeric index ranked) Select Top 10000 GID, Fariqi, Reader, Title from TGongwen Time: 196 ms. Scan count 1, logic read 289 times, physically read 1 time, read reading 1527 times. Select Top 10000 GID, Fariqi, Reader, Title from Tgongwen Order by GID ASC Time: 4720 ms. Scan count 1, logic read 41956 times, physically read 0 times, pre-read 1287 times. Select Top 10000 GID, Fariqi, Reader, Title from TGongwen ORDER BY GID DESC Time: 4736 ms. Scan count 1, logic read 55,350 times, physically read 10 times, read 775 times. Select Top 10000 GID, Fariqi, Reader, Title from Tgongwen Order by Fariqi ASC Time: 173 ms.

Scan count 1, logic read 290 times, physically read 0 times, read reading 0 times. Select Top 10000 GID, Fariqi, Reader, Title from TGongwen ORDER BY FARIQI DESC Time: 156 ms. Scan count 1, logic read 289 times, physically read 0 times, read reading 0 times. From the above we can see that the speed of no sorting and the number of logical readings are quite, but these are much more queried than "ORDER BY non-aggregated index". of. At the same time, when sorting in a field, whether it is a normal or reverse, the speed is basically quite. 12. Efficient TOP in fact, when querying and extracting a large-capacity data set, the maximum factor affecting the database response time is not a data lookup, but a physical I / 0 operation. Such as: SELECT TOP 10 * from (SELECT TOP 1000 GID, FARIQI, TIM TGONGWEN WHERE NEIBUYONHU = 'Office' Order By Gid Desc) AS A ORDER BY GID ASC This statement, in theory, the execution time of the whole statement Should be longer than the subsection, but the fact is the opposite. Because the clause is executed, the 10000 records are returned, and the whole statement returns only 10 statements, so the factor affecting the database response time is a physical I / O operation. One of the most effective ways to limit physical I / O operations here is to use TOP keywords. Top Keywords are the words used in SQL Server to extract the first few or the first few percentage data. The application of the script in practice found that TOP is really easy, and the efficiency is also high. But this word is not in another large database Oracle, this can't be said to be a regret, although in Oracle, other methods (such as rownumber) can be used in Oracle. In the discussion of the "Paging Display Storage Process of Ten Thousands of Data), we will use TOP. To this end, we discussed how to quickly query the data methods you need from a large-capacity database. Of course, these methods we introduce are "soft" methods. In practice, we have to consider all kinds of "hard" factors, such as network performance, server performance, performance system performance, even network cards, switches, etc. Third, universal paging display stored procedures for small data volume and massive data, and the page browsing feature is essential. This problem is a very common problem in database processing. The classic data paging method is: ADO record set paging method, that is, using ADO's own paging function (using a cursor) to implement paging. However, this paging method is only suitable for smaller data, because the cursor itself has a disadvantage: the cursor is stored in memory, which is very consumed. The cursor is built, and the relevant record is locked until the cursor is canceled. The cursor provides a means of scanning a row of row by line, generally uses a cursor to cross data, and perform different operations depending on the different data conditions. The cycle of the cursor (large data set) defined in multi-table and big tables is easy to enter a long wait or even crash. More importantly, for a very large data model, the page retrieves, if the method of loading the entire data source is very wasteful in accordance with the traditional method of loading each time, it is very wasteful. Nowadays, the popular paging method is typically the data of the block area of the page size, rather than retrieving all the data, and then performing the current row. The first way to extract data based on page size and page number is probably "Russian Storage Process".

This stored procedure uses a cursor, because this method does not have a universal recognition of everyone because of the limitations of the cursor. Later, some online stored procedures, the following stored procedures are the paging stored procedures written in conjunction with our office automation: Create Procedure Paging1 (@PageSize Int, - page size, such as per page 2 Record @PageIndex Int - Current page number) AS set NoCount on begin declare @indextable table (id ID IDETABLE TABLE (ID INT IDENTITY TABLE (ID INT IDENTITY (1), NID INT) - Defining Table Variable Declare @pageLowerBound Int - Defines this page Declare @PageUpperBound Int - top of page definition of this code set @PageLowerBound = (@ pageindex-1) * @ pagesize set @ PageUpperBound = @ PageLowerBound @ pagesize set rowcount @PageUpperBound insert into @indextable (nid) select gid from TGongwen where fariqi> dateadd (day, - 365, getdate ()) Order by Fariqi Desc SELECT O.GID, O.MID, O.TITLE, O.FADANWEI, O.FARIQI from TGongwen O, @ indextable t where o.gid = T.NID and T.ID> @PageLowerBound and T.ID <= @PageUpperbound ORDER BY T.ID End Nocount OFF The above stored procedure The latest technology of SQL Server-Table variables. It should be said that this stored procedure is also a very good paging stored procedure. Of course, in this process, you can write the table variables into temporary tables: CREATE TABLE #Temp. But it is obvious that in SQL Server, the temporary table is not used in the table variable. Therefore, the author just started using this stored procedure, it feels very good, the speed is better than the original ADO. But later, I found a better way than this method. The author saw a small short message "Removes the method of recording the records from the Nth to Mth" from the data sheet. The full text: Remove the Nth to RMB from the Publish Table: SELECT TOP M -n 1 * from Publish Where (ID NOT IN (SELECT TOP N-1 ID from Publish) ID The keyword for the Publish table I saw this article, it is really a spirit, I feel The idea is very good. Wait until later, I suddenly remembered this article when I was working on an office automation system (ASP.NET C # SQL Server). I would like to transform this statement, which may be a very good paging store.

So I was looking for this article online. I didn't expect that the article has not found it yet, but I found a paging store process according to this statement. This stored procedure is also a popular paging store process. I regret it. Did not change this paragraph into stored procedures: Create Procedure Paging2 (@SQL NVARCHAR (4000), - SQL statement @Page Int, - page @RecsperPage Int, - Number of records per page @ID varchar (255), - Need to be sorted non-repetitive ID @Sort varchar (255) - Sort field and rules) As declare @Str nvarchar (4000) set @ str = 'SELECT TOP' CAST Recsperpage as varchar (20)) '* from (' @ SQL ') T where t.' @id 'Not in ((@ recsperpage * (@ Page-1)) as varchar 20)) '' @ ID 'from (' @ SQL ') T9 Order By' @ Sort ') Order by' @ Sort Print @str EXEC SP_EXECUTESQL @Str GO In fact, the above statement can be simplified: SELECT TOP page size * from table1 where (ID NOT IN (SELECT TOP page size * pages ID) ORDER BY ID But this stored procedure has a fatal shortcoming, which is not in word. Although I can transform it as: SELECT TOP page size * from table1 where not exists (SELECT * "* from table1 order by id) b where b.id = a.id) Order BY ID, with NOT EXISTS instead of Not in, but we have already talked in front, the execution efficiency of the two is actually no difference. It's so, this method of combining with Not in in TOP is still more fast than using a cursor. Although NOT EXISTS cannot save the efficiency of the stored procedure, the TOP keyword in SQL Server is a very wise choice. Because the ultimate goal of paging optimization is to avoid excessive record sets, and we have also mentioned the advantage of TOP in front, and the amount of data on the data can be implemented by top. In paging algorithms, there are two provets that affect our query speed: Top and Not in. TOP can improve our query speed, while NOT I will slow down our query speed, so we must improve our entire pagination algorithm, so you have to completely transform NOT IN, replace it with other methods. We know, almost any field, we can all extract the maximum or minimum in a field via the max (field) or min, so if this field does not repeat, you can take advantage of these non-duplicated fields of MAX Or MIN is used as a watershed, which makes it a reference substance for the paging algorithm.

Here, we can use the operator ">" or "<" to complete this mission, so that the query statement is in line with the SARG form. Such as: SELECT TOP 10 * from Table1 WHERE ID> 200 This is the following page schedule: SELECT TOP page size * from table1 where id> (SELECT TOP (SELECT TOP ((Page-1) * page size) ID from the ID) AS T) AS T) The ORDER BY ID When the selection is not repeated, it is easy to distinguish between the size of the size, we usually select the primary key. The following table lists the tables in an office automation system with 10 million data, in GID (GID is the primary key, but not aggregation index.) Is ranked sequence, extract GID, Fariqi, Title field, respectively Take 10, 100, 500, 100 million, 100,000, 250,000, 500,000 pages as an example, test the above three paging schemes: (unit: milliseconds) Page Code 1 Program 2 Scheme 3 1 60 30 76 10 46 16 63 100 1076 720 130 500 17110 470 250 10,000 24796 4500 140 100,000 38326 42283 1553 250,000 28140 128720 2330 500,000 121686 127846 7168 From the above table, we can see that three stored procedures When performing the paging command below 100 pages, it can trust, the speed is very good. But the first solution is dropped by the speed of 1000 pages or above. The second solution is approximately the speed of the page 10,000 page begins to fall. The third program has never had a big destination, and the post-strength is still very good. After determining the third paging scheme, we can write a stored procedure accordingly. Everyone knows the SQL Server stored procedure is compiled in advance, and its execution efficiency is higher than the execution efficiency of SQL statements from the web page. The following stored procedures not only contain paging schemes, but also determine if the data is performed according to the parameters of the page.

- Get the data of the specified page CREATE Procedure Paging3 @tblname varchar (255), - Table Name @StrGetfields varchar (1000) = '*', - Need to return @fldname varchar (255) = '', Sort field name @PageSize int = 10, - page size @PageIndex int = 1, - Page @Docount bit = 0, - Return record total, non-0 value returns @ORDERTYPE bit = 0, - Setting Sort Type, non-0 value is descended @Strwhere varchar (1500) = '- Query criteria (Note: Do not add where) as declare @strsql varchar (5000) - Professional statement declare @strtmp varchar (110) - temporary variable Declare @strorder varchar (400) - Sort Type IF @docount! = 0 Begin if @Strwhere! = '' set @strsql = "Select Count (*) as Total from [" @tblname "] Where" @ Strwhere else set @strsql = "select count (*) as total from [" @tblname "]" END - The above code means that if @Docount passes is not 0, the total number of statistics is performed.

All of the following code is @Docount to 0 Else Begin if @ORDERTYPE! = 0 Begin set @STRTMP = "<(Select Min" set @strorder = "Order by [" @fldname "] desc" - If @ORDERTYPE is not 0, it will perform descending order, this is very important! Else begin set @strtmp = "> (Select max" set @strorder = "Order by [" @fldname "] ASC" end if @pageIndex = 1 begin if @Strwhere! = '' Set @STRSQL = "SELECT TOP" STR (@Pagesize) "" @ strGetfields "from [" @tblname "] where" @strwhere "" @strorderder Else Set @strsql = "SELECT TOP" STR (@Pagesize) "" @ strGetfields "from [" @TBLNAME "]" @strorder - If it is the first page, the above code is performed, this will speed up Perform speed END ELSE BEGIN - The following code gives @STRSQL to really executed SQL code set @strsql = "select top" str (@pagesize) " @ strGetfields " from [" @tblname "] Where [" @fldname "] " @STRTMP " ([" @fldname "]) from (SELECT TOP " STR ((@ PageIndex-1) * @ PageSize) " [" @fldname "] from [" @TBLNAME "]" @strorder ") as tbltmp)" @strorder if @Strwhere! = '' set @strsql = "select top" str (@Pagesize) " " @ strgetfields "from [" @TBLNAME "] Where [" @fldname "]"

@strtmp "([" @fldname "]) from (SELECT TOP" STR ((@ PageIndex-1) * @ PageSize) "[" @fldname "] from [" @tblname "] where" @Strwhere "" @strorder ") AS TBLTMP) and" @Strwhere " @strorder end EXEC (@strsql) The stored procedure above the store is a universal stored procedure, Its comment is written thereto. In the case of big data, especially when querying the last few pages, the query time generally does not exceed 9 seconds; use other stored procedures, in practice, this stored process is very suitable for large capacity The database is queried. The author hopes to bring a certain revelation by parsing the above stored procedures, and bring a certain efficiency to the work, and hopes to make a better real-time data paging algorithm. Fourth, the importance of aggregation index and how to select a gathering index in the title of the previous section, the author writes: the general paging display stored procedure for realizing small data volume and massive data. This is because the author discovers that the third stored procedure is in the case of small data in the practice of this stored procedure, and the patch speed is generally maintained at 1 second. 3 seconds. 2. When querying the last page, the speed is generally 5 seconds to 8 seconds, even if the total number of paging is only 3 or 300,000 pages. Although this paging implementation process is very fast in the case of oversized capacity, this 1-3 seconds, the speed is slower than the first one or no optimized paging method, borrowing The user's words are "there is no ACCESS database speed", this understanding is sufficient to lead the user to abandon the use of your development. The author analyzed this, which is the same as this phenomenon, but it is important: Sorted fields are not aggregated index! The topic of this article is: "Query Optimization and Page Algorithm Program". The author only puts the two contacts of "Query Optimization" and "Pieces Algorithms", which is because both require a very important thing - aggregation index. In the previous discussion, we have already mentioned that the aggregation index has two largest advantages: 1. Reduce the query range with the fastest speed. 2. Sort by the fastest speed. Article 1 When the query optimization is used, the second multi-use data is sorted in the page. The aggregation index can only build one in each table, which makes the aggregation index more important. The selection of aggregated indexes can be said to be the most critical factors that implement "Query Optimization" and "High Efficiency Paggers". However, it is often a contradiction between the needs of the aggregation indexes in accordance with the needs of the query column, and in line with the needs of the sequence. In the previous discussion of the author's "index", Fariqi, ie the user issued date as the starting column of the aggregated index, the date of the date is "day". The advantages of this practice have been mentioned above, and in the quick query of the time period, there is a great advantage over the list of ID primary keys. However, in paging, since this aggregation index is repeatedly recorded, the most efficient sort cannot be achieved using MAX or MIN to be used.

转载请注明原文地址:https://www.9cbs.com/read-38495.html

9cbs

New Post(0)