Database Optimization Strategy (4)

xiaoxiao2021-03-06  50

(1) Deeply understand the index structure

In fact, you can understand the index as a special directory. Microsoft's SQL Server provides two indexes: cluster index (also known as cluster index, cluster index) and non-aggregated index (Nonclustered Index, also known as non-cluster index, non-cluster index). Below, we will explain the difference between the aggregated index and the non-aggregated index:

In fact, the text of our Chinese dictionary itself is a gathering index. For example, we have to check the word "An", which will naturally open the first few pages of the dictionary, because "An" pinyin is "An", and the dictionary sorted by the pinyin is the beginning of English letters "A" At the end of "Z", then "An" is naturally ranked in the front of the dictionary. If you finish all the words starting with "A", you still can't find this word, then you don't have this word in your dictionary; the same, if you check "Zhang", then you will turn your dictionary The last part, because the pinyin of "Zhang" is "zhang". In other words, the text of the dictionary itself is a directory, you don't need to check other directories to find the content you need to find.

We put this body content itself is a directory that is arranged in accordance with a certain rule is called "aggregation index".

If you know a word, you can quickly check this word from automatically. But you may also meet the words you don't know, don't know its pronunciation, this time you can't find the word you want to check according to the just possible, but you need to find it according to the "deflection". The word, then turn it directly to a page according to the page number after this word to find the words you are looking for. However, you combine the ordering of the words found by "Department First Catalog" and "Inspection Form", it is not a real body sorting method. For example, if you check the word "Zhang", we can see the inspection after the checkout The page number "Zhang" is 672, "Zhang" in the inspection table is "Chi" word, but the page number is 63 pages, "Zhang" is "" word, the page is 390 pages. Obviously, these words are not true respectively in the "Zhang" word. Now you have seen the continuous "Chi, Zhang, Zhang" three words actually in the non-aggregated index, is the text of the dictionary Mapping in the non-aggregated index. We can find what you need in this way, but it takes two processes, first find the results in the directory, and turn it back to the page you need.

We put this directory purely the directory, and the body is purely the sorting method of the body is called "non-aggregated index".

Through the above example, we can understand what is "aggregated index" and "non-aggregated index".

Further introduction, we can understand it: Each table can only have a gathering index because the directory can only be sorted in one way.

(2) When to use aggregated indexes or non-aggregated indexes

The following table summarizes when to use aggregated indexes or non-aggregated indexes (very important).

Action Description Use the aggregated index to use the non-aggregated index column to be sorted by grouping should return to a range of data should not be one or very little different value should not be a small number of different values ​​should not have a large number of different values. Columns that should not be updated frequently should not have foreign key columns should respond to primary key columns should be frequently modified index columns.

In fact, we can understand the previous table via the previously aggregated indexes and the definition of non-aggregated indexes. Such as: Returns one of a range of data. For example, your table has a time column, just built the aggregated index in this column, then you will check all the data between January 1, 2004 to October 1, 2004, this speed will It is very fast, because your dictionary text is sorted by date, the cluster index only needs to find the beginning and end data to retrieve in all the data to retrieve; not like non-gathering indexes, you must check it first. You find the page number corresponding to each data in the directory, then find the specific content according to the page number. (3) Combining actual, talking about the misunderstanding used by the index

The purpose of theory is to apply. Although we have just listed when the aggregation index or non-aggregation index should be used, the above rules in practice is easily ignored or cannot be comprehensively analyzed according to the actual situation. Below we will talk about the misunderstandings used in the practice in practice, so that everyone can grasp the method of indexing.

1, the primary key is the aggregation index

This idea is considered extremely wrong and is a waste of aggregation indexes. Although SQL Server defaults to establish a gathering index on the primary key.

Typically, we will establish an ID column in each table to distinguish each data, and this ID column is automatically increased, and the step size is typically 1. The column GID in our instance of our office is. At this point, if we set this to the primary key, SQL Server will give this column to the aggregation index. This is good, that is, you can make your data physical sorting in the database, but the author thinks that this is not big.

Obviously, the advantages of aggregation indexes are obvious, and only one rule of aggregation index in each table, which makes aggregation indexes more precious.

From the definition of aggregated indexes we can see, we can see that the maximum benefit of using aggregation index is to quickly reduce query scope according to query requirements, avoid full table scans. In practical applications, because the ID number is automatically generated, we don't know the ID number of each record, so we are difficult to use the ID number in practice. This makes the ID number as a gathering index into a resource waste. Second, let each of the ID numbers are different as aggregation indexes, which do not comply with "large numbers of different values ​​should not establish the aggregate index" rules; of course, this is only for users to modify the record content, especially index items. It will be negative when it is, but there is no impact on the query speed.

In an office automation system, whether the system's home page is displayed, the meeting or the user's file query is in any case, and the data query is inseparable from the field. The "date" also has the "user name" of the user itself. .

Typically, the home page of office automation displays files or meetings that each user has not yet signed. Although our WHERE statement can only limit the current user has not yet signed, if your system has established a long time, and the amount of data is large, then each user opens a full table scan when each user opens the home page. This is not big, and the vast majority of users have been viewed in 1 month ago, so they can only share the overhead of the database. In fact, we can completely allow users to open the system home page, the database only queries the file that the user has not read in the past 3 months, limits the table scan, and improves the query speed via the "Date" field. If your office automation system has been established for 2 years, then your homepage displays will be 8 times the original speed, even faster.

Here is the "theoretical" three words because if your aggregation index is blindly built on this primary key, your query speed is not so high, even if you are on the "date" field Established index (non-aggregated index). Let's take a look at the speed performance of various queries in the case of 10 million data (250,000 data within 3 months): (1) Establish a gathering index only on the primary key, and does not divide the time period :

Select GID, Fariqi, NeiBuyonghu, Title from Tgongwen

Time: 128470 ms (ie: 128 seconds)

(2) Establish an aggregated index on the primary key and establish a non-aggregated index on Fariq:

Select GID, Fariqi, NeiBuyonghu, Title from Tgongwen

WHERE FARIQI> DATEADD (day, -90, getdate ())

Time: 53763 ms (54 seconds)

(3) Set the aggregate index on the date column (Fariqi):

Select GID, Fariqi, NeiBuyonghu, Title from Tgongwen

WHERE FARIQI> DATEADD (day, -90, getdate ())

Time: 2423 ms (2 seconds)

Although each statement is extracted from 250,000 data, the differences in various situations are huge, especially the difference in the date of the aggregation index. In fact, if your database really has 10 million volumes, the primary key is established on the ID column, just like the first, 2 cases above, the performance on the web page is timeout, and it cannot be displayed at all. This is also a most important factor I have abandoned the ID column as the aggregation index.

The way to get the above speed is: before each SELECT statement: Declare @d DateTime

Set @ d = getdate ()

And after the SELECT statement:

SELECT [Statement Execution Time (Mix)] = Datediff (MS, @ d, getdate ())

2. As long as the index is established, it can significantly improve the query speed.

In fact, we can find that in the above example, the second, third statements are identical, and the fields of establishing the index are also the same; the former is only the non-aggregated index built on the Fariqi field, and the latter is established on this field. The aggregation index, but the query speed has a heaters. Therefore, it is not necessary to simply establish an index on any field to improve the query speed.

From the statement of the table, we can see that this Fariqi field has 5003 different records in a table with 10 million data. The establishment of a aggregation index on this field is not yet. In reality, we will send a few files every day. The issuance date of these files is the same, which is fully compliant with the establishment of the aggregated index requirements: "There is neither the same, but not only the same rule." From this point of view, we build "appropriate" aggregate indexes are very important for us to improve query.

3. Put all the fields that need to improve the query speed to increase the aggregation index to improve the query speed.

It has been discussed above: If you don't leave the field when you perform a data query, "Date" also has the user's "user name". Since these two fields are important, we can consolidate them and build a composite index.

Many people think that just add any fields to the aggregation index, it can improve the query speed, and some people feel confused: If the composite gathering index character is separated, then the query speed will slow down? With this question, let's take a look at the following query speed (the result set is 250,000 data): (Date Rouger Fariqi first places the starting column of composite aggregation index, user name Neibuyonghu ranked) (1) Select GID, Fariqi, Neibuyonghu, Title from Tgongwen Where Fariqi> '2004-5-5'

Query speed: 2513 ms

(2) SELECT GID, FARIQI, NEIBUYONGHU, TIM TGONGWEN WHERE FARIQI> '2004-5-5' and neibuyonghu = 'office'

Query speed: 2516 ms

(3) SELECT GID, FARIQI, NEBUYONGHU, TIM TGONGWEN WHERE neibuyonghu = 'office'

Query speed: 60280 ms

From the above trial, we can see that if only the aggregated index is used as the query condition and the query speed of all columns used to use the composite aggregation index, it is even more than even more composite indexes. Fast (in the case of the number of query results set); and if only the non-starting column of the composite aggregation index is used as the query condition, this index is in any effect. Of course, the query speed of statements 1, 2 is the same as the number of items of the query, if all columns of the composite index are used, and the query results are less, so that "index coverage" can be formed, and the performance can achieve optimal . At the same time, please remember: Whether you often use other columns of aggregated indexes, the front lead list must use the most frequent columns.

转载请注明原文地址:https://www.9cbs.com/read-86933.html

New Post(0)