How to make your SQL run faster

xiaoxiao2021-03-06  46

People tend to fall into a misunderstanding when using SQL, that is, it is too concerned that the result is correct, and the performance differences that may exist between different implementations, this performance difference is in large or complex database environments. (Such as online transaction processing OLTP or decision support system DSS) is particularly obvious. The author found in work practice, poor SQL often comes from inappropriate index design, unsolicient connection conditions and inertable WHERE clauses. After proper optimization of them, its running speed is significantly improved! Below I will summarize from these three aspects:

In order to more intuitively explain, the SQL runtime in all instances is tested, no more than 1 second is expressed as (<1 second).

test environment--

Host: HP LH II

Main frequency: 330MHz

Memory: 128 trillion

Operating system: OPERSERVER5.0.4

Database: Sybase11.0.3

First, unreasonable index design

Example: Table RECORD has 620000 rows, trying to look at different indexes, the following SQL operation:

1. Built a non-communic index on Date

Select count (*) from record where date>

'19991201' and Date <'19991214'and Amount>

2000 (25 seconds)

SELECT DATE, SUM (Amount) from Record Group by Date

(55 seconds)

Select count (*) from record where date>

'19990901' and Place in ('bj', 'sh') (27 seconds)

Analysis: There is a lot of repetition values ​​on Date. Under the non-clustered index, data is properly stored on the data page. When the range is found, a table scan must be executed to find all the rows within this range. 2. A cluster index on Date

Select count (*) from record where date>

'19991201' and Date <'19991214' And Amount>

2000 (14 seconds)

SELECT DATE, SUM (Amount) from Record Group by Date

(28 seconds)

Select count (*) from record where date>

'19990901' and Place in ('bj', 'sh') (14 seconds)

Analysis: Under the cluster index, the data is physically in order on the data page, and the repetition value is also arranged together, so you can find the starting point of this range first, and only scan the data page within this range. Avoid a wide range of scans and improve the query speed. 3. Combine index on Place, Date, Amount

Select count (*) from record where date>

'19991201' and Date <'19991214' And Amount>

2000 (26 seconds)

SELECT DATE, SUM (Amount) from Record Group by Date

(27 seconds)

Select count (*) from record where date>

'19990901' and Place in ('bj,' sh ') analysis: This is an unseasonful combination index because it is the leader, the first and second SQL do not quote Place, Therefore, there is no use of the index; the third SQL uses Place, and all columns referenced are included in the combined index, and the index coverage is formed, so it's very fast. 4. Combine index on Date, Place, Amount

Select count (*) from record where date>

'19991201' and Date <'19991214' And Amount>

2000 (<1 second)

SELECT DATE, SUM (Amount) from Record Group by Date

(11 seconds)

Select count (*) from record where date>

'19990901' and Place in ('bj', 'sh') (<1 second)

Analysis: This is a reasonable combination index. It uses DATE as the leader, allowing each SQL to utilize indexes, and forms an index coverage in the first and third SQLs, and thus performance has achieved optimal. 5. Summary: The index established by default is a non-clustered index, but sometimes it is not the best; reasonable index design is built on various queries analysis and prediction. Generally, 1. There is a large number of repetition values, and often have a range query (Between,>, <> =, <=) and the column that occurred in the Order By, which can consider establish a cluster index; 2. Always save Take more columns, and each column contains a repetition value to consider establish a combined index; 3. Combined index should try to make a critical query to form an index coverage, and the leading list must be the most frequent column. Second, the connection condition: example: Table Card has 7896 lines, there is a non-aggregated index on Card_no, table Account has 191122 lines, there is a non-aggregated index on Account_no, trying to see the different table connection conditions, The implementation of two SQL:

Select SUM (A.Amount) from Account A,

Card b where a.card_no = b.card_no (20 seconds)

Change SQL to:

Select SUM (A.Amount) from Account A,

Card b where a.card_no = b.card_no and a.

Account_no = B.account_no (<1 second)

Analysis: Under the first connection conditions, the best query scheme is to make an external table, a CARD inside a table, and use the index on the Card. The number of I / Os can be estimated by the following formula: outer table Account Page 22541 (Outer Table Account 191122 line * inner layer table Card) 3 pages of the first line of the outer table in the first line) = 595907 Under the second connection conditions, the best query scheme is Take a CARD, an Account is in the inner table, and the index on the Account is used. The number of I / O may be estimated by the following formula: 1944 pages on the outer table Card (7896 lines of outer table Card) Table 33528 times I / O visible, only fully connected conditions, the true best solution is only performed. Summary: 1. Multi-table operations are pre-executed before being implemented, and the query optimizer lists several groups of possible connection schemes and finds the best solution for system overhead based on the connection conditions. The connection condition should be considering the table with indexes, the number of rows of rows; the selection of the inner and outer tables can be determined by the formula: the number of matches in the outer table * The number of times in the inner layer table is determined, the minimum is the best Program. 2. View the method of the execution plan - Use Set Showplanon to open the showPlan option, you can see the connection order, use the index information; if you want to see more detailed information, you need to perform DBCC (3604, 310, 302) with the SA role. Third, an inelaborate WHERE clause 1. Example: The columns in the following SQL condition statements have a proper index, but the execution speed is very slow: select * from record where

Substring (CARD_NO, 1, 4) = '5378' (13 seconds)

Select * from record where

AMOUNT / 30 <1000 (11 seconds)

Select * from record where

Convert (char (10), DATE, 112) = '19991201' (10 seconds)

Analysis: Any operation result of the column in the WHERE clause is calculated one by-quarter by SQL runtime, so it has to perform table search, without using the index above the column; if these results can be compiled in the query Get it, you can be optimized by the SQL optimizer, use the index, avoid the table search, so rewritten SQL into the following:

Select * from record where card_no like

'5378%' (<1 second)

Select * from record where Amount

<1000 * 30 (<1 second)

Select * from record where date = '1999/12/01'

(<1 second)

You will find that SQL is obviously up! 2. Example: Table stuff has 200,000 lines, there is a non-clustered index on id_no, please see the following SQL:

SELECT Count (*) from stuff where id_no in ('0', '1')

(23 seconds)

Analysis: 'IN' in WHERE Condition is logically equivalent to 'or', so the syntax analyzer converts IN ('0', '1') into ID_NO = '0' OR ID_NO = '1'. We expect it to find separately according to each OR clause, then add the result, which can take the index on ID_no; but in fact (according to Showplan), it adopts "OR Strategy", that is, take out each The line of the OR clause, in the worksheet of the temporary database, then establish a unique index to remove the repetition, and finally calculate the results from this temporary table. Therefore, the actual process does not use ID_no to index, and the completion time is also affected by the performance of the Tempdb database. Practice has proved that the more the number of rows, the worse the performance of the worksheet, when STUFF has 62 million lines, the implementation time is 220 seconds! It is better to separate the OR clause: SELECT Count (*) from stuff where id_no = '0'

SELECT Count (*) from stuff where id_no = '1'

Two results were obtained, and then a additional addition is calculated. Because each sentence uses an index, the execution time is only 3 seconds, and the time is only 4 seconds at 620000. Or, use a better way to write a simple stored procedure:

Create Proc Count_stuff As

Declare @a int

Declare @B INT

Declare @c Int

Declare @d char (10)

Begin

SELECT @ a = count (*) from stuff where id_no = '0'

SELECT @ b = count (*) from stuff where id_no = '1'

end

SELECT @ c = @ a @ b

SELECT @ D = Convert (Char (10), @ c)

Print @d

Directly calculate the result, the execution time is as fast as above! Summary: It can be seen, the so-called WHERE clause utilizes an index, and a table scan or additional overhead occurs. 1. Any of the listed operations will cause a table scan, including database functions, calculation expressions, etc., move as much as possible to the right right when querying. 2. In and OR clauses often use worksheets to make index; if you do not produce a large number of repetition values, you can consider unpacking the clauses; the index should be included in the unpackable clause. 3. Be good at using the stored procedure, which makes SQL more flexible and efficient. As can be seen from these examples, the essence of SQL optimization is the statement that the results can be identified by the optimizer, and the number of I / O times the table scan can be used to avoid the occurrence of the table search. In fact, the performance optimization of SQL is a complex process. These are only an embodiment of the application level, and in-depth studies will also involve the resource allocation of the database layer, the flow control of the network layer and the overall design of the operating system layer.

转载请注明原文地址:https://www.9cbs.com/read-57433.html

New Post(0)