Let SQL fly together

xiaoxiao2021-03-06 55

Let SQL fly together to often fall into a misunderstanding when using SQL, that is, it is too concerned that the result of the result is correct, and ignoring the possible performance differences between different implementations, this performance difference is large or complex In the database environment (such as online transaction processing OLTP or decision support system DSS) is particularly obvious. The author found in work practice, poor SQL often comes from inappropriate index design, insufficient connection conditions and inelastic WHERE clauses. After proper optimization of them, its running speed has improved significantly! Below I will summarize from these three aspects. In order to more intuitively explain the problem, the SQL runtime in all instances is tested, and each of the no more than 1 second is expressed as (1 second). Test Environment Host: HP LH II: 330MHz Memory: 128MB Operating System: Oper Server 5.0.4 Database: Sybase 11.0.3 First, Unreasonable Index Design Example: Table RECORD has 620000 rows, try to look at different indexes, The following SQL operations: 1. Construction of a non-clustered index on Date Select count (*) from record where date'19991201 'and age'19991214' and Amount2000 - (25 seconds) Select Date, Sum (Amount ) from record group by date - (55 seconds) Select count (*) from record where date'19990901 'and place in (' bj ',' sh ') - (27 second) analysis: Date has a lot of repetition Value, under the non-clustered index, data is physically randomly stored on the data page, and the table scan must be executed in the range to find all the rows within this range. 2. A cluster index on Date Select Count (*) from record where date'19991201 'and age'19991214' and Amount2000 - (14 second) Select Date, SUM (Amount) from Record Group by Date - (28 Seconds) SELECT Count (*) from Record Where Date'19990901 'and Place in (' BJ ',' SH ') - (14 seconds) Analysis: Under the cluster index, data is physically ranked on the data page The repetition value is also arranged together, and thus, when searching, you can find the range of the range of this range, and only scan the data pages within this range, avoid a wide range of scans, improve the query speed.

3. In Place, Date, Amount SELECT Count (*) from Record Where Date'19991201 'And Date'1991214' and Amount2000 - (26 second) Select Date, SUM (Amount) from Record Group by Date - - (27 seconds) Select count (*) from record where date'19990901 'and place in (' bj ',' sh ') - (1 second) analysis: This is an unseasonful combination index because it's The front lead list is Place, the first and second SQLs do not reference the Place, so there is no use of the index; the third SQL uses the Place, and all columns referenced are included in the combined index, and the index coverage is formed, so it The speed is very fast. 4. In Date, Place, Amount, SELECT Count (*) from Record Where Date'19991201 'and Date'19991214' And Amount2000 - (1 second) Select, SUM (Amount) from Record Group by Date - - (11 seconds) Select count (*) from record where date'19990901 'and place in (' bj ',' sh ') - (1 second) analysis: This is a reasonable combination index. It uses DATE as the leader, allowing each SQL to utilize indexes, and forms an index coverage in the first and third SQLs, and thus performance has achieved optimal. 5. Summary The index established by default is a non-clustered index, but sometimes it is not optimal; reasonable index design is based on the analysis and prediction of various queries. In general, there is a large number of repetitive values and often have a range of queries (Between, =, =), and orderby, groupby, which can consider establish a cluster index. Always access multiple columns simultaneously, and each column contains a repetition value to consider establish a combined index. The combined index should try to make a key query form an index coverage, and the leading list must be the most frequent column.

Second, inadequate connection condition: Table Card has 7896 lines, there is a non-aggregated index on Card_no, and the account is 191122, there is a non-aggregated index on Account_no, trying to see the different table connection conditions, two SQL execution: Select SUM (A.Amount) from Account A, Card B WHERE A.CARD_NO = B.Card_no - (20 seconds) Change SQL to: SELECT SUM (A.Amount) from Account A, Card B WHERE A.CARD_NO = B.Card_no and a.account_no = b.account_no - (1 second) Analysis: Under the first connection condition, the best query scheme is to make an access to an external table, a Card as an inner table, With the index on the Card, the number of I / O can be estimated by: 22541 pages on the outer table Account (191122 line × in the outer table ACCOUNT) to find the first line of the outer table 3 pages) = 595907 Under the second connection conditions, the best query scheme is to make a CARD out of the table, an Account as the inner table, and use the index on Account. The number of I / O can be estimated by the following formula. To: 1944 page of the outer table Card (Outer Table Card's 7896 Ring × Inner Layer Table Account) 4 pages to be found in each line of the outer table) = 33528 times I / O visible, only sufficient connection conditions The true best solution will be executed. Summary: The multi-table operation is pre-executed before being actually executed, and the query optimizer lists several groups of possible connection schemes and finds the best solution for system overhead based on the connection conditions. The connection condition should be fully considered with the table with indexes, the number of rows of rows; the selection of the inner and outer tables can be determined by the formula: the number of matches in the outer table × the number of times in the inner layer table is determined, the minimum is the best Program. View the method of the execution plan - Use SetShowPlanon to open the showplan option, you can see the connection order, use the index information; if you want to see more detailed information, you need to perform DBCC (3604, 310, 302) with the SA role (3604, 310, 302).

Third, unmoderable WHERE clause: The columns in the following SQL condition statements have an appropriate index, but the execution speed is very slow: select * from record where substring (CARD_NO, 1, 4) = '5378' - (13 seconds) Select * from Record Where Amount / 301000 - (11 seconds) Select * from record where convert (char (10), date, 112) = '19991201' - (10 seconds) analysis: WHERE clause Any of the listed results is calculated one by-quarter by SQL runtime, so it has to perform a table search without using the index above the column; if these results can be obtained, then it can be SQL optimizer optimization, use index, avoid table search, so rewritten SQL into the following: Select * from record where card_no like '5378%' - (1 second) Select * from record where Amount1000 * 30 - (1 second ) Select * from record wheredate = '1999/12/01' - (1 second) You will find SQL significantly faster! Example: Table stuff has 200,000 lines, there is a non-clustered index on id_no, please see the following SQL: Select Count (*) from stuff where id_no in ('0', '1') - (23 seconds) analysis: WHERE condition The 'IN' is logically equivalent to 'or', so the syntax analyzer converts IN ('0', '1') to ID_NO = '0' OR ID_NO = '1'. We expect it to find separately according to each OR clause, then add the result, which can take the index on ID_no; but in fact (according to Showplan), it adopts "OR Strategy", that is, take out each The line of the OR clause, in the worksheet of the temporary database, then establish a unique index to remove the repetition, and finally calculate the results from this temporary table. Therefore, the actual process does not use ID_no to index, and the completion time is also affected by the performance of the Tempdb database. Practice has proved that the more the number of rows, the worse the performance of the worksheet, when STUFF has 62 million lines, the implementation time is 220 seconds! It is better to separate the OR clause: select count (*) from stuff where id_no = '0'select count (*) from stuff where id_no =' 1 'gets two results, then make an additional calculation. Because each sentence uses an index, the execution time is only 3 seconds, and the time is only 4 seconds at 620000.

转载请注明原文地址:https://www.9cbs.com/read-39433.html

9cbs

New Post(0)