Let your SQL run faster - people tend to fall into a misunderstanding when using SQL, that is, it is too concerned that the result is correct, and ignores possible performance differences between different implementations, this performance difference It is particularly obvious in large or complex database environments (such as online transaction OLTP or decision support system DSS). The author found in work practice, poor SQL often comes from inappropriate index design, unsolicient connection conditions and inertable WHERE clauses. After proper optimization of them, its running speed is significantly improved! Below I will summarize from these three aspects: - For more intuitive explanation, the SQL runtime in all instances is tested, no more than 1 second is expressed as (<1 second). - Test Environment ---- Host: HP LH II - The frequency: 330MHz - Memory: 128 mega - Operating System: OperServer5.0.4 - Database: Sybase11.0.3 First, unreasonable Index Design - Example : Table Record has 620000 lines, trying to see the operation below different indexes, the following SQL operation: - 1. Built on Date with a non-communic index Select Count (*) from Record Where Date> '19991201' and Date <'19991214'and Amount> 2000 (25 seconds) Select Date, SUM (Amount) from Record Group by Date (55 seconds) Select Count (*) from Record Where Date>' 19990901 'and Place in (' bj ',' SH ') (27 seconds) - Analysis: - There is a large number of repetition values on Date, under the non-clustered index, data is physically stored on the data page, and you must perform a table scan to find a table scan. All rows within this range. - 2. A cluster index in Date Select count (*) from record where date> '19991201' and and> 2000 (14 second) Select Date, SUM (Amount) from Record Group by Date (28 seconds) Select count (*) from record where date> '19990901' and place in ('bj', 'sh') (14 seconds) - Analysis: - Under the cluster index, data is physically in order On the data page, the repetition value is also arranged together, so at the range lookup, you can find the range of the range of this range, and only scan the data pages only within this range, avoiding a wide range of scans, increase the query speed.
- 3. In Place, Date, Amount SELECT COUNT (*) from Record Where Date> '19991201' And Date <'19991214' and Amount> 2000 (26 second) Select Date, Sum (Amount) from Record Group by Date (27 second) Select count (*) from record where date> '19990901' and place in ('bj,' sh ') - analysis: - This is a unsyvisible combination Index, because its leading column is Place, the first and second SQL do not reference the Place, so there is no use of the index; the third SQL uses Place, and all columns references are included in the combined index, formed Index coverage, so its speed is very fast. - 4. In Date, Place, Amount SELECT Count (*) from Record Where Date> '19991201' And Date <'19991214' and Amount> 2000 (<1 second) Select Date, Sum (Amount) from Record Group by Date (11 second) Select count (*) from record where date> '19990901' AND discount in ('bj', 'sh') (<1 second) --- Analysis: - This is a reasonable combination index. It uses DATE as the leader, allowing each SQL to utilize indexes, and forms an index coverage in the first and third SQLs, and thus performance has achieved optimal. - 5. Summary: - The index established by default is a non-clustered index, but sometimes it is not the best; reasonable index design is built on various queries analysis and prediction. In general,: - 1. There is a large number of repetition values, and often have a range of queries (Between,>, <,> =, <=) and the columns that occur, the columns occurred in the group By, can consider establish a cluster index; - 2 Always access multiple columns simultaneously, and each column contains repetition values to consider establishing a combined index; - 3. Combined indexing should try to make a critical query form an index cover, and the front lead list must be used.
Second, the connection condition: - Example: Table Card has 7896 lines, there is a non-aggregated index on Card_no, table Account has 191122 lines, there is a non-aggregated index on Account_no, trying to look at different table connection conditions Next, the implementation of two SQL: Select SUM (A.Amount) from Account A, Card B WHERE A.CARD_NO = B.Card_no (20 seconds) - Change SQL to: SELECT SUM (A.Amount) from Account A, card b.card_no = b.card_no and a.account_no = b.account_no (<1 second) - Analysis: - Under the first connection condition, the best query is the extraction of Account. , Card is in the inner table, using the index on the Card, the number of I / O can be estimated by the following formula: --- On the 22541 pages on the top table Account (91122 lines of outer table Account) on the outer table ACCOUN 3 pages to be found in the first line of the outer layer) = 595907 times I / O - under the second connection condition, the best query scheme is to make a CARD out of the table, Account as the inner table, using Account The index, the number of I / O can be estimated by the following formula: - 1944 page on the outer table Card (7896 line of the outer table Card), the inner layer table Account, the 4 pages of each line of each line of the outer table = 33528 times I / O - visible, only a fully connected condition, the real best solution will be executed. - Summary: - 1. The multi-table operation is pre-executed before being actually executed, and the query optimizer lists several groups of possible connection schemes and finds the minimum system overhead based on the connection conditions. The connection condition should be considering the table with indexes, the number of rows of rows; the selection of the inner and outer tables can be determined by the formula: the number of matches in the outer table * The number of times in the inner layer table is determined, the minimum is the best Program. - 2. View the implementation method - Use SET ShowPlanon to open the showplan option, you can see the connection order, use the information of the index; want to see more detailed information, you need to perform DBCC with SA roles (3604, 310, 302 ).
Third, an inelaborate WHERE clause - 1. Example: The columns in the following SQL condition statements have a proper index, but the execution speed is very slow: select * from record WHERESUBSTRING (CARD_NO, 1, 4) = '5378 '(13 seconds) Select * from record whereamount / 30 <1000 (11 second) Select * from record whereconvert (char (10), DATE, 112) =' 19991201 '(10 seconds) - analysis: - WHERE clause Any operational result of the column is calculated by the SQL runtime, so it has to perform a table search without using the index above the column; if these results can be obtained when the query is compiled, then you can Optimized by SQL optimizer, use index, avoid table search, so rewritten SQL into the following: Select * from record where card_no like '5378%' (<1 second) Select * from Record Where Amount <1000 * 30 (<1 Second) Select * from record where date = '1999/12/01' (<1 second) - you will find SQL significantly faster! - 2. Example: Table stuff has 200,000 lines, there is a non-clustered index on id_no, please see the following SQL: Select Count (*) from stuff where id_no in ('0', '1') (23 seconds) - Analysis: - 'IN' in WHERE Condition is logically equivalent to 'or', so the syntax analyzer converts in ('0', '1') into ID_NO = '0' or ID_NO = '1' carried out. We expect it to find separately according to each OR clause, then add the result, which can take the index on ID_no; but in fact (according to Showplan), it adopts "OR Strategy", that is, take out each The line of the OR clause, in the worksheet of the temporary database, then establish a unique index to remove the repetition, and finally calculate the results from this temporary table. Therefore, the actual process does not use ID_no to index, and the completion time is also affected by the performance of the Tempdb database. - Practice proves, the more the number of tables, the worse the performance of the worksheet, when STUFF has 62 million lines, the execution time is 220 seconds! It is better to separate the OR clause: SELECT Count (*) from stuff where id_no = '0'select count (*) from stuff where id_no =' 1 '- get two results, then make an additional calculation. Because each sentence uses an index, the execution time is only 3 seconds, and the time is only 4 seconds at 620000.