Thoughts on a T-SQL statement

xiaoxiao2021-03-06  73

The problem is this: there is a normal index on the table stress_test (ID int, key char (2)) ID; there is a cluster index on the key; ID has a limited amount of repetition; key has unlimited repetition;

Now I need to use the ID = 'AZ' and key = 'bw' and key = 'cv' by logic and query table.

Ask for the most effective query statement

Test Environment: Hardware: P4 2.6 512M 80G Software: Windows Server 2003 (Enterprise Edition) SQLServer 2000 SP3A

First we create a test-tested data, in order to make the data as much as possible, we randomly generate two random numbers to a string through Rand (), first inserted, the data is 1,000,000 records, and then inserted in the loop into 58,000,000 records. Because it is randomized data, if you test the data set and I test, it will be the same for the optimization and operation of the index. The following "- // Test Script" is a script that generates test data, you can modify @maxgroup, @maxloop value as needed, such as testing 1 million records:

SELECT @ maxGroup = 1000 select @ maxloop = 1000

If you want to test 50 million:

SELECT @ maxGroup = 5000 select @ maxloop = 10000

So if your Server or PC is slower, please wait patiently ...... (speed running on my PC is 1.14m inserting 1 million), inserting 5 thousand and eight million times 19.41m, the time to re-establish Index is 34.36M)

As a general developer, it is easy to think of the statement:

- Statement 1

Select a. [id] from stress_test where [key] = 'az') a, (select Distinct [iD] from stress_test where [key] = 'bw') b, (Select Distinct [ID) ] from stress_test where [key] = 'cv') c Where a.id = B.ID and A.ID = C.ID

- Statement 2

SELECT [ID] from stress_test where [key] = 'az' or [key] = 'bw' or [key] = 'cv' group by id having (count (distance (distance) = 3)

- Statement 5

Select Distinct a. [ID] from stress_test as a, stress_test as b, stress_test as c where a. [Key] = 'az' and b. [Key] = 'bw' and c. [Key] = 'cv' and a. [id] = b. [id] and a. [id] = c. [ID]

But as the so-called "master" "master" as T-SQL may think that this way of writing is very "soil", it also appears to have no level, so some subquery and external connection are selected, and the efficiency of the regular query is relatively high: - Statement 3

Select Distinct [ID] from stress_test a where not exists (SELECT 1 from (SELECT 'AZ' AS K Union All Select 'BW' Union All Select 'CV') B LEFT JOIN STRESS_TEST C on C.ID = A.ID and B [k] = c. [key] where c.id is null)

- Statement 4

Select Distinct A.id from stress_test a where not exists (Select * from keytb c where not exists (select * from stress_test b where b .id = a.id and c.kf1 = b. [key]))

Let's analyze these statements first (analyze 5,800,000 data):

Please pay special attention to the value of Estimated Row Count.

Statement 1: From the execution plan, we can see that the index optimization selected by the MSSQLServer is very regular, first pass the clustered index to filter the ID of [Key] = 'az' condition, then perform the Hash Match, find the ID equal The sequential push final retrieved to meet the records of all conditions. The value of the Estimated Row Count is not large.

Statement 2: From the execution plan, we can see that it is the ID of [key] = 'az' or [key] = 'cv' in accordance with the Clustered Index filter, then in line with all the conditions Id, then Group 2 Hash Match all IDs. We can see that the value of Estimated Row Count is less and less, only 402 from the initial 369, 262 to the final sort.

Statement 3: From the execution plan, we can see is very complicated, and you can retrieve the records of A.ID = C.ID and [key] = '**' by 3 groups through the consTANT Scan and Non-Clustered Index. Group, then packets the foreign key match, then consolidate the data of the 3 groups, and then make foreign key matching with a recordset that Non-Clustered Index retrieved, we can see that MSSQLServer will record all records (5,000 10,000) records, the value of Estimated Row Count is: 58, 720,000, so this T-SQL bottleneck is packet 5 million records.

Statement 4: From the execution plan, we can see the statement 3 has a similarity, you must group all records (5 million) records, so this is the retrieved bottleneck, and the index used is non-clustered INDEX.

Statement 5: From the execution plan, we can see that the recording set that meets [key] = 'az' is retrieved by Clustered Index, and then Hash Match and Sorts, because the quantity is very small, in and through Non -Clustered Index Retrieves [Key] = 'BW' records for Inner Join, combined with the record of [Key] = 'CV' retrieval through Clustered Index, and finally group retrieval of 4 million data, if it is 6 Column, we can see that the value of Estimated Row Count is incremented, getting bigger and bigger, the last packet retrieved the value of Estimated Row Count is 3.46e 15, which has formed a huge bottleneck. We can test a small amount of data first (50000); everyone can test the script below: select @ maxGroup = 500 select @ maxloop = 100 ------------------- -------------------------------------------------- - | ------------------ Statement 1 ---- Statement 2 ---- Statement 3 ---- Statement 4 ---- Statement 5 ---- | | 50,000 (3 columns) 5MS 19MS 37MS 59MS 0ms | 50,000 (6 columns) 1ms 26ms 36ms 36ms 1ms 36ms 36ms 1MS From the data of the test, the efficiency of the statement 5 is the highest, there is almost no time, and the statement 2 Efficiency can only be said to be general. If the test is over, we can choose the statement 5 :-(, continue the following test .....

We test millions of records: 1. Test 1 million records first (select 3 columns) 2. First test 1 million records (select 6 columns) 3. Test 5 million data (Select 3 columns) 4. Test 5 million data (selected 6 columns)

Statistics Table 1: ---------------------------------------------- ---------------------------------- Statement 1 ---- Statement 2 ---- Statement 3 ---- Statement 4 ---- Statement 5 ---- | | 1 million (3 columns) 0.77% 0.41% 49.30% 48.99% 0.52% | 1 million (6 columns) 1.61% 0.81% 48.99% 47.44% 1.14% | 50 million (3 columns) 0.14% 0.18% 48.88% 48.86% 1.93% | 50 million (6 columns) 0.00% 0.00% 0.00% 0.1% 100.00% Statistics Table 2: -------------------------------------------------- -------------------------------------- Statements 1 ---- Statement 2 --- - Statement 3 ---- Statement 4 ---- Statement 5 ------------------------------------- (3 columns) 575MS 262MS 110117MS 110601MS 12533MS | 50 million (6 columns) 1070ms 576ms 107988ms 109704ms 10M or more test summary: (We can pay more attention: statement 2 and statement 5) 1. In the case of 1 million records, The statement 5 is the fastest, but it is the slowest in 5 million records. This shows that in the case of index optimization, the amount of data is different, and the efficiency of the retrieval is also different. We usually use the use of Index when writing T-SQL, as long as we write T-SQL uses Clustered Index, we think it is optimized, in fact, this is a misunderstanding, we also pay attention to Estimated Row Count Value, a large number of I / O operations should we pay attention, so we should select the corresponding T-SQL statement according to the different data volume, do not think that under small data is the highest amount of data in large data, perhaps the most slow:-(.

2. The fastest in the implementation of the plan is not the fastest, we can see 1 million (6 columns) In this line, the proportion of statement 2 and statement 5 is 0.81%: 1.14%, but actual Operation efficiency is 38ms: 11ms. So, we choose T-SQL to consider the speed of local I / O, so we must not only look at the execution plan while optimizing the statement, but also calculates the specific efficiency.

Add:

Set statistics Time ON / Off Set Statistics Io ON / OFF is a good debugging method.

3. Comprehensive evaluation, the efficiency of statement 2 is the highest, and the performance efficiency is not very different from the amount of data volume.

4. The more simple statement (statement 1), the higher the comprehensive efficiency, the lower the statement (statement 3, statement 4).

5. When writing the T-SQL statement usually, be sure to test according to different data volumes, although all use Clustered Index, but the efficiency of retrieval is large. - // Test script Use Northwindgoif Exists (Select * from sysobjects where name = N'Stress_test 'and type =' u ') DROP TABLE Stress_TestGo - // Defines Table Stress_test to store all test data CREATE TABLE stress_test ( [ID] int, [key] char (2))

GO - // Insert Test Data SET NOCOUNT ON - // Variable Definition Declare @ID INT - // Stress_Test ID Value Declare @key char (2) - // stress_test [key] value declare @maxgroup int - / / group maximum loop number declare @maxloop int - // ID The largest loop number declare @tempGroup int - // Temporary variable declare @temploop int - // Temporary Variable Declare @ Tempint1 Int - // Temporary Variable Declare @ Tempint2 INT - // Temporary Variable Declare @RowCount Int - // Record the number of rows submitted by transactions

- // Initialization variable Select @ id = 1select @ maxGroup = 1000select @ MaxLoop = 1Select @ TempGroup = 1select @ Temploop = 1select @key = '' select @ rowcount = 0

While @temploop <@ @ maxloopbegin while @TempGroup <@ @ maxGroup becom selection @ Tempint1 = 65 Convert (int, rand () * 50) SELECT @ Tempint2 = 65 Convert (int, rand () * 100) IF (@ @ r () * 100) IF (@ tempint1> = 122 or @ tempint2> = 122) begin select @ tempint1 = @ tempint1-100 select @ tempint2 = @ tempint2-100 if (@ tempint1 <= 65 or @ tempint2 <= 65) begin select @ tempint1 = @ tempint1 57 SELECT @ Tempint2 = @ Tempint2 57 End SELECT @ Key = Char (@ Tempint1) Char (@ Tempint2) if @ Rowcount = 0 Begin TRAN INS INSERT INTO stress_test ([ID], [KEY]) VALUES (@ID) , @ Key) SELECT @ ROWCOUNT = @ RowCount 1 if @Rowcount> 3000 - // Judgment When the number of rows reaches 3000, began to submit a transaction Begin Commit TRAN INS SELECT @ rowcount = 0 end select @ TempGroup = @ TempGroup 1 end if @rowcount> 0 begin commit tran ins select @ rowcount = 0 endselect @ tempGroup = 1 select @ id = @ id 1 select @ tempLoop = @ tempLoop 1endGO - // delete the record is NULL KEY delete stress_test where [key] is nullgo - // Establish a cluster index PK_STRESSCREATE CLUSTERED INDEX PK_STREATE CLUSTERED INDEX PK_STRESS ON Stress_TEST ([Key]) - // Establish a non-clustered index ni_stress_idc reate NonClustered index NI_stress_id on stress_test ([id]) GO - // test defined in Table keytbif exists (select * from sysobjects where name = N'keytb 'and type =' U ') Drop table keytbGOcreate table keytb ---- - // Store the table you need to match (Kf1 varchar (20))

- // Store you need to match the value, temporarily secure to three INSERT INTO KeyTB (KF1) VALUES ('AZ'); Insert Into Keytb (KF1) VALUES ('BW'); Insert Into Keytb (KF1) Values 'CV');

- Incert Into Keytb (KF1) VALUES ('du'); - Insert Into Keytb (KF1) Values ​​('ex'); - Insert Into Keytb (KF1) Values ​​('fy'); Go Next We start Test several T-SQL INDEX optimization issues:

- Test 1 million / 100 million records first (select 3 columns) T-SQL:

Print 'The first statement:' set statistics Time ON Set Statistics Io On SELECT A. [Id] from (Select Distinct [Key] from stress_test where [key] = 'AZ') A, (Select Distinct [ID] from stress_test WHERE [key] = 'bw') b, (Select Distinct [ID] from stress_test where [key] = 'cv') c where a.id = B.ID and A.ID = C.IDGOPRINT 'second statement : 'SELECT [ID] from stress_test where [key] =' az 'or [key] =' bw 'or [key] =' cv 'group by id having (count (distance (distance) = 3) GOPRINT' Three statements: 'Select Distinct [ID] from stress_test a where not exists (SELECT 1 from (SELECT' AZ 'AS K Union All Select' BW 'Union All SELECT' CV ') B Left Join Stress_Test C on C.ID = A.ID and B. [K] = c. [Key] where c.id is null) GOPRINT 'fourth statement:' Select Distinct A.id from stress_test a where not exists (SELECT * from Keytb C Where not exists (Select * from stress_test b where b. ID = a.id and c.kf1 = b. [key])) GOPRINT 'fifth statement:' SELECT DISTINCT A. [ID] from stress_test as a, stress_test as b, Stress_test as cwhere a. [key] = 'ac' and b. [key] = 'b B 'and c. [key] =' ca 'and a. [id] = b. [id] and a. [id] = c. [ID]

Goset Statistics Time Offset Statistics IO OFF

- Test 1 million / 100 million records (select 6 columns) T-SQL: Print 'The first statement:' set statistics Time ON Set Statistics Io on Select a. [ID] from (SELECT) Distinct [id] from stress_test where [key] = 'az') a, (Select Distinct [i] from stress_test where [key] = 'bw') b, (Select Distinct [i] from stress_test where [key] = ' CV ') C, (Select Distinct [iD] from stress_test where [key] =' du ') d, (Select Distinct [Key] from stress_test where [key] =' ex ') E, (Select Distinct [ID] from Stress_test where [key] = 'fy') f Where a. [ID] = b. [id] and a. [id] = c. [id] and a. [id] = d. [id] and a. [ID] = e. [id] and a. [id] = f. [id] goprint 'second statement:' select where [key] = 'az' or [key] = 'BW 'or [key] =' cv 'or [key] =' du'or [key] = 'ex'or [key] =' fy'group by id having (count (distance (distance) = 6) goprint ' Third statement: 'Select Distinct [ID] from stress_test a where not exists (SELECT 1 from (SELECT' AZ 'AS K Union All Select' BW 'Union All Select' CV'Union All Select 'Du'Union All Select' Ex'Union All Select 'Fy') B Left Join Stress_Test C on C. ID = A.ID and B. [K] = c. [Key] where c.id is null) GOPRINT 'fourth statement:' Select Distinct A.id from stress_test a where not exists (SELECT * from Keytb C Where NOT EXISTS (Select * from stress_test b where b. ID = a.id and c.kf1 = b. [key])) GOPRINT 'fifth statement:' SELECT DISTINCT A. [ID] from stress_test as a, stress_test as B, Stress_Test As C, Stress_Test AS D, Stress_Test AS E, Stress_Test As Fwhere A. [Key] = '

AZ 'and b. [Key] =' bw 'and c. [Key] =' CV 'and d. [Key] =' du 'and e. [Key] =' ex 'and f. [Key] =' Fy 'and a. [Id] = b. [Id] and a. [Id] = c. [Id] and a. [Id] = d. [Id] and a. [ID] = E. [ID] And a. [id] = f. [ID] Goset Statistics Time Offset Statistics Io OFF

Go

转载请注明原文地址:https://www.9cbs.com/read-69701.html

New Post(0)