Five ways to improve SQL performance
Release Date: 4/1/2004
| Update Date: 4/1/2004
Johnny PAPA
Data Points Archive
Sometimes, in order to make the app run faster, all the work done is here or there are some small adjustments. Ah, but the key is to determine how to adjust! You will encounter this situation later: SQL queries in your application cannot respond in the way you want. It either returns data or has a long time. If it reduces the speed of the report or your business application, the user must wait too long, they will be very dissatisfied. Just like your parents don't want to listen to why they are just that they are going back in bed, and users will not listen to what you consume for so long. ("Sorry, mother, I use too many Left Join.") Users hope that the application responds quickly, and their reports can return to analyze data within an instant. In terms of myself, if a page takes more than ten seconds while surfing on the web, I can load (ok, five seconds is more actual), I will not be impatient.
In order to solve these problems, it is important to find the root of the problem. So where do you start? The root cause is usually in the database design and access it. In this month, I will tell four technologies that can be used to improve the performance of SQL Server® applications or improve their scalability. I will carefully describe the use of Left Join, Cross Join and the search of the Identity value. Keep in mind that there is no magical solution at all. Adjust your database and its query requires occupancy time, analyzes, and requires a lot of testing. These technologies have been proven to be valid, but for your application, some of these technologies may be more applicable than others.
This page
Return from INSERT to Identity Inline View and Temporary Table Avoiding Left Join and Null Flexible Use Cartesol Equity Picking
Return to Identity from INSERT
I decided to start with the content of many problems: How to retrieve the Identity value after executing SQL INSERT. Typically, the problem is not how to write a query of the search value, but where and when to retrieve it. In SQL Server, the following statement can be used to retrieve the Identity value created by the latest SQL statement running on the active database:
SELECT @@ identity
This SQL statement is not complicated, but it is necessary to remember that if this latest SQL statement is not insert, or you run this SQL for other connections for non-INSERT SQL, you will not get the desired value. You must run the following code to retrieve Identity followed by INSERT SQL and on the same connection, as shown below:
INSERT INTO PRODUCTS (ProductName) VALUES ('Chaalk')
SELECT @@ identity
Running these queries on a connection to the Northwind database will return a IDENTITY value named chalk. So, in the Visual Basic® application using ADO, you can run the following statement:
SET ORS = OCN.EXECUTE ("SET NOCOUNT ON; INSERT INTO PRODUCTS _
(ProductName) VALUES ('Chaalk'); Select @@ identity ")
LproductId = ics (0)
This code tells SQL Server not to return the line count of the query, then execute the Insert statement, and return to the Identity value created for this new line. The Set NoCount ON statement indicates that the returned recordset has a row and a column, which contains this new Identity value. If this statement is not, an empty recordset is first returned (because the INSERT statement does not return any data), then the second recordset is returned, and the second recordset contains the Identity value. This may be somewhat confusing, especially because you have never hoped that INSERT will return to record sets. This happens because SQL Server sees this line count (ie, an impact) and interprets it to represent a recordset. Therefore, the real data is pushed back to the second recordset. Of course, you can get this second recordset using the next NEXTRECORDSET method in ADO, but if you always return to the record set, it will be more convenient and more efficient. Although this method is effective, you need to add additional code in the SQL statement. Another way to get the same result is to use the Set NoCount ON statement before INSERT, and place the Select @@ identity statement in the For Insert trigger in the table, as shown in the following code snippet. This way, any INSERT statement that enters the table will automatically return to the Identity value.
Create Trigger Trproducts_insert on Products for Insert AS
SELECT @@ identity
Go
The trigger starts only when INSERT occurs on the Products table, so it always returns an Identity after successful Insert. With this technology, you can always retrieve the Identity value in the application in the same way.
Back to top
Inline view and temporary table
Some times, the query needs to connect the data to some of the data that may only be collected by executing Group BY then executing a standard query. For example, if you want to query about the latest five orders, you first need to know which orders are you. This can be retrieved using the SQL query that returns the order ID. This data is stored in a temporary table (this is a common technology), then couples with the Products table to return these sets of products:
Create Table # temp1 (OrderId Int Not Null, _
OrderDate DateTime Not Null)
INSERT INTO # Temp1 (OrderId, OrderDate)
Select Top 5 O.Orderid, O.Orderdate
From Orders o Order by O.OrderDate DESC
Select P.ProductName, SUM (Od.quantity) AS ProductQuantity
From # temp1 t
Inner Join [ORDER DETAILS] OD on T.Orderid = Od.OrderId
Inner Join Products P On Od.ProductId = P.ProductID
Group by p.ProductName
Order by p.ProductName
Drop Table # temp1
These SQL statements create a temporary table that inserts the data into the table, and the other data is coupled to the table and then remove the temporary table. This can cause this query to perform a large number of I / O operations, so you can rewrite the query, replace the temporary table in the intraogram. The embedded view is just a query that can be coupled to the FROM clause. So, you don't have to consume a lot of I / O and disk access on temporal tables in TempdB, and you can use the inline view to get the same result: Select P.ProductName,
Sum (Od.quantity) AS ProductQuantity
FROM
Select Top 5 O.Orderid, O.Orderdate
From Orders O
ORDER BY O.Orderdate DESC
T
Inner Join [ORDER DETAILS] OD on T.Orderid = Od.OrderId
Inner Join Products P On Od.ProductId = P.ProductID
GROUP BY
P.ProductName
ORDER BY
P.ProductName
This query is not only higher than the previous query efficiency, but also shorter length. Temporary tables consume a lot of resources. If you only need to join the data to another query, you can try the inline view to save resources.
Back to top
Avoid Left Join and NULL
Of course, there are many cases you need to perform Left Join and use NULL values. However, they do not apply to all situations. Changing the construction of SQL queries may generate the effect of shortening a report that spends a few minutes to a few seconds such as a few seconds. Sometimes, you must adjust the data in the query to adapt to the display modified by the application. Although the Table data type reduces a large amount of occupation resources, there are many areas in the query to optimize. A valuable common function of SQL is Left Join. It can be used to retrieve all rows in the first table, all matched rows in the second table, and all rows that do not match the first table in the second table. For example, if you want to return each customer and its order, use Left Join, you can display a customization and a customized customer.
This tool may be overused. LEFT JOIN consumes a lot of resources because they contain data that matches NULL data. In some cases, this is inevitable, but the consideration may be very high. Left Join consumes more resources than Inner Join, so if you can rewrite queries so that the query does not use any Left Join, a very considerable return is available (see Figure 1).
Figure 1 query
A technique accelerating using the Query speed of Left Join involves creating a Table data type, inserting all rows in the first table (left on the left side of the left), then update the Table data type using the value in the second table. This technology is a two-step process, but it saves a lot of time compared to standard Left Join. A good rule is to try a variety of different technologies and record the time required for each technology until you have the best performance for your application's execution performance.
When testing the speed of the query, it is necessary to run this query multiple times and then take an average. Because the query (or stored procedure) may be stored in the process cache in SQL Server memory, the first attempt time is like a slight longer, while all subsequent attempts spend shorter time. Also, when you run your query, you may be in touch with the same table. When other queries lock and unlock these tables, you may cause your query to wait. For example, if you update the data in this table, your query may take longer when you update your query. The easiest way to avoid speed reduction when using Left Join is to design the database as much as possible around them. For example, suppose that a product may have a category or there is no category. If the Products table stores its category ID, without the category of a particular product, you can store NULL values in the field. You must then perform Left Join to get all products and their categories. You can create a category that is "No Category", which specifies that the foreign key relationship does not allow NULL values. By performing the above, you can now retrieve all products and categories using Inner Join. Although this seems to be a variable method with excess data, it may be a very valuable technology because it eliminates Left Join that consumes more resources in the SQL batch statement. All in the database can save you a lot of processing time for you. Keep in mind that for your users, even if you have a few seconds, because you have many users who are accessing the same online database application, it will be very significant in these seconds.
Back to top
Flexible use of Cartesol
For this trick, I will introduce very detailed and advocate the use of Cartesol in some cases. For some reasons, Cartin Join has been condemned, and developers are usually not used to use them at all. In many cases, they consume too much resources, causing efficient use. But like any tool in SQL, if you use it correctly, they will also be valuable. For example, if you want to run a query that returns a monthly data (even if a specific month customer does not order), you can easily use the Cartesian product. The SQL in Fig. 2 performs the above operation.
Although this seems to be no magical, please consider it, if you get a standard Inner Join from the customer to the order (these orders are grouped in a month), only the customer has orders. month. Therefore, you won't get 0 values for the month without ordering any product. If you want to draw a picture for each customer to display each month and the monthly sales, you may want this figure including the month of monthly sales of 0 to intuitively identify these months. If you use SQL in Figure 2, the data will skip the monthly month of $ 0, because in the order table, for zero sales will not contain any rows (assuming that you only store the event).
Although the code in Figure 3 is longer, it can achieve the goal of acquiring all sales data (even the month without sales). First, it will extract a list of all months last year and then put them in the first Table data type table (@TBLMONTHS). Next, this code will get a list of all customers companies in this period of time, and then put them in another Table data type table (@ TBLCUS-TOMERS). These two tables store all basic data necessary to create result sets, except for actual sales quantity. All months (12 lines) are listed in the first table, and all customers who have sales in this period of time (81 me) are listed in the second table. Not each customer has purchased the product every month in the past 12 months, so the implementation of Inner Join or Left Join will not return each customer of each month. These operations will only return customers and months to purchase products. Cartesian product can return all customers in all months. The Cartesian product is basically multiplied by the first table with the second table, generating a row collection, which contains the result of the number of rows in the first table multiplied by the number of rows in the second table. Therefore, the Cartesian product will return to the table @tblfinal to 972 lines. The final step is to update the @TBLFinal table for each customer's monthly sales of each customer in the range of this date, and select the final row.
If the resource occupied by Cartesian product may be much more, it is carefully used to use Cross Join. For example, if Cross Join is performed on the product and category, then use the WHERE clause, Distinct or Group By to filter out most rows, then use Inner Join will get the same result, and more efficient is much higher. If you need to return data for all the possibilities (for example, when you want to populate a chart using a chart using monthly sales dates, you will be very helpful. However, you should not use them for other purposes because inner Join is much higher in most scenarios.
Back to top
Picker
Here you introduce other common technologies that can help improve SQL query efficiency. Suppose you will group all sales personnel according to the area and make their sales small, but you only want those salesperson marked as active. You can group salesperson by region and use the Having clause to eliminate salespeople that are not active, or do this in the WHERE clause. Doing this in the WHERE clause will reduce the number of rows that require packets, so do this more efficient than in the Having clause. The screening of row-based conditions in the Having clause will force queries to group data that will be removed in the WHERE clause.
Another technique that improves efficiency is to use the DistINCT keyword to find a separate report of the data line, instead of using the Group By clause. In this case, the SQL efficiency of the Distinct keyword is higher. Please use the group by in the case where the aggregate function (SUM, COUNT, MAX, etc.) is required. Also, if your query always returns a unique line, don't use the Distinct keyword. In this case, the Distinct keyword will only increase system overhead.
You have already seen that there is a lot of technology to optimize query and implement specific business rules, and the technique is to try some attempts and then compare their performance. The most important thing is to test, test, and then test. In the future content of this column, I will continue to deepen SQL Server concepts, including database design, good index practices, and SQL Server security examples.