Oracle Optimization Series Articles II - Database Query Optimization Technology

xiaoxiao2021-03-05  32

The database system is the core of the management information system. Based on database-based online transaction processing (OLAP), online analysis processing (OLAP) is one of the most important computer applications of banks, enterprises, government and other departments. From the application example of most systems, the query operation is the largest proportion in various database operations, and the SELECT statement based on the query operation is the largest statement in the SQL statement. For example, if the amount of data accumulates to a certain extent, such as a bank account database table information accumulates to hundreds of thousands or even tens of thousands of records, the full table scans often takes ten minutes, or even hours. If you use better query strategies than full menu, you can easily reduce query time to minutes, thereby visible to query optimization technology importance.

The author found in the implementation of the application project, and many programmers use some front-end database development tools (such as PowerBuilder, Delphi, etc.) when using database applications, only paying gorgeous interfaces, do not pay attention to the efficiency of query statements, leading to The development of application system is low, and the waste is wasteful. Therefore, how to design efficient and reasonable query statements is very important. This article is based on application examples, combined with database theory, introduces the application of query optimization techniques in the real system.

analyse problem

Many programmers believe that query optimization is the task of DBMS (Database Management System), and the SQL statement written by the programmer is not large, which is wrong. A good query plan can often increase the program performance by dozens of times. The query plan is a collection of SQL statements submitted by the user, and the query plan is a collection of statements generated after optimization processing. The process of the DBMS processing query plan is this: After the language of the query statement, after the syntax check, submit the statement to the Query Optimizer of the DBMS, after the optimization of the algebraic optimization and access path, by the precompilation module Processing the statement and generate query planning, then submitted to the system processing at a suitable time, finally returning the execution result to the user. In the high versions of actual database products such as Oracle, Sybase, etc., are adopted based on cost-based optimization methods. This optimization can estimate the cost of different query planning based on information obtained from the system dictionary table, then select one Better planning. While the current database products have been getting better and better in query optimization, the SQL statement submitted by the user is the basis of system optimization. It is difficult to imagine that a bad query plan will become efficient after the system is optimized, so The advantage of the user's writings is critical. The system is doing query optimization. We will not discuss it, follow the following focus on improving the solution to the user query plan.

Solve the problem

The following is a way to improve the user query plan as an example with the relational database system informix.

1. Reasonable use index

Indexes are important data structures in the database, and its fundamental purpose is to improve query efficiency. Most of the database products are now using IBM's first ISAM index structure. The use of indexes is just right, and the principles of use are as follows:

● Connect frequently, but do not specify the index on the column of the foreign key, and the field that is not often connected is automatically generated by the optimizer.

● Establish an index on the columns of frequent sorting or grouping (ie, GROUP BY or ORDER BY operation).

● Establish a search in columns that are often used in the conditional expression, do not establish an index on the columns of different values. For example, only two different values ​​of "male" and "female" in the "sex" column of the employee table, so it will not be necessary to establish an index. If the establishment index does not improve query efficiency, it will seriously reduce the update speed.

● If there are multiple columns to be sorted, a composite index can be established on these columns.

● Use system tools. If the Informix database has a TbCheck tool, you can check on the suspicious index. On some database servers, the index may fail or because of frequent operation, the read efficiency is reduced. If a query using the index is unknown, you can try the integrity of the index with the TbCheck tool, and fix it if necessary. In addition, after the database table updates a large amount of data, the index can be removed and reconstructed can increase the query speed. 2. Avoid or simplify sort

The large table should be simplified or avoided. When an output can be generated using an index to generate an output in an appropriate order, the optimizer avoids the step of sorting. Here are some influencing factors:

● The index does not include one or several columns to be sorted;

● The order of the columns in Group By or Order By clause is different from the order of the index;

● Sort columns come from different tables.

In order to avoid unnecessary sorting, it is necessary to correctly enhance indexes, reasonably consolidate database tables (although sometimes it may affect the standardization of the table, but is worthy of efficiency). If sort is inevitable, you should try to simplify it, such as the range of zodes of sorting.

3. Eliminate sequential access to large table line data

In nested queries, sequential access to the table may have a fatal impact on query efficiency. For example, use sequential access strategy, a nest 3 query, if each layer queries 1000 lines, then this query is to query 1 billion row data. Avoiding the main method of this is to index the column of the connection. For example, two tables: student table (student number, name, age ...) and selection class (student number, course number, grade). If both tables are connected, they must establish an index on the "Learning" connection field.

It is also possible to use and set to avoid sequential access. Although there are indexes on all check columns, some form of WHERE clause is forced optimizer to use sequential access. The following query will force the order of operation of the ORDERS table:

Select * from Orders where (Customer_Num = 104 and ORDER_NUM> 1001) or ORDER_NUM = 1008

Although indexing is built in Customer_NUM and ORDER_NUM, the optimizer is used in the above statement or the sequential access path to scan the entire table. Because this statement is to retrieve a collection of separated rows, it should be changed to the following statement:

Select * from orderers where customer_num = 104 and order_num> 1001

Union

Select * from Orders where order_num = 1008

This will use the index path to process the query.

4. Avoid related subsis

A column label occurs in the query in the inquiry and the WHERE clause, then it is likely that the subquery must be re-query after the column value in the main query changes. The more nesting, the lower the efficiency, so you should try to avoid subquery. If the child query is inevitable, then filter out as much row as possible in the child query.

5. Avoid difficult formal expressions

Matches and Like Keywords support wildcard matching, and the technical is called regular expressions. But this match is particularly time consuming. For example: SELECT * from Customer WHERE ZIPCODE LIKE "98_ _ _ _"

Even the index is established on the zipCode field, in this case, it is also possible to use sequential scanning. If the statement is changed to SELECT * from customer where zipcode> "98000", you will use the index to query when you execute the query, obviously greatly improves the speed.

In addition, it is necessary to avoid non-start substrings. For example, the statement: select * from customer where zipcode [2,3]> "80", the non-start substring is used in the WHERE clause, so this statement does not use an index. 6. Use temporary table to accelerate query

Sort a subset of the table and create a temporary table, sometimes accelerate the query. It helps to avoid multiple sorting operations and simplify the work of optimizer in other ways. E.g:

Select Cust.Name, Rcvbles.balance, ... Other Column

From cofst, rcvbles

WHERE CUST.CUSTOMER_ID = RCVLBES.CUSTOMER_ID

And rcvblls.balance> 0

And custom.postcode> "98000"

ORDER by custom.name

If this query is to be executed more than once, you can find all unpaid customers in a temporary file and sort by the customer's name:

Select Cust.Name, Rcvbles.balance, ... Other Column

From cofst, rcvbles

WHERE CUST.CUSTOMER_ID = RCVLBES.CUSTOMER_ID

And rcvblls.balance> 0

ORDER by custom.name

INTO TEMP CUST_WITH_BALANCE

Then query in the temporary table in the following manner:

SELECT * from Cust_With_balance

WHERE Postcode> "98000"

The row in the temporary table is less than the routine in the main table, and the physical order is the desired order, and the disk I / O is reduced, so the query workload can be greatly reduced.

Note: The primary table is not modified after the temporary table is created. When data is frequently modified in the primary table, be careful not to lose data.

7. Use sort to replace non-order access

Non-sequential disk access is the slowest operation, which is manifested back and forth in the disk access arm. The SQL statement hides this situation so that we can easily write a query to access a large number of non-sequential pages when writing applications.

Sometimes, use the sort capability of the database to replace the sequential access to improve the query.

Example analysis

Let's take an example of a manufacturing company to explain how to perform query optimization. 3 tables in the manufacturing company database are shown below:

1. Part table

Part Number                   

(Part_num)                  

102, 032SEageat 30g DISK   ...

500, 049Novel 10M network card  ......

......

2. Vendor table

Manufacturer                   

(Vendor_num)  (Vendor_name) (Other Column)

SEAGEAT CORP                

523, 045                    

......

3. PARVEN table

Part number                 

(Part_num)  (vendor_num)   (part_amount)

102, 032     3,450,000,000234, 423321,0014,000,0004,000,000

......

The following query will run regularly on these tables and generate reports on all parts:

SELECT Part_DESC, VENDOR_NAME, PART_AMOUNT

From part, vendor, parven

Where part.part_num = parven.part_num

And Parven.vendor_num = vendor.vendor_num

ORDER by part.part_num

If you do not build an index, the overhead of the above query code will be very huge. To this end, we build an index on the part number and the vendor number. The establishment of indexes avoids repeated scans in nested. The statistics on the table and index are as follows:

                            

ROW SIZE   (ROWS / PAGES)   (DATA PAGES)

10000                          

Vendor                                                                                                                 :40

Parfarven                          

Index                        

Key size)              Leaf pages?

PART                     

Vendor                     

Parven                              

It looks a relatively simple 3 table connection, but its query overhead is very large. As can be seen by viewing the system table, there is a cluster index on Part_num and Vendor_NUM, so the index is stored in the physical order. The PARVEN table does not have a specific storage order. The large novels of these tables will be small from the success rate of unprecedented access from the buffer page. Optimized query planning of this statement is: First read 400 pages from Part in Part, and then access 10,000 times in the PARVEN table, 2 pages 2 (a index page, one data page), total 20,000 disks Page, finally accessing 15,000 times in the Vendor table, 30,000 disk page. It can be seen that the disk takes 50,400 disks on this cable.

In fact, we can improve query efficiency by using three steps to use temporary devices:

1. Data from Vendor_Num from the PARVEN table:

SELECT Part_NUM, VENDOR_NUM, PRICE

From parven

ORDER BY VENDOR_NUM

INTO TEMP PV_BY_VN

This statement sequence reads PARVEN (50 pages), write a temporary table (50 pages), and sort. Assume that the overhead of the sort is 200, a total of 300 pages.

2. Connect the temporary table and vendor table, output the result to a temporary table and press part_num:

SELECT PV_BY_VN, * VENDOR.VENDOR_NUM

From PV_BY_VN, VENDOR

WHERE PV_BY_VN.VENDOR_NUM = vendor.vendor_numorder by pv_by_vn.part_num

INTO TMP PVVN_BY_PN

DROP TABLE PV_BY_VN

This query reads PV_BY_VN (50 pages), which is transmitted through the Vendor table by index, but since the vendor_num order is arranged, it is actually read in the vendor table in the index (40 2 = 42 pages), the output table per page About 95 lines, a total of 160 pages. Write and access these pages to trigger 5 * 160 = 800 read and write, index a total of reading and writing 892 pages.

3. Connect the output and Part to get the last result:

SELECT PVVN_BY_PN. *, Part.part_Desc

From PVVN_BY_PN, PART

Where pvvn_by_pn.part_num = part.part_num

DROP TABLE PVVN_BY_PN

In this way, the query is sequentially read in the PVVN_BY_PN (160 pages), and 15,000 times by the index read PART table. Due to the establishment of an index, 1772 disks read and write, the optimization ratio is 30: 1. The author did the same experiment on Informix Dynamic Sever, found that the optimization ratio of time spending is 5: 1 (if added data, the proportion may be larger).

Small  

20% of the code used to take 80% of the time, which is a famous law in programming and is the same in database applications. Our optimization should seize key issues, focusing on the execution efficiency of SQL for database applications. The key link of the query optimization is to make the database server read data from the disk and sequentially read the page instead of a non-order page.

转载请注明原文地址:https://www.9cbs.com/read-33863.html

New Post(0)