Database query optimization technology

xiaoxiao2021-03-06 107

The database system is the core of the management information system. Based on database-based online transaction processing (OLAP), online analysis processing (OLAP) is one of the most important computer applications of banks, enterprises, government and other departments. From the application example of most systems, the query operation is the largest proportion in various database operations, and the SELECT statement based on the query operation is the largest statement in the SQL statement. For example, if the amount of data accumulates to a certain extent, such as a bank account database table information accumulates to hundreds of thousands or even tens of thousands of records, the full table scans often takes ten minutes, or even hours. If you use better query strategies than full menu, you can easily reduce query time to minutes, thereby visible to query optimization technology importance. The author found in the implementation of the application project, and many programmers use some front-end database development tools (such as PowerBuilder, Delphi, etc.) when using database applications, only paying gorgeous interfaces, do not pay attention to the efficiency of query statements, leading to The development of application system is low, and the waste is wasteful. Therefore, how to design efficient and reasonable query statements is very important. This article is based on application examples, combined with database theory, introduces the application of query optimization techniques in the real system. Analysis issues Many programmers believe that query optimization is the task of DBMS (Database Management), and the SQL statement written by programmers is not large, which is wrong. A good query plan can often increase the program performance by dozens of times. The query plan is a collection of SQL statements submitted by the user, and the query plan is a collection of statements generated after optimization processing. The process of the DBMS processing query plan is this: After the language of the query statement, after the syntax check, submit the statement to the Query Optimizer of the DBMS, after the optimization of the algebraic optimization and access path, by the precompilation module Processing the statement and generate query planning, then submitted to the system processing at a suitable time, finally returning the execution result to the user. In the high versions of actual database products such as Oracle, Sybase, etc., are adopted based on cost-based optimization methods. This optimization can estimate the cost of different query planning based on information obtained from the system dictionary table, then select one Better planning. While the current database products have been getting better and better in query optimization, the SQL statement submitted by the user is the basis of system optimization. It is difficult to imagine that a bad query plan will become efficient after the system is optimized, so The advantage of the user's writings is critical. The system is doing query optimization. We will not discuss it, follow the following focus on improving the solution to the user query plan. Solution The following is based on the relational database system Informix as an example, introducing the method of improving the user query plan. 1. Rational use index index is an important data structure in the database, and its fundamental purpose is to improve query efficiency. Most of the database products are now using IBM's first ISAM index structure. The use of indexes is just right, and the principles of use are as follows: ● Inconditioning, but not specified as the column of the foreign key, and the unconnected field is automatically generated by the optimizer. ● Establish an index on the columns of frequent sorting or grouping (ie, GROUP BY or ORDER BY operation). ● Establish a search in columns that are often used in the conditional expression, do not establish an index on the columns of different values. For example, only two different values of "male" and "female" in the "sex" column of the employee table, so it will not be necessary to establish an index. If the establishment index does not improve query efficiency, it will seriously reduce the update speed. ● If there are multiple columns to be sorted, a composite index can be established on these columns. ● Use system tools. If the Informix database has a TbCheck tool, you can check on the suspicious index.

On some database servers, the index may fail or because of frequent operation, the read efficiency is reduced. If a query using the index is unknown, you can try the integrity of the index with the TbCheck tool, and fix it if necessary. In addition, after the database table updates a large amount of data, the index can be removed and reconstructed can increase the query speed. 2. Avoiding or simplifying sorts should be simplified or avoided to repeat the large table. When an output can be generated using an index to generate an output in an appropriate order, the optimizer avoids the step of sorting. The following is some influencing factors: In order to avoid unnecessary sorting, it is necessary to correctly enhance indexes, reasonably consolidate database tables (although sometimes it may affect the standardization of the table, but is worthy of efficiency). If sort is inevitable, you should try to simplify it, such as the range of zodes of sorting. 3. Eliminating sequential access to large table row data In nested queries, sequential access to tables may have fatal impact on query efficiency. For example, use sequential access strategy, a nest 3 query, if each layer queries 1000 lines, then this query is to query 1 billion row data. Avoiding the main method of this is to index the column of the connection. For example, two tables: student table (student number, name, age ...) and selection class (student number, course number, grade). If both tables are connected, they must establish an index on the "Learning" connection field. It is also possible to use and set to avoid sequential access. Although there are indexes on all check columns, some form of WHERE clause is forced optimizer to use sequential access. The following query will force the order to perform the order of the OrderS table: select * from Orders Where (Customer_Num = 104 and ORDER_NUM> 1001) or ORDER_NUM = 1008 Although the index is built in Customer_Num and ORDER_NUM, the optimizer is still used in the above statement Sequential access path scans the entire table. Because this statement is to retrieve the collection of separate rows, it should be changed to the following statement: select * from Orders where customer_num = 104 and order_num> 1001 Union Select * from Orders where order_num = 1008 This can use the index path processing query. 4. Avoiding a column query of a column at the same time in the query in the inquiry and WHERE clause, then it is likely that the subquery must be re-query after the column value in the main query changes. The more nesting, the lower the efficiency, so you should try to avoid subquery. If the child query is inevitable, then filter out as much row as possible in the child query. 5. Avoid difficult forms of regular expressions Matches and Like keywords support wildcard matching, which is called regular expressions. But this match is particularly time consuming. For example: SELECT * from Customer WHERE ZIPCODE LIKE "98_ _ _" Even in this case, in this case, it is also possible to scan in order. If the statement is changed to SELECT * from customer where zipcode> "98000", you will use the index to query when you execute the query, obviously greatly improves the speed. In addition, it is necessary to avoid non-start substrings.

For example, the statement: select * from customer where zipcode [2,3]> "80", the non-start substring is used in the WHERE clause, so this statement does not use an index. 6. Use a temporary table to accelerate the query to sort a subset of the table and create a temporary table, sometimes accelerating queries. It helps to avoid multiple sorting operations and simplify the work of optimizer in other ways. For example: SELECT cust.name, rcvbles.balance, ...... other columns FROM cust, rcvbles WHERE cust.customer_id = rcvlbes.customer_id AND rcvblls.balance> 0 AND cust.postcode> "98000" ORDER BY cust.name If the query to Multiple times, more than once, you can find all unpaid customers in a temporary file and sort by the customer's name: Select Cust.Name, Rcvbles.balance, ... Other Column from Cust, RCVBLES WHERE cust.customer_id = rcvlbes.customer_id aND rcvblls.balance> 0 oRDER BY cust.name INTO TEMP cust_with_balance then the following manner in the temporary table query: SELECT * FROM cust_with_balance WHERE postcode> main line than "98000" in the temporary table There are fewer columns in the table, and the physical order is the desired order, reducing disk I / O, so query workload can be greatly reduced. Note: The primary table is not modified after the temporary table is created. When data is frequently modified in the primary table, be careful not to lose data. 7. Using sorting to replace non-sequential access non-sequential disk access is the slowest operation, manifested in the back-page movement of the disk access arms. The SQL statement hides this situation so that we can easily write a query to access a large number of non-sequential pages when writing applications. Sometimes, use the sort capability of the database to replace the sequential access to improve the query.

Example Analysis The following we will give an example of a manufacturing company to explain how to perform query optimization. 3 tables in the manufacturing company database, the mode as follows: 1. Table Part Number Part part  described  other column (part_num)  (part_desc)  (other column) 102,032Seageat 30G disk ... 500, 049Novel 10M Network Card  ...... 2. Watch Manufacturers vendor number  manufacturer name  other columns (vendor _num)  (vendor_name) (other column) 910,257Seageat Corp ...... 523,045                ... 3. PARVEN table parts             (vendor_num)   (vendor_num)   (part_amount) 102, 032910, 2573, 450,000 234, 423                                                                                            Part_num = parven.part_num and parven.vendor_num = vendor.vendor_num ORDER BY Part.part_num If the index is not built, the above query code will be very huge. To this end, we build an index on the part number and the vendor number. The establishment of indexes avoids repeated scans in nested. The statistics on the table and index are as follows: Table                                            10,000 vendor                                                                                the number of keys per page  number of pages (Indexes)  (key Size)  (keys / page)  (Leaf pages) part4500 20 Vendor45002 Parven825060 seems to be a relatively simple 3 tables, but its query overhead is very large. As can be seen by viewing the system table, there is a cluster index on Part_num and Vendor_NUM, so the index is stored in the physical order. The PARVEN table does not have a specific storage order. The large novels of these tables will be small from the success rate of unprecedented access from the buffer page.

转载请注明原文地址:https://www.9cbs.com/read-124057.html

9cbs

New Post(0)