Abstract
From the perspective of improving the performance of the database, this paper introduces several commonly used anti-conventional methods in database design, and has made more in-depth discussion on these methods and the precautions in use.
Key words: database design reversal improvement performance
2. Database design brief description
The database design is the process of converting the real world business model with the needs of the needs into a database of a database, which is a core issue of establishing a database application system. The key to the design is how to make the design database can reasonably store the user's data, so that the user performs data processing.
The database design is completely human problem, not the database management system. No designs are good or bad, still running. Database design should work with users with the database administrator and system analyst to understand the requirements of each user, and jointly make an appropriate, complete design for the entire database.
Both the performance and tuning of the database and its application are based on a good database design. The data of the database is the basis of all operations. If the database design is not good, all other tuning methods improve the performance of the database performance is limited. of.
Normalization of data
2.1. Paradigm Overview
The standardization theory is to study how to transform a bad relationship model into a good relationship model, the standardization theory is established around paradigm. Standardization Theory believes that all relations in a relational database should meet certain specifications (constraints). The standardization theory divides the norms required by the relationship into several levels, satisfying the minimum requirements, called the first paradigm (1nf), proposed a second paradigm (2nF), in the second paradigm The third paradigm (3NF) was proposed, and the BCNF paradigm, 4nf, 5nf were proposed later. The higher the level of the paradigm, the more strict constraints that should be met. Each level of the specification relies on its previous level, for example, if a relationship mode satisfies 2nF, it must satisfy 1NF. Below we will only introduce 1NF, 2NF, 3NF paradigm.
2.2. 1nf
1NF is the minimum requirement of the relationship model, and its rules are:
Each column must be atomic and cannot be divided into multiple sub-columns.
The position of each row and column can only have a value.
Can't have a variety of columns.
Example: If a student is required, a student can choose a multi-door class, then the "student" table below is not satisfied with 1NF: Student (S-NO, S-Name, Class-no)
Among them: S-NO is a student number, S-name is the name of the student, Class-NO is the course number. Because a student can choose a multi-course, there are multiple values, so the empty does not meet 1NF.
Standardization is to divide it into two tables: "Student" table and "selection" table, these two tables have satisfied 1NF.
Student (S-NO, S-NAME)
STU-Class (S-NO, Class-No)
2.3. 2nf
For tables that meet 2nf, in addition to satisfying 1NF, the non-primary column must depend on all principles instead of a part of the combination master code. If there is only one column that satisfies 1NF table, it automatically satisfies 2nf. Example: The following "selection" table does not match 2nf.
Stu-Class (S-NO, Class-NO, Class-Name)
Where: class-name is the course name. Since the main code of the word table is: (S-NO, Class-NO), the non-primary code column Class-Name depends on a part of the Class-NO of the combined master code, so it does not conform to 2nF.
The standardization of the table is also broken down into two tables: "Elective" table and "course" table, they all meet 2nF. STU-Class (S-NO, Class-No)
Class (Class-NO, Class-Name)
2.4. 3nf
The rules of 3NF are to satisfy the 2NF, any non-master code cannot rely on other non-homologues. Example: The following "Course" table does not match 3NF.
Class (Class-NO, Class-Name, Teacher-no, Teacher-Name)
Among them: Teacher-No is a teacher number, Teacher-Name is the name of the teacher. Because non-master-name TEACHER-NAME depends on another non-primary code column Teacher-NO, it does not conform to 3NF. Its solution also breaks down into two tables: "Course" table and "teacher" table, they are all satisfied with 3nf.
Class (Class-NO, Class-Name, Teacher-No)
Teacher (Teacher-No, Teacher-Name)
2.5. Small knot
When a table is specified, its non-regular column depends on the primary code column. From the perspective of a relational model, the table satisfies 3NF is the most standardized standard, so that it is easy to maintain. A completely standardized design does not always generate optimal performance, so it is usually designed to be designed in 3nf, if there is performance problem, then resolved by reverse specifications.
The advantage of data specification in the database is to reduce data redundancy, save storage space, corresponding logic and physical I / O number of I / O times, and speed up, delete, change the speed, but for full-specific database queries, usually More connection operations are required to affect the speed of the query. Therefore, in order to improve the performance of certain queries or applications, it will destroy the specification rules, which is reverse specification.
3. Data anti-specification
3.1. Benefits
Is it possible to normalize the level, better? This should be decided as needed, the deeper "separation", the more the relationship is generated, the relationship is too much, the more frequent connection operation, and the connection operation is the most time, especially The query-based database application, frequent connections affect the query speed. Therefore, the relationship is sometimes intentionally retains non-standardized, or after normalization, it is usually in order to improve performance. For example, "Account" table B-TB01 in the account system, its column Busi-Balance (the total balance of the enterprise account) violates the specification, where the value can be obtained by the following query:
Select Busi-Code, SUM (Acc-Balance)
From b-tb06
Group by busi-code
If there is no column in B-TB01, if you want to get the BUSI-NAME (enterprise name), the total balance of the enterprise account, you need to do connection operations:
Select Busi-Name, SUM (Acc-Balance)
From B-TB01, B-TB06
Where b-tb01.busi-code = b-TB06.BUSI-CODE
Group by busi-code
If this query is often made, it is necessary to add column busi-balance in B-TB01. The corresponding price must be created, deleted, and changed on Table B-TB06 to maintain the B-TB01 table. The value of the busi-balance column. Similar situations often occur in decision support systems. The benefit of the anti-specification is to reduce the demand of the connection operation, reduce the number of external code and index, and may reduce the number of tables, and the corresponding problems may have data integrity issues. Accelerate the query speed, but will reduce the speed of modification. Therefore, when deciding to do anti-specification, we must weigh the spade, carefully analyze the application's data access needs and actual performance characteristics, and good indexes and other methods can often solve performance problems without having to use reverse specifications.
3.2. Commonly used anti-conventional technology
Before performing the reverse modulation operation, we must fully consider the access requirements of the data, the size of the commonly used table, some special calculations (eg, total), physical storage locations of the data, and the like. Commonly used reverse techniques have increased redundant columns, increasing the registry, re-group and division tables.
3.2.1. Adding redundant columns
Increasing redundant columns are the same columns in multiple tables, which are often used to avoid connection operations when queries. For example, in the previous example, if you often retrieve a class teacher name, you need to do connection inquiry in Class and Teacher Table:
Select Class-Name, Teacher-Name
From Class, Teacher
Where class.teacher-no = teacher.teacher-no
In this case, you can add a column of Teacher-Name in the Class table that does not need to be connected.
Increasing redundant columns can avoid connection operations when queries, but it requires more disk space while increasing the workload of the table maintenance.
3.2.2. Increase the column
Increasing the increased columns from other tables from other tables, which are calculated by them. Its role is to reduce connection operations when queries, avoid using set functions. For example, the column BUSI-Balance in Table B-TB01 in the account system mentioned earlier is a derived column. The derived column also has the same shortcomings as redundant columns.
3.2.3. Resets
Resettings Find finger If many users need to view the result data connected to the two tables, the two tables re-constitute a table to reduce the connection and improve performance. For example, users often need to view the course number, the course name, the teacher number, the teacher's name, and the table Class (Class-NO, Class-Name, Teacher-NO) and Table Teacher (Teacher-No, Teacher-Name) ) Combined into a table Class (Class-NO, Class-Name, Teacher-No, Teacher-Name). This improves performance, but requires more disk space, while also losing the independence of data in concept.
3.2.4. Segmentation Table
Sometimes segmentation of the table can improve performance. There are two ways to split:
1 Horizontal segmentation: Place the data line in two separate tables according to the value of one or more column data. Horizontal segmentation is usually used in the case: a table is very large, and the number of pages that need to be read when the query needs to be read, and the number of layers of the index is also reduced, and the query speed can be improved. The data in the B table has independence, such as data or data of data or different periods of each region, particularly some data, and other data are not commonly used in the table. C The data needs to be stored on multiple media. For example, a regulatory table LAW can be divided into two tables Active-Law and Inactive-Law. The contents of the Activea-Authors table are regulations that are effective, and the INACTIVE-LAW is not often inquired. Horizontal segmentation increases complexity to the application, which usually requires multiple table names when queries, and query all data needs UNION operations. In many database applications, this complexity will exceed the advantages it brings, as only the index key is not large, the table increases two to three times the amount of data in the table when the index is used for queries, and it will increase the reading when the query is added. An index layer of disk number. 2 Vertical segmentation: Put the master and some columns to a table, then put the master code and another column to another table. If some listed in a table, and other columns are not common, vertical segmentation can be used, and the vertical segmentation allows the data line to be smaller, and a data page can store more data, and reduce I when the query is / O number of times. Its disadvantage is that you need to manage redundant columns, query all data requires JOIN operation.
4. Reverse specification technology requires maintenance of data integrity
Regardless of the anti-specification technology, certain management needs to maintain data integrity, and the common method is batch maintenance, application logic and trigger. Batch maintenance refers to a certain time for the modification of the copy column or the column, run a batch of processing jobs or stored procedures to modify replication or derived columns, which can only be used in the case where real-time requirements are not high. The integrity of data can also be implemented by application logic, which requires an increase, deletion, and change of all tables involved in the same transaction. Using the application logic to achieve the integrity risk of data, because the same logic must be used and maintained in all applications, it is easy to miss, especially when demand changes, it is not easy to maintain. Another way is to use triggers, and immediately trigger the corresponding modification of the replica or derived column for any modification of the data. The trigger is real-time, and the corresponding processing logic only appears in one place, easy to maintain. In general, it is the best way to solve such problems.
5. Conclusion
The rectification design of the database can improve query performance. Commonly used reverse techniques have increased redundant columns, increasing the registry, re-group and division tables. However, anti-specification technology requires maintenance of data integrity. Therefore, when doing reverse specifications, we must weigh the outcome, carefully analyze the data access requirements and the actual performance characteristics of the application.