============================================================================================================================================================================================================= =========
Author: xsb (http://xsb.itpub.net)
Published in: 2005.11.17 14:41
Category: Oracle
Source: http://xsb.itpub.net/post/419/46079
-------------------------------------------------- -------------
Talking about the reverse specification in database design: [ZT]
The first paradigm: For each row in the table, there must be only a unique value. Each column in one line is only a unique value and has atomicity.
Second paradigm: The second paradigm requires non-main key columns to be a subset of primary keys, and non-primary key column must completely rely on the entire primary key. The primary key must have a unique element, and a primary key can consist of one or more columns that make up a unique value. Once created, the primary key cannot be changed, the foreign key is associated with the primary key of a table. The primary key association means a pair of relationships.
Third paradigm: The third paradigm requires non-main key columns to not depend on each other.
Fourth paradigm: The fourth paradigm is prohibited from main key columns and non-main key columns, unconstrained
Fifth paradigm: The fifth paradigm divides the table into small pieces as small as possible, in order to exclude all redundancy in the table.
This paper introduces several commonly used anti-conventional methods in database design, and has been more in-depth discussion on the advantages and disadvantages of these methods and the precautions for the advantages and disadvantages of these methods. Keywords: Database design reversal improvement performance 2. Database design briefing database design is the process of converting real-world commercial models and demand into a database model, which is a core issue of establishing a database application system. The key to the design is how to make the design database can reasonably store the user's data, so that the user performs data processing. The database design is completely human problem, not the database management system. No designs are good or bad, still running. Database design should work with users with the database administrator and system analyst to understand the requirements of each user, and jointly make an appropriate, complete design for the entire database. Both the performance and tuning of the database and its application are based on a good database design. The data of the database is the basis of all operations. If the database design is not good, all other tuning methods improve the performance of the database performance is limited. of. Standardization of data 2.1. Paradigm Overview Normalization Theory is to study how to transform a bad relational pattern into a good relationship model, standardized theory is built around paradigm. Standardization Theory believes that all relations in a relational database should meet certain specifications (constraints). The standardization theory divides the norms required by the relationship into several levels, satisfying the minimum requirements, called the first paradigm (1nf), proposed a second paradigm (2nF), in the second paradigm The third paradigm (3NF) was proposed, and the BCNF paradigm, 4nf, 5nf were proposed later. The higher the level of the paradigm, the more strict constraints that should be met. Each level of the specification relies on its previous level, for example, if a relationship mode satisfies 2nF, it must satisfy 1NF. Below we will only introduce 1NF, 2NF, 3NF paradigm. 2.2. 1nf 1nf is the minimum requirement of the relationship model. Its rules are: Each column must be atom, and cannot be divided into multiple sublining. The position of each row and column can only have a value. Can't have a variety of columns. Example: If a student is required, a student can choose a multi-door class, then the "student" table below does not satisfy 1NF: Student (S-NO, S-Name, Class-no) where: S-NO is a student number S-name is the courses for the student name, class-no as the course number. Because a student can choose a multi-course, there are multiple values, so the empty does not meet 1NF. Standardization is to divide it into two tables: "Student" table and "selection" table, these two tables have satisfied 1NF. Student (S-NO, S-NAME) stu-class (s-no, class-no) 2.3. 2nf For tables that meet 2nf, in addition to satisfying 1NF, the non-primary column must depend on all principles, and Not part of the combination master code. If there is only one column that satisfies 1NF table, it automatically satisfies 2nf. Example: The following "selection" table does not match 2nf. STU-Class (S-NO, Class-NO, Class-Name) where: Class-Name is the course name. Since the main code of the word table is: (S-NO, Class-NO), the non-primary code column Class-Name depends on a part of the Class-NO of the combined master code, so it does not conform to 2nF.
The standardization of the table is also broken down into two tables: "Elective" table and "course" table, they all meet 2nF. Stu-Class (S-NO, Class-NO) Class (Class-NO, Class-Name) 2.4. 3NF 3NF rule is that any non-master code cannot rely on other non-homologues in addition to 2NF. Example: The following "Course" table does not match 3NF. Class (Class-No, Class-Name, Teacher-no, Teacher-Name) where Teacher-No is a teacher number, Teacher-name is the name of the teacher. Because non-master-name TEACHER-NAME depends on another non-primary code column Teacher-NO, it does not conform to 3NF. Its solution also breaks down into two tables: "Course" table and "teacher" table, they are all satisfied with 3nf. Class (Class-NO, Class-Name, Teacher-No) Teacher (Teacher-no, Teacher-Name) 2.5. Small junction When a table is specified, its non-regular column depends on the primary code column. From the perspective of a relational model, the table satisfies 3NF is the most standardized standard, so that it is easy to maintain. A completely standardized design does not always generate optimal performance, so it is usually designed to be designed in 3nf, if there is performance problem, then resolved by reverse specifications. The advantage of data specification in the database is to reduce data redundancy, save storage space, corresponding logic and physical I / O number of I / O times, and speed up, delete, change the speed, but for full-specific database queries, usually More connection operations are required to affect the speed of the query. Therefore, in order to improve the performance of certain queries or applications, it will destroy the specification rules, which is reverse specification.
What is the third paradigm is the basic theory of database logic model design, and a relational model can be non-destructive from the first paradigm to the fifth paradigm, which is also called normalize. In the model design of the data warehouse, the third paradigm is generally generally used, which has a very strict mathematical definition. If a relationship that meets the third paradigm must have the following three conditions: 1. The value of each attribute is unique, not polymity; 2. Each non-primary attribute must be completely dependent on the whole Main key, not part of the main key; 3. Each non-primary property cannot rely on the properties in other relationships, because this property should be returned to other relationships. We can see that the definition of the third paradigm is basically made around the relationship between primary keys and non-primary properties. If only the first condition is only satisfied, the first paradigm is called; if the front two conditions are met, the second paradigm is called the second paradigm, and so on. Therefore, the paradigm of each stage is backward compatible. 3. Data reverse specification 3.1 If the benefits of the anti-specification are standardized, the better it is, better as needed, because "separation", the more the relationship, the relationship is too much, the more frequent connection operations, the more frequent connection operations The connection operation is the most time, especially for database applications based on query, frequent connections affect the query speed. Therefore, the relationship is sometimes intentionally retains non-standardized, or after normalization, it is usually in order to improve performance. For example, "Account" table B-TB01 in the account system, its column busi-balance (total balance of enterprise accounts) violates the specification, where the value can be obtained by the following query: SELECT Busi-Code, SUM (ACC-BALANCE From b-tb06 group by busi-code If there is no column in B-TB01, if you want to get the total balance of busi-name, enterprise account, you need to make a connection operation: SELECT Busi-name, SUM (ACC -Balance) from B-TB01, B-TB06 WHERE B-TB01.BUSI-CODE = B-TB06.Busi-code group by busi-code If this kind of query is often used, it is necessary to join the column in B-TB01. Busi-balance, the corresponding price is the need to create an increase, delete, and changed trigger on Table B-TB06 to maintain the value of the BUSI-Balance column on the B-TB01 table. Similar situations often occur in decision support systems. The benefit of the anti-specification is to reduce the demand of the connection operation, reduce the number of external code and index, and may reduce the number of tables, and the corresponding problems may have data integrity issues. Accelerate the query speed, but will reduce the speed of modification. Therefore, when deciding to do anti-specification, we must weigh the spade, carefully analyze the application's data access needs and actual performance characteristics, and good indexes and other methods can often solve performance problems without having to use reverse specifications. 3.2. Commonly used reflection techniques Before performing anti-conventional operation, the commonly considered data access needs, the size of the commonly used tables, some special calculations (such as total), physical storage locations of the data, and the like. Commonly used reverse techniques have increased redundant columns, increasing the registry, re-group and division tables. 3.2.1. Adding redundant columns to increase redundancy columns are the same columns in multiple tables, which are often used to avoid connection operations when queries.
For example, in the previous example, if you often retrieve a class teacher name, you need to make a Class and Teacher table connection inquiry: Select Class-name, Teacher-name from class, teacher where class.teacher-no = teacher.teacher- No such thing in the Class table, adding a column of Teacher-Name without connection operation. Increasing redundant columns can avoid connection operations when queries, but it requires more disk space while increasing the workload of the table maintenance. 3.2.2. Increasing the gathering of the acquired column from the increased column from other tables, calculates the generated by them. Its role is to reduce connection operations when queries, avoid using set functions. For example, the column BUSI-Balance in Table B-TB01 in the account system mentioned earlier is a derived column. The derived column also has the same shortcomings as redundant columns. 3.2.3. Resets Reset Reset File If many users need to view the result data connected to the two tables, the two tables re-constitute a table to reduce the connection and improve performance. For example, users often need to view the course number, the course name, the teacher number, the teacher's name, and the table Class (Class-NO, Class-Name, Teacher-NO) and Table Teacher (Teacher-No, Teacher-Name) ) Combined into a table Class (Class-NO, Class-Name, Teacher-No, Teacher-Name). This improves performance, but requires more disk space, while also losing the independence of data in concept. 3.2.4. Segmentation table Sometimes segmentation of the table can improve performance. There are two ways to split: 1 Horizontal segmentation: Put the data line in two separate tables according to the value of one or more column data. Horizontal segmentation is usually used in the case: a table is very large, and the number of pages that need to be read when the query needs to be read, and the number of layers of the index is also reduced, and the query speed can be improved. The data in the B table has independence, such as data or data of data or different periods of each region, particularly some data, and other data are not commonly used in the table. C The data needs to be stored on multiple media. For example, a regulatory table LAW can be divided into two tables Active-Law and Inactive-Law. The contents of the Activea-Authors table are regulations that are effective, and the INACTIVE-LAW is not often inquired. Horizontal segmentation increases complexity to the application, which usually requires multiple table names when queries, and query all data needs UNION operations. In many database applications, this complexity will exceed the advantages it brings, as only the index key is not large, the table increases two to three times the amount of data in the table when the index is used for queries, and it will increase the reading when the query is added. An index layer of disk number. 2 Vertical segmentation: Put the master and some columns to a table, then put the master code and another column to another table. If some listed in a table, and other columns are not common, vertical segmentation can be used, and the vertical segmentation allows the data line to be smaller, and a data page can store more data, and reduce I when the query is / O number of times. Its disadvantage is that you need to manage redundant columns, query all data requires JOIN operation. 4. Reverse specification technology requires maintenance of data, regardless of the anti-specification technique of use, requires certain management to maintain data integrity, and common methods are batch maintenance, application logic and triggers.