Database design experience talk

zhaozj2021-02-16 67

A successful management system is composed of [50% of business 50% software], and 50% of successful software has [25% database 25% program], the database design is good and bad It is a key. If the business's data is more necessary than life, the design is the most important part of the application. There are also specialized testers in the materials and sweat recharge, and university degree courses. However, as we have repeatedly emphasized, good teachers are more than experienced teachings. So I summoned the curve and experience of the past years, and I found some professionals who have a lot of knowledge of the database design to give you some design database skills and experience. The 60 best techniques were selected, and these techniques were written into this article. In order to facilitate indexing of its content into 5 parts:

Part 1 - This part of these regulations before designing the database, including naming specifications and clarity business needs. Part 2 - Design Database Table Total 24 Guide Tips, covering the design of field design and common problems that should be avoided. Part 3 - How to select the key? Here are 10 techniques to specifically involve the correct usage of the system generated primary key, and how often and how to independent fields are available. Part 4 - Guaranteed data integrity Discussion How to keep the database clear and robust, how to reduce harmful data to a minimum. Part 5 - Various tips do not include other techniques in the four parts of the above four parts, the five flowers, with them hope that your database development work will be more easily. Part 1 - When designing the database, check that the existing environment is designed to design a new database, you should not only carefully study business needs and to examine existing systems. Most database projects are not established from the beginning; in general, there is always an existing system used to meet specific needs (possibly automatic calculations). Obviously, existing systems are not perfect, otherwise you don't have to build a new system. But research on the old system allows you to find some subtle problems that may ignore. In general, examining existing systems is absolutely beneficial to you. Defining the standard object naming specification must define naming specifications for database objects. For database tables, you have to determine the table name from the beginning of the project to use the plural or singular form. In addition, the alias for the table is a simple rule (for example, if the table name is a word, the alias will take the first 4 letters of the word; if the name is two words, the first two letters of the two words Composition 4 letters long alias; if the name of the table consists of 3 words, you may wish to take one of the first two words and then remove the two letters from the last word, and the result is a different alias, the rest. According to a secondary push), the table name can be added to the prefix Work_ to adopt the name of the application using the table. Column [Field] in the table is to use a set of design rules for the key. For example, if the key is a digital type, you can use _n as a suffix; if it is a character type, a _c suffix can be used. A standard prefix and suffix should be adopted for the column [field] name. For another example, if there is a lot of "Money" fields in your table, you may wish to add a _m suffix for each column [field]. Also, the date column [field] is best taken with D_ as the name.

Check the naming specification between the table name, the report name, and the query name. You may soon be confused by the names of these different database elements. If you insist on naming the different components of these databases, at least you should distinguish between these object names with Table, Query or Report et al.

If Microsoft Access is used, you can identify objects (such as TBL_EMPLOYEES) with QRY, RPT, TBL, and MOD. I also used TBL to index the text when I deal with SQL Server, but I used sp_company (now using sp_feft_) to identify stored procedures, because when I found a better handling method often saved several copies. I identify the function I have written with UDF_ (or similar tags) when implementing SQL Server 2000. If you want to do things, you must first use the ideal database design tool, such as: Sybase's PowerDesign, she supports PB, VB, Delphe and other languages, through ODBC, can connect more than 30 databases, including DBASE , FoxPro, VFP, SQL Server, etc., I will have a chance to introduce PowerDesign's use in the future. Getting a Data Mode Resource Manual You can read the Data Mode Resource Manual, which is written by Len Silverston, W. H. Inmon and Kent Graziano, is a book worthy of data modeling books. The book covers a variety of data fields, such as personnel, institutions, and work performance. Other you can also refer to: [1] Salay Wang Shan Database System Introduction (Second Edition) Higher Education Press 1991, [2] [US] Steven M.Bobrowski Oracle 7 and Customer / Server Computing Technology Getting started to Jingjing Yuan et al, 1996, [3] Zhou Zhongyuan information system modeling method (below) Electronic and Informatization 1999 No. 3, 1999 Imagine the future, but not forgetting the lesson of the past, I found to ask users How to look at the future demand variation is very useful. This can achieve two purposes: First, you can clearly understand which place should be more flexible and how to avoid performance bottlenecks; secondly, you know that users will be as surprised as you when there is no certain demand change in advance. . Be sure to remember the lessons of the past! Our developers should also help each other by sharing their experiences and experience. Even if the user thinks that they don't need any support, we should also conduct this education in this area, we have been facing this moment "I have to do this." Logical design is performed before the physical practice is in-depth physical design. As a large number of CASE tools continue to emerge, your design can also achieve considerable logic level, you can usually better understand the aspects of the database designs. Understanding your business is determined by you to determine the system from the customer's angle to meet the needs of your needs, in your ER (Entity Relationship) mode (why, you haven't shown it yet? Then please see skills 9) . Understanding your business business can save a lot of time in the future development phase. Once you have identified your business needs, you can make many decisions yourself.

Once you think you have clarified your business content, you'd better communicate with our customers. Adopt customer terms and explain to them what you think and you are heard. At the same time, it should also be used, and the system will be used to express the system's relational base. This way you can make your customers correct your own understanding and do the next ER design. Creating a Data Dictionary and ER chart must take a time to create an ER chart and a data dictionary. At least at least the data type of each field and the primary keys within each table. It is entirely necessary when creating the ER chart and the data dictionary but it is entirely necessary to understand other developers. The sooner creation can help avoid the possibility of being confusing in the future, so that anyone who knows the database can make data from the database.

There is a very importance of the latest document, such as ER chart, is emphasized, which is useful to indicate the relationship between the table, and the data dictionary illustrates the use of each field and any alias that may exist. This is exact essential for documentation for SQL expressions. Creating a pattern is better than a thousand words: developers are not only to read and implement it, but also use it to help themselves and user conversations. The pattern helps improve collaboration efficiency, so it is almost impossible to have a big problem in the initial database design. The pattern doesn't have to be very complicated; it can even be simple to write on a piece of paper. It is only necessary to ensure that the logical relationship on it can produce benefits in the future. From the input / output to define the database table and field requirements (input), first check the existing or designed reports, queries, and views (outputs) to determine which are the necessary tables and fields to support these outputs. For a simple example: If the customer needs a report to sort by postal coding, segmentation and summing, you have to ensure that a separate postal coding field is included without using the postal code into the address field. Report Tips To understand how users are usually reported: Batch or online submission report? The time interval is daily, weekly, monthly, every quarter or every year? You can also consider creating a summary table if needed. The primary key generated by the system is difficult to manage in the report. The user is retrieved with a sub-key within a table with a system generated primary key, often returns a number of repetitive data. Such retrieval performance is relatively low and it is easy to cause confusion. Understanding customer demand seems to be obvious, but demand is from customers (here you want to consider from internal and external customers). Don't rely on the needs written down, real needs in your head. You have to let customers explain their needs, and as the development continues, it is necessary to ask customers to ensure that their demand is still developed. A unchanging truth is: "Only I saw me know what I want" will cause a lot of rework, because the database does not meet the needs standards that customers have never written. What is even worse is that your explanation of them is only yourself, and it may be completely wrong. Part 2 - Design Table and Field Checking Various changes I will consider which data fields will change in the future when designing the database. For example, the surname is this (note is the surname of Westerners, such as women who have married after marriage). So, when establishing a system storage customer information, I tend to store the last name field in a separate data table, but also attach the starting day and terminating day, so that this data entry can be tracked. With a meaningful field name, I have participated in the development of a project, which has a program that is inherited from other programmers. The programmer likes to display the data indicator to name the field name in the screen, which is not good, but unfortunately, she I also like to use some strange nomenclasses, and their naming adopts the combination of Hungarian naming and control serial numbers, such as CBO1, TXT2, TXT2_B, and the like. Unless you are using a system that is only to your abbreviated field name, please clearly describe the fields as well as possible. Of course, don't do your head, such as Customer_Shipping_Address_Street_Line_1, although it is very illustrative, but no one is willing to type such a long name, the specific scale is in your grasp. Using a prefix naming If you have a lot of fields (such as firstname), you may wish to help you identify fields with a particular table prefix (such as cuslastname).

Time data should include the "Recent Update Date / Time" field. Time markers For the reasons for finding data issues, rescreasting / overloading data by date and clears the old data is particularly useful. Standardization and data-driven data is not only convenient, but also other people. For example, if your user interface is to access external data sources (files, XML documents, other databases, etc.), you might as well as store the corresponding connection and path information in the user interface support table. Also, if the user interface performs tasks such as workflow (send mail, print letter, modifying record status, etc.), then data generating workflow can also be stored in the database. The pre-arrangement will always work hard, but if these processes use data drivers rather than hard coding, then strategy changes and maintenance will be much easier. In fact, if the process is data-driven, you can push considerable responsibility to the user, by the user to maintain its workflow process. Standardization cannot live for those who are unfamiliar with standardization (Normalization), standardization can ensure that the fields are all most basic, and this measure will help eliminate data redundancy in the database. Standardization has several forms, but Third Normal Form (3NF) is often considered to have the best balance in performance, scalability, and data integrity. Simply, 3NF stipulates: Each value in the * can only be expressed once. * Each line in the table should be unique (with unique keys). * The non-key information dependent on other keys should not be stored within the table. Compliance with 3NF standards has the following features: There is a set of tables that store the associated data connected by the keys. For example, a 3NF database that stores customers and its related orders may have two tables: Customer and Order. The ORDER table does not contain any information for the order related to the customer, but a key value will be stored in the table, which points to the line of the customer information in the Customer table. Higher levels of standardization are also available, but it is better to be better? The answer is not necessarily. In fact, for some projects, even 3nf may introduce too high complexity to the database.

For efficiency, it is also necessary to standardize the table, which is also necessary. There have been a development of a food and beverage analysis software is to use a non-standardized table to reduce the query time from an average of 40 seconds to two seconds. Although I have to do this, I will never use the non-standardized design concept of the data sheet as the design concept. The specific operation is just a derived. So if you anticipate non-standardized tables, it is entirely possible. Microsoft Visual FoxPro Report Tips If you are using Microsoft Visual FoxPro, you can use the name of the user-friendly field name: such as using Customer Name instead of txtcnam. This way, when you use the wizard [Wizards, Taiwanese called 'elves'], its name will make those who are not programmers easier to read. Inactive or unused indicators add a field indicating whether the record is no longer active in the business. Whether it is a customer, anyone else is still otherwise, so that it can help to filter active or not active when running inquiry. At the same time, it also eliminates some of the problems faced when using data, for example, some records may no longer be used for them, and then delete it. Using role entity Definitions belonging to a category [field] When you need to define a specific category or a specific role, you can create a specific time-related relationship with a role entity, so you can achieve self-documentation. The meaning here is not to let the Person entity comes with a title field, but said why don't you describe the person with a Person entity and a person_type entity? For example, when John Smith, Engineer is upgraded to John Smith, Director and even finally climbed to John Smith, CIO's high, and all you have to do is changing the key value between the two tableson and person_type, increasing one Date / Time fields to know when the change occurs. In this way, your Person_Type table contains all Person possible types, such as Associate, Engineer, Director, CIO or CEO, etc. There is also an alternative to changing the Person record to reflect changes in the new title, but this cannot track the specific time of the position where you are in place. The easiest way to use common entity nomenclature data organization data is to adopt common names, such as: Person, Organization, Address, and Phone, etc. When you combine these commonly used general names or create a specific corresponding copile body, you get your own special versions. The main reason for adopting general terms at the beginning is that all specific users can be embodied in abstract things. With these abstract representations, you can use your own special names in the 2nd level, for example, Person may be Employee, Spouse, Patient, Client, Customer, Vendor or Teacher, etc. Similarly, Organization may also be Mycompany, MyDepartment, Competitor, Hospital, Warehouse, Government, etc. Final Address can be specifically Site, Location, Home, Work, Client, Vendor, Corporate, and FieldOffice. Using general abstract terms to identify "things" categories allow you to achieve huge flexibility in association with business requirements, while doing so can significantly reduce the redundancy required for data storage. When the user is designed to a database or other international characteristics, users must remember that most countries have different field formats, such as postal coding, etc., some countries, such as New Zealand, there is no postal code.

Data Repeat requires a discrete data table if you find yourself in repeated input data, create a new table and a new relationship. 3 useful fields * DRECORDCREATIONDATE, which should be added each table, is now () under the VB, and at the SQL Server silently think GetDate () * SRecordcreator, under SQL Server, NOT NULL DEFAULT USER * NRECORDVERSION , Recorded version tags; contribute to accurately explaining NULL data or loss data for the record, describing the street address for the address and telephone using multiple fields to describe the street address in shortness of the street address. Address_Line1, address_line2 and address_line3 can provide greater flexibility. Also, the phone number and email address are best to have its own data sheet, and it has its own type and tag category. Over-standardization can be careful, doing so may result in problems in performance. Although address and telephone tables can usually reach the best, if you need to access such information frequently, you may be more appropriate to store "preferred" information (such as Customer et al.) In its parent table. Compromise between non-standardized and accelerated access is certain. I feel very surprised to use multiple names fields, and many people leave a field in the database. I think that only developers who have just been introduced will do this, but this practice is actually very common online. I suggest that the last name and name are treated as two fields, and then combine them again when querying.

I am most commonly used to create a computational column [field] in the same table, which can automatically connect the standardized field so that it will change when the data changes. However, doing so when using modeling software. In summary, use the connection field to effectively isolate the user application and developer interface. Bewix the size of the object name and special characters in the past, one of the most annoyed things is the name of the object, such as CustomerData. This problem exists from Access to the Oracle database. I don't like to use this case-sensitive object naming method, and the result has to be manually modified. Think about it, can this database / application not to use a more powerful database? All uppercases and the names containing the names of the underwriting have better readability (Customer_Data), absolutely don't leave space between characters of the object name. Be careful to keep the word to ensure that your field name is not conflict with the retention word, database system or common access method, for example, there is a table in an ODBC connection in recent I have written, where DESC is used as a description field name. The consequences can imagine! DESC is the reserved word after the abbreviation of Descending. A SELECT * statement in the table is available, but I get it is a lot of information that is useless. Keep the field name and type of consistency must ensure consistency when named the field and specify the data type. If the field is called "agreement_number" in a table, you don't change the name to "REF1" in another table. If the data type is an integer in a table, you can turn it in another table. Remember, you have done your own lives, others have to use your database. Carefully select the digital type in SMALLINT and Tinyint types to be particularly careful, for example, if you want to see the total monthly sales, your total field type is Smallint, then if the total amount exceeds $ 32,767 You can't make a calculation. Deleting tags contains a "Delete Tag" field in the table, so you can mark the row as delete. Do not delete a row separately in the relational database; it is best to use clear data programs and carefully maintain index integral. Avoiding using trigger triggers can usually be implemented in other ways. The trigger may become interference when debugging the program. If you really need a trigger, you'd better focus on its documentation. Contains version mechanisms It is recommended that you introduce version control mechanisms in the database to determine the version of the database in use. This requires this requirement anyway. For a long time, the needs of users will always change. It may eventually be required to modify the database structure. Although you can determine the version of the database structure by checking new fields or claims, I found that the version information is not more convenient in the database? . Give text fields, pay the balance ID type text field, such as customer ID or order number, etc. should be set more than general imagination, because time is not long, you will be embarrassed because you want to add extra characters. For example, suppose your customer ID is 10 digits. Then you should set the length of the database table field to 12 or 13 characters. Is this a waste of space? There is a little, but there is so much that you imagine: A field extends 3 characters in 1 million records, plus a little index, but the entire database is more than 3MB of space. But this additional space can achieve the growth of database size without the need to refactor the entire database in the future. The number of ID cards from 15 is 18 bits is the best and most painful example. Column [Field] Naming Tips We found that if you use a unified prefix, you will be greatly simplified when writing SQL expressions. This does have something disadvantage, such as destroying the role of the automatic table connection tool, the latter links the public column [field], but even if these tools are sometimes not connected.

For a simple example, assume that there are two tables: Customer and Order. The prefix of the Customer table is CU_, so the subparaminated in the table is as follows: Cu_name_ID, Cu_Initials, and Cu_Address, etc. The prefix of the ORDER table is OR_, so the subparade name is: or_order_id, or_cust_name_id, or_quantity, and or_description, etc. So the SQL statement that is selected from the database is written as follows: Select * from customer, order where curname = "myName"; and co_name_id = or_cust_name_id and or_quantity = 1 is written in the case where there is no such prefix ( Use alias to distinguish it): select * from customer.surname = "myname"; and customer.name_id = Order.cust_name_id and order.quantity = 1 The first SQL statement is not type how much characters. But if the query involves 5 tables or more columns [field] you know how this skill is used. Part 3 - Selecting the key and index data mining to pre-plan your customer department once to deal with more than 80,000 contact information, and fill in the necessary data for each customer (this is definitely not small). I have to identify a group of customers as a market goal. When I started the design table and field, I tried to add too many fields in the main index to speed up the running speed of the database. Then I realize that specific group queries and information mining are neither accurate speed. The results have to rebuild in the main index and incorporate data fields. I found that there is a key to the key - when I want to create a system type lookup, why should I use a number as the main index field? I can retrieve it with a fax number, but it is not important to me like the system type. Using the latter as the primary field, re-indexing and retrieval after the database is updated.

The data index in the two environmental data warels (ODS) and data warehouses (DW) is different. In the DW environment, you have to consider how the sales department organizes sales activities. They are not database administrators, but they determine the key information in the table. The designer or database staff should analyze the database structure to determine the best conditions between performance and correct output. Use the primary key of the system to generate this same skill 1, but I think it is necessary to repeat everyone here. If you always use the system generated key as the primary key when you design the database, you actually control the index integrity of the database. In this way, the database and non-manual mechanisms effectively control the access to each line of storage data. It is also advantageous to use the system generated key as the primary key: When you have a consistent key structure, it is easy to find logical defects. The decomposition field is used to index in order to separate the named field and contain a field to support user-defined reports, consider the decomposition of other fields (or even primary keys) to make it a component so that the user can index it. The index will speed up the execution speed of SQL and report generator scripts. For example, I usually create a report if you have to use the SQL Like expression, because the Case Number field cannot be broken down into such elements such as Year, Serial Number, Case Type, and Defendant Code. Performance will also become. If the annual and type fields can be broken down into index fields, then these reports will run more. Key Design 4 Principles * Create foreign keys for associated fields. * All keys must be unique. * Avoid using a compound key. * Foreign key is always associated with unique key fields. Don't forget that the index index is one of the most efficient ways to get data from the database. 95% of database performance issues can be resolved using indexing techniques. As a rule, I usually use a unique non-group index for the system key (as a stored procedure) for the logical primary key (as a stored procedure), using a non-group index for any foreign key column [field]. However, the index is like a salt, too much dish is salty. Do you have to consider how big the database is space, how to access, and whether these accesss are mainly used as read and write. Most databases index automatically created primary key fields, but don't forget the index foreign key, they are often used frequently, such as running a query display a record of the primary table and all associated formats. Also, don't index the Memo / Note field, don't index large fields (there are a lot of characters), so that the index will take much storage space. Don't index common small tables Don't set any keys for small data tables, if they often have this insert and delete operations. Index maintenance for these insertions and deletions may consume more time than scanning tables. Don't use the SMS or ID number (ID) to choose a key to use the SSN or ID as the database. In addition to privacy reasons, the Government is increasingly tending to use other purposes, SSN or IDs need to be manually entered. Never use the manual input key as the primary key, because once you enter an error, you can do it is to delete the entire record and start from the beginning.

When I cracked the procedure of others, I saw that many people had used the SSN or ID to be used, and of course, although it is illegal. And people also know that this is illegal, but they are used to it. Later, with the increase of stealing cases, my current peers are painful from a large stall data from a large stall data. Don't use the user's key to determine what field as the key of the table, you can be careful that the user will be edited. In general, do not select the user-edit field as a key. Doing so will force you to take the following two measures: * Implement a restriction on the behavior of the user editing field after creating a record. If you do this, you may find that your application suddenly changes in business needs, and users need to edit those informable fields lack sufficient flexibility. What do they think so when the user finds the system after entering the data until the record is saved? Delete rebuild? If the record cannot be rebuilt? * Propose some methods for detecting and correcting key conflicts. Usually, the cost of payment is done, but from the performance of the performance, it is relatively large. Also, the correction of the key may force you to break through your data and the isolation between the commercial / user interface. So still override an old saying: your design should adapt to the user rather than letting users adapt to your design. The reason why the primary key is not updated is that the primary key implements the association between different tables in relational mode. For example, the Customer Table has a primary key Customerid, while the customer's order is stored in another table. The primary key of the Order table may be a combination of ORDERNO or ORDERNO, Customerid, and date. No matter which key settings you choose, you need to store Customerid in the Order table to ensure that you can find its order records for users. If you modify CustomerID in the Customer table, you must find all relevant records in the Order table to modify it. Otherwise, some orders will not belong to any customer - the integrity of the database is finished. If the index integrity rule is applied to the table level, it is almost impossible to change the records of a record and all associated records in the database without writing a large number of code and additional deletion records. And this process is often incorrect, so it should be avoided as much as possible. Optional button (candidate key) can sometimes make a primary key to remember, and query data is not a machine but a person. If you have an optional button, you may further use it for primary key. In that case, you have the ability to build powerful indexes. This prevents those who use the database from connecting to the database to properly filter data. This load is more awake on the database of strict control domain tables. If the option is really useful, it is the level of the primary key. My opinion is that if you have optional keys, such as State_code in the country, you don't create subsequent keys on the unique key of the existing unable to change. You have to do more than creating a worthless data. If you build this form of association because of the subsequent keys [alias] of excessive table [alias], the operation load really needs to be considered. Don't forget that most database indexes automatically created primary key fields. But don't forget the index outer key field, they will be used each time when you want to query the record and its associated records. Also, don't index the Memo / Notes field and don't index large text fields (many characters), which will make your index occupy a lot of database space. Part 4 - Guaranteed the integrity of the data rather than the business rules forced data integrity If you handle the requirements according to the business rules, then you should check the business level / user interface: If the business rules have changed, then only need to update I.e. If the demand stems from the need for maintenance data integrity, it is necessary to apply a restriction condition on the database level. If you really use constraints in the data layer, you have to guarantee that there is a way to notify the user interface using the user's understanding of the language that cannot be understood by the cause of constraints. Unless your field name is very lengthy, the field name is not enough.

As long as it is possible, use the database system to implement the integrity of the data. This includes not only the integrity of standardization and also includes functionality of data. You can also increase the trigger to ensure the correctness of the data when writing data. Do not rely on the business layer to ensure data integrity; it does not guarantee the integrity of the table (foreign bond), so it is impossible to impose on other integrity rules. Distributed Data System For distributed systems, you should estimate the amount of data in the next 5 or 10 years before you decide whether all data is replicated in each site or save data in a place. When you transfer the data to other sites, it is best to set some tags in the Database field. New your tags after you receive your data at the destination site. To make this data transmission, write down your own batch or scheduler running at a specific time interval without transferring data after every day. Local copy your maintenance data, such as calculation constant and interest rate, etc., setting the version number to ensure that data is exactly the same at each site. Forced indication integrity (reference integrity?) There is no good way to cancel it after harmful data into the database, so you should reject it before it enters the database. Activate the integrity features of the database system. This allows the data to be cleaned to force the developer to put more time handling error conditions. Relationship If there is a multi-pair relationship between the two entities, it is also possible to translate into a multi-relationship, then you'd better set to multiple-to-many relationships at the beginning. From the existing multi-to-one, there are many more relationships than a lot of relationships than one. Adopt view In order to provide another layer of abstraction between your database and your application code, you can establish a special view for your application without having to access the data table directly. This is also equal to providing more freedom when processing database changes. Give data-owned and resume development plan considerations of data retain policies and include your data recovery process in advance. It is possible to ensure that data identification of data dictionary to the user / developer can also ensure documentation on data source. Write an online update to "Update Query" for later data loss can be re-process updates. After using the storage process, let the system solve many troubles to generate a highly integrity database solution, I decided to package some function groups of associated tables, provide a set of regular stored procedures to acceise the speed and simplification Development of customer program code. Database is not just a place where data is stored, it also simplifies the encoding. The best way to use the search control data integrity is to limit the user's choice. As long as it is possible to provide a clear value list for users to choose from. This will reduce the incorrect and mishand of the type of code to provide data consistency. Some public data is especially suitable for finding: national code, status code, etc. Part 5 - Various small techniques documents, documents, documents to prepare documents for all shortcuts, naming specifications, restrictions, and functions. Use database tools that are added to the table, column [field], triggers. Yes, this is a little effort, but in the long run, this is very useful for development, support, and tracking modifications.

转载请注明原文地址:https://www.9cbs.com/read-21451.html

9cbs

New Post(0)