Database standardization three paradigm application instances

xiaoxiao2021-03-06  84

Why is normalization? There are currently many databases that have not been normalized due to various reasons. Some of these reasons are explained herein, and use different forms of paradigm (Normal Form) to standardize the claims form of insurance companies. The changes in the table during this process and some additional additional agents added are higher, and less error is less, it is easier to maintain. The database standardization is the structure of the optimization table and the practice of data organizations to the table, so data can be more clear. Standardization allows you to change business rules, demand, and data without re-constructing the entire system. By changing the storage data - only changing a dock - and changing the program that accesss this information, you can eliminate the opportunities of many errors or spam, and reduce the amount of work necessary to update the information. One problem in the company's reality can be summarized in a sentence "We generally do this." We are generally stored in that way; we generally allow people to write any information into ; we generally use that way to program. This is usually a bad thing, especially for young and learning companies. However, when there is a new system and a better way of completing the task, sometimes "use that way to do very well" this sentence may need to re-explore and modify it. Standardization data is one of the useful ways of the company often adopt. Although data is stored in a flat file in a flat file for a Cobol program (such as file layout familiar with any COBOL programmer), it is very similar to the storage in the relational database, but the method stored in a flat file is not The necessary way to complete the task, especially since you don't understand the differences between the two or the fear changes, simply bring the past concept into the present way. Note: Dictionary.com is the definition of normalization: "Make it standard, especially caused it to meet certain standards or specifications." Or "Force forced to accept". Webopedia believes that the standardization is "in the relational database design, organizational data to minimize redundant process. Standardization typically involves dividing a database into two or more tables and defining a table. The target is quarantined, so adding , Delete and modify a field only needs to be performed in one table, then it can be passed to the remaining table in the database by defined relationship. " I prefer this definition. Terminology Before you understand an example of an insurance company in the real world, you need to know the terminology that will be used in the discussion. When processing the database, especially when processing normalization problems, a new keyword talked below: • Relation: From essentially, the relationship is a two-dimensional containing line and column Table or array. • Relationship: Association is a method of contacting each other between different tables. At the same time, there is also a basic core issue between database specification between data items that form different entities. There are three basic types of data associations, which is important to understand: one-on-one (1: 1): One-on-one relationship means that any given per (not most) is strictly closely related to another An instance of an entity corresponds to. Everyone only has a correct fingerprint is unique. Each phone number is accurate with a payment of independent private customer (not a company). Everyone in the United States has only one social security number. One-to-many (1: m): A pair of conleviation means an example of a given entity can be associated with zero instances, one instance or multiple instances of another entity. Everyone may have no children, there is a child or a small child. Everyone may have no car, there is a car or multiple cars.

Multi-to-many (m: N): Multi-to-many associations (zero, one or more instances of a given entity, zero one or more instances associated with another entity) is a complex association with a direct simulation. It is often broken down into a plurality of 1: m. Since multiple families are mixed together, one or more children may have no parents (orphans), a parent (single family), more than a parent (two parents who are still together or divorce, or divorce Parents). House or property can be transferred to a person or more people, and these people (one or more) may be one or more houses or property on the will. • Attribute: The property is considered to be a modified feature or feature of certain components in the program or database, which can be set to a different value or column in the relationship or table. · TUPLE: TUPLE is a set of value or value attributes in the relational database or non-relational database: one line in the relationship. · Delete exception: Delete an exception refers to data contradictions or unpredictable data (information) due to other data intentional deletion. • Insert anomalies: Insert anomalies refer to that there is no capacity to add information to the database due to the lack of data. · Update exception: Update an exception refers to data contradictions caused by incomplete updates of data redundancy or redundant data. · Decomposition of the relationship: Decomposition of the relationship refers to a relationship into multiple relationships, so that the relationship meets a higher paradigm. · Data redundancy: Data redundancy refers to data repetition in database in database. · Data Integrity: Data Integrity Index Indexer Data Consistency. Ensure that data integrity is important, only this user knows that their dependencies are correct, the results of their query and the program are precise and conforming to expectations. · Atomic value: Atomic value is a value, it is neither a set of values ​​that can be further split, nor a repetitive group. Each column has a complete value, but only one value - this value cannot be decomposed into multiple parts, it is either used by the database, or the user accessible by the database is used. · Reference integrity rules: Reference integrity rules refers to the value stored in non-empty external health, must be a critical data item in some relationship. · External health: External health is a set of properties in a relationship (one or more columns), which is also the primary key in some (the same or other) relationship. It is a logical link between relations. Refer to the outside of your own relationship to recurrent outside health. · Feature-dependent: Feature reliance means that the value of a certain attribute is determined by the value of another attribute in the line. This usually occurs between primary keys (unique information clips that makes a row) and other information on the line. The combination of cities and states depends on the Zip (post) code, even if there are many ZIP code in a given state to associate with a city. Each legitimate personnel in the United States relies on his social security number. · Decisive: Features dependent on the left attribute determines the value of other attributes in the line (Zip code determines the city and state; the social security number determines the identity of the person; the license number and the state determine the owner of the car). · Entity Integrity Rules: Entity Integrity Rules You may be empty (if you have a ZIP code in a city; if you have a car, there is a license number). • Constraint: Constraint is a rule that defines values ​​in the database. The phone number must be a number; the number of dollars must be a number; State must be legitimate state or province; Country must be a legitimate country; the date cannot be February 31. Now you have known a lot of related terms, we can see the meaning of the standards of the relevant terms.

The following example is not a typical employee-manager-department example, nor a student - professor - course provides an example. I will demonstrate a hypothetical insurance company database. The table in the database is much more complicated than this example, but it is similar to people. Figure 1 shows the non-standardized definition of the Claims (CLAIM) table. Although the table in an insurance company's database is much more than it, these tables provide some background, and we can see normalization and their branches through it. Remember that the examples in each chapter have only partial columns, which simplifies examples and makes you easily see changes that have changed. CLAIM_NUM, OCCURANCE_NUM, CLAIM_STATUS, ACCDNT_YR, ACCDNT_DT, REPORTED_DT, ENTERED_DT, CLAIM_DT1, CLAIM_DT2, CLAIM_DT3, CLAIM_DT4, CLAIM_DT4, CLAIM_DT5, CLAIM_DT6, CLAIM_DT7, CLAIM_DT8, CLAIM_DT9, CLAIM_DT10, CLOSED_DT, DEATH_DT, ASSIGNED_DT, ADJSTER_CD, ADJUSTER_NAME, AGENT_CD, AWARD_CD, CAUSE_CD, CAUSE_DESC, LOCATION, SITE, COVERAGE_CD, COVERAGE_DESC, DED_RECOV, DEDUCTIBLE_REMAIN, PAID_1, RESERVED_1, PAID_2, RESERVED_2, PAID_3, RESERVED_3, PAID_4, RESERVED_4, PAID_5, RESERVED_5, PAID_6, RESERVED_6, PAID_7, RESERVED_7, PAID_8, RESERVED_8, PAID_9, RESERVED_9, PAID_10, RESERVED_10, LEGAL_FLG, KEY1, KEY2, KEY3, KEY4, KEY5, KEY6, KEY7, KEY8, KEY9, KEY10, SEVERITY_CD, POLICY_NUM, PAYMENT_NUM, SSN, STATE, ACTVY_DT, ENTRY_DT, ADMIN_CD, ADMIN_DESC, REOPEN_DT, INSURED_NAME, INSURED_ADDRESS, INSURED_PHONE, INSURED_CITY, INSURED_STATE, INSURED_ZIP, CLAIMANT_NAME, CLAIMANT_ADDRESS, CLAIMANT_CITY, CLAIMANT_STATE, CLAIMANT_ZIP, CLAIMANT_PHONE, SPECIAL_DT_1, SPE Cial_dt_2, special_dt_3, special_dt_4, special_dt_5, special_dt_6, special_dt_7, special_dt_8, special_dt_9, special_dt_10, gross_pd, policy_id

Figure 1: Columns of unregulated claims form

The first paradigm (1NF) converts a database or database table to the first paradigm. The first paradigm requires eliminating the repeated group in the data, which is achieved by establishing a separate table of related data. It determines the table by observing data and table structure to complete the first paradigm. The first paradigm is by aming a repeated group to each stand-alone table, which links these tables to the repetition groups through a pair of multiple associations. There is no repetitive attribute and a set of values ​​without repetition - this sounds simple enough. However, sometimes because there is no other choice, people believe that only simply adding any other collection, but this is what you do. If we want to make the claims to reach the first paradigm, we need to find all the properties that truly associated with a claim. What constitutes a claim? · Claims must be numbered. · Claims must have people who ask for a request. · Claims must have a report date. · Claims must have an accident or a date of illness. · Claims must have the quantity of certain items caused by accidents or illness. · Claim is or prepared or based on some strategy. · Claims can end. · Claim can start again. · Is there a coverage? Or have some kind of strategy? · Do you have a cause? Or do you have causes caused by an accident or a cause? · Do you pay a claim? Or do you pay an invoice? · Is there a social security number? Or sometimes a social security number belongs to someone who makes requests? · The dead date is an interesting part. Did people die? No, but if it is life insurance, it may be related to the claim, so it should stay. Modified directly related to the claims of the column, the obtained results shown in Figure 2: CLAIM_NUM, CLAIM_STATUS, ACCIDENT_YR, ACCIDENT_DT, REPORTED_DT, ENTERED_DT, CLOSED_DT, DEATH_DT, ASSIGNED_DT, ADJSTER_CD, ADJUSTER_NAME, AGENT_CD, AGENT_NAME, AWARD_CD, AWARD_DESC, PAYMENT_NUM, LOCATION , SITE, DEDUCTIBLE_RECOVER, DEDUCTIBLE_REMAIN, POLICY_NO, POLICY_DESCRIPTION, STATE, RUN_DT, ACTIVITY_DT, ENTRY_DT, REOPEN_DT, INSURED_NAME, INSURED_ADDRESS, INSURED_PHONE, INSURED_CITY, INSURED_STATE, INSURED_ZIP, CLAIMANT_NAME, CLAIMANT_ADDRESS, CLAIMANT_CITY, CLAIMANT_STATE, CLAIMANT_ZIP, CLAIMANT_PHONE, GROSS_PD Figure 2: The amended claim form is in line with the first paradigm. The revised version will contain information related to the claim, without the payment or invoice, strategy or accident.

Payment_numClaim_statusAccident_dtAccident_yrReported_dt Entered_dt123456789 Open 20-JUN-2000200028-JUN-2000 29-JUN-2000234567890Reviewed 15-FEB-1984 1984 19-FEB-1984 20-FEB-1984147258369Reopened 08-APR-2003 2003 10-APR-2003 11-APR-2003258369147Closed 18 -Dec-19801980 18-DEC-1980 19-DEC-1980 If you have a payment table, and store the retention amount of a specific claim to pay for other different bills, why not store them in the payment table? In short, you store some information in the payment table, so why don't you put these content in it inside it? Don't put it in the claims form? If the only reason for putting this information in claims form is that a user may need this information when it is claim, then the compensation and payment table can be connected (Join), and the information can be from all payments that occur from a single claim. And because you have different types of insurance strategies (there are different types of claims), why not store all types of claims payment information in a table? Store all payment information in the same table in accordance with logic. Most of the information associated with some payment (attribute) is the same, whether it is the type of payment or that type of claim. However, the account information of different types of claims is somewhat different. The second paradigm (2NF) second paradigm is deleted with redundant data. When the information in a table depends on the columns in the table not the primary key portion, the second paradigm is usually violated. If the new first paradigm claims list, then the redundant data that can be easily and easily see is the city and state of the insured city and state and the claims to claim claims. Cities and states are directly dependent on ZIP code without relying on anything related to claims. CLAIM_NUM, CLAIM_STATUS, ACCIDENT_YR, ACCIDENT_DT, REPORTED_DT, ENTERED_DT, CLOSED_DT, DEATH_DT, ASSIGNED_DT, ADJSTER_CD, ADJUSTER_NAME, AGENT_CD, AGENT_NAME, AWARD_CD, AWARD_DESC, LOCATION, SITE, DEDUCTIBLE_RECOVER, DEDUCTIBLE_REMAIN, POLICY_NO, POLICY_DESCRIPTION, STATE, RUN_DT, ACTIVITY_DT, ENTRY_DT, REOPEN_DT, INSURED_NAME, INSURED_ADDRESS, INSURED_PHONE, INSURED_CITY, INSURED_STATE, INSURED_ZIP, CLAIMANT_NAME, CLAIMANT_ADDRESS, CLAIMANT_CITY, CLAIMANT_STATE, CLAIMANT_ZIP Figure 3. the second paradigm Claim

Claim_numClaimant_name Claimant_addressClaimant_cityClaimant_stateClaimant_zip123456789Jennifer Smith1234 MainPittsburghPA 15201234567890Bill Smith7852 Eagle PittsburghPA 15202147258369 John Jones4562 EdgeEighty Four PA 15330258369147 Eleanor Stillwater7531 West EasternSomersetPA 15510

Zip_CodeCityState15330 Eighty FourPA15510SomersetPA15201PittsburghPA15202 PittsburghPA15203 Pittsburgh PA15204Pittsburgh PA15205 PittsburghPA15206Pittsburgh PA15207Pittsburgh PA15208Pittsburgh PA15209 Pittsburgh PA15210Pittsburgh PA because Pittsburgh, Eighty Fou and Somerset, PA is not dependent on the claims, but depends on information-related Zip Code, it is not directly attributable to the pay table. Although this is not the only problem of this table, it eliminates the difficulties triggered with city, state, and zip code. Claim_numClaimant_nameClaimant_address Claimant_zip123456789Jennifer Smith 1234 Main 15201234567890Bill Smith7852 Eagle15202147258369 John Jones 4562 Edge15330258369147 Eleanor Stillwater7531 West Eastern 15510 others to migrate to other tables so that the claim form in line with information about the second paradigm include compensation number and compensation described combination, just need compensation number Stored in the claims form. When using this method, any update to the description needs to be changed for a given number, which can change a column of a row in the compensation table, and this will not update an exception, but if you update some A column affecting a table of hundreds of entities may have an update exception. The same logic can be applied to the mediator and the agent, migrate their information to its own table, only need to store the value of the number column in the claim form, which is easy to access the auxiliary information.

award_cdaward_desc

Adjuster_cdadjuster_name

Agent_CD Agent_Name Third Paradigm (3NF) Third Paradigm Rules Find attributes that do not directly depend on the primary key of the table formed by the first paradigm and the second paradigm. We have established a new table for all the information associated with the primary key of the table. Each new table holds information from the source table and the primary key they rely on. Note: The third paradigm is usually said to be a "key, all the keys, there is no information other than the keys".

CLAIM_Num, CLAIM_STATUS, ACCIDENT_DT, REPORTED_DT, ENTERED_DT, CLOSED_DT, DEATH_DT, ASSIGNED_DT, ADJSTER_CD, AGENT_CD, AWARD_CD, LOCATION, SITE, DEDUCTIBLE_RECOVER, DEDUCTIBLE_REMAIN, POLICY_NO, STATE, RUN_DT, ACTIVITY_DT, ENTRY_DT, REOPEN_DT, INSURED_NAME, INSURED_ADDRESS, INSURED_PHONE, INSURED_ZIP, Claimant_name, Claimant_address, Claimant_zip Figure 4: The third paradigm of the claims form can see more changes in the claims form in the third paradigm, and the insured name, address, telephone number, and zip code are more dependent on the insured name, address, telephone number, and zip code. The strategy is not a claim itself. Therefore, we can put the insured's information in the policy table. This makes the remaining information of the claims form to be more directly related to the claim, putting all other information into its own table and guarantees sufficient (no omissions). A simple connection of these tables can re-construct information of the source table, which is also the goal of relationship algebra and relationship (the relationship theory and the basis of relational database dependencies).

转载请注明原文地址:https://www.9cbs.com/read-106217.html

New Post(0)