1. What is a data warehouse?
The definition of the data warehouse is given in the book of "Building the Data Warehouse" in Whinmon: "Data warehouse is a topic, integrated, stable, and time-variable collection data to support management decisions. Structural form. ", Said that the popularity: Data warehouse technology is also commonly known as distributed database plus a restrictive condition, and new data storage and processing methods are formed.
And this constraint is the focus of the book.
2. Transition from database à data warehouse
Why did people use data warehouse techniques after so many database products. The princes of the industry are so embarrassed. This is the promotion of computer technology applications. The process of current development of database technology is accompanied by the driving of the application requirements of OLTP (On Line Transaction Process online business). The most urgent technical requirements for online business processing are fast response. Database technology, especially the technology based on EF CODD-proposed relationship theory, divide the data set into very few redundant entities (Entity), and then weave them into one organic overall according to a certain relationship (RELATIONSHIP). It is relatively perfect to meet the application needs of OLTP. For each business process, it is only necessary to involve an entity. Business processing For the physical ADD or UPDATE, it is only related to the possible minimum space of the data medium, such as record-level blockade technology, which is consistent with the related updates of other entities. Sex and integrity. The success of the theory and technology of the actual needs of OLTP at the time, promoted the popularity of relational database products. Oh, don't you understand, it doesn't matter, you don't have a relationship, you will understand it. This meaning refers to: use the frequency classification storage, different applications to access different data classes, still don't understand? You are too stupid!
3. Data warehouse is an important part of the distributed system
This is a typical distributed database design:
Operating data
It should be noted that the data warehouse is not a distributed system, which is part of it, but if you understand the status of the data warehouse, then you know why the data warehouse is called, not a distributed application. The reason is very simple, the data warehouse is the core, and the other parts must be a center of the sun, and the structure is formed like a solar system.
Operating data is a wide variety of data we get from a wide variety of data sources. This is the most primitive state of the entire system data, and I saw the record of my call on November 11, or Seeing what bills I have on the 15th, the specific content of this bill, even I can see the takeoff and arrival time, if you understand what this is, then you should understand the following things. .
Features of operating data:
1. Real-time: Data is almost current value.
2. The data source is extremely rich, and the data from the outside of the enterprise and the internal and internal data.
3. High requirements for the reaction time. (You can't have 1 hour for adding a bill of records).
Therefore, it can be seen that even if it is designed to design a operated database, it is not very difficult, :-) In advance, the design of operational data follows: Demand à architecture à Complete code à load data.
The biggest feature of the data warehouse is a "steady" word, let me say how it is extracted from the operational data, and let's say that his data update cycle is at least 24 hours, you should understand; use its data is not It may be a real-time thing, it's right, it exists for the purpose of making you doing real-time things, it is to make you use the data extracted from the operational database to analyze and statistically work. Do you know? This is very important. This is also an important task of each analyst for DSS (previously known as MIS). If you don't call the decision analysis system: P want to know his benefits or listen to the expert's opinion: Directly use the online business Data in the processing system is decreasing supporting data analysis processing is a lot of trouble, or even achieved. At this time, people will ask why there is the data I need in the system, but I can't use it! This is not to say that the relationship database is not good, but the old product has encountered a new task. The E-R type data structure can perfectly perform online business processing, but not adapt to large-scale decision support data analysis, especially the needs of enterprise-level decision support data analysis processing. Adapt to this demand, it is born to the data warehouse technology.
The goal of the data warehouse is to provide support information for the development of management, this significant response to the OLTP (online transaction) system needs to be different. Just like enterprises to develop business reorganizations, in order to support management decisions, it is necessary to recombine data in the OLTP system in accordance with decision-making business accounts, and to analyze the contents of different organizations to make it easy to use according to different decisions. This topic-based mode is from the user's angle is a multi-data reorganization structure.
Before loading the data to the data warehouse after the data structure is reorganized, it is necessary to perform data conversion, or "integration" processing. This process consists of several essential operation steps to make data complete, unified, which ensures that the data in which the data warehouse is used is guaranteed, and there is a detailed later. In short, the integration is to ensure that the data is accurate, in place, and there is no numerical range that should have, and there is no repetition.
OK! Do you understand? Is it a bit complicated? Don't tighten, remember it, you will do it: Data warehouse is to give statistical analysis and other work, specifically designed data support, it is so simple, summed up:
database:
1. Data is free of time, and the update time is longer.
2. The source of data is the operational data, and after a certain mode.
3. When processing, the event is relatively loose.
Its design is relatively complicated, but it is certain that the data warehouse is followed by: data -> requirements. This can also be understood: the smart woman is difficult to be no rice. You have a lot of radish, you can only make a "radish meeting"! Therefore, the purpose of DSS design is: You will give me me, I will tell you what I want. It sounds not very awkward. :)
So far, you should have had a comprehensive and coarse understanding of the data warehouse, huh, the next step should be carefully drilled with great building the data warehouse.
Ma Lei Wednesday, NOVEMBER 29, 2000