Data Warehouse Guide

zhaozj2021-02-17  68

Data warehouse learning experience

One. concept

1. Data warehouse: refers to the topic, consistent, different time, stable data sets for support for decision support in business management. From a broad sense of data warehouse refers to a database stored in a large number of historical data. Each record represents a data on a special time point.

It is an information technology that converts the collected data into commercial value, and embodies the collected information in the report. Including collection data, filtering data, storage data, and then applies data to an application such as analysis, reports.

2. Data Warehouse Objective: Confirm the data structure, find trends, assist decisions, provide decision-making information for business management.

3. . DSS: Decision Support Process.

4. Data Warehouse Component: Data Market, Relational Database, Data Source, Data Preparation, Service Tool

5. Dimension:

6. Multi-dimensional:

7. Aggregation: Gets and concentrates a group or sum structure. The polymerization is the concept of moving data in a multi-level hierarchy.

9. Category: Classified to categories and distinguish specific data, within a dimension, to provide a classification defined for detailed classification systems.

10. Detailed category: The bottom of a dimension is the most underlying classification.

11. Decomposition and synthesis:

12. Indicator quantity:

13. OLAP: online analysis

14. OLTP online transaction

two. Data model standardization

1. Concept:

Standardization: It is a regular way that applies a set of rules to associate attributes and entities.

Entity: It is a major data object and is critical to the user. It is usually a person who will be recorded in the database, a location, the same thing or one thing. Attributes: entities include attributes, attributes are feature, modified components, quality, quantity, or characteristics.

Paradigm: Standardization is composed of several steps that can reduce 褓 to obtain more satisfactory physics, these steps are called paradigm.

The first paradigm: a table that does not contain repeat columns is attributed to the first paradigm.

Second paradigm: If a table is attributed to the first paradigm and only contains columns depend on the primary key, it is attributed to the second paradigm.

The third paradigm: If a table is attributed to the second paradigm and only contains those columns dependent on the primary key, it is attributed to the third paradigm.

two. Information demand modeling:

1. Top mode: Using specific data elements to organize these elements into each dimension and indicator,

2. From bottom to model: from the user's point of view, the advantage is that the designer can transfer a usual theme or business sector

3. Development. It is a combination of top and bottom-up methods.

4. Example: Sales revenue should be represented from budget and actual.

Indicators: The actual income of product sales, the budget collection of product sales, the estimation of product sales

Dimensions: products that have been sold.

three. Design data warehouse, often ask several questions for users?

1. Task of the user's department

2. The task assumed by the user in the department

3. What reports need to be required to complete the task

4. Where does this information currently get?

5. How do you handle information?

6. Is the information be generated by the user or it is generated in a regular report?

7. Is the user entered the information in the worksheet? Is it possible to analyze?

8. How to deal with this information is timely?

Packet preparation:

Packet: ________________________

Dimensions: __________________________________________

category:

Indicator (forecast sales, actual sales, prediction deviation)

four. Establish multidimensional data model

To build a cube:

1. Choose a business process used to analyze the model theme.

Modeling theme: For example, the market strategy is developed through product line and region to analyze consumers, and the data model theme is "sales". 2. Determine the particle size of the fact table.

The fact table particle size usually represents the bottom of each associated dimension. The choice of "Day" is grained, indicating that each record in "Time Dimension" represents a day.

3. Distinguish the dimensional layer of each fact table.

The defined particle size is related to the dimension.

4. Distinguish the metrics of the fact table.

Metrics not only include the data itself, but also new values ​​you get from existing data. When designing a data model, a decision must be made: whether to store the calculation results in the fact table or get these values ​​during the running phase. Such as: ratio.

5. Determine the properties of each dimension table.

Under normal circumstances, the quantity of each dimension table attribute defined should be minimized.

6. Let the user verify the data model.

Welcome to send me Email, let us make progress together.

Mailto: hxflx@sina.com hxflx@163.com Liung@neusoft.com

转载请注明原文地址:https://www.9cbs.com/read-29750.html

New Post(0)