Construction and analysis of SQLSERVER data warehouse

xiaoxiao2021-03-06 28

(1) Basic concept:

1. Cube: Cube is the main object in online analysis processing (OLAP), which is a technology that can quickly access data in a data warehouse. The cube is a data set, typically constructed from a subset of data warehouses, and organizes and summarizes a multi-dimensional structure defined by a set of dimensions and metrics.

2. Dimension: is the structural characteristics of the cube. They are organized hierarchies (levels) used to describe data in fact data tables. These categories and levels describe some similar sets of members, and users will analyze them based on these members.

3. Metrics: In the cube, the metric is a set of values, which are based on a column in the fact data table of the cube, and is usually numbers. In addition, the metric is the center value of the cube analyzed. That is, the metric is the digital data that the end user browsing the cube when viewing the cube. The metric that you have chosen depends on the type of information requested by the end user. Some common metrics have Sales, Cost, Expenditures, and Production Count.

4. Metadata: The structure model of data and applications in different OLAP components. Metadata Description Objects in the OLTP database, data warehouses, and multi-dimensional cubes in the data market, and which applications also record different record blocks.

5. Level: Level is an element of the dimension hierarchy. The level describes the hierarchy of the data, from the highest data (maximum extent) level until the lowest (most detailed) level.

6. Data mining: Data mining allows you to define a model containing packets and predictive rules to apply data in relational databases or multi-dimensional OLAP data sets. After that, these predictive models can be used to automatically perform complex data analysis to find trends in helping to identify new opportunities and choose a chance to win.

7. Multidimensional OLAP: MOLAP storage mode makes the partition aggregation and its source data of the copy of its source data stored on the analysis server computer in a multi-dimensional structure. According to the percentage and design of partition aggregation, the MOLAP storage mode provides potential to achieve the fastest query response time. All in all, MOLAP is more suitable for frequently used multi-dimensional data sets and needs to respond quickly to fast query responses.

8. The relationship OLAP (ROLAP): ROLAP storage mode allows the partition aggregation to store the table of the relational database (specified in the partition data source). However, the ROLAP storage mode can be used for partitioning data without creating a polymerization in the relational database.

9. Hybrid OLAP (HOLAP): The HOLAP storage mode combines the characteristics of both MOLAP and ROLAP.

10. Particle size: The data summary level or depth.

11. Polymerization | Gathering: Aggregation is a pre-calculated data summary, and the aggregate can improve the query response time because the answer has been prepared before the problem.

12. Cut a block: partition data defined by multiple portions of multiple dimensions, called a cut block.

13. Slice: partition data defined by a member of a dimension called a slice.

14. Data Drill: The end user selects a single unit from a conventional cube, a virtual cube or link cube, and retrieves the result set from the source data of the unit to obtain more detailed information. This operation process is data drill. .

15. Data Mining Model: Data Mining enables you to define a model containing grouping and predicting rules to apply data in relational databases or multi-dimensional OLAP data sets. After that, these predictive models can be used to automatically perform complex data analysis to find trends in helping to identify new opportunities and choose a chance to win. (2) Example Construction Process and Analysis

1. Now analyze and explore the build process of the MS SQL Server data warehouse with a relatively simple instance. In fact, the construction of the data warehouse is quite complicated, and he combines the front-end technology and strong business requirements of the data warehouse. Here is just a simple example to explain his approximate construction process.

2. Build a data warehouse model, he includes two parts, one is to consider what useful data to provide the original data source, that is, after the data is filtered, it can be used for the data warehouse. The second is to see what the company's business layer needs to be analyzed. This is closely fitted with the company's senior decision-making layer, fully understands his business needs, because the user of the data warehouse is mainly the company's senior decision makers.

At this stage, you must do a lot of pre-periods, because the data in your original database may have a lot of access to the demand for the data warehouse to be built, and the structure is completely two horses. How can you extract your original data, as a data warehouse useful data, your original database is sacking zero-breaking transaction data, and your data warehouse is transformed and refined statistics, For example, in your original database stores all deposits and withdrawal records, your data warehouse does not care about each record of each record, but wants to statistically at the fastest time in the shortest time. All deposits and withdrawals of this month, if this query is made on the original database, it will lose the meaning of the data warehouse. The oversized data makes you unable to query, this time you have to This query is meaningful data translates to the data warehouse, which is data cleaning, ie ETL. There are many ways to achieve data cleaning, and there are many details, such as the matching of data types, data format conversion, and the main key duplicate, and how you regularly, on time to data on time. Warehouse is waiting. There is no strict through a step in my example, because I have no normative raw database, and there is no standardized business requirement. I just use the star model and the snowflake model to make several typical data warehouse tables. The same is as follows:

Fact in the window, Time, Address, Detail, time-dimensional, addressive, detailed address, Detail, is the subviost of Address. They constitute a snowflake model. All of them have some data.

3 Now, the data warehouse has been established, and the next step is to establish a metadata database on an OLAP server. This database is different from our previous database, he is a database of depositing metadata, such as our next step to create a cube, role, data source, shared dimension, and mining model, etc. Then need to connect to the data source established in the ODBC Data Source Manager early to connect to the data warehouse.

I created the database MMM and data source TEST as follows:

After these work is done, you can build a shared dimension in the dimension of the data warehouse, and now time dimension and address dimension. Its creation process is the same.

Create Time Dimensional (TIME) in this point, build a snowflake model sharing dimension with Address and Detail

Click Next to create a Detail dimension. Processing after creating is completed to take effect

The dimension is created after the creation will create a cube. The cubes are a dataset based on dimoor and fact tables to quickly access the data warehouse. Our cube structure is as follows:

Detail (SREET) Detail (Mark)

Address (province, city)

Time (Year, Day)

The creation process of the cube Study is as follows:

Click Next to create success (Study), processing as follows:

Then, I should create a mining model.

After completion, the treatment is as follows:

To now a simple data warehouse architecture has been successful, we use the front-end analysis tool to do inquiry for the established data warehouse, see if our simple business requirements are used, first use Excel as a query tool:

In addition to doing query with Excel, English Query, we can use the MDX function to search directly to OLAP.

So far, a simple data warehouse has created success and can implement some simple business inquiry. This instance is mainly analyzing the creation process of the data warehouse and further deepening the understanding and understanding of the data warehouse, further understanding the basic concepts.

转载请注明原文地址:https://www.9cbs.com/read-50581.html

9cbs

New Post(0)