Basic common sense of data warehouse

xiaoxiao2021-03-06  77

What is a data warehouse data warehouse is an environment, rather than a product, providing users current and historical data for decision-making support, which is difficult or not available in traditional operating databases. Data warehouse technology is to effectively integrate operational data into a unified environment to provide general name of various technologies and modules. Everything to do is to make users more easily and more convenient to query the information you need, provide decision support. Data Warehouse Composition Data Purification Data Load Information Publishing System Operating Data and OF Data Data State Reports, Inquiry, EIS Tools OLAP Tool Data Mining Tools Manipulation Platform Metadata Management Platform Data Warehouse Database is the core of the entire data warehouse environment It is the place where data is stored and provides support for data retrieval. Relative to the operational database is characterized by the support and fast retrieval technology for massive data. Data extraction tools take data from a wide variety of storage methods to make the necessary transformation, finishing, and store them into the data warehouse. Accessibility for various data storage methods is the key to data extraction tools, which should generate a COBOL program, MVS job control language (JCL), UNIX script, and SQL statements, etc. to access different data. Both data conversion include deleting data segments that have no meaning for decision applications; convert to unified data names and definitions; calculate statistics and derivative data; assigning default values ​​to the default value; unifying different data definitions. Metadata metadata is data describing the structure of data within the data warehouse. It can be divided into two types, technical metadata and business metadata different from the use. Technical metadata is data for the design and managers of data warehouses for development and daily management data warehouses. Including data source information; data conversion description; definition of object and data structure within the data warehouse; rules for data cleaning and data update; mapping of source data to destination data; user access, data backup history, data import History, information release history, etc. Commercial metadata describes the data in the data warehouse from the perspective of commercial business. Including the description of the business topic, the included data, query, report; metadata provides an information directory for the access data warehouse, this directory fully describes what data in the data warehouse, how to get this data, And how to access these data. It is the center of operation and maintenance of the data warehouse, and the data warehouse server uses him to store and update data, and users understand and access data through him. The access tool provides users to access the data warehouse. There are data query and report tools; application development tools; management information system (EIS) tools; online analysis (OLAP) tools; data mining tools. Data Marts, for specific application purposes or applications, from part of the data from the data warehouse, can also be referred to as departmental data or subject data (SUBJECT AREA). During the implementation of the data warehouse, it is often possible to start from a departmental data, and there will be a complete data warehouse in several data markets. It is necessary to pay attention to the implementation of different data sets, the field definition of the same meaning must be compatible, so that it will not cause a lot of trouble when implementing the data warehouse. Data Warehouse Management: Safety and Privilege Management .

Information Publishing System: Send data or other related data in the data warehouse to different locations or users. Web-based information publishing system is the most effective way to deal with multi-user access. Why establish a data warehouse: Business: Use all possible data to make a quick and correct decision; users are experts in the business sector, rather than computer professionals; enterprise data double every 18 months, need to have an effective How to access these data; competition has intensified in business intelligence and useful enterprise data. Technically: Computer computing capacity is increasingly cheap (MIPS price decline); the price of storage medium; the network bandwidth growth, the network's transmission capacity is getting cheaper; the computer environment of the entire enterprise is increasingly complex, each era Application systems of different manufacturers also exist; new applications to access data for other applications. Problems in implementing data warehouse: The step of implementing commercial (considering return on investment): From top to bottom or from the lower to human resources: training or hiring design (Think Big, But Start Small) may be used In many types of data sources, historical data may be "old", and the database may become very large. Data warehouse is more business-driven rather than technology-driven, it is necessary to communicate with end users, and the established process may never end. Important: 1) Data warehouse should include detail data (cleaned). 2) Any data that the user can see should have a corresponding description in the metadata. 3) Consider how the data in the data warehouse is always allocated in the data warehouse to all the servers, and is still time? These strategies have a great impact on the performance of the entire data warehouse. 4) When elective data warehouse design tools should be taken: Is the metadata format of the tool support compatible with the metadata format supported by the data warehouse? Can the metadata format of different tools? 5) End users' use of data warehouses has a great impact on the performance of the data warehouse. When designing the data warehouse model, the user should take into account the usage of the data warehouse. Nine steps in design data warehouse: 1) Select the right topic (domain to solve the problem) 2) Express Define Fact Table 3) Determine and confirm the dimension 4) Choosing the Facts 5) Calculate and store the derivative data segment in the FACT table 6) Rounding Out The Dimension Tables 7) Choosing The Duration of The Database The NEED To TRACK SLOWLY CHANGING DIMENSIONS 9 Determines the query priority and query mode. Technical hardware platform: The hard disk capacity of the data warehouse is usually 2-3 times the capacity of the database hard disk. Usually the mainframe has more reliable performance and stability, and it is easy to combine with the system left by the history; and the PC server or UNIX server is more flexible, easy to operate, and provides dynamically generating query requests. Question to consider when choosing a hardware platform: Do you provide parallel I / O throughput? What is the support of multi-CPU? Data Warehouse DBMS: His capacity of your storage, the performance of queries, and support for parallel processing. Network Structure: The implementation of the data warehouse generates a lot of data communication on the part of the network segment, and it is necessary to improve the network structure.

转载请注明原文地址:https://www.9cbs.com/read-109304.html

New Post(0)