Bi architecture and related technologies (medium)

xiaoxiao2021-03-05  47

Now, most companies have adopted a computer business management system in one or more departments, as well as quite commercial data. However, as the industry's old saying "Rich Data, Poor Information", the previously accumulated data is not well obtained. Why? It's not that the senior management personnel of the enterprise did not think of it, but the source of these data is too wide, the format is not uniform, and the very small amount of data record format is incorrect; at the same time, the accumulated data is quite large, and millions of records have just started. Certain large companies have a business records every day; and some details are not important to high-level managers. What they need is a statistical report that stands in the strategic layer of the strategic layer, timely, in a short period of time, and is a statistical report for the business decision service.

In order to achieve this arduous goal, BI experts break down the task into three sub-tasks: 1) In order to integrate data in various formats, clear the error records in the original data, the experts put forward the requirements of data pretreatment --STL (Data Extraction, Conversion, Loading); 2) The pre-processed data should be unified, resulting in meta data, data warehouse; 3) Finally, for concentrated huge Data sets should also be performed, and new opportunities for valuable business decisions are discovered, which is OLAP (online transaction analysis) and data mining.

The following is specifically introduced to the expertise and auxiliary tools for each subtraction. 1) Data pretreatment (STL: Extraction, Transformation, Load)

Shortly after the early large-scale online transaction system (OLTP), a simple program for "extraction" processing is to search for the entire file and database, using some standards to choose the required data, will It is copied to copy out for overall analysis. Because this does not affect the online transaction processing system being used, it reduces its performance, and the user can control the data extracted by itself. However, there has been a huge change in the current situation, and the enterprise uses multiple online transaction processing systems, and the data definition format between these systems is not the same, even with different software products provided by the same software manufacturer, or is just product versions. Different, there is a little gap between data definition formats. Thus, we must first define a unified data format, then convert the data of each source in a new unified format, and then load into the data warehouse.

Among them, it is especially not to pay attention to all data in different formats of each source can be incorporated by the new unified format, and we should not force all the data from all the data sources to be concentrated. Why? There are a lot of reasons. There is a possibility that a small number of records uses a wrong data, and such data cannot be corrected and should be went. Some data records are non-structured, it is difficult to convert them into newly defined unified formats, and the extracted information must read the entire file, extremely low efficiency, such as binary data files, multimedia files, etc., such data. If you don't decision on your business, you can go.

Some software vendors have developed special ETL tools, including: · Ardent DataStage · Evolutionary Technologies, Inc. (ETI) Extract · Information Powermart · Sagents Solution · SAS Institute · Oracle Warehouse Builder · MSSQL Server2000 DTS

2) The data warehouse is mentioned above, before the STL is performed, you need to define a unified data format. So, is it necessary to save it in a unified data format that is defined in order to use in the evolution of the data warehouse? YES! With the changing business model and business rules, it is necessary to modify and function upgrades the system. We will not start with the specific meaning of the data format prior to defined. Therefore, we need a data used to describe data. Early we use data dictionary, data dictionary, generally includes definition, relationship, source, scope, format, and usage of data. However, over time, experts have found that more and more set of data warehouses wish to make it easy to enclose the structured and unstructured data of the latest format, and the traditional relational database data The dictionary does not reach this goal.

After the death of XML, this self-description, infinite nested extension, platform independent text data format provides considerable technical support for the evolution of the data dictionary, thereby generating XML-based metadata concept. Also, there is currently a lot of software systems and data warehouses using XML's metadata. Such as Microsoft's .NET, P2P EMULE, etc. It can be seen that metadata is not only used in a data warehouse.

Since XML-based metadata is quite flexible, we can describe complex business services with metadata. So, the metadata in the data warehouse is divided into two types: technical metadata and business metadata. Technical meta data is metadata that provides support for employees of corporate technology users and IT departments, which is important for maintenance and improvement systems. Business meta data is metadata supporting enterprise business users, making business users more likely to understand information in statistics.

The metadata tool is divided into two categories: one is integrated into a centralized storage, and the other is the access tool for the query access in the warehouse. In general, the corresponding tools are bundled in the data warehouse and BI system provided by most software vendors. These include: · Ardent MetaStage (Infomix) · IBM information Catalog · Brio Enterprise · Business Objects · Cognos Impromptu and Powerplau · Information Advantage Business Intelligence · Microsoft OLAP Services ( "Plato") · Microstrategy DSS Web and Server

The data warehouse is the foundation of the BI, it is like a chef's ingredients. The data of each data source is sent in the data warehouse after the pre-process of ETL. The data warehouse has the following four important features: 1-facing-oriented: different types of companies, the topic collection is different. 2 Integrated: Data warehouse has a wide range of data, the most important purpose of data warehouse is to integrate data from these different data sources. 3 Non-volatile: Compared to traditional operating database systems, data warehouses are usually loaded and accessed in bulk. Moreover, for records in the data warehouse, data updates, deletions in general sense are not performed. All historical data will be retained, usually we just import new data in batches. 4 Changes over time: Operating database systems do not save all data generated after the system is put into operation, generally reserved data records generated within 60-90 days. Moreover, in general, a business activity in the operational database takes only one record. When the business situation changes, we only need to update the corresponding record. In order to discover the timing rules of the business activity in time, the business activity may also exist at the same time, in addition to the contents of the corresponding fields, the time records of their business activities are also different. Data in the data warehouse is a complex snapshot that is generated at a certain time. This shows that the data of the data warehouse is highly redundant and must be. Moreover, since the use object of the data warehouse is not the same, the design of the data warehouse needs to consider the degree of detail of its data unit, that is, the particle size. The higher the details, the lower the particle size level, and vice versa. For example: a simple transaction is at a low grade, and the summary of all transactions per month is in a high-grade level. Typically, the data analysts are less data particle size, while the data particle size used by the senior management personnel is high. The particle size also determines the size of the physical space occupied by the data warehouse, although a transaction record may only occupy 200 bytes, but the 100,000 transaction records accumulated for a month will take 20m bytes. If you synthesize all months on all months, the resulting record may only take 500 bytes.

The usual activity of data warehouses is a bulk load and query access, which does not perform a general data update, and its data redundancy is high. In order to improve query efficiency, we can use some unconventional methods to make data partition storage. Moreover, we need to make more and effective monitoring of data in the data warehouse.

Most of the software vendors that provide data warehouse technical services are developed from the operational database system. The data warehouse launched is based on its own large-scale database products, and bundled the corresponding ETL, metadata, OLAP, reports. Such as IBM's DM2, SAS, Sybase, Oracle, Informix, MSSQL Server, etc.

Data Markets (Data Mark) will be explained at the end of this section. If the data warehouse is built on the enterprise-level data model. Then the data market is a subset of the enterprise-class data warehouse, and he mainly facing the department-level business and is only for a particular subject. Data markets can mitigate the bottleneck access to the data warehouse to a certain extent. However, since each data is independent of each other, it has formed a new "information island", which also caused repetitive investment. Therefore, more and more data warehouse vendors have begun to provide technical services for enterprises users to integrate the original data markets to build a centralized data warehouse. In the actual project, it is to choose a data warehouse or choose a data market, depending on the main business driver of the project. If the company is enduring bad data management and inconsistent data, it is better to make a good foundation for the future, the program is better. If the company urgently needs to provide information to the user, you can build a data market first. Once the urgent information needs are met, the conversion plan of the data architecture containing the independent data warehouse should be considered. · Bi architecture and related technology brief (on) · BI architecture and related technology brief (below)

转载请注明原文地址:https://www.9cbs.com/read-31928.html

New Post(0)