How to build a bank data warehouse
Song Yumer, No. 11 Xinhua East Road, Dengzhou City, Henan Province
As a new technology in the field of data management, its essence is a comprehensive solution for online analysis processing (OLAP). It is mainly a concept that is mainly a concept. The structure is completed under the guidance. There is no ready-to-use product that can be purchased directly, and there is no specific analysis specification and implementation, that is, there is no maturity, reliable and widely accepted data warehouse standard. In the design and implementation of the previous relational database, there is not only detailed theoretical derivation, but there are countless design examples, no matter what company database products, development tools, as long as they do according to the specification, then achieve the same business needs It will be very similar. In the implementation of existing data warehouses, the difference between the MOLAP program and the ROLAP program appears, and the shape of data warehouse modeling tools, performance tools, while designers' personal experience and quality will also play a very important role in it.
Data warehouse technology implementation
The current application of data warehouse technology is mainly included in the following specific implementations.
1. Create a data warehouse on a relational database (ROLAP)
2. Create a data warehouse on a multi-dimensional database (MOLAP)
The MOLAP scheme is to organize data in a multi-dimensional manner; the ROLAP scheme uses two-dimensional relationship table as the core expression multi-dimensional concept, divided into two types of tables: dimensional structure into two types: dimension and fact table, make relationships The structure can better adapt to the representation and storage of multi-dimensional data. In terms of the expression of the multidimensional data model, the multi-dimensional matrix is more clear and occupied by the relationship table, and the connection of the data is queried by the connection between the relationship, the system performance is the biggest problem. The MOLAP scheme makes concise, index and data aggregation are automatically managed, but at the same time lose certain flexibility. The implementation of the ROLAP scheme is more complicated, but the flexibility is better, and the user can dynamically define statistics and calculation methods, which can protect investments on the existing relational database.
Since both programs have advantageous, in practical applications, MOLAP and ROLAP are often used, that is, a so-called mixed model. Using relational data set history data, details, data, or non-numeric data, exert the advantages of relational database technology, reduce cost, and store current data and common statistics in the multi-dimensional database to improve operational performance.
3. Establish a logically data warehouse on the original gatehouse
Because of the current operational OLTP system has accumulated massive data, how to extract the useful information required to make a decision to become the most urgent needs of users. New data warehouses can be given a complete solution from all aspects of functionality, performance, but need to invest a lot of manpower, material, and the construction of data warehouses and the accumulation of data. It takes a period of time to meet the user's information analysis. urgent need. Therefore, in the early stage of the construction of the data warehouse, some suitable performance tools can be used to establish a logical data warehouse system on the original OLTP system. Although due to the limitations of the original OLTP system design, such systems may not be able to achieve many analytical functions, but such a data structure is fixed, the information analysis requirements are relatively stable, so the modeling of data warehouses is relatively easy. , Convenient; at the same time, such systems will also become the prototype of the real data warehouse construction.
Relationship between information systems and data warehouses
Due to the large amount of data, the data source is diversified, when the commercial bank builds a management information system, it is inevitable how to manage these vast data, and how to extract useful information; and the biggest advantage of the data warehouse is You can store commercial data on different information island in your business network, stored in a single integrated database, and provide various means to count, analyze data. Therefore, it can be said that the management information system is built in the bank using the data warehouse, both pressure, and data foundation, and the link between them is inevitable, it is difficult to cut. Data warehouse in commercial banks include deposit analysis, loan analysis, customer market analysis, related financial industry analysis decision (securities, foreign exchange trading), risk forecast, benefit analysis, etc.
When building a bank information system, there are two ways due to historical situations and reality requirements:
1. Construction new system
Due to the current supervision of bank internal operations, there is a lack of good data collection mechanisms, and therefore, when building a management information system, sub-data collection entry and data summary analysis are considered. In such a system, since the processing problem of a large number of historical data is not required, considering that there may be multiple data sources during the collection process, the data warehouse can be constructed while the system is constructed, and the collected data will be integrated by data. Go to the data warehouse.
2. Improve the original system
For existing OLTP systems, a large number of historical data is precipitated, the logical data warehouse can be established on the original system, which is the virtual multidimensional model on the relational model, using the performance tool with data analysis. When the system is stable, establish a physical data warehouse, which saves investment and shorten the development period.
Problems need to pay attention in implementation
I. Problems in model design
Model design (including logical model design and physical model design) is the key to the basis and success or failure of the system. In practical operation, the different technologies should be paid attention to the following issues.
1. Build a data warehouse directly
When building a data warehouse directly, you must recombine the data in the OLTP system as required by business analysis, and organize them separately by focusing on different side, making it easy to use.
* Determination of theme
The theme is a logical concept that should be able to completely and uniformly portray the data involved in the analysis object and interconnect. The subject matter of the subject is mainly from two aspects: analysis of the original fixed statements and interviews on business people. The original fixed statement can better reflect the demand for data analysis in the past, and the data meanings and format are relatively mature, stable, and a large number of references are required in the model design. However, it is not only to replace the current hand-reportable statement, it should not be the goal of building management information systems, but also through business interviews, further excavation of potential and deeper analysis requirements in daily work. Only in this way can you really understand the topic division required to build a data warehouse model.
* Refining of content
The theme is actually directly related to the scope of the analysis content. Once the subject is divided, the next step is to refine the specific content of the analysis and determine its location in the data warehouse based on the nature of the analysis. The usual dimensional element corresponds to the analysis angle, and the metrics correspond to the specific indicators of analysis care. Is an indicator as a dimensional element, metrics or dimension, depending on the specific business needs, but from the actual operation, the following conceptual experience can be summarized: as a dimensional element or dimensional property is usually discretized, only allowed Finite value; as a measure of continuous data, the value is unlimited. If you must use a continuous data as a dimension, you must press the value to segment the value to the segmentation value as the actual dimensional element. Judging the analysis indicator is a dimensional or dimensional property, it is necessary to comprehensively consider the frequency of storage and related queries occupied by this indicator.
It is important to emphasize that in the process of refining analysis, be sure to solve the ambiguity problem of the indicator. In different reports and the same name in the business interview, whether it is in the same conditions, the relationship between the same method is extracted or calculated, what is the interrelationship between them, these problems must be familiar with analysts There will be accurate and clear answers there, otherwise it will affect the model design, data extraction, and data display. * Design of particle size
The granularity of the data stored in the data warehouse model will have an impact on many aspects of the information system. In fact, the level of various dimensions is used as the finest granularity, and the data that determines whether the stored data can meet the functional requirements of information analysis, while the level of granularity is divided, and the selection of the particle size in the polymer table will directly affect the response time of the query.
If the same information system is in a wide range, multi-level operation, such as department-level and enterprise levels, you should consider different levels of data warehouses.
* Skills in model design
Composite indicators, especially the definition of ratio symbols, must pay attention to the addition and subtailing, or in turn. The number of households, the calculation of the pen, this indicator often appears in the analysis or report, but does not need to be generated as a separate indicator physics in the database, but must be prepared when defining the analysis model. Measurement time characteristics, different performance of analysis indicators in time dimension, can be divided into cumulative indicators, semi-accumulated indicators, and non-tightable indicators.
2. Build a logical data warehouse on the basis of the original data
If data analysis processing is performed directly in the OLTP system, you will encounter a lot of trouble, sometimes it is even impossible to implement. This is not to say that the relationship database is not good, but because its design idea is not suitable for large-scale data analysis. Therefore, when using this method, you need to pay attention to the processing of the following issues:
* Different time units
This is the most frequently encountered problems during the implementation, and it is often the hardest problem. The time stored in the OLTP system tends to use the same time unit as the actual business, such as the date of account data unit, and the financial statements are month or half a year. When it is analyzed, the data of different time units is often necessary to unify the same result, so that there must be an appropriate conversion mechanism to be implemented.
* Redundancy information
The so-called redundant information refers to the fields of the same meaning in different relational tables, but the same meaning is not only the same, but also the conditions they have established, such as the same loan for a certain time. The loan balance. In the OLTP system, such fields are often designed based on performance consideration, while in order to ensure uniqueness and accuracy of the results, it is necessary to use only one of the data to generate analysis results.
* Interval connection
Due to the design of the design of the OLTP system, it is necessary to ensure the integrity, consistency, and consider the response time, so the tables and tables are both relatively independent and interdependent. When designing a data warehouse logic model, the connection between the table must be hiented, both to ensure that the analysis data can be obtained or calculated, and thereby avoiding loops, causing the ambiguity of analysis data. In addition, different connection pathways also have different query speeds, affecting the response performance of data analysis.
* Design of statistics table
If the above problem cannot be well resolved on the basis of the original database, then the equity is to build a statistical table, that is, the simplified data warehouse, the form similar to the data warehouse, and the timing calculation statistics are put into, the time, redundancy After the problem, the connection is removed, and it is simple to analyze.
Second, the problem in data extraction
Data extraction is a small technical content, but very cumbersome work must have a person responsible for data extraction. When designing it, the questions you should pay attention to:
1. The rules extracted by the data should be used as metadata to make detailed records, source fields, destination tables, destination fields, conversion rules, and conversion conditions in the process. This is not only for programmers, but also makes modifications when the extracting rules or logic model changes. 2. How to record changes in the business database is an important part of data extraction. Due to time saving data in the data warehouse, the difference between data between different time points has become a key factor. It is usually possible to utilize the means of the database management system to generate data change logs in the database level. This is a comprehensive consideration of the performance, operability, and the effects of the original business system. More ideal methods.
3. When the data in the same table in the data warehouse comes from different tables in the original system, even when extracted, it is necessary to ensure that these data units are consistent, and they meet the same time condition.
4. Data extraction not only considers the extraction of data, but also considers the time schedule and execution method, which is a complete data extraction plan, and it can guarantee that the data extracted is accurate and available.
Third, post-maintenance, optimization problems
The construction of a data warehouse is a long-term job, which needs to be continuously adjusted and perfect in the process of running with other systems. This includes two works:
1, performance
Data warehouse involves query of massive data, data is written, not only high requirements for database systems, but also extremely different from the requirements of OLTP systems, so in system design, implementation, and maintenance, data warehouse systems Performance is a problem that cannot be ignored. Especially during operation, pay close attention to the application of the application for system resources. Timely adjust the system in a timely manner, including adjusting the database parameters, data fragmentation, creating special indexes and even improving system configuration.
2, model
Application and demand is mutual promotion, continuous development, with the completion of the information system, and users will update higher requirements for the system in the process of understanding the system. How to meet the needs of users under the premise of minimal investment, is also a question that is worth paying attention to and persistent. First, you must use the potential of the existing system. Second, it is considered that the need to add a small number of indicators on the existing system, and the system is appropriately adjusted, and finally considering the system to reconstruct the system. It may reduce investment in system construction.
Deepening of data warehouse applications
According to the application implemented in the above method, the analysis of the generation and daily business of the report is mainly completed, which does not bring real benefits to enterprises, and far does not exert the application value of the data warehouse. As the application is in-depth, it can be closely fitted by the company's technicians and business people. Planning the application model of the actual value of the company, and continuously adjust the model itself according to the development of the actual business, in order to find out where the enterprise operation process Law, that is, data mining on the data warehouse, build DSS systems, so that the significance of building data warehouses can be fully reflected, so that it will eventually benefit enterprises.
Although data warehouse technology also needs to grow, improved, but as long as companies can recognize the importance of information analysis, business personnel and technicians can truly cooperate, I believe there will be more practical results in the near future.