Uncomfortable CRM-data warehouse from call center
Duan Yunfeng Yang Fengnian Song Junde 2002/04/19
After the call center service system collects a lot of data, it is necessary to use this information to provide the correct basis for the CRM system. In this process, the data warehouse is an indispensable element. Data warehouse is a comprehensive technology and solution based on data management and utilization, which will become a new round of growth points in the database market, and will also become an important part of the next generation application.
This article is divided into three major parts, introduces the concept of data warehouse and data market, and how to obtain high quality information, data warehouse design and implementation, three tool layers in the data warehouse system, and data warehouse platforms are subject to evaluation index A more detailed explanation, and the development direction of the data warehouse is analyzed.
What is a data warehouse?
1. Data Warehouse Concept Analysis
In the face of competition, integrity and changing markets, facing different levels of information, how will companies at all levels will be applied in order to quickly make correct decisions in business operations and management?
The data warehouse is a technical solution for the above problems, which is the core based on large-scale database-based decision support system environments. The parent of the data warehouse h · W · INMON is to define data warehouses: Data warehouse is a topic, integrated, unrecognized and converged data set over time to support managers' decisions.
We often think that the data warehouse is one or a group of products that help us get answers to questions, or help us improve decision-making capacity. In fact, the data warehouse is not so simple. Although it can help us get answers to better make decisions, this is just part of its global process. Where is the data in the data warehouse come? How does data enter a data warehouse? How to maintain data warehouse? How to organize data in the data warehouse? These are issues that must be answered before building a data warehouse. Establishing a data warehouse includes all activities created, managed, and maintaining data warehouses. Therefore, the data warehouse is not a product, but the solution.
Data warehouses and databases are different concepts. Data warehouse is a comprehensive solution, and the database is just a ready-made product. Data warehouses require a very powerful database engine to drive. Unlike relational databases, the data warehouse does not have a strict mathematical theory basis, which is more directed. Due to this engineering of the data warehouse, it can be technically divided into data according to its working process, data storage and management, data performance, and technical consultation of data warehouse design.
2. Data warehouse and data market difference
When you talk about a data warehouse, youdomllically talk about the data market. Due to misleading of manufacturers, many people often confuse these two concepts. Data markets are also a very popular term, a more common misunderstanding is that it is considered to be the difference between the data warehouse is only the size of the amount of data. In fact, the data warehouse is an enterprise-level, which provides decision-making support for the operation of each department of the entire enterprise: and the data market is a miniature data warehouse, which usually has fewer data, fewer theme area, As well as fewer historical data, it is a department-level, generally only serve personnel within a local range, so it is also known as department-level data warehouse.
There are two datasets, namely, independent data marts, and dominating Data Marts. The so-called subordinates refers to its data directly from the central data warehouse. Obviously, this structure of the subordinate data market still maintains the consistency of data. Generally, those who have a very frequent key business unit that access the data warehouse, which can improve the response speed of the query. Independent data fairs, its data is directly derived from various production systems. Many companies often consider investment when planning to implement the data warehouse, and finally built is the independent data mage of this structure, which is used to solve the problem of decision-making problems in individual sectors. In this sense, it is not much different from the logical structure in addition to the difference in data size and service objects, which is also the main reason for the data market as a departmental data warehouse. How to build a data warehouse?
1. Get high quality information
Data Warehouse As a Decision Support System (DSS) and online analysis, the application data source is planned and resolved, and information is obtained from the database. Data quality difference is one of the most difficult problems that need to be resolved when building a data warehouse, and there are many different ways to improve the quality of information in the data warehouse. When it is found that there is a data quality problem in the source system, some methods need to be studied to improve data quality. A method of improving data quality in a data warehouse is to improve data quality in source system: Another method is to correct data from the old system to the data warehouse.
2. Data warehouse design and implementation
(1) Design and implementation process
Define the architecture of the warehouse, make capacity planning, select storage server, database, OLAP server, and other tools; integrated servers, storage, client tools; design warehouses and views; define physical data warehouse structures, determine data storage, Subregional and access methods; use data gateways, ODBC drivers, or other packages to connect data sources; design and implement data extraction, cleaning, conversion, loading, and refresh scripts; model, view definitions, scripts, and other elements Data is loaded into a warehouse; design and implement end-user applications; putting a data warehouse and its application based on its application.
(2) Item Need to pay attention
The model design of the data warehouse (including logical model design and physical model design) is the key to the basis and success or failure of the system, and the following issues should be paid attention to:
The aid of the subject: The subject is a logical concept, which should be able to fully describe the data involved in the analysis object and interconnect. The subject matter of the subject is mainly from two aspects: analysis of the original fixed statements and interviews on business people. The original fixed statement can better reflect the demand for data analysis in the past, and its data meanings and format are relatively mature, stable, and a large number of references are required in the model design.
The details of the analysis content: The theme is actually directly related to the scope of the analysis content. Once the subject is divided, the next step is the specific content of the refinement analysis and determines its location in the data warehouse according to the nature of the analysis. . Typically, the dimensional element corresponds to the analysis angle, and the metrics correspond to the specific indicators of the analysis. An indicator is a dimensional element, metrics or dimensional properties, depending on its specific business needs, but from actual operations, such conceptual experience can be summarized: as a dimensional element or dimensional property is usually discrete data, only allowed Finite value; as a measure of continuous data, the value is unlimited.
Granular design: The particle size of the data stored in the data warehouse model will have a multifaceted impact on the information system. In fact, the level of various dimensions will be used as the finest granularity, and the data that determines whether the stored data can meet the functional requirements of information analysis, and the level of granularity division and the choice of particle size in the polymer table will directly affect the response time of the query. In the process of data extraction, you should pay attention to the following points:
Data-extraction rules should be used as metadata to specify and manage, the source tables, source fields, destination tables, destination fields, conversion rules, and conversion conditions during the extraction process are detailed records. This is not only for programmers, but also makes modifications when the extracting rules or logic model changes.
How to record changes in the business database is an important part of data extraction. Since the data warehouse is saved on time, the difference between data between different time points is a key factor. It is usually possible to utilize the information provided by the database management system to generate data change logs in the database level. The extraction is completed according to the change of the log again, which is a comprehensive consideration from performance, operability, and the effects of the original business system. More ideal methods.
When the data in the same table in the data warehouse comes from different tables in the original system, even when different libraries are taken, they must ensure that these data units are consistent, and they meet the same time condition.
Data extraction not only considers the extraction of data, but also considers the extraction schedule and execution, which is a complete data extraction plan, and it can guarantee the accurate and available data.
3. Three tool layers of the data warehouse system
OLAP query analysis tool, DSS analysis predictive tool and data excavation tool together constitute the tool layer of the data warehouse system. Their respective side focus is different, the applicable range and the target for users are different. The data warehouse system has three tools that truly use a lot of valuable information in which it contains.
(1) Online analysis processing (OLAP)
Online analysis processing is mainly analyzed, queries and reports in a multi-dimensional manner. It is different from traditional online event processing (OLTP) applications. OLTP applications are mainly used to complete user transaction processing, such as civil aviation booking systems, bank savings systems, etc., usually to perform a large number of update operations, while the response time is relatively high. The OLAP application is mainly analyzed by the current and historical data of the user, assisting leadership decisions. Its typical application has the analysis and prediction of bank credit card risk, the company's marketing strategy has been developed, mainly for a large number of query operations, and the requirements of time are not strict.
Currently, the common OLAP mainly has a multi-dimensional database-based MOLAP and ROLAP based on a relational database. In data warehouse applications, OLAP applications are generally front-end tools for data warehouse applications, while OLAP tools can also be used with data mining tools, statistical analysis tools, enhance decision analysis.
(2) Decision Support System (DSS)
The Decision Support System (DSS) and the target users of the data warehouse are the same for the business of the medium-high-level leadership, and they are implemented by decision-making and trend analysis. Some of the technologies in DSS can be well integrated into the data warehouse and make the data warehouse's analytical capabilities more powerful. For example: Traditional statistical analysis models in DSS help users make more effective and more in-depth analysis of data in the data warehouse, thereby better grasping and utilizing information. Some intelligent decision technology, such as artificial neural networks showed powerful functions in discovering customer behavior patterns, predicting financial market behaviors. The applications of these DSS core technologies in the data warehouse will not only improve the decision support capabilities of the data warehouse, but also make the DSS application range more extensive.
(3) Data mining
Data mining is a popular technology in the current industry that has produced huge benefits in multiple applications. Data mining is not necessarily based on a data warehouse, but if you work with data mining and data warehouse, you can simplify some of the steps of data mining process to greatly improve data mining work efficiency. Since the data of the data warehouse comes from the entire enterprise, the wide range of data sources in data mining is guaranteed. Data mining techniques are more important and relatively independent parts in data warehouse applications. At present, data mining technology is in development. Data mining involves a variety of technologies such as mathematical statistics, fuzzy theory, neural network, and artificial intelligence. The technical content is relatively high, and the difficulty is relatively difficult. In addition, data mining technology also combines with visualization technology, geographic information system, statistical analysis system, enriched data mining technology and tool function and performance. 4. Evaluation indicator of data warehouse platform
Since many database vendors are vigorously promoting and promoting their own data warehouse solutions, end users are at a loss. So, is there a third-party institution or organization to develop a relatively fair and authoritative evaluation criteria? The answer is yes.
At present, there are two main evaluation indicators specifically for data warehouse platforms:
(1) TPC-D
TPC is an international organization, composed of 45 members, and multinational companies such as IBM, Microsoft, NCR, NEC, HP, Sun are their members. TPC is responsible for developing a unified, just test standard for various open platforms in different types of applications.
For OLTP systems, measure the main indicators of its database performance is TPC-C, which is not analyzed here. For data warehouse systems, the main indicators of the database performance are TPC-D. There are mainly 3 data to consider:
Qppd: Describe the system's query processing power. QTHD: The result of the traffic test, describing the processing capability when multiple users simultaneously queries. In other words, it also fully represents the parallel processing power of the system. QPHD: The price performance ratio.
Obviously, the larger the data of the two indicators, and the last one is, the better. Of course, the first thing to consider should be to meet the business needs.
The TPC-D value of each supplier and the detailed description of TPC-D can be found from the TPC on the home page of the Internet. Alternatively, TPC-D test results can also be found on the homepage of Microsoft, Ideas and other companies.
It is necessary to explain that the flow test results are required. Although it describes the ability of the system to process concurrent query requests, not all vendors' traffic tests are performed in multi-user state. TPC-D gives the supplier a choice: directly performs traffic testing in multi-user state; or first test in a single user state, then utilize the measured processing capability index qppd and flow index calculation formulas to calculate QTHD .
How to distinguish these two test results? Just download the TPC-D test summary download and printed, you can understand the number of streams when doing traffic testing. The number of STREAM actually represents the number of users who simultaneously submit the query request. If it is a test in a single user state, you can only find a stream, which is Stream00.
(2) Data Challenge
Since TPC-D has made very strict regulations for the test database model, data loading, and all inquiry, each manufacturer participating in the test may make many adjustments in advance, making its performance better, resulting in the actual application of data warehouse The situation has a large difference. Therefore, the test results of TPC-D are primarily given a preliminary reference to the user when making a data warehouse software and hardware platform selection.
In addition to TPC-D, there is also a test standard called Data Challenge (Data Challenge) announced in May 1998. Different from TPC-D, it is very much attention to the dynamic query capabilities of the system, all inquirys are not open, and manufacturers involved in the test cannot be pre-adjusted. Before the test, each manufacturer set a good environment according to the regulations, followed by Data Challenge's technical experts to conduct various performance evaluation. When the user decides to select a manufacturer to apply a data warehouse system, at least the following issues should be considered:
At present, there are some problems with urgent need to solve, these problems can be solved with a traditional production system? Does the manufacturer have experience in implementing data warehouses in this industry? How's the effect? How much data is currently available, how can future expansion requirements, can you do online upgrades? What is the parallel processing ability of the system? Because it will directly affect the ability of the system to process complex queries and dynamic queries. Is the management of the system complicated? Is there a problem with database reorganization? This manual is very expensive because complex management requires a lot of database administrators. What is the high availability and reliability of the system? How big is the impact on the business when the system fails, how is the ability to tolerate this fault?
When considering the above problems, the actual system of investment is generally achieved by expected results.
Where is the data warehouse?
1. Technical trend
The development of data warehouse technology includes data extraction, storage management, data performance, and methodology.
In terms of data extraction, future technological development will focus on system integration. It incorporates interconnection, conversion, replication, scheduling, monitoring, etc. to standardized unified management to accommodate the changes in the data warehouse itself or data source, making the system more easily managed and maintained.
In terms of data management, future development will make database vendors to explicitly launch data warehouse engines and be used as server products and database servers. In this respect, a parallel relational database with decision-making support will be the most potential.
In terms of data performance, the algorithms and functions of the mathematical statistics will be generally integrated into online analysis products, while complicating with Internet / Web technology, launching data warehouse applicable to intranet, terminal-free maintenance-free access front end. In this regard, the front-end software of the data warehouse user before the industry is applied, and the product will form the product and as part of the data warehouse solution. The methodology of the data warehouse implementation process will become more popular, which will become a clear branch of database design and become an essential part of management information system design.
The tendency of data warehouses developed by computer applications is the driving force of data warehouse development. The traditional online transaction processing system is not considered by the data warehouse, but the actual application has a demand for the functions available to the data warehouse. Therefore, many transaction systems have fallen into a two difficult situation in recent years: add limited online analysis functions, including complex reports and data summary operations, which seriously affects the transaction online performance, on the other hand, Statistical analysis is not fully embodied due to various limits in the system structure. Therefore, the application technology is moving towards more detailed, more professional directions.
In a new generation of applications, the data warehouse is incorporated into the system design at the beginning, online analysis will be applied to a universal transaction processing system. On data management, online transaction processing and data warehouses are relatively independent in applications, and the online transaction processing system itself will be more concise and efficient, and statistics are also more convenient. The industry-oriented mathematical statistics to the more common application development, and integrate into the application system data warehouse solution. They will better serve business decision services based on the rich information provided by the data warehouse.
2. market prediction
In the market, the development of data warehouses can be seen from both manufacturers and users. For vendors providing data warehouse products and solutions, the harsh market competition is an eternal theme, which is unable to provide a full solution manufacturer, may be acquired by other companies. For example, engaged in data extraction, software companies that provide special tools are likely to incorporate large database vendors. It is a company that can have two companies that can continue to develop: First, there are companies with powerful databases, data management backgrounds; seconds, specializes in providing technical consultations for the implementation of data warehouses for specific industries. From the user's perspective, data management's traditional fields, such as financial, insurance, telecommunications and other industries, except for credit analysis, risk analysis, fraud detection, data warehouse applications will change with modern social business model Further popular and in-depth.
In recent years, a revolution is changing product manufacturing and service providing way, it is a digital custom economic model. In this world, users can purchase a computer according to their own requirements, a jeans designed according to their own body shape, a health medicine produced according to their own needs, a pair of glasses matching their faces ... large scale Customization is not only a manufacturing process, logistics system or sales strategy, which is likely to become an organization principle of enterprise production. In the future, in the future, the data warehouse will become a key weapon for enterprises to win competitive advantage.