Data migration concept
The original legacy system is replaced by the new system. During its use, there is often a large number of precious historical data, many of whom have been enabled by the new system. In addition, these historical data is also an important basis for decision analysis. Data migration is to clean, transform these historical data, and load them into a new system. Data migration is mainly suitable for a set of old systems to switch to another new system, or when multiple industrial systems switches to the same set of new systems, the historical data in the old system is needed to switch to the new system. Bank, telecommunications, tax, industrial, insurance, and sales of systems, generally need data migration. For many-to-one, for example, due to the success of information construction, there are multiple different systems to run simultaneously, but they cannot achieve effective information sharing, so there is a need for a new system tolerate several old systems.
Data migration has a very important meaning of system switching and even new systems. The quality of data migration is not only an important prerequisite for the success of the new system, and it is also a powerful guarantee for the future stability of the new system. If the data migration fails, the new system will not be able to enable normally; if the quality of data migration is poor, all of the spam can not be masked, and the new system will cause a lot of hidden dangers, once the new system accesss these spam, maybe These spam generate new error data, which will also cause system abnormalities when severe.
Conversely, successful data migration can effectively guarantee the smooth operation of the new system and can inherit the precious historical data. Because regardless of a company or a department, historical data is undoubtedly a very precious resource. For example, the company's customer information, bank deposit record, tax department tax information, etc.
Data migration features
The data migration at the time of system switches is different from the data of the production system OLTP (On-line Transaction Processing) to data warehouse DW (Data Warehouse). The latter mainly synchronizes the production system to the data warehouse after the last extracted, this synchronization is performed in each extraction cycle, generally in the sky. The data migration is to convert the historical data of the need to a new production system. The most important feature is to complete the extraction, cleaning, and loading of large quantities of data in a short time.
The content of data migration is the basis of the entire data migration and needs to be considering from the perspective of information system planning. When you divide content, you can take into account both horizontal times and longitudinal modules.
Horizontal division
The time to generate data is divided, and it is necessary to consider how long historical data is migrated. Due to the development of information technology, the new system often needs to store more information daily than the old system, and in order to solve the performance bottleneck brought by high data volume, the new system generally only retains a certain period of time. Data, such as 1 year, and more than the save cycle, that is, 1 year ago to the data warehouse for decision analysis. For data migration of this new system, the main migration of data within 1 year, and the historical data before 1 year needs additional consideration.
Vertical division
The function module of processing data is based on the basis, and it is necessary to consider the function module that is not included in the new system, and the processing problem involved in the data. This type of data is generally not required to migrate to a new system due to the unable to establish mapping relationship. However, the old system is relatively coupled between the module, pay attention to the integrity of the data when longitudinally divided.
Three methods of data migration
Data migration can be made in different ways. There are three main methods in summary, that is, the system switches can be migrated by tools. The system is handled by manual entry, and the system is generated after the system is generated.
Before the system switches through tool migration
Before the system switches, the history data in the old system is extracted, converted, and loaded into the new system using the ETL (EXTRACT Transform Load) tool. Where ETL tools can purchase mature products or are autonomous developments. This method is the most important and fastest way to migrate. The premise of its implementation is that historical data is available and can be mapped to a new system. Manual entry before system switching
Before the system switches, organize relevant personnel to manually enter the required data into the new system. This method consumes a lot of labor, and the error rate is also relatively high. Mainly some of the data that cannot be converted to the new system, and the new system is enabled, and the data that cannot be provided by the old system can be used as a useful supplement to the first method.
After the system switches, generate through the new system
After the system is switched, the required data is generated through the related functions of the new system, or for this specially developed supporting program. The desired information is usually generated according to data that has been migrated to the new system. The premise of its implementation is that these data can be generated by other data.
Data migration strategy
The strategy of data migration refers to the migration of data in any way. Combined with different migration methods, there is a way to migrate, divided into migration, swearing, and then moving, first-movement.
A migration
One migration is to migrate all the required historical data in a new system through a data migration tool or migration program. The advantage of a migration is that the process of migration is short, relatively divided, and less problems involved in migration, and the risk is relatively low. Its disadvantage has a relatively large work strength. Since people who implement migration need to monitor the migration process, if the time required to migrate is relatively long, the staff will be tired. The premise of one migration is that the new and old system database is not large, and all data volumes can be completed in the permissible downtime.
Migration
The divided migration is to migrate the required historical data several times to the new system through a data migration tool or migration program. The divided migration can separate the task and effectively solve the contradiction between the data quantity and the short downtime. However, the divided switching results in multiple merger, increasing the probability of error, and in order to maintain the consistency of the overall data, it is necessary to synchronize the first switching data when migrating, and increase the complexity of the migration. Division migration typically migrates data that makes static data and changes in frequent data, such as code, user information, etc. before system switching, and then migrate dynamic data during system switching, such as transaction information, data changes that occur after static data migration, You can synchronize into a new system every day, or you can synchronize into a new system once in the system switches.
Self-ride
After the preemptive, the movement is before the system switches, first enters some data into the new system, and the system is switched to migrate other historical data. Since the presence, it is mainly for the case of specific differences in the new and old system data structure, that is, the current data required for the new system is enabled, cannot be obtained from the existing historical data. For this part of the data, you can enter by manual before the system switches.
First move
Before you move, it means that the original data is migrated to the new system before the system switching, and the original data is migrated to the new system, and then through the relevant functions of the new system, or for this to prepare the supporting program that has been migrated to the new system. The raw data in the middle generates the required result data. Before you move, you can reduce the amount of migration.
Data migration implementation
The implementation of data migration can be divided into three phases: preparations before data migration, implementation of data migration, and check after data migration.
Due to the characteristics of data migration, a large number of jobs need to be completed during the preparation phase, sufficient and thoughtful preparation work is the main basis for completing data migration. Specifically, the detailed description of the data source to be migrated, including the storage mode of the data, the amount of data, data of the data, and the data dictionary of the new-term system database, and the historical data of the old system, the new and old system data structure Difference analysis; differential analysis of new and old system code data; establish mapping relationship between new and old system database tables, to formulate data conversion for processing methods, development, subordinate ETL tools that cannot map fields, data conversion test plans, and verification procedures to formulate data conversion Emergency measures. Among them, the implementation of data migration is the most important link in three stages of data migration. It requires the development of the detailed implementation step process of data conversion; preparation of data migration environments; preparations on the business, ending unprocessed business matters, or telling a paragraph; the techniques involved in data migration are tested; finally implement data migration.
The verification after data migration is an inspection of the migration, and the result of the data check is to determine whether the new system can formally enabled. Data checks can be performed by quality inspection tools or writing check programs, and check the accuracy of the data by trial running new systems, especially queries, reporting functions, check data.
Technical preparation for data migration
Data conversion and migration typically include a number of work: old system data dictionary, old system data quality analysis, new system data dictionary, new and old system data difference analysis, establishing a new and old system data, development and deployment data conversion and migration Procedure, developed emergency plan during data conversion and migration, implementing old system data to new system transformation and migration work, checking the integrity and correctness of data after transition and migration.
Data conversion and migration programs, that is, the process of ETL can be roughly divided into extraction, conversion, and loading three steps. Data extraction, conversion is based on the mapping relationship of the new and old system database, and data difference analysis is the premise of establishing a mapping relationship, which also includes differential analysis of code data. The conversion step typically includes the process of data cleaning, data cleaning is mainly for the source database, performs corresponding cleaning operations on data such as erriness, repetitive, incomplete, violations, or logical rules, need to clean before cleaning Perform data quality analysis to find out data, otherwise data cleaning will not talk. Data loading is extracted by loading tools or self-written SQL programs, and the converted result data is loaded into the target database.
Data check
Data format check: Check if the format of the data is consistent and available, the target data requires Number.
Data length check: Check the valid length of the data. Special attention is required for a field of Char types to VARCHAR.
Interval range check: Check if the data is included in the interval defined in the defined maximum and minimum value; for example, the age is 300, or the entry date is 4000-1-1.
Null value, default check: Check the null value defined by the new legacy system, whether the default value is the same, the definition of different database systems may be different, requiring special attention.
Integrity check: Check the association integrity of the data. If the code value of the record reference exists, it is important to note that some systems are used after using a period of time, and the foreign key constraints are removed in order to improve efficiency.
Consistency Check: Check whether there is data that violates consistency, in particular, there is a system that is submitted separately.
Data Migration Tool Selection
The development of data migration procedures has two options, namely, an autonomous development program or a mature product. These two options have their own different features, and the choice should be analyzed according to the specific situation. Throughout some large domestic projects, there are many relative mature ETL products when data migration. It can be seen that these projects have some common features, including: there is a lot of historical data when migrating, allowing the downtime, facing a large number of customers or users, there is a third-party system access, and the impact generated by failure The face will be wide. At the same time, it should also be seen that the autonomous development procedures are also widely used. Related ETL products
Currently, many database vendors provide data extraction tools, such as Informix's Infomover, Microsoft SQL Server 7 DTS and Oracle's Oracle Warehouse Builder, which addresses the data extraction and conversion within a certain range. However, these tools do not automatically complete data extraction, and users need to use these tools to write appropriate conversion programs.
For example, Oracle's Oracle Warehouse Builder Data Extracting Tool, referred to as OWB, provided features include model constructors and design; data extraction, movement, and loading; metadata management, etc. But the process provided by OWB is cumbersome, and it is difficult to maintain, it is not easy to use.
In a third-party product, ASCENTIAL Software's DataStage is a relatively complete product. DataStage can extract data from multiple platform data sources from multiple platforms, load them into a variety of systems. Each of them can be done in the graphical tool. It also flexiblely scheduling external system, providing special design tools to design conversion rules and cleaning rules, etc., achieving a variety of complex and practical Features. Among them, simple data conversions can be implemented by dragging operations on the interface and calling some DataStage predefined conversion functions, complex conversions can be implemented by writing scripts or intending to other languages, and DataStage provides debugging environments, which can greatly improve development and Debug the efficiency of the conversion process.
Preparation for data extraction and conversion
Before data extraction, you need a lot of preparations. details as follows:
1. Set the extraction function based on each data table in the target database, the conversion machining description recorded in the mapping relationship. This mapping relationship is the result of the previous data difference analysis. The naming rule of the extract function is: f_ target data table name_e.
2, optimize depending on the SQL statement of the extracted function. The optimization method can be used to adjust the parameter settings such as sort_area_size and haveh_Area_size, start the parallel query, use the prompt to specify the optimizer, create a temporary table, and the source data sheet is Analyzes, increase the index.
3. Establish a scheduling control table, including the ETL function definition table (record extract function, conversion function, cleaning function, and load function name and parameters), extract the schedule (recording the extraction function to be scheduled), load the schedule (record to be scheduled Load information), extract log list (record start time and end time of each extraction function scheduling, and extracting correct or error message), load log tables (record the start time and end time of each load process schedule, and load process execution Correct or error message).
4, establish a scheduling control program, which dynamically scheduled the extract function according to the extraction schedule, and saves the extracted data into the plane file. The naming rules of the flat file are: target data table name .txt.
The operation of data conversion is mainly reflected in the ETL process as the conversion of cleaning and code data for source data. Data cleaning is mainly used to clean the spam in source data, which can be divided into pre-draw, cleaning in extraction, and then cleaning after extracting. The ETL source data is mainly cleaned by extracting. The conversion to the code table can consider conversion before extracting and transform during the extraction process. details as follows:
1. Data table in the source database involved in ETL, establish the cleaning function before the data extraction is established according to the result of the data quality analysis. The cleaning function can be uniformly dispatched by the scheduling control program before data extraction, or it can be scattered into each extraction function. The naming rule of the cleaning function is: f_ source data table name _t_c. 2. For the source database involved in ETL, according to the result of the code data difference analysis, the code data value of the required conversion is not changed or changed, considering the code referenced in the source data table before extracting the code in the source data sheet. Conversion. The pre-conversion requires establishing a code conversion function. The code conversion function is uniformly dispatched by the scheduling control program before data extraction. The naming rule of the code conversion function is: F_ source data table name _t_dm.
3. The code for the new and old code coding rules is different, considering the conversion during the extraction process. According to the results of the code data difference analysis, the extraction functions involving the code data are adjusted.
Data migration check
After the data migration is complete, you need to verify the migrated data. The verification after data migration is an inspection of migration quality, and the results of data verification are also important basis for judging whether the new system can formally enabled. The migrated data can be verified in two ways.
Quality analysis of the migrated data can be performed by a data quality inspection tool or writing a targeted inspection program. The verification of data after migration is different from the quality analysis of the historical data of the migration, mainly the difference in inspection indicators. The indicators of the migration data check mainly include five aspects: integrity check, whether the referenced foreign key exists; consistency check, the same meaning of the same meaning is consistent in different positions; the total score balance inspection, such as the total sum of the owed index Comparative comparison with different granularity of the division, dividends, records, check whether the number of records corresponding to the new old database is consistent; the examination of special sample data is checked, and the same sample is consistent in the new and old database.
The new and old system query data comparison check, through the respective query tools of the new and old system, the data of the same indicator is queried, and the final query result is compared; first restore the data of the new system before the old system migration, then the last day All business happened to the old system is completed to the new system, checking there is no abnormality, and the result of the old system is compared.