Foreword
Each database administrator will face the issue of data import, which may happen during new and old transplantation of the database, or during the recovery reconstruction process after the database crash, it is possible to create the simulation environment of the test database. In short, as a qualified database administrator, you should do a technical reserve that accepts various data import requests, while we must also meet the demanding of the import speed of people. This article discusses only various features and techniques of accelerated data imports provided by Oracle database, some of which can also be converted to other databases. Which of the following seven data import methods is the most applicable requirement for specific situations, and I also enumerate various factors that affect the import speed. In order to compare the effects of various data import methods, I created the sample tables and datasets, and imported the sample data set with various methods to calculate the overall import time and the import process to occupy the CPU time, and the time it derived here is for reference only. It should be noted that you recommend that you use the Oracle 9i Enterprise Database, of course you can also try to use the standard version of the standard version of Oracle 7.3. The machine used herein is configured to: CPU Intel P4, memory 256M, database Oracle 9i Enterprise Edition.
Example table structure and data set
To demonstrate and compare various data import methods, I assume that the data import task is imported into the Calls table of the Oracle database, and the external data file contains 100,000 call center records, nearly 6MB file size, specific data example as follows:
82302284384, 2003-04-18: 13: 18: 58, 5001, complaint, mobile phone three-pack repair quality
8230284385, 2003-04-18: 13: 18: 59, 3352, consultation, water supply hotline number
8230284386, 2003-04-18: 13: 19: 01, 3142, suggestion, set bus line
The table name that accepts imported data is Calls, the table structure is as follows:
Name NULL? TYPE Comment
------------ ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -
Call_id NOT NULL NUMBER PRIMARY Key
Call_date Not Null Date Non-Unique INDEX
EMP_ID NOT NULL NUMBER
Call_type NOT NULL VARCHAR2 (12)
Details varchar2 (25)
Insert INSERT
The easiest way to import data is to write an Insert statement and insert the data one by one. This method is only suitable for importing a small amount of data, such as the seed data of a table, such as SQL * Plus script. The maximum disadvantage of this method is that the introduction speed is slow, which takes up a lot of CPU processing time, not suitable for high volume data; and its main advantage is that the introduction concept is simple and modified, and there is no need to do more ready. use. If you have a lot of time, you can't send it, and you want to get to the database and CPU, this method is suitable for you. :)
In order to compare with other methods, 100,000 records are now imported into the Calls table, which consumes 172 seconds, where the introduction process occupies the CPU time of 52 seconds.
Insert INSERT by data sheet, the table has no index
Why is the last method takes a lot of CPU processing time, the key is that the index has been created in the Calls table. When a data is inserted into the table, Oracle needs to discriminate the new data and the old data in the index, while Update all indexes in the table, repeat the update index will consume a certain amount of time. Therefore, improve the import speed is to create an index or remove all indexes before importing the data, and insert an index of the table after the external file data is inserted one by one. This improves the rate of import, and the index created is also very compact, this principle is also suitable for bitmap index. For key constraines, you can temporarily expand or delete constraints to get the same effect. Of course, these doctors will affect the foreign key constraints of existing tables. An interpretation is required before deleting. It should be noted that this method is not appropriate in the absence of many data in the table. For example, there are 90 million data in the table, and at this time, you need to add tens of millions of data. The actual import of data saving will be re-created. The index of the 100 million data is exhausted. This is what we don't want. The results obtained. However, if the table to import data is empty or imported than the amount of data is much larger than the amount of data, the import data saves time will be a small amount to recreate the index. At this time, the method can be considered .
Accelerating index creation is another problem that needs to be considered. In order to reduce the working time sorting in the index creation, the size of the sort_AREA_SIZE parameter can be added to the current session, which allows the current session to perform more sort operations during the memory index creation process. You can also use the NOLOGGING keyword to reduce the REDO logs generated by the creation index. NOLOGGING keywords will have a significant impact on the restoration of the database and the StandBy standby database, so it is necessary to carefully consider it before use. priority.
Using this method, first delete the primary key of the Calls table and the unique index, then import data by one by one, then recreate the index after completion (the table is empty before importing data). The method consumes a total of 130 seconds, including the time to reconstruct the index, where the introduction process takes up 35 seconds.
The advantage of this method is to speed up the speed of import and make the index more compact; the disadvantage is the lack of commonality, when you add new complex mode elements (index, foreign keys, etc.), you need to add code, modify import execute program. In addition, the index of the delete table has a large performance impact on the index of the online user's query, and it is also necessary to consider that the deletion or failure of the main or unique critical constraints may affect Used to reference to their foreign keys.
Batch insertion, no index
The OCI programming interface in Oracle V6 adds array interface features. The array operation allows the import program to read external file data and resolved, submit the SQL statement to the database, and insert the data to retrieve the SQL statement. Oracle only needs to execute a SQL statement and then resolve the supplied data in memory. Batch import operations are more efficient than progressive insertion duplicate operations because only one resolution of SQL statements, some data tank operations, and the operation between the program and the database is significantly reduced, and the database is the operation of each data. It is repeated that this provides the database to the database. The advantage is that the overall time of data import is significantly reduced, especially the time of the process occupies the CPU.
It is required to be reminded that the data batch import operation can be executed through the OCI interface, but many tools and scripting languages do not support the use of this feature. If you want to use this method, you need to study whether the development tool you use supports OCI batch operation. The import program requires complex encoding and there may be an erroneous risk, lack certain elasticity. Using the above method, the program extracts external data into the array in memory, and performs batch insert operation (100 rows / time), retains the deletion / reconstruction index operation of the table, the total import time falls to 14 seconds, and the process takes up The CPU has dropped to 7 seconds, and the time spent on the actual import data is significantly reduced by 95%.
Create Table As SELECT, using Oracle9i External Table
A new feature of Oracle 9i is External Table, which is like a usual database table, with fields and data type constraints, and you can query, but the data in the table is not stored in the database, but is associated with the database Ordinary external document. When you query External Table, Oracle will resolve the file and returns the requirements that meet the conditions, just like the data stored in the database table.
It should be noted that you can connect External Table with other tables in the query statement (Join), but cannot add indexes to External Table, and cannot insert / update / delete data, after all, it is not a real database table . In addition, if the external file associated with the database is changed or deleted, this affects the results of the external table to return query, so before the change, you should call the database.
This method opens a new door for import data. You can easily connect external files with the database, and create a corresponding External Table in the database, then query the data immediately, just as external data has been imported into the database table. The only shortcomings need to be clear, the data is not really imported into the database. When the external file is deleted or override, the database will not access the data in the External Table, and the index is not created, and the access data speed will be slow. Create a calls_external (External Table Table), which is associated with external data files:
Create Table Calls_external
(Call_id Number,
Call_date date,
EMP_ID NUMBER,
CALL_TYPE VARCHAR2 (12),
Details varchar2 (25))
Organization External
(TYPE ORACLE_LOADER
Default Directory Extract_files_dir
Access parameters
(
Records Delimited by NewLine
Fields terminated by ','
Missing Field Values Are NULL
(
Call_id, call_date char Date_Format Date Mask
"YYYY-MM-DD: HH24: MI: SS",
EMP_ID, CALL_TYPE, DETAILS
)
)
Location ('Calls.dat')
);
Then the External Table is synchronized with the actually used table calls, delete the calls table and rebuild it:
CREATE TABLE CALLS
(
Call_id Number Not Null, Call_Date Date Not NULL,
EMP_ID NUMBER NOT NULL,
Call_type varcha2 (12) Not null,
Details varchar2 (25)
)
TableSpace TBS1 NOLGGING
AS
SELECT CALL_ID, CALL_DATE, EMP_ID, CALL_TYPE, DETAILS
From calls_external;
Because the Calls table is a real database table, you can create a cable to speed up access, the data in the table will be retained, even if the external data file is updated or deleted. The NOLOGGING keyword is used to speed up index reconstruction in the table statement.
Import data using this method, the total import time is 15 seconds, the process occupies the CPU for 8 seconds, which is slightly slower than the previous method, but it is not possible to use the External Table import data. Insertion is slower.
The advantage of this method is that the result is good for unlimited writing code. It is not like an OCI bulk insertion error risk, which can also use the DBMS_JOB package scheduled data import process to implement data import. Automation. Its disadvantage is that the target table must first delete and rebuild. If this method is not suitable, the user will encounter "Table or View Does Not Exist" when the user accesses data during the reconstruction process of the table. It applies only to Oracle 9i or later databases.
INSERT APPEND AS SELECT, using Oracle9i External Table
The previous method demonstrates how to create a database table associated with an external data file, and the data of the table is mapped by an external data file. Disadvantages is that the database table needs to be deleted first to maintain the consistent and synchronization of the external data file, and the data that imports increment is not required to delete the existing data. For this requirement, Oracle provides an INSERT statement takes an APPEND prompt to meet.
INSERT / * APPEND * / INTO CALLS
(Call_ID, Call_Date, EMP_ID, CALL_TYPE, DETAILS)
SELECT CALL_ID, CALL_DATE, EMP_ID, CALL_TYPE, DETAILS
From calls_external;
This statement reads the contents of the Calls_external table with external data files and increases it to table calls. Append Tip tells Oracle to use a quick mechanism to insert data, and can use the NOLOGGING keyword using the table.
It is foreseeable that this method consumes the same time with the previous method, after all, they are different phases solutions that use the External Table feature import data. If the target table is not empty, it will consume a slightly long time (because the longer index is to be rebuilt), and the previous Create Table As SELECT method is to create an index as a whole.
Sputabilities of SQL * Loader
SQL * Loader is an import utility provided by Oracle, especially for importing large amounts of data from an external file into the database table. The tool has many years of history, each version upgrade makes it more powerful, flexible and fast, but unfortunately it is mysterious and not intuitive, and can only be called from the command line window.
Although it has a disadvantage, it is the fastest and most effective import data method. By default it uses the "Conventional Path" regular option to import data, and its performance improvement is not obvious. I recommend using the faster import parameter option to call the Direct Path import option in the command line to add a "direct = true" option. In the "Direct Path" Import implementation, the program is directly written directly into the import data at the high water mark of the new data block of the database table, shorten the processing time of the data insert, and optimizes the very effective B binary method to update the table. index. Using this method, if the default Conventional path import option is used, the total import time is 81 seconds, and the process occupies CPU time is about 12 seconds, which includes the index time of the updated table. If you use the Direct Path import option, the total import time is just 9 seconds, and the process takes up the CPU time is just 3 seconds, and the index time of the updated table is also included.
It can be seen that although the index in the table is not deleted before the data is imported, the Direct Path import option using SQL * Loader is still fast and effective. Of course, it also has a disadvantage, just like NOLOGGING keywords, this method does not generate redo log data, and will not be able to return to the previous state after importing process error; the index of the table during the data import process does not work, users access the table There will be sluggish, of course, it is best not to allow the user to access the table during the data import process.
Partition Exchange
There is a limit on the data import method discussed above, that is, the user can access the database table after importing data is complete. In the face of 7 × 24 uninterrupted, this restriction will affect the real-time access of the user if we simply import data that requires increased data. Oracle provides a table partition function in this regard, which reduces the impact of import data operations on the user's real-time access data. The operation mode is like a hot-swappable hard disk, but the hard disk here is replaced by partition. . It is necessary to declare that the Partitioning partition feature is only available in the Enterprise Database.
In a partitioned table, the table presented to the user is a collection of multiple partition segments (segments). Partitions can be added when needed, unload or delete, partition tables, and tables in the database, as long as their table structure and field type are consistent, the exchanged partition table will have interactive Data of the table. It should be noted that this exchange is only performed on the data dictionary level of the Oracle database, and no data is actually moved, so the partition table exchange is extremely fast.
In order to create an experimental environment, first assume that the Calls table is a partition table, to create an empty partition part_01012004, used to save call data from January 1, 2004. Then you need to create a temporary table as calls_temp, which has the same field and data type as the Calls table.
We import 100,000 data into the calls_temp table using the import method previously introduced to the calls_temp table, and you can wait patiently to import into the calls_temp table, and create index and related constraints, all this does not affect the user's real-time access to the Calls table. Because we only operate on the Calls_Temp temporary table. Once the data import is completed, the calls_temp table has call data from January 1, 2004. At the same time, use the space partition named Part_01012004 in the Calls table, use the following statement to execute partition exchange:
Alter Table Calls
Exchange Partition Part_01012004 with Table Calls_tempincluding Indexes WITHOUT VALIDATION;
The partition exchange operation will only update the data dictionary of the Calls table very quickly, and the Part_01012004 partition table immediately has all the data of the calls_temp table, and the calls_temp table is empty. Assuming that the Calls table uses local index instead of the global index. Included Indexes in the above statement will ensure that the partition exchange includes the availability of indexes, and WITHOUT VALIDATION indicates that the matching of data in the alternate table is not checked.
in conclusion
The above explores a variety of data import methods of the Oracle database, each of which has its own shortcomings and applicable environments to meet your different import requirements, of course, you need to understand these methods, in speed, simple, flexibility , Seek an optimal import scheme between recoverability and data availability.
In order to compare the effects of various methods, we created an example to show the import efficiency and effects of various methods, from which you can choose the most suitable method for future data import work. At the same time, please remember that this article does not cover all Oracle data import technology (such as parallel data import technology), which requires us to continue to explore and attempts.
Data Import Method Overall Import Time (Second) Import Process Occupation CPU Time (Second) By Data Insert Insert172 52 By Data Insert Insert, Table No Arrival 130 35 Batch Insert, Table No Restaurant 14 7CREATE AS SELECT, using Oracle9i External Table158Insert Append as select, using Oracle9i External Table 158Sql * Loader ConvertIal Path Default Import Options 81 12SQL * Loader Direct Path Import Options 9 3