Level: primary
Marty Lurie, IBM System Engineer, Massachusetts, Waltham
August 01, 2003
The popular writer and the recommender Marty Lurie solves the federal issue - that is, the data federation. Marty illustrates the working principle of IBM Federal Database Technology, the key technological success factors, and how to prove that IBM is effective when testing these factors.
Introduction
If you have a variety of databases, you should find that you will communicate with each other. This article will help you deal with this problem by solving interoperability between databases. We will use the IBM Strategic Federal Database Technology - the core of the IBM® DB2® Information Integrator (DB2® Information Integrator). We will study the working principle of this technology, the key technological success factors, and how to prove that IBM is effective when testing these factors. In Part 2 of this article we will also study insert performance and how the federal server is generated from the data stored in multiple data sources.
Back to top
Why do I need a database federation, what is the database federation?
As a DBA, you may not be willing to manage many different brands of database - I haven't met with DBA: "Yes, yes, I want a variety of different databases, I am not busy enough." If you All different databases, including selection, insert, update, and delete, as all tables are located in a single database, will be very high work efficiency. The database federation is to do this: make all tables look like in the same database. The following sample SQL is explaining how powerful this technology:
DB2 => INSERT INTO Remote_INFORMIX
Select A.Customer, B. Palance, C.Limit
From remote_db2 a, recote_oracle b, Remote_sybase C
WHERE C.LIMIT> 10000
And a.c_key = b.c_key
And a.c_key = c.c_key;
Database federation allows us to no longer need to build a data market! The performance optimization example of a data mine alternative is as follows. If you want to query is not very large, and if the summary table usually meets the requirements of the query, then there is no need to create a new server and a lot of data, which can greatly improve product efficiency. . Of course, the data market or data warehouse is the preferred solution for a busy heavy query that needs to access the minimum level of details.
Back to top
How does the database federation work?
It's a good thing. Oh, I am sorry, what is the technical content of this lecture?
Consider the system diagram as shown below:
Figure 1. Federal member
"Federator" system "System" is operated by the table in the federatee ". The remote table appears as a virtual table in the "Federator" database. The client application can perform operations on the virtual table in the "Federator" database, but true persistence stores in the remote database. We will study a sample client program that inserts the program in the next part of this article.
Each "Federal" will see "Federalists" as another database client connection. "Federatee" is just a client request for database operation. "Federal" needs to use client software to access each remote database. To access each type of federal, you need to install IBM Informix®, Sybase, Oracle, etc.
The database federal application interface is SQL. This greatly improves work efficiency compared to the new interface that must be learned. Use the same syntax to select, insert, update, and delete the same syntax for the local table to access the remote table. It is not possible to perform all tables, but the information integrator in DB2 V8 has made great progress in this regard by providing insertion and update features. Back to top
Installation and configuration
This is a summary and focus of installation and configuration processes. Don't forget to read the manual. You will need to install DB2 V8.1 with the information integrator option. Select Custom when installing, then Enable INFORMIX DATA SOURCE Support under the Server Support option. The Informix wrapper is currently available with the 8.1 server; other wrappers are still in the BETA test phase, and it can be used soon. For tips and techniques for Linux kernel parameters suitable for V7 and V8, please refer to my previous article: Simulating Massively Parallel Database Processing On Linux! For best practices for configuring the Informix Remote Server, see My article: Winning Database Configurations : An IBM Informix Database Survey.
To be anxious, please refer to Appendix A, which contain a complete SQL setting in the federal environment.
The steps to set the federation are as follows:
Step 1: Set the client connectivity between the federal and remote database servers
Before trying to make a federal configuration, you need to make the remote client software to operate. Test a simple client program (such as Demo1.ec from Informix, or similar client programs from Oracle or Sybase) to verify connectivity. If the client software cannot access the remote server, you can't continue.
Step 2: Create a db2dj.ini file that contains the parameters required by the federal
DB2 V7 and DB2 V8 require db2dj.ini files. This file provides parameters and paths for remote databases. The sample shown here references two remote servers: one is IBM Informix Extended Parallel ServerTM (XPS), one is Informix Dynamic ServerTM (IDS). Here is a sample from Windows® 2000 machine:
C: // program files // ibm // sqllib // cfg> Type db2dj.ini
InformixDir = C: // Program ~ 1 // Informix // Client ~ 1
Informix Server = FLIIF
InformixSqlhosts = C: // TMP // SQLHOSTS
InformixServer = FLXPS
This example shows how to access two remote servers: one is IBM Informix V9 and the other is IBM Informix XPS.
Need to inform the database how to get the file. Be sure to use a fully qualified path name. Don't use relative path names - otherwise it will be bad. For Windows 2000, please use:
DB2_DJ_INI = C: Progra ~ 1 // IBM // SQLLIB // CFG // DB2DJ.INI
For UNIX® or Linux, use:
DB2SET DB2_DJ_INI = / Home / DB2Inst1 / SQLLIB / CFG / DB2DJ.INI
Step 3: Create "Package" for the remote database
"Packaging" defines a library file that knows how to communicate with the federal database. It uses the client connectivity software you set in step 1 to access the remote database.
There are two ways to define the wrapper: using SQL, or using a graphical user interface (GUI) in DB2 V8. Examples of using SQL Creating a wrapper are as follows:
Create Wrapper "Informix" library 'db2informix.dll';
Start the GUI wizard from the Control Center, as shown in Figure 2. Right-click the Federated Database Objects ID in the tree navigation pane you want to configure. As shown in the illustration. It is recommended that you use the buttons show SQL when using the GUI. It makes you better understand what happened after the scene (see Figure 3 for example). Ten the Show SQL button as your private DBA teacher.
Figure 2. Creating a wrapper using the Control Center
Figure 3. After the scene of GUI
Step 4: Define the federal server - Remote Database instance
Use the CREATE Server statement (or control center) to define a remote instance. Please note that we have used the wrapper created in the previous step. There are many options that can be used to specify relative CPUs and I / O speeds, as well as network speed and other parameters. IBM recommends that you accept the default values for most parameters. The parameters shown below are only used to illustrate how these parameters are specified.
Below is an example of SQL:
Create Server "RCFLIIF"
Type Informix Version '9.3'
Wrapper "Informix" options (Node 'Fliif ",
DBNAME 'Stores_Demo'
, Add cpu_ratio '1'
Add IO_TIO '1'
, Add comm_rate '1'
, Add db2_maximal_pushdown 'y'
);
Node specifies the remote database server, it is not a TCP / IP hostname. DBNAME defines a remote database.
Pushdown is the default value that indicates that if possible, the connection should be connected to the remote server. We will test this and study some database optimizer description plans in the next section. DB2_MAXIMAL_PUSHDOWN is a non-document-free parameter (herein, it tells the federal man to send SQL and connect to "by the federal", even if the optimizer is considered to be more than the data and the connection is better.
Other options you may want to use include fold_id and fold_pw, both of which are set to "N". This allows the federal users to use the identity and password connection data source that is identical in "user mapping) (see Step 5). If there is no setting, the federal server will try to connect four times, and the first attempt to connect with the user ID and password all written, and the fourth attempt is connected to the user ID and password all written. If the user identity and password are written (this is more common in the case of UNIX, Linux, and Windows systems), then set Fold_ID and FOLD_PW to "n" will make the connection data source a little faster.
Step 5: Creating user mappings for federal authentication
The federal wants to get data from the remote database. It needs to be authenticated as any other client program. "User Mapping" provides a mechanism for establishing a data request credential.
The local DB2 user ID is mapped to the user ID on the remote server. This example is mapped to the remote "Informix" identifier from the local identity "Lurie".
Create User Mapping for "Lurie" Server "RCFLIIF" Options (Remote_Authid 'Informix', Remote_password 'UseYourown');
The DB2 Control Center also provides a GUI definition method for user mapping. As shown in Figure 4:
Figure 4. Mapping User ID Using the Control Center
Step 6: Table of the Type - Access to the Remote Table
We have implemented the following so far:
Packaging: Define the server type and identify the binary file to access it. Server: Defines the location of the remote server and database on the server. User mapping: Defines the authentication to the remote server.
Now we only need some tables!
Define each table to the Federator through an alias (Nickname). Once an alias is defined, we can use it like a local table name. The following example illustrates the remote table "CARD 1" of the user "rcfliif" belonging to the remote server "RCFLIIF" is defined as an alias "RC9_CARD1".
Create nickname rc9_card1 for "rcfliif". "lurie". "Card1";
congratulations! With the right alias, we can do some "exciting" things now, such as:
SELECT * from RC9_CARD1;
A table name can provide useful identifiers. This naming convention is quite simple. "RC" is an abbreviation for relational connectivity (RELATIONAL Connect). "9" indicates that the remote server is Informix V9. The last "Card1" is a table name. This table has several generation columns of several different bases. See Appendix B for source code for creating a table.
I believe that the GUI fans are looking forward to a fence, I will never let you down. The first is a filter dialog sample, which limits the table to the alias of the candidate (Figure 5):
Figure 5. Filter
A list of tables is displayed after the filtration is shown (Figure 6). There is only one table to meet the filter conditions in this example.
Figure 6. Table that meets the filter conditions
Click OK, then Figure 7, which is a data sample from the table represented by the new alias.
Figure 7. Table to satisfy the filter condition
Back to top
Federal connection
When the server is configured, it will be tested. The environment I use is shown in Figure 8. Creating the SQL column of this environment in Appendix A.
The first query we have to study will use two tables in the remote IDS9 server: "RC9_CARD1" and "RC9_CARD2". Then we will optimize four remote tables using some advanced technologies in the specificization data cache.
Figure 8. My test environment
Back to top
Where do you perform a connection?
What handling a table connection is selected from the federal cost-based optimizer. This option is one of the most important aspects of federal performance. The optimizer can choose to send SQL to the remote server and use the remote engine to connect to the table. This is the so-called push operation (PushDown). In addition, the federal can retrieve each line from the remote server and can be connected using the federal connection engine.
Let us inquire:
SELECT A.C3, A.C10, SUM (A.C100) AS SUM_C100, SUM (A.C1000) AS SUM_C1000
From rc9_card1 a, rc9_card2 b
Where a.superkey = b.superKey
Group by a.c3, a.c10;
Figure 9 demonstrates each line from a remote server and then processes their description plans in the federal engine. The total cost of this connection exceeds 45 million Timeron.
Figure 9. Make a federal connection plan if the plan is not used to determine where to perform a connection to the best solution:
The relative speed of the Federal CPU and the relative speed of the federal CPU relative speed and throughput network speed by the federal database
The CREATE Server configuration in the control center shows a number of parameters that can be tuned relative speed according to the server (Fig. 10).
Figure 10. Performance parameters used using the Control Center Tuning a particular subject by the federal
Based on experience, the lower push operation is useful. When the federal database can break freely, it should be subjected to the operation as much as possible. For weak remote data sources (such as a flat file set), it is best (sometimes it is necessary) to connect by retrieving all rows and connecting to the federal server.
After performing the down operation, the query plan looks great, as shown in Figure 11. Total cost is greatly reduced compared to the case where the operation is not pushed.
The optimizer actually considers where to connect. I am trying to make this work as simple as possible. Don't worry too much to understand the distributed connection theory - there are too many interesting federal technology applications, so it is difficult to detail the theory. The latest statistics for federal servers and remote servers will improve the query plan.
Figure 11. Under the implementation of the operation connection is much faster
Back to top
Faster Federal Connection - Avattered Query
The fastest way to connect two tables is that they do not connect to them. The TPC-D benchmark (currently revocation) of the Transaction Processing Council (TPC) is finally proof: If the query answer is calculated in the summary table, the query will run more fast.
How can I apply this summary table acceleration technology to the federal environment? To truly use the summary table, you must use them whenever possible. A monthly data summary can satisfy a month, a quarter or a year of aggregation level. However, the optimizer must be smart to overwrote the query to operate the summary table instead of looking for details. We will test this feature.
Adding a summary table to the federal manner is very similar to the non-federal environment. These tables are referred to in DB2 V8, referred to as a specific query table (MQT). A powerful feature of the federal is to create MQT on the alias (remote) table.
Add a specific query table
The MQT contains a SELECT statement for defining how the data is summarized. The SQL of the MQT is as follows:
--create a materialized query Table or MQT
- this Will Allow The Optimizer To Rewrite SQL
- And Not Access The Remote Servers
Drop Table Card_mqt;
CREATE SUMMARY TABLE CARD_MQT
AS (
SELECT A.C3, A.C10, SUM (A.C100) AS SUM_C100, SUM (A.C1000)
As SUM_C1000
From rc9_card1 a, rc9_card2 b, RC8_CARD1 C, RC8_CARD2 D
Where a.superkey = b.superKey
And b.superkey = C.superKey
And C.superKey = D. SuperKey
Group by a.c3, a.c10)
Data initially defresh defresh defresh defresh defresh
- Populate the MQT with Data
Refresh Table Card_mqt;
- Very Important - Tell The Optimizer The MQT
- Alive and Well and Open for Business
SET CURRENT REFRESH AGE = Any;
So how much improvement is MQT?
First consider four table queries across two different servers. The query cost is not performed. Query SQL as follows:
SELECT A.C3, SUM (A.C100) AS SUM_C100
From rc9_card1 a, rc9_card2 b, RC8_CARD1 C, RC8_CARD2 D
Where a.superkey = b.superKey
And b.superkey = C.superKey
And C.superKey = D. SuperKey
GROUP BY A.C3;
Select this query because it is different from the aggregation level that we use is different from the MQT you just defined. This increases the difficulty to the optimizer - we want to know if the optimizer is smart to rewrite the query to use the MQT instead of all data on the network.
A query plan that does not use MQT is similar to Figure 12:
Figure 12. Query plan when using MQT
Let us add MQT now. There is no doubt that the optimizer is very intelligent, enough to rewrite the query to use the use of the "brute force" to give the answer. The cost is sharply cut to only 25 Timeron (Figure 13).
Figure 13. Due to the use of MQT query performance
Let's take a look at what happened using MQT: The Visual Explain tool displays SQL and actually executed SQL provided to the federal.
Submitted SQL query references the remote table on the federal. The packet is performed on the column C3. This increases the difficulty to the optimizer because MQT is gathered on column C100.
Figure 14. Inquiry submitted to the federal (before optimization)
Now, look at Figure 15. After the optimization is not connected! Single-table SELECT operations in MQT meet the query. The optimizer uses system metadata to confirm that the table CARD_MQT needs to make a simple gathering rather than using a remote table.
Figure 15. The actual execution query uses MQT
Eliminate data market
The optimizer query rewriting is used with MQT on the federal server, which is a very powerful combination of features. There is no need to build all the work of the data market, and MQT provides another method for many queries. This cannot replace the data market when the query uses a function of the disabled sprite cache and a large amount of data.
It should be filled and refreshed in MQT to minimize it to minimize the performance of optimization and communication flow.
Back to top
Conclude
The above is an introduction to the federal. In the next article, we will study other methods of applying federal technology, specifically:
The federal insertion is based on the XML generated by the federal connection, as well as XML and WebSphere® MQ: Chocolate with peanut butter.
Back to top
Appendix A
- this SQL Creates A Federated Environment
- for Two Different
- Federates
- Connect to the db2 database
CONNECT to SAMPLE;
DROP Wrapper "Informix";
Create Wrapper "Informix" library 'db2informix.dll';
- CREATE A Server for a Remote Instance of Informix 9.3
Create Server "RCFLIIF"
Type Informix Version '9.3'
Wrapper "Informix" options (Node 'Fliif ",
DBNAME 'Stores_Demo' -, Add CPU_Ratio '0.0000001'
-, add o_ratio '0.0000001'
, Add cpu_ratio '1'
Add IO_TIO '1'
, Add comm_rate '1'
- NO, STILL NOT PUSHING DOWN JOIN, ADD PUSHDOWN 'Y'
, Add db2_maximal_pushdown 'y'
-, Add PushDown 'Y'
);
- Create a Server for a Remote XPS Informix V8.3 Instance
Create Server "RC_XPS"
Type Informix Version '8.3'
Wrapper "Informix" options (Node 'FLXPS',
DBNAME 'Stores_Demo'
, Add cpu_ratio '1.0'
Add IO_RATIO '1.0'
Add Comm_Rate '2'
Add PushDown 'Y'
);
-
Create User Mapping for "Lurie"
Server "RCFLIIF" options (remote_authid 'informix),
Remote_password 'useYourown');
Create User Mapping for "Lurie"
Server "rc_xps" options (remote_authid 'informix ",
Remote_password 'useYourown2');
-
Create nickname rc9_card1 for "rcfliif". "lurie". "Card1";
CREATE NICKNAME RC9_CARD2 for "RCFLIIF". "lurie". "Card2";
CREATE NICKNAME RC8_CARD1 for "RC_XPS". "Lurie". "Card1";
CREATE NICKNAME RC8_CARD2 for "rc_xps". "lurie". "Card2";
--create a materialized query Table or MQT
- this will allow the optimizer to shutrite SQL TO
- The Remote Servers
Drop Table Card_mqt;
CREATE SUMMARY TABLE CARD_MQT
AS (
SELECT A.C3, A.C10, SUM (A.C100) AS SUM_C100,
SUM (a.c1000) as sum_c1000
From rc9_card1 a, rc9_card2 b, RC8_CARD1 C, RC8_CARD2 D
Where a.superkey = b.superKey
And b.superkey = C.superKey
And C.superKey = D. SuperKey
Group by a.c3, a.c10)
Data Initially Deferred Refresh Deferred; Refresh Table Card_mqt;
SET CURRENT REFRESH AGE = Any;
- Run Various Queries and Capture Explain Plans
SET CURRENT EXPLAIN SNAPSHOT = YES;
VALUES CURRENT TIMESTAMP;
SELECT A.C3, A.C10, SUM (A.C100) AS SUM_C100,
SUM (a.c1000) as sum_c1000
From rc9_card1 a, rc9_card2 b
Where a.superkey = b.superKey
Group by a.c3, a.c10;
VALUES CURRENT TIMESTAMP;
SELECT A.C3, A.C10, SUM (A.C100) AS SUM_C100,
SUM (a.c1000) as sum_c1000
From rc9_card1 a, rc9_card2 b, rc8_card1 c,
RC8_CARD2 D
Where a.superkey = b.superKey
And b.superkey = C.superKey
And C.superKey = D. SuperKey
Group by a.c3, a.c10;
VALUES CURRENT TIMESTAMP;
SELECT A.C3, SUM (A.C100) AS SUM_C100
From rc9_card1 a, rc9_card2 b, rc8_card1 c,
RC8_CARD2 D
Where a.superkey = b.superKey
And b.superkey = C.superKey
And C.superKey = D. SuperKey
GROUP BY A.C3;
VALUES CURRENT TIMESTAMP;
SET CURRENT EXPLAIN SNAPSHOT = NO;
Back to top
Appendix B
- (c) Copyright 2003 Martin Lurie and IBM CORP
DROP TABLE CARD1;
CREATE TABLE CARD1 (SuperKey Serial,
C3 Int,
C10 INT,
C100 INT,
C1000 INT)
IN dbspc1;
DROP TABLE CARD2;
CREATE TABLE CARD2 (Superkey Int,
C3 Int,
C10 INT,
C100 INT,
C1000 INT)
IN dbspc1;
- Stored Procedure to Populate The THE
- Table Drop Procedure Pop_card;
Create Procedure Pop_card (Tot_Rows Int)
Define rows_in integer;
Define C3cnt Integer;
DEFINE C10cNT Integer;
DEFINE C100CNT INTEGER;
DEFINE C1000cnt Integer;
Let rows_in = 0;
FOR C1000CNT = 1 to 1000 STEP 1
For c100cnt = 1 to 100 STEP 1
For c10cnt = 1 to 10 step 1
For c3cnt = 1 to 3 Step 1
INSERT INTO Card1 Values (0,
C3cnt,
C10cnt,
C100cnt,
C1000cnt);
Let rows_in = rows_in 1;
IF rows_in> tot_rows then
EXIT for;
END IF;
End for; - c1000cnt
IF rows_in> tot_rows then
EXIT for;
END IF;
End for; - C100CNT
IF rows_in> tot_rows then
EXIT for;
END IF;
End for; - C10cnt
IF rows_in> tot_rows then
EXIT for;
END IF;
End for; - C3cnt
End procedure;
Execute Procedure Pop_card (50000);
SELECT Count (*) from Card1;
Related Information
Using a specific query table to speed up the query in DB2 UDB Getting Started on Integrating Your Information Use a Flexible Infrastructure for Integrating Information in Web Applications in Web Applications Information Integration Technology Preview
About author
Marty Lurie computer career's starting is to try Write Fortran on the IBM 1130 with a punched paper strip. His daily work is responsible for the system engineering of the IBM data management department, but if he asked him, he admits that most of the time is in the martial art. The most proud program he wrote is to connect its Nordic TRACK to his laptop (this laptop minus two pounds, "cholesterol" is reduced by 20%). Marty is IBM-certified DB2 DBA, IBM-certified Business Intelligence Solutions Professional, and Informix authenticated Professional. You can contact him through lurie@us.ibm.com.