Marty Lurie
IBM IT Expert, Waltham, Ma
July 2003
We will focus on discussing a case study using IBM DB2® Information IntegratorTM (DB2 II). These examples aim to explore your ideas, not to demonstrate a recommended production environment.
introduction
By providing a single view of multiple servers, the federal database improves efficiency. As we have seen in Part 1 of this article, this not only simplifies data access, but also the federal server can connect (JOIN) and optimization across different databases very efficiently. In order to make Part 2 interesting, we will focus on a case study using IBM DB2® Information IntegratorTM (DB2 II). These examples are designed to explore your ideas, not to demonstrate a recommended production environment. Because I found case study than abstract concepts more easily "inscribed in the heart".
Our puzzle is: Our data is distributed on multiple servers, as shown in Figure 1:
Figure 1. Marketing Activity Mail list data distributed over multiple data sources
If you choose to accept the task, then your task is:
Create a mailing list for marketing activities. The data of this marketing activity is distributed on multiple data sources. Table 1 specifically lists the necessary data sources:
Table 1. Data source of the mailing list
"Do not send mail" list Oracle 9i high-value customer Excel® spreadsheet client list, record system Informix® XPS Customer Credit Level Informix IDS
The marketing activities you have to create are generated by the credit level, producing the pre-shopping history of "High Value" customers and the purchased marketing data to provide different rewards.
The sales list must exist in the format of the XML stream and send it to the production company of marketing activities through a persistent message queue.
Solution to this case study involves the following steps:
Setting and testing data sources
Develop SQL for realizing marketing activities
The query results are processed into XML and the XML stream is queued so that it can be read by any queue client.
As promised in Part 1 of this article, this case study solution includes a federal insert, according to the federal connection XML and XML and WebSphere® MQ.
When my editor saw this problem, she said: "Do you want to put it in Part 2?" Now, you have seen it.
First, a few words from the sponsors: DB2 Information Integrator released
DB2 Information Integrator products were released on May 20, 2003. This is a message, if you have not found it, here there is an excerpt:
"IBM DB2 INFORMATION INTEGRATOR V8.1 represents the next-generation information integration software, which provides the base pillar for E-Business On DemandTM. The fast response to changes means that the merchant can not only be in the enterprise, but also in its value chain Quickly and easily associate information. DB2 Information Integrator enables companies to integrate business information inside and outside the company. Such information may be located in various data source systems (such as Oracle Database, Microsoft® spreadsheets, flat files ) And are distributed in various operating environments (such as Windows®, Linux, UNIX®, Z / OS®). "
It is better to use these characteristics than it takes a lot of time to describe these features.
Set data source for this case study
In Part 1, we set the Informix data source, so I will no longer be repeated here. See those guidance settings for setting Informix data sources. If you prefer to use the command line, instead of the interface that can be clicked with a mouse, see Part 1 of this article. In this section, I will describe:
For the "Do not send mail" list, set and use Oracle data sources
For sales information, set and use Excel data sources
When rewards: set and use ODBC data sources
Short Summary How to create a client table and credit table in Informix (see Part 1 for more detail).
Set and use Oracle data sources
From Part 1 below. We know that the DB2 Information Integrator server is like a regular client that runs inquiry. To prepare this case study, the federal environment we need includes setting up a range of remote data sources. Don't let us be intimidated; the DB2 Information Integrator server can be as simple as a single remote data source, where DB2 Information Integrator is used as the data access gateway. The steps to set the data source are as follows:
Establish a client connection.
Configure the db2dj.ini file and set the Oracle_home environment variable.
Create a wrapper.
Create a server.
Create a user mapping.
Create a table nickname.
After building this data source, you will see a mode based on two previous examples and this example in Part 1, and if a given wrapper, you should also configure a federal to access any data sources.
Although the content of this section does not have to let you read outstanding documents, I will still try my best. About "Information Integration", now there is a red book, see: getting started on integrating your information.
Step 1: Establish a client connection
Installing the Oracle client SDK is quite easy to install the Oracle client on the DB2 Information Integrator server. The key step is to verify that the connection is working properly! Figure 2 shows the configuration screen, which provides a simple functional test to test the connection of the remote Oracle server.
The name "foo" highlighted in Figure 2 is a node name, which is required to access the remote Oracle server in Information Integrator to access the remote Oracle server.
Figure 2. Testing the connection of the Oracle server
Step 2: Configure db2dj.ini files
By installing the DB2 INFORMATION INTEGRATOR after installation of the Oracle client (I use the third of the publishing is also the last beta release), the installer can automatically detect the location of the client code and in the db2dj.ini file. Add the required Oracle_home section automatically. Listing 1 shows the Windows example of the db2dj.ini file:
Listing 1. The db2dj.ini file contains the location of the Oracle client and the Informix parameter in Part 1.
When the DB2 II and DB2 V8 revisions 2 issues (DB2 V8 FIX PACK 2 RELEASE run together, the ORACLE_HOME environment variable must be set. The WIN32 and UNIX commands for this variable are as follows:
Step 3: Configure the wrapper
From Part 1 we learned that DB2 Information Integrator wrapper tells DB2 Information Integrator which library that uses which library to access the remote data source. I will explain Oracle settings because Part 1 only contains the Informix remote server. Configure DB2 Information Integrator, with a graphical interface, there is also a command line interface. To use graphics tools:
Run DB2CC from the command line
In Windows, use Start -> Programs -> IBM DB2 -> General Administration Tools -> Control Center. Expand the tree on the left, click on Federated Database Objects and select the CREATE Wrapper option. Please see Figure 3.
Figure 3. Creating a wrapper from Control Center
The next dialog defines the type of wrapper. For the current release of Oracle (V8 and V9), Net8 is the correct option. Please see Figure 4.
Figure 4. Select Net8 for Oracle Package
Your "Private Teacher" is always waiting behind the Show SQL button - please use it! Figure 5 shows an example to tell you how to know what is running, so you can write a script for this data source or other data source.
Figure 5. Creating the "Insider" of the wrapper
Step 4: Create a server
Now we need to mark the IP address and port of the remote Oracle system. There are many options available when creating a server. My favorite is db2_maximal_pushdown. Figure 6 below shows how this will be performed from the Control Center.
Figure 6. Creating a remote server
Step 5: Create a user mapping
The user mapping provides authentication in the remote Oracle instance (Figure 7). For remote databases, DB2 Information Integrator servers and any other client are the same. You can't expect who you don't tell the remote server, you want to let it "ship", can you do this now? We will use the "Scott" user ID (this identity is the standard Oracle installation with it). No, my "Scott" password is not "Tiger", you should not! The user mapping GUI has some small changes in the GA code; the gripping screen here is tested from the final beta version.
Figure 7. Creating a user mapping
Step 6: Create a table nickname
For existing tables, the nickname is the local handle of the DB2 Information Integrator database to reference the remote table. For a new table, this process is simplified.
For our case studies, we need a "Do Not Mail" list stored in the database table. We will create and populate tables from the DB2 => command line on the remote Oracle instance.
First, let us create a table:
Now, from the DB2 command line, let us use the convenient DB2 syntax to insert some values:
For fun, we choose the value from the table:
If the table already exists on the remote server, the last task is to create a table nickname so we can reference the remote table, as they are on the DB2 Information Integrator server. For our case studies, don't do this, but I still want you to do it for the next reference. Similarly, the graphical user interface makes the setting process very simple (Figure 8), saving SQL can avoid the future to all click again. Figure 8. Creating a nickname for the existing remote table
Below you will see (Figure 9) Method of filtering the name from the remote directory. I got all the remote directories of this example; if you have multiple tables in the database, you should consider using this filter dialog to reduce the number of returned tables.
Figure 9. The number of remote table names returned by filtration
Debug prompt: If you see this message:
Let your Oracle DBA run the lsnrctl start command. This message indicates that the Oracle TCP remote listener is not automatically started when the server starts. Please put this small prompt in a certain script so you will not forget.
Figure 10 is the last two dialogs for creating a nickname:
Figure 10. Creating a nickname (continued)
You may have guessed, I am a person who likes the command line, so if you want to avoid using a graphical user interface, see Listing 2.
In Listing 2, the steps 3 to step 6 are provided, which are provided for those who like the command line:
Listing 2. Creating wrapper, servers, user mapping and nicknames from the command line
Set Microsoft Excel Data Source
Many business users are the happiest when they put data on them can see - this place is Excel. This example accesses Excel data from the local workbook (Worksheet).
Excel data can be provided through the network. The local DB2 INFORMATION INTEGRATOR server on the Windows PC allows these data like the DB2 table can be viewed. This may not be necessary, but in some environments, this access can be provided to solve some problems. Ok, this has a real example of my friend:
You received more than 200 spreadsheets with information about market segmentation data. Business analysts are deeply proud of their masterpieces - and you need to put all of this data in a table. You can open each spreadsheet and perform 200 clip and paste, or you can also write a script to perform the DDL shown in Listing 3, and make a series of inserts - select (insert-select) or do once Union. Select, but I personally hate the keyboard, let alone cut and paste, I will of course choose to write scripts.
Figure 11 is our original spreadsheet, in which the useful customer number identifies the list of gifts that should be given.
Figure 11. Microsoft Excel spreadsheet
This simple DDL provides SQL access to spreadsheet data:
Listing 3. DDL accessing Excel data
Now let's choose according to the nickname:
Success, no shear and paste!
Reward: Set ODBC Data Source
ODBC can always be used for remote servers without this machine DB2 Information Integrator driver. This example demonstrates how to set to Informix's ODBC connection - you can insert the ODBC you need to access the database you want to access.
Keep in mind that the ODBC driver is on the DB2 Information Integrator server and is connected to the remote database. I use remote informix data sources because it is easy to get, but this is not a supported configuration - here is just as a configuration example. Listing 4. Accessing SQL of ODBC Data Source
The PASSTHRU feature allows you to run SQL on the remote server, as if you are directly connected to the remote server. This is very good for the prediction results on the remote server by bypassing this machine. When you end, don't forget to close the Passthru.
Listing 5. Passthru
Set up a client table and credit table in Informix
When setting the client table and credit rating table for this case study, we will practice more features of DB2 II. We will explain the ability to create a table on the remote server and insert the data into the remote server.
Create a table and insert data
DB2 Information Integrator server can create tables, insert, update, and delete all data without having to leave the DB2 => prompt or DB2 GUI you selected, both of which are very convenient.
To illustrate this example, let's move data from Oracle to Informix. Sorry, let you know that I have some prejudice, but I really think that the data is removed from Oracle, moved to Informix or DB2 UDB is a good idea.
Table creation
You can create a table on your remote server without leaving DB2 Information Integrator environments. In this example, we create a table on ids:
Use INSERT-SELECT to fill in the table
Now we use the insert-select to populate the remote table. The following SQL will extract two columns from the remote Oracle table and populate these data into the INFORMIX table just added.
Please note that we have got an error!
If we fail during insertion, DB2 Information Integrator allows us to avoid errors. Because insertion is a very easy duplicate transaction, we will use a server option to overcome this error. You can do this through the GUI's Add Server Options dialog, or execute the following command from the command line:
Now we will execute insert-select:
The data is as follows:
Create a customer table for this case study
In order to create a list of sales, the client table is the last one in the four tables we need to connect. What is the customer table? I think you won't ask this:
Figure 12. Customer table in XPS
This is a very simple nickname. Part 1 of this article provides more background knowledge about setting Informix nicknames.
Mail list connection
Now "MEGA-Query). Please note that I am inserted in SQL. I have marked them with bold.
The result is as follows
DB2 Visual Exploration Tools (DB2 Visual Explain) displays the use of a new operator RPD to access Excel data, as shown in Figure 13:
Figure 13. Mail list connection Visual Explain
Let us put this query into a view for future reference. IBM recommends creating views based on a remote table, which is superior to create a view according to a remote view.
Similarly, through a convenient DB2 command prompt, we create the following view:
Handling large tables and dirt data
There are two larger puzzles in the federal design: VLDB (Very Large Databases, extra large database) and dirty data. Through a good planning and architecture, these issues can be controlled.
Large table
In a VLDB environment, a Shared Nothing architecture necessarily requires an extension processing capability to perform a query of "Boil the Ocean" style. Please refer to my article analog MPP processing on Linux to get more information about the shared architecture. If we try to connect large databases and remote databases (JOIN), then you will collect data floods on our DB2 Information Integrator server. There is a simple workaround. Create a table without a shared MPP server, then execute the insertion from the remote database - select the data into the MPP environment. Then connect to the MPP environment, then you can enjoy the rapid response of the highly scalable-free server and accept the praise of the end user.
Dirty data
The second question of the federal is "dirty data." In fact, it has nothing to do, nothing more than system a customer_number is a customer number, it is completely different from the meaning of system B's Customer_Number, the latter is a social insurance number. What will DBA do?
DBA truly needs to know metadata - and use the correct tool for Data Scrubbing. There are a lot of extraction, conversion, and load (ETL) tools to help solve this problem. We should not ignore such functions provided by DB2 itself.
The SoundEx function is a good example of a simple tool that can be used to connect the name. The attempt to connect "Lurie" and "Laurie" and "Luria" will fail. Even if these names are converted to uppercase, they will fail. How can SOUNDEX help? It converts all characters to uppercase, removes the vowel, and then uses the pronunciation of each letter.
The following is a normal job SoundEx () example:
All three names have the same SOUNDEX () result so that our connection is successful. You can also write your own user-defined functions for data cleaning. Remind you: If you need the highest possible match rate, and false positive, you really need to assess the product cleaning product unless you have too much free time, and you like repeating others. Has have done it.
Create an XML document according to federal connections
XML has become the best format for inter-system communication. Is the exchange data better than the self-described data? XML is of course affected by the low signal-to-noise ratio, which is less cumbersome, but it has received universal acceptance.
DB2 has multiple built-in functions to convert data to XML. Let us now take a look at the REC2XML () function.
REC2XML () is a very fast and convenient way to convert table data to XML. Our marketing activities are defined by the views we created. Using REC2XML (), you can get XML stream as long as one step is:
Here are some sample output:
This can also be packaged in the view as follows:
Combine them: Release the federal XML through WebSphere® MQ
Now we are ready to combine all these, and send our XML marketing activities to the company. The communication method is IBM WebSphere MQ. This is a persistent queue ("Persistent Queue).
What is the lasting queue? It can look into the Unix® naming pipe of "Eat Stimulant". This queue can contain many different messages and can read the queue without "destroy" messages. It can be used to publish-subscribed messages or points-to-point messaging messages. It is a very powerful message delivery system. In order to release XML to WebSphere MQ, we need to do some settings. We will use the simplest way and all in the locally:
Install WebSphere on the same server as DB2. Please refer to the IBM Web site or contact your IBM team to get the evaluation software.
Please find the MQ function included with DB2. In Windows, this file is MA0F_NT.ZIP. Unempatize these binaries and install them.
Switch to the DB2 CFG directory and run enable_mqfunctions as shown below:
If you need to manually start WebSphere MQ, the command is:
Now you have been waiting for a long time - released XML to WebSphere MQ. This command is too simple, it is a little bit of tiger head snake.
What should I do now? We can use MQ API EXERCISER to see the actual results, MQ API EXERCISER is easy to find in the "MQ First Steps" launcher. Start the API EXERCISER, connect the DB2 queue, and execute the MQGET, as shown in Figure 14.
Figure 14. Execute MQGET to DB2 queue
Figure 15 is a actual message from the queue, which is exactly what we needed to complete this case study. We federally for three different relational data sources and an Excel spreadsheet to convert the connection output to XML, and publish this XML into a persistent message queue so others can access this information. Wow, we have done it.
Figure 15. The content of the XML message shows the federal data
Conclude
I hope that these two articles will give you some revelations, let you know how to combine different server data to make DBA work easier.
We have learned multi-proportional source, connection, insertion, optimization, and a case study. We understand the meaning of big data and data cleaning needs. We combine remote data into XML streams and publish it to a persistent message queue.
Please try it at any time. It should be enough to be busy, so I can write a next article.
About author
Marty Lurie's computer career begins to make paper containe, at which time he is trying to write a Fortran program on IBM 1130. His daily work is an IT expert in IBM Data Management, but if you ask a few words, he will recognize that he is mainly dealing with the computer. His favorite program is a program that he himself is written to Nordic TRACK himself (this laptop has two pounds, less than 20% "cholesterol"). Marty is IBM certified DB2 DBA, IBM certified business intelligence solution expert, and Informix-certified Professional. Can pass lurie@us.ibm.com and m