Power in partitioning (translation)

zhaozj2021-02-17  43

Subregional power

DWAINE R. Snow and Paul C. Zikopoulos

Laughing paste

Original article: "DB2 Magazine" Quarter 2, 2003 · Vol. 8, Issue 2

English original (due to unauthorized article translation, please retain the original link when reprint)

People have a lot of misunderstandings. Experts from Toronto laboratory have a name for this useful function.

The DB2 database partition is often a lot of misunderstandings. Changes made for DB2 UDB 8.1 on Linux, UNIX, and Windows platform help to simplify DB2 partitions. We will explain these changes, clarify some of the myths of partitions, and indicating that you should consider the timing and reasons for the partition.

DB2 UDB 8.1 for Linux / UNIX / Windows integrates the DB2 product family in the DB2 UDB Enterprise Edition and Enterprise Expansion Edition into a separate product. This new DB2 Enterprise Server Edition (ESE) contains the function of the database partition (since it is used as a separate product), as a billing project, now known as database partition function (Note: Database-Database Partitioning Feature DPF ). When DB2 users find that they need to start partition, they can start now without having some additional code - they only need DPF license agreement.

DPF's truth

The myth of the database partition is visible in DB2 (see Table 1). Quick overviews for partitioning will help you distinguish between authenticity and make appropriate partition decisions.

Table 1: Myths and facts around DB2 data partition

No matter whether there is DPF, DB2 supports parallel query processing. Figure 1 shows the installation of a 4-channel SMP (symmetric multiprocessing) server DB2 ESE. In this assumption. A separate query can automatically use all CPUs and physical disks on this server. A sub-agent that relies on a subset requiring data to provide a parallel mechanism within the partition. DB2 feeds back the data sent from disk to these sub-agency using I / O pre-depiction. This parallel mechanism is transparent to users, applications, and DBA.

Figure 1: Parallel mechanism of DB2 ESE without DPF

The DPF option adds the ability to partition the database in a set of machines or logically located in a SMP server. Relying on DPF, a database image can span multiple machines (storage), and it is still a full database image for users and applications.

Consider the situation of the four SMP server groups. (In this article, we will use the term database partition group instead of the cluster, because the cluster usually refers to a fault transfer configuration of high reliability or a partition group for measuring the system.) Use DPF, discussed in Figure 1 Parallel operations can be extended to multiple SMP machines (see Figure 2). This kind of benefit is a dual parallel operation. You can balance these parallel operations across multiple machines or logical database partitions. Such processing is called a parallel mechanism between the partition.

Figure 2: Parallel mechanism of using DPF across multiple machines

The parallel mechanism between partitions is usually performed in multiple servers, but more and more users are now doing this operation in some large SMP boxes (see Figure 3).

Figure 3: DB2 ESE with DPF in a separate SMP server.

When (why) needs to be partitioned

So, should you go to the partition? In the following, you should consider the partition:

· Your server has a lot of available memory. Even if DB2 V8 has 64-bit support, multi-partition has been proven to provide efficient use and more linear scalability than a separate SMP parallel mechanism.

· The system will include multiple servers, or more than 6 to 12 CPUs.

· The speed of operation of the database tool is the key to your business operation.

• The number of batch windows provided by loading data or extraction, conversion, and load processing operation is limited.

• The demand for scrolling window data updates in the data warehouse makes parallel SQL processing and other log spaces. Partitions are particularly meaningful in these situations. But some other benefits provided by the partition make it more attractive. These benefits are:

The extensibility of the query uses the most obvious reason for the DPF is that it can increase the performance of the query workload and INSERT, UPDATE, and DELETE operations. Put a separate large database partition as a wide range of smaller databases to accelerate these operations because each database partition has its own data.

For example, you want to scan a table containing 100 Million line data. For a separate database partition, this scan requires a separate database manager to search all 100 Million records. If you divide your database system partition to use 50 database partition servers and distribute this 100 Million strip to between them, then the database manager on each database partition only needs to scan 2 Million lines. If each scan is executed at the same time and at the same speed, the time to the scan is approximately 2% of the former.

DPF provides similar to linear scalability and use tools to complete infrastructure build. This allows it to increase capacity as needed without the need for new technology or separately

The DB2 optimizer is based on a parallel mechanism. In other words, it retains how to partition the underlying data in the system. With this information, the optimizer considers different query execution policies and a low-cost selection. When comparing different implementations, it takes into account the inherent parallel mechanisms of different operations and overhead of transmitting messages between database partitions.

When a large amount of data or the number of processors and partitions increases, DB2 provides similar linear scalability. However, the maximum benefit of the partition may provide dependent on its workload, the size of the maximum capacity table, and other factors. In general, we recommend that the number of raw data (only the size of the table) should be based on the power of the CPU on the server used in each CPU. For example, the PSeries P690-CLASS CPU, we recommend 150-200GB of each processor. For XSeries-Class CPUs (or any Intel or AMD machines) using Linux or Windows, we recommend 75-150GB per processor. Remember, these are only recommended in general, and there will be some differences in actual conditions.

The restriction of the architecture of the architecture DPF has broken through some of the system architecture of DB2. For example, the maximum capacity of the table in DB2 is 64GB with 4kb page size (although the 32KB page size can support to 512GB). DB2 in the table and table space size limit is based on each partition. Therefore, partitioning of a database in multiple partitions allows you to increase the maximum capacity of the table through the quantity of the partition of your system environment. For example, a database partition into a 408 GB table that can support the maximum (32kb page size) 2408 GB.

In an environment where no partition, memory will become a bottleneck. The 32-bit version of DB2 ESE limits each DB2 server sharing memory. (This limitation will be different from the operating system of running DB2, and some suppliers also provide some basic internal model extensions.) Shared memory supports memory-integrated database resources such as buffer pools, high-speed buffers, and stacks. In a DPF environment, each database partition manages and own its own resources, so you can overcome these restrictions through partitioning your database. You can even run the logical database partition in large SMP boxes to take advantage of large memory resources in a separate SMP server.

Data is loaded in DB2 ESE V8.1, if the target table partitions, the loading tool automatically separates and loads data. It uses intelligent default to separate and load data; it can also be optimized for some environments.

In a partition database, you can load data to the appropriate database partitions using the load tool. Figure 4 shows that the loading tool is to separate data and loading into the table using a dedicated media reader (in parallel state). Loading tools provide similar to linear scalability for loading time. Figure 4 With the loading of the DPF accelerator table

DB2 V8 loading tools are faster and more available than previous versions. If you perform a loaded operation in DB2 V8, DB2 is no longer locked all tables where the tablespace where the table being loaded is locked (the same as DB2 V7). There is nowday, there is also a name called online loading, which allows the table to be read during loading. Consider the previous DB2 version, still supports the AutoLoader script.

Is it possible to load time? But the performance of the query is not important? These two issues are actually excluded from each other; but the loading time in many business processes is an important factor. In fact, this is sometimes determined. If it is in that case, then use DPF is quite useful. For example, a large amount of data loaded with a large amount of data in a specified time interval in a specified time interval is likely to be a potential fraud signal. The database must quickly discover these abnormalities, so the data needs to enter the data warehouse quickly and frequently. At the telecommunications company, this process is repeated every 10 minutes per hour. These requirements require an expandable and efficient infrastructure to run similar to inquiry.

Maintaining database partitions can be significantly maintained. Disperse databases can accelerate the maintenance of the database on multiple database partitions, because each operation is on data subset of partition management.

Although index creation in DB2 is parallel, you can further reduce the total time by partitioning the database. Using a small dataset parallel to create an index in all partitions is allowed to be parallel. Remember, DB2 abstracts the operation under the surface phenomenon; multiple tables can be expressed as a form of form.

The RunStats tool can also benefit from division. This tool updates the physical properties of the table and its related index statistics. The optimizer uses these statistics when deciding the data access path. RunStats is a CPU intensive and requires sorting and aggregation of data. You can use the DPF option to reduce the time occupied by the tool. Use DPF, RunStats check the data on a database partition rather than the entire table. In DB2 V8, the RunStats sampling option further reduces the execution time of the tool (no or no DPF).

Similarly, the data is scattered in a plurality of partitions helps the table recombination (REORG), which is I / O dense and requires random grab (especially in offline). Each database partition can reorganize its data simultaneously assigned to each processor more driver to reorganize data. These operations can also be operated in parallel to shorten the required overall time. At DB2 V8, you can perform REORG on each partition or on a subset of the database partition. You can also stop, pause, check status, and recovery.

These examples are merely some maintenance operations that can be beneficial from DPF.

Parallel Insert / Delete Only SQL statements in the database partition are parallel operations. If the database environment uses SQL to make large-capacity insert and delete processing, multi-data library partitions can increase transaction processing throughput by inserting and deleting statements on parallel processing on the database partition. This benefit can also be applied to techniques from selecting data from a table and insert another table.

Scrolling Window Requirements When the query window must be kept in an open state, the partition is usually used to meet the needs of the scroll window (insert and delete rows per day). Defining a database partition for each processor will increase the throughput in the new data insertion table. It is very reasonable to perform insert or delete data streams in the database partition (for example, using a critical range). Multi-data library partitions make sufficient memory for the LockList allocation of a database that stores a large amount of insertion and delete operations.

Database partitions also allow index maintenance operations across database partitions (when inserted and deleted) backups and recovery multiple database servers can greatly reduce the backup time used by the backup database. According to your environment, this may be an important decision when deciding whether to partition the database.

DB2 performs parallel backup and recovery by assigning a separate process or thread for each table space. In a partitioned backup, each partition is backed up separately. Parallel Execution These backup operations will shorten the time consumed by the backup of the entire database.

A partitioning database environment can also speed up the front roll and restart (crash) recovery. With DPF, some of the logs that must be in front of each database partition; Database Partition Server Balanced Separation and Conquer Strategy Accelerates this process. At the same time, if a specific database partition does not need a front roll, it will be ignored in crash recovery.

Record some things that need to be considered In a highly active system, the performance of the database log is likely to restrict the ability of the system. In a partitioned database environment, each partition has its own set of logs. When the system needs to perform high-intensity insertion, update, or delete activities, multiple database partitions may improve performance because the log is written in parallel to each partition and a few record logs at each partition.

make a choice

Is the partition or no partition? Fundamentally, DB2 is DB2. No matter what you don't have to partition your database for interface, feature, tools, and your skills don't have any effect. This is in fact relying on your app and environment. Using our recommendations, you can find questions about the question.

About author

DWAINE R. Snow is the product manager of the DB2 UDB partition database. It is used as a DB2UDB of Canada for many years, providing database and application planning and design, project planning and implementation, complex online transaction processing and decision support system design, performance Adjustment and on-site guidance of system integration. You can contact him through mrdb2@yahoo.com

Paul C. Zikopoulos has a seven years of DB2 work experience in the IBM Data Management Software Group and wrote many articles. He participated in writing a few books, including DB2: The Complete Reference (Osborne McGraw-Hill, 2002), and DB2 Fundamentals Certification for Dummies (Wiley, 2002). Paul is a DB2 certified advanced technology expert (DRDA and Cluster / EEE field) and DB2 certification solution experts (business intelligence and database management areas). You can contact him through Paulz_ibm@msn.com.

转载请注明原文地址:https://www.9cbs.com/read-29039.html

New Post(0)