Data Warehouse RDBMS Performance Optimization Guide

zhaozj2021-02-12  206

Data Warehouse RDBMS Performance Optimization Guide

This property optimization guide is intended to help database administrators and developers configure Microsoft® SQL Server® 2000 to get the best performance and help to find out why the relationship database (including databases for data warehouses) is low. This guide also provides guidelines and best practices on how to load, index, and write queries to access data stored in SQL Server. There is also a variety of SQL Server tools that can be used to analyze performance characteristics.

Introduction

A major improvement is introduced in Microsoft SQL Server 7.0: a database engine that is largely configured, self-optimized and self-managed. Before SQL Server 7.0, most database servers will cost the database administrator's large amount of time and energy, they must manually optimize the server configuration to get the best performance. In fact, many competitive database products still require administrators to manually configure and optimize their database servers. This is the main reason why many customers use SQL Server. SQL Server 2000 is a solid foundation in SQL Server 7.0. SQL Server's goal is to optimize the database engine and allow DBA to automate administrative tasks, so that DBA does not have to manually configure and continuously optimize the database server.

Although some sp_configure options can now be manually configured, it is recommended that the database administrator should do not do this, but to automatically configure and optimize SQL Server. For this adjustment ability, SQL Server 7.0 has a widely recognized and practical results record, and SQL Server 2000 has been significantly improved. The changing conditions in the environment may have a negative impact on database performance, so let SQL Server optimize, you can make database servers to adapt to these changes.

Basic principles of performance optimization

You can take many steps to manage the performance of the database. SQL Server 2000 provides several tools to help you complete these tasks.

Management performance

Let SQL Server complete most optimized tasks. SQL Server 2000 has been significantly improved, which can create a database server that is substantially automatically configured and self-optimized. With SQL Server automatic optimization settings, help SQL Server can operate at the highest performance even if user load and queries continue to change over time. Manage RAM caching. RAM is a limited resource. One of the main part of any database server environment is to manage the random access memory (RAM) buffer cache. Accessing the data in the RAM cache is much faster than the same information in the access disk. However, RAM resources are limited. If the database I / O can be reduced to the minimum required data set and index page, these pages will stay in the RAM longer. If unnecessary data and index information flowing into the buffer cache, the valuable page is quickly crowded out. The main goal of performance optimization is to reduce I / O so that the buffer cache is most useful. Create and maintain the appropriate index. For all database queries, a key factor in maintaining minimum I / O is to ensure that the appropriate index is created and maintained. Partitioning on large data sets and indexes. To reduce overall I / O strokes and improve parallel operations, consider partitioning table data and indexes. This chapter describes a variety of ways to implement and manage partitions using SQL Server 2000. Monitor the performance of the disk I / O subsystem. The physical disk subsystem must provide enough I / O processing capabilities for the database server to make the database server does not have a disk queue in the runtime. Disk queuing phenomena will result in poor performance. This article describes how to detect disk I / O issues and how to solve these problems. Optimize applications and queries. This optimization is especially important when the database server serves a service through a given application as a request from hundreds of connections. Since applications typically determine SQL queries to be executed on the database server, application developers must understand the structural basics of SQL Server and how to make full use of SQL Server's indexes to minimize I / O. Optimize the activity data. In many business intelligent databases, most database activities involve data in the last month or a quarter (up to 80% of database activities) may be due to recent loaded data). To maintain a good database overall performance, when loading, indexing, and partitioning these data, the way taken must be able to provide optimal data access performance for these data. Use SQL Server Performance Tools

SQL Event Profiler and Index Optimization Wizard SQL Event Profiler can be used to monitor and log workloads for SQL Server. Then, you can submit the recorded workload to the index optimization guide so that you can change the index when necessary. SQL Event Profiler and Index Optimization Wizard helps administrators to implement an optimized index. Use these tools regularly allow SQL Server to maintain good performance, even if the query workload changes over time. SQL Query Analyzer and Graphics Execution Plan In SQL Server 2000, SQL Query Analyzer provides the Graphics Execution Plan, using this method to easily analyze problems with SQL queries. "Statistics I / O" is another important feature of the SQL query analyzer. This chapter will be introduced later. System Monitor Object SQL Server provides a set of system monitor objects and counters that provide information on monitoring and analyzing SQL Server health. This chapter describes the key counter to be monitored.

Effect of performance configuration options

Maximum asynchronous IO

Manual Configuration Options in SQL Server 7.0 Maximum Asynchronous I / O has been automated in SQL Server 2000. Previously, maximum asynchronous I / O is used to specify that SQL Server 7.0 can be submitted to the number of disk I / O requests to Microsoft Windows NT® 4.0 and Windows 2000 during one checkpoint operation. Windows then submits these requests to the physical disk subsystem. After this configuration setting is automated, SQL Server 2000 can automatically maintain optimal I / O throughput in dynamic. Note that Windows 98 does not support asynchronous I / O, so the maximum asynchronous I / O option is not supported on this platform.

Database recovery model

SQL Server 2000 introduces features that configure transaction records in the database level. The selected model will have a big impact on performance, especially during data loading. There are three recovery models: "complete", "large-capacity logging" and "simple". The recovery model of the new database is inherited from the model database when creating in the new database. After creating a database, you can change its model.

"Full Recovery" can provide the greatest flexibility to recover the database to previous point points. "Recovery of Large Capacity Log" offers high performance and occupies less log space for some large-scale operations (eg, create indexes or large-capacity copies). The disadvantage of this recovery is that it is not flexible compared to time point recovery. "Simple Recovery" provides the highest performance, and the occupied log space is minimized, but the risk of loss of data is very risky in the system failure. When using the Simple Recovery model, the data can only be restored to the status of the last (most recent) full database or differential backup. In this model, since the transaction is truncated from the checkpoint in the log, you cannot use the transaction log backup to recover the transaction. This produces possibilities of loss of data. After the log space is no longer required to recover from the server failure (active transaction), the log space is truncated and reused.

Experienced administrators can use this recovery model function to greatly speed up the speed of data loading and large-capacity operations. However, depending on the selected model, the possibility of loss of data is different.

Important Notes must be carefully considered to be encountered before choosing a recovery model.

Each recovery model is committed to meeting different needs. Please weigh the loss according to the selected model. The result of the weighing needs to consider the protection measures of performance, space utilization (magnetic disk or tape) and prevent data loss. When you choose to recover your model, you need to make a decision to make a decision in conjunction with the following aspects:

Large-scale operations (eg, create indexes or large-capacity loads) Possible data loss (for example, lost transaction) Transaction log space occupies size backup and recovery process

Depending on the operation performed, a model may be more suitable than another model. Before selecting a recovery model, consider the impact it will bring. The following table provides some help information.

Is the possibility of recovery model? Is it possible to restore the possibility of work? Simple and highly performance complete large-capacity replication operations. The log space can be recycled to keep the space require a low level. Changes made since the recent database backup or differential backup must be restored to the end point of any backup. The changes thereafter must be done. It is not lost until the data file is lost or damaged. Restore to any point in time (for example, the moment before an application or user error). There is usually no risk. If the log is damaged, it must be redistributed to the changes made since the recent log backup. Restore to any point in time. High-capacity logging is highly capable of performing high-capacity replication operations. Large capacity operations use the smallest log space. If the log is damaged, or there is an excessive capacity operation since the recent log backup, you must re-perform the changes since the last backup. In addition, there will be no jobs you do. It can be restored to any backup end point. The changes you have made later must be done. Explain for many instances

The functionality of multiple instances running SQL Server on a single computer is also introduced in SQL Server 2000. By default, each instance of SQL Server will dynamically get and release memory to adjust for changes in the workload of the instance. When SQL Server 2000 has multiple instances, and each instance automatically adjusts memory usage, performance optimization will become complicated. Most high-end business intelligent customers are usually only installed on each computer, so for them, this feature is usually not required. However, as the computer is getting bigger and bigger (Windows 2000 DataCenter Server supports up to 64 GB RAM and 32 CPUs), in some production environments, the needs of multiple instances may occur. Examples of using extended memory support need special attention.

Extended memory support

In general, SQL Server 2000 will be dynamically acquired and release memory as needed, so administrators usually do not need to specify how much memory should be assigned to SQL Server. However, SQL Server 2000 Enterprise Edition and SQL Server 2000 developers have introduced support for Microsoft Windows 2000 Address Windowing Extensions (AWE). In this way, SQL Server 2000 can address more memory (for Windows 2000 Advanced Server up to about 8 GB; for Windows 2000 DataCenter Server up to about 64 GB). In the case of extended memory, you must configure each instance of the access extension memory to static allocation it will use.

Note This feature is only available when you run Windows 2000 ADVANCED Server or Windows 2000 DataCenter Server.

Precautions using Windows 2000

To utilize AWE memory, you must run the SQL Server 2000 database engine with a Windows 2000 account privilege that has allocated Windows 2000's "Memory" privileges. The SQL Server installer will automatically authorize the MSSQLServer service account to lock page options in memory. If you use SQL Server 2000 from the command prompt, you must use the "Group Policy" utility of Windows 2000 to manually assign this authority to the user account of the interactive operation, otherwise, If SQL Server does not act as a service run, you will not be able to use AWE memory.

Enable "Memory Lock Page" option

On the Start menu, click Run, and then enter GPEDIT.MSC in the Open box. In the Group Policy Tree pane, expand your computer configuration and expand your Windows settings. Expand security settings and expand your local policy. Select the user rights assignment folder. The policy will be displayed in the detailed information pane. In the detailed information pane, double-click the lock page in the memory. In the Local Security Policy Settings dialog, click Add. In the Select User or Group dialog box, add an account that is right to run SQLServr.exe. To make Windows 2000 ADVANCED Server or Windows 2000 DataCenter Server support 4 GB of physical memory, you must add / PAE parameters to the boot.ini file.

For computers that do not exceed 16 GB, you can use / 3GB parameters in the boot.ini file. This allows Windows 2000 Advanced Server and Windows 2000 DataCenter Server to allow user applications to address extended memory through 3 GB virtual memory, and reserve 1 GB virtual memory for the operating system itself.

If the physical memory on your computer exceeds 16 GB, the Windows 2000 operating system itself will require 2 GB virtual memory address space for system overhead. Therefore, it can only support 2 Gb virtual address space for application overhead. For systems that exceed 16 GB of physical memory, be sure to use / 2GB parameters in the boot.ini file.

Note If you accidentally use / 3GB parameters, Windows 2000 will not be addressed to any memory of 16 GB or more.

Precautions for using SQL Server 2000

To make the SQL Server 2000 instance use AWE memory, use sp_configure to set the enable AWE option. Then, restart SQL Server to activate AWE. Since AWE support is enabled during SQL Server startup and keeps enabling status before SQL Server is turned off, SQL Server will send "Enable Address Windowing Extension" messages to SQL Server error logs when AWE is in use. To notify the user.

When you enable AWE memory, the instance of SQL Server 2000 does not dynamically manages the size of the address space. Therefore, when you enable the AWE memory and start SQL Server 2000, the following conditions will occur depending on the setting mode of the maximum server memory.

If the maximum server memory has been set and at least 3 GB on the computer, the instance obtains the amount of memory specified in the maximum server memory. If the amount of memory available on your computer is less than the maximum server memory (but greater than 3 GB), the instance obtains almost all available memory and may only leave up to 128 MB available memory. If the maximum server memory has not been set and at least 3 GB available memory on the computer, the instance is almost all available memory, and may only leave only 128 MB available memory. If the available memory on your computer is less than 3 GB and the memory is dynamically assigned, the SQL Server will run in a non-AWE mode whether or not the AWE setting is set.

When you assign SQL Server AWE memory on a 32 GB system, Windows 2000 may require at least 1 GB of available memory to manage AWE. Therefore, if AWE is enabled while launching an instance of SQL Server, you recommend that you do not use the default maximum server memory settings, and should limit it at 31 GB or less. Precautions for failover clusters and multiple instances

If you use the SQL Server 2000 failover cluster or run multiple instances while using the AWE memory, you must ensure that the maximum and less than the available physical RAM of the maximum server memory that is running is running. For failover, you must consider the minimum amount of physical RAM on any candidate survival node. If the physical memory on the failover node is less than the initial node, the instance of SQL Server 2000 may not be able to start or start with the memory than its in the initial node.

SP_Configure option

"Cost Threshold" option

The cost threshold option using the parallelism can specify the threshold used by SQL Server to create and perform parallel plans. SQL Server creates and executes parallel plans for queries when the estimated cost of performing a serial plan is higher than the value set in the cost threshold of the parallelism. Cost refers to the time (in seconds) that is expected to be required when performing a serial plan for a particular hardware configuration. The cost threshold for the symmetric multiprocessor (SMP) is set.

Typically, parallel plan is advantageous for longer queries; performance advantages can compensate for the additional time required to initialize, synchronize, and terminate the plan. The cost threshold option for the parallelism is usually used when mixing the short query and long query. Short query executes a serial plan, while long query uses a parallel plan. The value of the cost threshold of the parallelism determines which queries are treated as short queries, thus only perform a serial plan.

In some cases, even if the cost of the query is less than the current weight threshold value of the current parallelism, it is also possible to select a parallel plan. This is because the cost threshold of the parallelism is used to use a parallel plan or a serial plan to determine the expected cost provided before completion of the fully optimized completion.

The cost threshold option for the parallelism can be set to any value from 0 to 32767. The default is 5 (in milliseconds). If the computer has only one processor, or if the SQL Server can only use a CPU, if the maximum parallelism option is set to 1, SQL Server ignores the cost threshold of the parallelism.

"Maximum Parallel" option

The maximum parallelism option is used to limit the number of processors used in executing parallel plan (up to 32). The default is 0, which uses the available actual number of CPUs. Set the maximum parallelism option to 1 to force to cancel a parallel plan. If the value is set to a number greater than 1, the maximum number of processors used when performing a single query. If this value is specified as a number greater than the number of available CPUs, use the available actual number of CPUs.

Note If the relationship mask option is not set to a default, the number of CPUs that can be used on SQL Server on a symmetrical multiprocessor (SMP) system may be restricted.

For servers running on the SMP computer, few changes in the maximum parallelism. If the computer has only one processor, the maximum parallel value is ignored.

"Priority Improvement" option

The priority lifting option is used to specify whether SQL Server should run in a scheduling priority higher than the other processes on the same computer. If this option is set to 1, SQL Server will run in the Windows scheduler to run according to the 13 priority base. The default is 0, indicating the priority base 7. The priority lifting option should only be used on a computer dedicated to SQL Server and has a SMP configuration.

Note If the priority is increased too high, the resources of the basic operating system and network functions may be insufficient, resulting in shutting down SQL Server or using other Windows tasks on the server. In some cases, if you set any value other than the default value to the default value, you may result in the following communication error in the SQL Server error log:

Error: 17824, Severity: 10, State: 0 Unable to write to ListenOnconnection '', loginname '', hostname '' OS Error: 64, The specified network name is no longer available.

Error 17824 indicates that SQL Server encounters connection issues when trying to write to the client. If the client has stopped responding, or the client has been restarted, these communication issues may be caused by a network problem. However, error 17824 does not necessarily represent network issues, but may just set the results of the priority enhancement option.

Set Work Set Size option

Setting Work Set Size options for SQL Server to reserve physical memory space equal to server memory settings. Server memory settings are automatically configured by SQL Server based on workload and available resources. It will change significantly between minimum server memory and maximum server memory. Setting the setting of the settings Work set means: Even when SQL Server is idle, another process can easily use the SQL Server page, the operating system does not try to change these pages.

If you want to allow SQL Server to dynamically use memory, do not set the setup work set size. Set the minimum server memory and maximum server memory to the same value (you want SQL Server).

The lightweight pool and relationship mask option will be discussed in the "Key Performance Counter" next to this chapter.

Optimize disk I / O performance

If you configured, SQL Server will only contain a few GB of data, and do not afford heavy read and write activities, you can do not have to consider disk I / O and SQL Server I / O activities between hard drives, and achieve maximum performance. However, to create a large SQL Server database to include hundreds of GB or even TB data, and / or must be able to bear heavy read / write activities, it is necessary to make a corresponding configuration to balance the load between multiple hard drives, thus Maximize the disk I / O performance of SQL Server.

Optimized transmission speed

One of the most important aspects for database performance optimization is to optimize I / O performance. Of course, SQL Server is no exception. Unless there is enough RAM to run SQL Server to accommodate the entire database, I / O performance will determine the speed of the disk I / O subsystem to handle SQL Server read and write data.

Since the transmission speed, I / O throughput and other factors that may affect I / O performance continue to improve, we will not be targeted to see which speed should be given from the storage system. In order to better understand the desired performance, it is recommended that you collaborate with the preferred hardware suppliers to determine the expected optimal performance.

We must emphasize the difference between sequential I / O operation (commonly referred to as "sequence" or "according to disk sequence") and non-sequential I / O operations. We also hope that you should pay attention to pre-reading may have a significant impact on I / O operation.

Sequence and non-sequence disk I / O operation

It is necessary to explain the following terms the meaning of the disk drive. Typically, a hard drive consists of a set of driver discs. Each disc is provided with a surface for reading / writing operation. A set of arms with read / write heads are used to move between these discs and read data from the surface of the disc or write data. For SQL Server, it is important to remember the following two points on the hard drive. First, read / write heads and associated disk arms need to move to position the position on the hard disk drive disc of SQL Server and operate it for it. If the data is not distributed on the hard drive disc in order of location order, the hard drive needs to spend more time to move the disk arm (seek time) and the rotary read / write head (rotation lag time) to find data. This is completely different from the situation in the position sequential distribution, in which case all data required is on a continuous physical sector of the hard disk drive disc, so the disk arm and read / write head are executed The amount of movement of disk I / O is very small.

Non-order and sequential situations have a significant difference: each non-order seek for approximately 50 milliseconds, and sequential seeking approximately two or three milliseconds. Note that these times are approximately estimated values, and will vary according to the following factors: non-sequential data distribution distance on the disk, the speed (RPM) of the hard disk disk can rotate, and other physical properties of the hard drive. The main point is that sequential I / O helps improve performance, rather than sequential I / O reduces performance.

Second, be sure to remember, reading and writing 8 KB is almost as much as reading and writing 64 kB. Within 8 KB to about 64 KB, the movement of the disk arm and the read / write head (seek time and rotation lag time) still takes to the time required for one disk I / O transmission operation. Therefore, from the mathematical perspective, when the SQL Server data required to transmit 64 kB is required, since the 64 kb and 8 kb transmission speed is substantially the same, the SQL Server data processed each time is 8 times that of the latter, So it is best to try to execute 64 KB transfer operations as much as possible. Keep in mind that the read-read manager performs its disk operation in the 64 KB block (called SQL Server extension panel). The log manager is also written in order in large I / O size. The main point to remember is that SQL Server performance is improved if you make full use of the pre-read manager and separate SQL Server log files with other files that are not accessible in order.

According to experience, most hard drive processing sequential I / O operations are 2 times the processing of non-sequential I / O operations. That is, the time taken to operate non-sequential I / O is twice that of the execution order I / O operation. Therefore, it is necessary to avoid situations where possible in the database can cause random I / O in the database. Although I / O operations should be performed in order to sequentially, such as page split or data disorder may result in non-order I / O.

In order to promote the execution order I / O, it is necessary to avoid the situation that causes page splitting. Designing a well-arranged data loading strategy will also help. You can use the partition policy that can be separated data and index to cause sequential distribution on the disk. Be sure to set the job to periodically check if there is a fragment in the data and the index, and at too much data debris, use the SQL Server to recall the data to reorder the data. More information about performing these operations will be described later in this chapter.

Note Because transaction log data always writes to log files in order not more than 32 KB, the log is usually not the main consideration.

RAID

RAID (inexpensive disk redundant array) is a storage technology that is usually used in a database greater than a few GB. RAID has both performance advantages and fault tolerance. Multiple RAID controllers and disk configurations provide balance between cost, performance, and fault tolerance. This topic briefly describes the case of using RAID technology for SQL Server databases and discusses various configurations and balanced programs. performance. The hardware RAID controller will divide all the data read / written from Windows NT 4.0 with Windows 2000 and applications (such as SQL Server) into many slices (usually 16-128 kB), which will then be distributed to all participants. The disk on the RAID array. In this way, data is disassembled to each physical drive, and the read / write I / O workload will be averaged to all physical hard drives involved in the RAID array. This will increase the disk I / O performance because the hard disk involved in the RAID array remains the same extent, and does not cause certain disks to become a bottleneck due to the distribution of I / O requests. Unlike. RAID also prevents the hard disk from malfunctioning and thus cause data loss by using two methods of mirroring and parity.

Mirror is implemented by writing information into another set of (mirror) drivers. If the drive is lost when there is a mirror, you can rebuild the data on the lost drive by replacing the fault drive and rebuild the mirror set. Most RAID controllers provide a function of replacing a fault drive and re-image with Windows NT 4.0 and Windows 2000 and SQL Server online. Such a RAID system is typically referred to as a "hot-swap" drive.

There is a advantage of the image: if it needs to be fault tolerance, the performance it is implemented is the best in the RAID option. Please remember that each of the SQL Server is written to the mirror set, and two disk I / O operations are performed, and one such operation is performed on each side of the mirror set. Another advantage is that the mirroring is more than the fault tolerance provided by parity RAID. The image can make the system continue to run after at least one drive, and in the case where many drivers in the mirror set are faulty, may be able to support the system, rather than the forced system administrator shut down the server and from the file backup restore.

The disadvantage of the mirror is that the cost is high. The mirrored disk cost is: each of each drive that needs to be stored data has an additional driver. This actually doubles the storage cost, and for the data warehouse, the memory is usually one of the most expensive components required. RAID 1 and its mixed RAID 0 1 (sometimes called RAID 10 or 0/1) are achieved by mirroring.

The parity is implemented in this way: calculates the recovery information of the data written to the disk, and then writes the parity information to other drives constituting the RAID array. If a drive fails, a new drive will insert the RAID array and use this information to restore the data on the fault drive by extracting the recovery information (parity) written to other drives. data. RAID 5 and its mixing are achieved by parity.

The advantage of parity is low in cost. To protect any number of drives with RAID 5, you only need to add another drive. The parity information will be evenly distributed on all drives involved in the RAID 5 array. The disadvantage of parity is that performance and fault change. Since the cost is added at the time of calculation and writing parity, RAID 5 requires four disk I / O operations each time, while the mirror is only two disk I / O operations. The I / O operation cost of the image and parity is the same. However, reading operations typically occur on a failed drive, thereafter, the array must be offline, and the recovery must be performed from the backup medium to recover data.

General experience: Be sure to strike strip on any of the required multiple disks to achieve reliable disk I / O performance. The system monitor will indicate whether there is a disk I / O bottleneck on a particular RAID array. Please be prepared to add a disk as needed and redistributed the data into the RAID array and / or small computer system interface (SCSI) channel to balance disk I / O, and maximize performance. Hardware RAID controller onboard cache

Many hardware RAID controllers have some form of read and / or write cache. For SQL Server, this available cache function can significantly enhance the ability of the disk subsystem to efficiently process I / O. These controller-based cache mechanisms are: Collect smaller and possible I / O requests from the host server (SQL Server), and try to synthesize them with other I / O requests within a few milliseconds. Batch, so that batch I / O can form larger (32-128 kB) and may be order I / O requests to send to hard drives.

In order of sequential and large I / O requests facilitate improvement, follow this principle, because in the case where the hard drive can provide a fixed number I / O to the RAID controller, this helps to generate larger disks I / O throughput. The hard drive can handle more I / O per second, not how magical in the RAID controller's cache function, but because the RAID controller cache uses some organization mode to arrange incoming I / O requests, It may be fully utilized to secure the I / O processing capability of the basic hard drive.

These RAID controllers typically protect their cache mechanisms in some form of backup power. This backup power supply can help the data in the cache for a period of time (may be a few days) when power is powered off. If the database server is supported by the uninterruptible power supply (UPS), the RAID controller will have more time and opportunity to refresh the data into the disk. Although the server's UPS does not directly affect performance, it is indeed possible to provide protection for performance improvements provided by the RAID controller cache.

RAID level

As mentioned above, in the respective levels of RAID, RAID 1 and RAID 0 1 provide optimal data protection and optimal performance, but more costs are required for the required disk. When the cost of the hard disk is not a limiting factor, it takes care of performance and fault tolerance, RAID 1 or RAID 0 1 is the best choice.

The cost of RAID 5 is lower than RAID 1 or RAID 0 1, but it provides a poor fault tolerance and write performance. The write performance of RAID 5 is approximately half of RAID 1 or RAID 0 1 because the RAID 5 reads and writes parity information requires additional I / O.

Using RAID 0 to achieve optimal disk I / O performance (disk strip is not fault tolerant). Because RAID 0 does not provide fault-tolerant protection, it will never be used in a production environment, and it is not recommended to use it in the development environment. RAID 0 is usually only used for reference inspection or testing.

Many RAID array controllers provide RAID 0 1 (also known as RAID 1/0 and RAID 10) options by physical hard drive. RAID 0 1 is a mixed RAID solution. At a lower level, the controller images all data like normal RAID 1. At higher levels, it (like RAID 0) brought the data strip to all drives. Therefore, RAID 0 1 provides maximum protection (mirroring) and high performance (strip). Since these striped and mirror operations are managed by the RAID controller, they are transparent to Windows and SQL Server. The difference between RAID 1 and RAID 0 1 is at the hardware controller level. For a given storage, RAID 1 and RAID 0 1 require the same number of drives. For more information on specific RAID controllers in RAID 0 1, please contact the hardware supplier of the controller. The following illustration shows the difference between RAID 0, RAID 1, RAID 5, and RAID 0 1.

MarginWidth = "1" marginheight = "0" src = "/ china / tech / images / prodtechnol / SQL / 2000 / images / raid04.gif" frameborder = "0" width = "95%" height = "469"> If Your browser does not support embedded frameworks, click here to view in a separate page.

Note that in the above illustration, in order to accommodate data equivalent to four disks, RAID 1 (and RAID 0 1) requires eight disks, and RAID 5 only needs five disks. Be sure to consult your memory vendor to learn more about their specific RAID implementation.

Level 0

Because this level uses a disk file system named a strip, it is called a disk strip. Data is divided into multiple blocks and distributed all disks in the array in fixed order. RAID 0 distributes multiple operations to multiple disks so that these operations can be performed independently, thereby improving reading / write performance. RAID 0 is similar to RAID 5, but RAID 5 also provides fault tolerance.

The following illustration shows RAID 0.

Level 1

Because this level uses a disk file system called a mirror set, it is called a disk image. Disk mirroring provides an exact amount of redundant copies as the selected disk. All data written to the primary disk will be written to the mirror disk. RAID 1 provides fault tolerance and usually improve read performance (but may reduce write performance). The following illustration shows RAID 1.

level 2

This level adds redundancy by using the error correction method that distributes parity to all disks. It also uses disk strip policies to divide a file into multiple bytes and distribute the file to multiple disks. Compared to mirroring (RAID 1), the strategy only has a small improvement in disk utilization and reading / write performance. RAID 2 is not as high as other RAID levels, usually do not use it.

Level 3

This level uses the same strip method as RAID 2, but the error correction method only needs one disk for parity data. The use of disk space is different due to the number of data disks. RAID 3 provides some improvements in reading / writing performance. RAID 3 is also very useful.

level 4

The zip data block or paragraph used at this level is much larger than RAID 2 or RAID 3. Like RAID 3, the error correction method only needs one disk for parity data. It separates user data from error correction data. RAID 4 is not as efficient as other RAID levels, usually not used.

Level 5

This level is also known as a tape with parity, which is the most commonly used policy in new design. Similar to RAID 4, it uses data in large block forms to disk in the array. It is different that it writes a parity in all disks. Data redundancy is provided by parity information. Data and parity information are arranged on disk arrays, so both information is always on different disks. Compared to disk mirroring (RAID 1), strip with parity can provide better performance. However, when the band member is lost (for example, when the disk is faulty), the read performance will drop. RAID 5 is one of the most commonly used RAID configurations. The following illustration shows RAID 5. Level 10 (1 0)

This level is also known as a mirror like a strip. This level uses the strip-banded disk array, and the array is mirrored to another group of identical strip disks. For example, you can use four disks to create a stripped array. The stripped disk array is then mirrored using another set of stripped disks. RAID 10 provides a performance benefits of disk strip and disk redundancy brought by mirror. In all RAID levels, the number of disk provided by RAID 10 is the highest, and the price is twice as much as possible. The following illustration shows RAID 10.

MarginWidth = "1" marginheight = "0" src = "/ China / tech / images / prodtechnol / SQL / 2000 / images / raid5b.gif" frameborder = "0" width = "95%" height = "268"> If Your browser does not support embedded frameworks, click here to view in a separate page.

Online RAID extension

Using this feature can dynamically add a disk to a physical RAID array in the case of SQL Server to keep online. The added disk drive automatically integrates into the RAID memory. The method of adding a disk drive is to install them to a physical location called a hot-swappable driver slot or a hot-swappable slot. Many hardware vendors provide hardware RAID controllers capable of implementing this feature. Data will even be restroom evenly between all drives, including newly added drivers, and do not need to close SQL Server or Windows.

You can use this feature by reserving vacancy in the hot-swap slot in the disk array box. If SQL Server often causes the RAID array burden overweight due to the I / O request (this can be indicated by the disk queue length of the Windows logic drive letter associated with the RAID array), you may need to run in SQL Server. To install one or more new hard drives into the hot-swap slot. The RAID controller moves some existing SQL Server data to these new drives so that the data is evenly distributed to all drives in the RAID array. Then, the I / O processing capability of the new drive (75 non-sequential / 150 sequential I / O per second per second) is added to the overall I / O processing capability of the RAID array.

System monitor and RAID

In the system monitor (in Microsoft Windows NT? 4.0 for Performance Monitor), information on logical disk drives and physical disk drives can be obtained. The difference between the logical disk and the physical disk is that in the system monitor, the logical disk is associated with the contents of the Windows read logical drive letter, and the physical disk is associated with Windows read as a physical hard disk.

In Windows NT 4.0, By default, all disk counters of the Performance Monitor are in turn off, because they may have little effect on performance. In Windows 2000, by default, the physical disk counter is open, and the logical disk counter is turned off. Diskperf.exe is a Windows command that controls the type of counter that can be viewed in the system monitor. In Windows 2000, you have to get the performance counter data of the logical drive or storage volume, you must type diskperf -yv at the command prompt, then press Enter. This results in a disk performance statistical driver for collecting disk performance data reports data for logical drives or storage volumes. By default, the operating system uses the diskperf -yd command to get the data of the physical drive.

In Windows 2000, the syntax of Diskperf.exe is as follows:

Diskperf [-y [d | v] | -n [d | v]] [// computename]

parameter

(none)

Reporting whether the disk performance counter is enabled and identifies the enabled counter.

-y

Set the system to initiate all disk performance counters when the computer is restarted.

-YD

Enable the physical drive when the computer restarts.

-yv

Enable the disk performance counter of the logical drive or storage volume when the computer is restarted.

NN

Set the system to disable all disk performance counters when the computer is restarted.

-nd

Disable the disk performance counter of the physical drive.

-nv

Disable the disk performance counter of the logical drive.

// computername

Specify the computer to view or set the disk performance counter to use.

In Windows NT 4.0 and lower versions, Diskperf -y is used to monitor hard drives that do not use Windows NT Software RAID or the set of the RAID controller. When using Windows Software RAID, use DiskPerf -ye so that the system monitor will correctly report the physical counter between the Windows NT band. When combined with the DiskPerf -ye and Windows NT strips set, the information reported by the logical counter will not be correct and should be ignored. If you must use the logical disk counter information to be combined with the Windows NT strip, use DiskPerf -y.

When using Diskperf -y, the logical disk counter is correctly reported to the Windows NT strip set, but the information reported by the physical disk counter will not be correct and should be ignored.

Note that the DiskPerf command will work after restarting Windows (Windows 2000 and Windows NT 4.0 and lower versions are true).

Precautions for monitoring hardware RAID

Because the RAID controller provides multiple physical hard drives as a RAID mirror set or strip set to Windows, Windows reads the packet like reading a physical disk. The final abstract view of the actual underlying hard drive activity causes the performance counter to report information that may generate a misleading.

From the perspective of optimization performance, it is important to know how much physical hard drives are associated with a RAID array. This information will be required when determining the number of disk I / O requests for Windows and SQL Server to each physical hard drive. Reporting the System Monitor as a disk I / O request associated with a hard drive, divided by the actual physical hard drive number known in the RAID array.

To roughly estimate the I / O activity of each hard drive in the RAID array, you must also write the disk write I / O number of the system monitor report to 2 (RAID 1 and 0 1) or 4 (RAID 5). This will give the actual I / O request number to the physical hard drive to the physical hard drive, because it is the I / O capability of the hard drive in this physical level. However, when the hard disk RAID controller uses a cache function, this method cannot accurately calculate the hard drive I / O because the cache function greatly affects the direct I / O of the hard drive. When monitoring disk activity, it is best to focus on disk queues instead of the actual I / O of each disk. The speed of disk I / O depends on the transmission rate of the drive, and this rate cannot be adjusted. In addition to purchasing faster or more drives, you don't have any other measures, so there is no significance of concern if the I / O actual I / O occurs. However, you want to avoid too much disk queue. A large number of disk queues indicate that your I / O problem is. Because Windows cannot read the number of physical drivers in the RAID array, it is difficult to accurately estimate the disk queue of each physical disk. The approximate approximation can be determined by dividing the disk queue length by the number of physical drivers in the hardware RAID disk array of the observed logic drive. For the hard drive where the SQL Server file is located, try to make the number of disk queues less than two, which is ideal.

Software RAID

Windows 2000 Supports Software RAID. Software RAID provides a mirror set and strip set (with or without fault tolerance) through the operating system, providing fault tolerance. You can use the operating system process to set RAID 0, RAID 1 or RAID 5 function. Most large data warehouses use hardware RAID, but if your installation is relatively small, or if you choose to do hardware RAID, the software RAID can bring some data access and fault tolerance.

Software RAID does occupy some CPU resources because Windows must manage RAID operations that are typically managed by the hardware RAID controller. Therefore, in the case where the number of disk drives is the same, the performance provided by the Windows Software RAID is lower than the hardware RAID, especially when the usage rate of the system processor is close to 100% due to other purposes. By reducing the possibility of the I / O bottleneck, Windows Software RAID usually helps a set of drives to provide better services for SQL Server I / O compared to without software RAID. If you use software RAID, SQL Server should be able to better utilize CPUs because the server usually waits for the I / O request to complete.

Disk I / O parallelism

In order to improve the performance of large SQL Server databases stored on multiple disk drives, a valid method is to create a disk I / O parallel mechanism, which simultaneously reads and write multiple disk drives. RAID implements disk I / O parallelism through hardware and software. The next topic discusses the use of partitions to organize SQL Server data to further increase the disk I / O parallelism.

Use partitions to improve performance

For the SQL Server database stored on multiple disk drives, the performance can be improved by partitioning the data to increase the disk I / O parallelism.

A variety of methods can be used for partitioning. The creation and management methods of the partition include configuring a storage subsystem (disk, RAID partition), and application a variety of data configuration mechanisms in SQL Server (for example, files, file groups, tables, and views). Although this section focuses on some of the performance-related partition functions, Chapter 18, "Use partitions in the SQL Server 2000 Data Warehouse" also introduces the partition topic.

The easiest way to create disk I / O is to use hardware partitions and create a disk for all SQL Server database files (except for transaction log files, they always store disks that are physically separated and dedicated to log files. Drive on the drive) The driver pool is served. The driver pool can be a RAID array that is presented in Windows as a physical drive. You can use multiple RAID arrays and SQL Server file / file groups to set larger pools. A SQL Server file can be associated with each RAID array and combine these files into a SQL Server file group. Then, a database can be constructed based on the file group to uniformly distribute the data to all drives and RAID controllers. The driver pool method relies on RAID to divide the data between all physical drives, which helps ensure that the data is parallel to the data during the database server operation. The driver pool method simplifies the performance optimization of SQL Server I / O because the database administrator knows that only one physical location is available to create a database object. You can monitor the disk queue of a single drive pool, if necessary, add more hard drives to the pool to prevent disk queuing phenomena. In general, the utilization rate of the database cannot be determined, and this method is used to help optimize performance. It is best not to just because SQL Server may use a 5% time to isolate some of the I / O capabilities to the disk partition. "Single Drive Pool" method helps all available I / O capabilities to "always" available for SQL Server operations. It also allows I / O operations to be distributed to the maximum number of available disks.

The SQL Server log file should always be physically scattered to different hard drives, separated from all other SQL Server database files. For extremely busy SQL Server manageing multiple busy databases, transaction log files for each database should be physically separated to reduce contention.

Since the transaction logging is mainly written in order I / O, the log file is often significantly increased by I / O performance. A disk drive containing a log file can perform these sequential write operations very efficiently, provided that these operations are not interrupted by other I / O requests. Sometimes, you will need to read the transaction log during SQL Server operations (for example, copy, rollback, and latency updates). Some implementations use new data in real time to the data warehouse in the data repository, and copies the front end for their data conversion utility. Administrators who participate in the replicated SQL Server need to ensure that all disks for transaction log files have sufficient I / O processing capability to handle read operations that need to occur in addition to normal log transaction writing.

Physically divided documents and file groups require additional management. It turns out that these additional work is worthwhile in order to isolate and improve access to very active tables or indexes. Some benefits are listed below:

For I / O requirements for specific objects, you can make more accurate assessments, and if all database objects are placed in a large driver pool, this evaluation is not so easy. Use files and file groups to partition data and indexes, enhance the ability of administrators to create a finer backup and recovery strategy. Files and file groups can be used to maintain data in the order on the disk, thereby reducing or eliminating non-order I / O activities. If the data loaded to the data warehouse requires parallel execution to meet the final deadline, this feature is especially important. In the database development and reference test phase, it may be adapted to make physical splitting files and file groups, which collects database I / O information and applies it to capacity plans for the production database server environment.

Precautions for object partitions

SQL Server activities can be separated between different hard drives, RAID controllers, and PCI channels (or three combinations):

Transaction Log TEMPDB Database Table Non-Gathering Index Note In SQL Server 2000, Microsoft enhances distributed partition views, using this view to create a joint database (usually called extended), this database will use resource load and I / O activity Distributed on multiple servers. The combined database is suitable for some high-end online analysis processing (OLTP) applications, but it is recommended not to use this method to solve the needs of the data warehouse.

Using hardware RAID controllers, RAID hot-swappable drivers and online RAID extensions can easily implement physical splitting of SQL Server I / O activity. The most flexible method is to arrange the RAID controller, allowing separate RAID channels to associate with different activities described above. Also, each RAID channel should be connected to a separate RAID hot-swap cabinet to take advantage of the online RAID extension function (if this feature can be used via the RAID controller). Subsequently, the Windows logic drive letter will be associated with each RAID array, and SQL Server files are separated between different RAID arrays based on known I / O mode modes.

Using this configuration, it is possible to re-associate a disk queue associated with each activity and associated with a different RAID channel and its driver cabinet. If a RAID controller and its drive array enclosure support online RAID extension, and there is a hot-swappable hard drive's slot in the cabinet, simply add more drivers to the RAID array until the system monitor reports The disk queue of the RAID array has reached an acceptable level (preferably less than two for the SQL Server file), you can solve the disk queue problem of the RAID array. This can be done when SQL Server online.

Separate transaction log

The storage device of the maintenance of the transaction log file should be physically separated from the device where the data file is located. Different from your database recovery model settings, most update activities generate both data equipment activities and log events. If you set these two events to share the same device, the operations to be executed will strive to be the same limited resource. Most installations benefit from these competition I / O activities.

Separation Tempdb

SQL Server creates a database called Tempdb on each server instance for a shared workspace for a variety of activities, including: temporary table, sort, processing subqueries, generated aggregation to support Group By Or Order By clause, using the Distinct query (must create a temporary worksheet to delete the duplicate row), cursors, and hash links. By dividing Tempdb on its own RAID channel, we make TEMPDB I / O operations in parallel with their related transactions. Since Tempdb is actually a draft area, it is more frequent, so RAID 5 is not a good choice for Tempdb, while RAID 1 or 0 1 is better. Although RAID 0 does not provide fault tolerance, it can be considered to use it for TEMPDB because Tempdb will be regenerated every time you restart the database server. RAID 0 brings the best RAID performance for Tempdb using the least physical drive, but the main concern when RAID 0 is used for TEMPDB in the production environment is: If there is a physical drive, including a drive for Tempdb, it is faulty, It may affect the availability of SQL Server. This can be avoided if Tempdb is placed on a RAID configuration with fault tolerance.

To move the Tempdb database, use the Alter Database command to change the physical file location of the SQL Server logical file name associated with TEMPDB. For example, you want to move Tempdb and a log associated with it to a new file location E: / MSSQL7 and C: / TEMP, use the following command: Alter Database Tempdb Modify File (Name = 'Tempdev', filename = 'E: / MSSQL7 / TEMPNEW_LOCATION.MDF ') Alter Database Tempdb Modify File (Name =' Templog ', filename =' c: /temp/tempnew_loglocation.mdf ')

Master database MSDB and Model databases are rarely used in production compared to user databases, so it is usually not necessary to consider them when considering optimizing I / O performance. The Master database is usually only used to add new logins, databases, devices, and other system objects.

Database partition

You can use files and / or file groups to partition the database. The file group is just a name collection that combines multiple separate files for management purposes. A file cannot be a member of multiple file groups. Tables, indexes, ntext, and image data can be associated with a specific file group. That is to say, all of them are allocated from the files in the file group. Here is three types of file groups.

Primary file group

This file group contains master data files and all other files that are not placed in another file group. All pages of the system table are allocated from the primary file group.

User-defined file group

The file group is any file group specified by the FileGroup keyword in the CREATE DATABASE or ALTER DATABASEFileGroup statement or any file group specified on the Properties dialog in the SQL Server Enterprise Manager.

Default file group

The default file group contains all tables and indexed pages that are not specified when creating the file group. In each database, only one file group can be a default file group each time. If the default file group is not specified, the primary file group is the default file group.

Files and file groups are useful for the location of control data and indexes, and eliminate device contention. A considerable part of the installation also uses files and file groups as a more detailed mechanism than the database granularity in order to control more of their database backup / recovery policies.

Horizontal partition (table)

The horizontal partition divides a table into multiple tables, each table contains the same column number, but the number of rows will be reduced. How to level the table to be determined according to the analysis data. According to the general experience, when the table is partitioned, the table referenced by the query should be made as few as possible. Otherwise, it will affect the Union query of the logical merge table when the query will affect performance.

For example, assuming corporate requirements: We have to store transaction data that rolls in ten years to the central fact table of our data warehouse. Our company's ten years of business data means that the data will exceed one billion lines. Number of any content management will be difficult. Now, consider that we must remove the tenth year annually, then load the latest year.

Administrator usually adopted is: Create ten independent but structural tables, and stored a year of data in each table. The administrator then defines a joint view on the basis of this ten table so that the end user sees all the data is placed in a table. This is actually not the case. Any queries performed for this view are optimized to search only for the designated year (and the corresponding table). However, the administrator does have managed management capabilities. Now, administrators can manage annual data in a granular manner. The annual data can be loaded, indexed or maintained separately. Adding a new year is simple: remove the view, remove the data containing the decade data, and then redefine the new view to include the new year.

When you partition the data between multiple tables or multiple servers, only the query of the partial data is faster because the data to be scanned is relatively small. If these tables are on different servers, or on a computer having multiple processors, you can also scan each table involved in parallel, thereby improving query performance. In addition, the execution speed of the maintenance task (eg, reconstructing an index or backup table) will be faster.

By using a partition view, the data is still displayed as a table, and the corresponding base table can be not required when querying the data. If you meet the following, the partition view can be updated. For more information on partition views and their restrictions, see "SQL Server Book".

In this view, the INSTEAD OF trip is defined with the logic that supports INSERT, UPDATE, and DELETE statements. This view and INSERT, UPDATE, and DELETE statements follow rules defined by the updatable partition view.

Separate non-aggregation index

Index resides in the B tree structure, set a different file group by using the Alter Database command, which can be separated from their related database tables (except for the index). In the example below, the first ALTER DATABASE creates a file group. The second ALTABASE adds a file to the newly created file group.

Alter Database Testdb Add FileGroup TestGroup1aTer Database Testdb Add file (name = 'Testfile', filename = 'E: /MSSQL7/test1.ndf') To FileGroup TestGroup1

After creating a file group and its associated file, you can specify the file group when you create an index, thereby using the file group to store the index.

Create Table Test1 (Col1 Char (8)) Create INDEX INDEX1 On Test1 (COL1) on testgroup1

SP_HELPFILE will return information about files and file groups in a given database. In the output result of the sp_help

, the section provides information about the index of the table and its file group relationship.

sp_helpfilesp_help test1

Parallel data search

SQL Server can scan data in parallel while running on a computer having multiple processors. If a table is in a file group containing multiple files, you can perform multiple parallel scans on the table. Just access a table in order, you will create a standalone thread to read each file in parallel. For example, if you completely scan the table created on a file group containing four files, you will use four separate threads to read data in parallel. Therefore, creating multiple files for each file group will help improve performance, as this will use stand-alone threads to scan each file in parallel. Similarly, when a query is coupled to the table on different file groups, each table can be read in parallel to improve query performance.

In addition, any text, ntext or image column in the table can be created on file groups other than the file group in which the base table is located.

In the end, there will be excessive parallel threads, causing a bottleneck in the disk I / O subsystem, and the saturation point will be reached. These bottlenecks can be determined by using the system monitor to monitor the PhysicalDisk object and disk queue length counter. If the disk queue length counter is greater than 3, consider reducing the number of files.

In order to increase throughput by using multiple files parallel access, it is beneficial to distribute as many data as possible to as many physical drives as possible. To distribute the data on all disks, first set hardware-based disk strips, then distribute the data to multiple hardware strips based on the file group as needed.

Parallel inquiry suggestion

SQL Server automatically performs queries in parallel. This will optimize the query on the multiprocessor computer. Work will be subdivided into multiple threads (affected by threads and memory availability), rather than a query to use an operating system thread, which will be faster and more efficient when completing complex queries. The optimizer in SQL Server will generate a plan for the query and determine when the query will be executed in parallel. The following conditions are determined:

Does the computer have multiple processors? Is there enough memory to perform a query in parallel? What is the CPU load on the server? Which type of query is being running?

If SQL Server is allowed to run parallel operations in parallel (eg DBCC and create indexes), the pressure of the server resources will be changed, and when you perform a heavy parallel operation task, you may see the warning message. If a warning message for resource insufficient resources often occurs in the server error log, consider using the system monitor to investigate which resources (for example, memory, CPU usage and I / O usage) are available.

Do not run a lot of queries in parallel when there is an active user on the server. Try to perform maintenance jobs (for example, DBCC and create indexes) in a time period without a load. These homework can be performed in parallel. Monitor disk I / O performance. Observe the disk queue length in the System Monitor (in Windows NT 4.0 for Performance Monitor), determine whether to upgrade the hard disk or reset the database to different disks. If the CPU has very high usage, upgrade or add more processors.

The following server configuration options may affect the parallel execution of the query:

Parallel cost threshold maximum parallelism maximum work thread query controller cost limit

Optimize data load

When accelerating data loading activities, you must remember a number of tips and methods. These methods may vary depending on your initial data loading or incremental data. In general, incremental loading is more complex and is more restrictive. The way you choose may also be based on the factors you cannot control. Processing window requirements, selected storage configuration, server hardware restrictions, etc., will affect the options you use.

Some common points need to be remember when performing initial data loading and incremental data loading. The following topics will be discussed in detail below:

Choosing the appropriate database recovery model Use BCP, BULK INSERT, or mass Copy API control lock behavior in parallel loading data miscellaneous, including:

Bypassing the influence of the reference integrity check (constraints and triggers) loading pre-sorted data deletion index

Select the appropriate database recovery model

We have discussed the database recovery model in the "Influence Configuration Options" section. Be sure to remember that the selected recovery model may have a big impact on the time required to perform data loading. These recovery models main control will write the amount of data in the transaction log. This is very important because the write operation is performed on the transaction log.

Log records and minimum logs record large-capacity copy operations

When using a fully recovery model, all insert lines performed by a large capacity data load mechanism (will be discussed below) are recorded in the transaction log. For large data loading, this may result in fast fill transaction logs. In order to help prevent the space of the transaction log, the minimum log records large-capacity replication operation can be performed. Is the log record or a large-capacity replication in the form of a logged record does not use a part of a large-capacity replication operation; it depends on the status of the database and table involved in the large capacity replication. If all the following conditions are met, there will be a large-capacity replication without logging:

The recovery model is "simple" or "large-capacity logging", or the database option Select INTO / BULKCOPY is set to true. The target table is not copied. The target table does not have an index, or if the target table has an index, it is empty when the large capacity is copied. The TabLock prompt is specified using BCP_Control if setting the EOPTION to BCPHINTS. Any large-capacity replication in the SQL Server instance that does not satisfy the above conditions will be fully recorded.

When performing an initial data loading, it should always be run under the "large-capacity logging" or "simple" recovery model. For incremental data loading, as long as the possibility of data loss is low, consider using the "large-capacity logging" model. Because many data warehouses are basically only read-only or fewer, the database recovery model is set to "Large-capacity logging" does not generate any problems.

Copy API using BCP, Bulk INSERT or mass

There are two mechanisms inside SQL Server to solve the needs of large-capacity mobile data. The first mechanism is a BCP utility. The second mechanism is the Bulk INSERT statement. BCP is a command prompt utility that copies data to SQL Server and copy data from it. In SQL Server 2000, the BCP utility is rewritten with the ODBC Large Capacity Copy Application Programming Interface (API). Earlier versions of the BCP utility are written using DB-Library large-capacity replication API.

Bulk INSERT is the Transact-SQL statement included with SQL Server, which can be implemented from the database environment. Unlike BCP, Bulk INSERT can only pull data into SQL Server. It cannot be launched. An advantage of using Bulk INSERT is that it can be copied to the SQL Server instance using the Transact-SQL statement without having to exit the interpreter to transfer to the command prompt.

The third option is a large-capacity replication API, and programmers usually are very interested in this option. With these APIs, programmers can use ODBC, OLE DB, SQL-DMO, or even DB libraries to transfer data into or removed from SQL Server.

All these options allow you to control the batch size. Unless you use small capacity data, it is best to get used to specify the batch size for recovery. If the batch size is not specified, SQL Server will submit all the lines to be loaded as a batch. For example, you try to load 1,000,000 line new data into a table. The server suddenly turned off after processing, 999,999. When the server is recovered, the 999,999 lines are processed from the database, and then try to reload the data. You can save your own recovery time by specifying a batch size to 10,000, because you have submitted 1 to 990,000 rows to the database, so you will need to roll back to 9,999 rows (not 999,999 rows). Similarly, if a batch size is not specified, you will have to restart the loading process from the first line to reload the data. If the batch size is specified as 10,000 lines, only the load process is restarted from the 990th, 001 line, which is highly around 990,000 rows that have been submitted.

Control lock behavior

The BCP utility and BULK INSERT statement accept TabLock prompts that allows users to specify locking behaviors to use. Tablock specifies that a large-capacity update table-level lock will be used during a large capacity replication operation. Use TabLock to reduce the content of the lock-locking, thereby improving the performance of the large-capacity replication operation. This setting has a very important meaning when the parallel load is handled for a single table (discussed in the next section).

For example, to copy the data in the authors.txt data file to the Authors2 table in the Pubs database, specify the table-level lock and execute it from the following command line prompt: bcp pubs..AUTHORS2 in Authors.txt -c -t, -sservername -usa -ppassword -h "tablock"

Alternatively, you can use the BULK INSERT statement to copy data from the query tool (such as SQL query analyzer), as shown in the following example:

Bulk INSERT PUBS..AUTHORS2 from 'c: /authors.txt'with (DataFiletype =', Fieldterminator = ',', Tablock)

If Tablock is not specified unless set to Table Lock on Bulk Load Option to ON, the default lock will use a row lock. Use the Table Lock on Bulk Load option with the sp_tableoption command, or set the locking behavior of the table during large capacity load operation.

Table Lock on Bulk Load table Lock behavior OFF uses the row lock ON to use the table-level lock

Note If a TabLock prompt is specified, it will replace the settings that declare the sp_tableoption during a large-capacity loading process.

Parallel loading data

Parallel loading - non-partition table

Data can be loaded into a non-regional table using any of the large-capacity data load mechanisms in SQL Server. This is done by simultaneous running multiple data loading. Before you start loading, you need to split data to be loaded in parallel into multiple separate files (large capacity inserted into the data source). Then, all independent load operations can be started simultaneously to load data in parallel.

For example, suppose you need to load a merged database for service companies operating in four regions around the world, each region is reported to the report time (hours) on the bills of the customer per month. For large service organizations, this may indicate that a large amount of transaction data is needed. If the four reporting areas provide separate files, you can use the four files to a table in a table using the methods described above.

Note that the number of parallel threads (loads) of parallel should not exceed the number of available processors for SQL Server.

The following illustration illustrates parallel loading of the non-subregion table.

Marginwidth = "1" marginheight = "0" src = "/ china / tech / images / prodtechnol / SQL / 2000 / images / dbfile01.gif" frameborder = "0" width = "95%" height = "301"> If Your browser does not support embedded frameworks, click here to view in a separate page.

Parallel loading: horizontal partition (table)

This section focuses on how to use a horizontal partition table to increase the speed of data loading. In the previous section, we discussed the load data from multiple files into a (non-partition) table. If the table is partitioned, the device contention can be reduced, thereby organizing the continuity of the data and accelerating the loading process. Although the above figure shows that the data is loaded into different parts of the table, such a statement may not be accurate. If all three threads in the above-described load are handled simultaneously, the extended panel extracted for the table may be a mixed state. Once the data is mixed, it may result in optimum performance when retrieving data. This is because the data is not stored in a physical consecutive order, which may cause the system to use discontinuous I / O to access it. Based on this table, the aggregation index will solve the above problem because the data is read in the continuous order, and the button order is sequentially sorted and written back. However, reading, sorting, deleting old data, and putting new sorted data backup may be a very time consuming task (see loading pre-sorted data below). To avoid this mixing situation, consider using the file group to keep multiple continuous spaces in the position where you can store a big table. Many installations also use the file group to separate index data with table data.

To facilitate elaboration, it is assumed that there is a data warehouse allocated on a large physical partition. Any load operation that is parallel to the database may result in a non-continuous (mixed) state to store affected data / index pages. Which operation will you do? Any operation modified to data will result in data to become discontinuous. In order to meet the requirements of the processing window, the user may try parallel to perform initial data loading, incremental data loading, index creation, index maintenance, insertion, update, delete, etc.

The following illustration shows the table partition across multiple file groups.

MarginWidth = "1" marginheight = "0" src = "/ china / tech / images / prodtechnol / SQL / 2000 / images / dbfile02.gif" frameborder = "0" width = "95%" height = "565"> If Your browser does not support embedded frameworks, click here to view in a separate page.

Loading a pre-sorted data

Earlier versions of SQL Server provide an option that you can use to specify the sorted_data option when creating an index. SQL Server 2000 cancels this option. In earlier versions, specify this option as part of the CREATE INDEX statement is: it allows you to avoid sorting during index creation. By default, the data in the table is sorted during the processing during the SQL Server. To get the same effect in SQL Server 2000, consider creating a gathering index before large capacity loading data. Large capacity operations in SQL Server 2000 uses enhanced index maintenance policies, so that for tables that have aggregated indexes, data import performance can be improved, and the data is reordered after importing.

FillFactor and Pad_index impact on data loading

FillFactor and Pad_index will be introduced more fully in a section entitled "Index and Index Maintenance". For FillFactor and Pad_index, you need to remember the key: When you create an index, if you reserve them as default settings, you may cause SQL Server to perform more required to store data, more required write and read I / O operations. . This is the case if there is no more write activity in the data warehouse but a large number of read activities has occurred. To let SQL Server write more data on a page data page or index page, you can specify a specific FillFactor when creating an index. It is best to specify PAD_INDEX when providing a FillFactor value.

General criteria for initial data loading in loading data

Deleting an index (the only exception may be when loading pre-sorted data, see above)) Use Bulk Insert, BCP, or Large Copy API Use Partition Data File Parallel to Partition Table for Each Available CPU Running One Load Flow Settings "Large-capacity Logging" or "Simple" Recovery Model Using TabLock Options

After loading data

Create an index to switch to the corresponding recovery model to perform a backup

General guidelines for incremental data loading

Use the index to load the data to the appropriate position. The lock particle size (sp_indexoption) should be determined based on performance and concurrency requirements. Unless otherwise reserved time recovery (for example, the online user modifies the database during large capacity load), change the recovery model from "full" to "large-capacity logging". Reading operations should not affect large capacity loading.

Index and index maintenance

The I / O feature of the server hardware device has been discussed earlier. Now we will discuss how SQL Server data and index structure are physically placed on disk drives. If you want to improve performance after design is completed, the index location may be a maximum factor affecting the data warehouse.

Index Type in SQL Server

Although SQL Server 2000 introduces several new index types, they are all based on two core forms. The format of these two core forms is aggregated or non-aggregated indexes. In SQL Server, database designers can use the following two main types of index:

Gather index. Non-aggregated index.

These two main types of other variants include:

Unique index. Calculate the index of the column. Index view. Full-text index.

The following sections will detail each index mentioned above (outside the full text index). Full-text index is a special case, which is different from other database indexes, which is not introduced. The indexing view is a new index in SQL Server 2000, which should cause special attention to the data warehouse user. Another new feature introduced in SQL Server 2000 is to create an index in ascending or descending.

Index work principle

Index in the database is similar to the index in the book. In a book, you can quickly find the information without having to read the whole book. In a database, the database program uses the index to find the data in the table without having to scan the entire table. The index in the book is a word and a list of page numbers in each word. The index in the database is the list of values ​​in the table and the list of value storage locations (rows in the table).

The index can be created for a column or a set of columns in the table and implemented in the form of B. The index contains an entry and one or more columns (search keys) corresponding to each row in the table. B tree is stored in an ascending or descending order according to the sorting order of the search key (depending on the option selected when creating the index), and uses any preamble set of the search key to efficiently search the B tree. For example, the index of the A, B, and C column can be efficiently searched using the following combination: A; A and B; A, B and C.

When you create a database and optimize its performance, you should create an index for finding data for columns used in the query. In the PUBS sample database included in SQL Server, there is an index in the EMP_ID column of the Employee table. When the statement executed by the user is looking for data in the Employee table according to the specified EMP_ID value, the SQL Server query the processor identifies the index of the EMP_ID column and uses the claim to find data. The following illustration illustrates how the index stores each EMP_ID value and points to the data of the data having the corresponding value in the table.

MarginWidth = "1" marginheight = "0" src = "/ china / tech / images / prodtechnol / SQL / 2000 / images / fund01.gif" frameborder = "0" width = "95%" Height = "336"> If Your browser does not support embedded frameworks, click here to view in a separate page. However, the table with indexes need to take up more storage space in the database. Similarly, the runtime used to insert, update, or delete data, and the processing time required to maintain the index will be longer. You must pay attention when you design and create an index: performance benefits are more important than the additional cost caused by storage space and resource.

Index intersection

There is a unique function in the SQL Server query processor: perform an index intersection. This is a special form of index coverage, we will be detailed later, but now you need to mention the index intersections because of the following two reasons. First, it is a technology that may affect your index design strategy.

Second, this technology may reduce the number of index you need, so that the disk space occupied by large databases can be saved.

The index intersection allows the query processor to use multiple index to solve the query. Most database query processors use only one index when trying to solve the query. SQL Server can combine multiple indexes in a given table or view, generate a hash table based on these indexes and utilize a hash table to reduce the I / O operation of a given query. Introduction, the hash table generated from the index intersection becomes an overlay index, and the I / O performance it provides is the same as the overlay index. In the database user environment, it is difficult to predetermine all queries that will be run for the database, while the index intends to provide greater flexibility for this environment. In this case, preferred strategies are for all non-clustered indexes that are often defined by columns that are often queried, and allows index intersection processing to cover the index.

The following example uses an index intersection:

Create Index IndexName1 on Table1 (Col2) Create Index IndexName2 on Table1 (Col3) Select Col3 from Table1 Where Col2 = 'Value'

When performing the above query, you can resolve this query quickly by combining these claims.

Index structure in SQL Server

All indexes in SQL Server are physically built based on B-tree indexes stored on the 8 KB index page. Each index page has a page, and the page is the index line. Each index line contains a key value and a pointer to the row index page or the actual data line. Each page in the index is also referred to as an index node. The top layer node of the B tree is called the root node. The underlying node in the index is referred to as a leaf node. Any index layer between the roots and leaves is collectively referred to as intermediate layers or nodes. The pages in each layer index are linked together in the two-way link list.

The SQL Server data page and the index page are 8 KB. The SQL Server data page contains all data associated with a row in the table (except for text and image data). In terms of text and image data, by default, the SQL Server data page containing the row associated with the text or image column will contain a pointer, the pointer points to one or more 8 KB pages containing the text or image data. Binary (or B) structure. A new feature in SQL Server 2000 is to store small text and image values ​​in rows, which means that small text or image columns will be stored on the data page. This feature can reduce I / O operations because the additional I / O necessary to extract the corresponding image or text data can be avoided. For information on how to set the table to store text or images in the row, see "SQL Server online from books".

Aggregate index

The aggregation index is very useful for retrieving a range of data values ​​from the table. Non-aggregated index is best suited to retrieve a specific row, and the aggregation index is best suited to retrieve certain ranges. However, since each table is only allowed to use a gathering index, it is not always possible to create which type of index to create according to this simple logic. There is a simple physical reason for this problem. For the upper part of the aggregated index B tree structure, if the non-aggregated index portion is tissue like the non-aggregated index portion, the base layer of the aggregation index is composed of the actual 8 kb data page of the table. But this situation has an exception, that is, when you create a gathering index on the basis of the view. Since the index view will be described below, we will discuss the aggregation index created for actual tables. When creating a gathering index for a table, the data associated with the table is read in the same order as the index search button, and the data is sorted and the database is physically stored back to the database. Because the data of the table can only be saved in the memory in one order, it is not repeated, so it conforms to a gathering limit. The figure below describes a memory that aggregates indexes.

Marginwidth = "1" marginheight = "0" src = "/ china / tech / images / prodtechnol / SQL / 2000 / images / bokind2.gif" frameborder = "0" width = "95%" height = "643"> If Your browser does not support embedded frameworks, click here to view in a separate page.

Gather indexes and performance

The aggregation index has some natural features that affect performance.

When using the aggregation index to retrieve the SQL Server data according to the search key, the pointer jump is not required (which can cause the location on the hard disk to be changed in order to change) to retrieve the associated data page. This is because the leaf layer of the aggregation index is actually the associated data page.

As mentioned earlier, the leaf layer (of course, data that also includes the table or index view) is physically sorted and stored in the same order as the search button. Since the leaf layer of the aggregation index contains the actual 8 KB data page of the table, the entire table's row data is arranged in a disk drive in the order determined by the aggregation index. This will bring potential I / O performance advantages from the table from the table based on the value of the aggregation index, because the sequential disk I / O is used (unless the page has occurred on the table) Spread, this situation will be discussed in section "FillFactor and PAD_INDEX"). Because of this, when retrieving a large number of rows, be sure to select the aggregation index on the table according to the columns used to perform the scope.

The rows associated with the aggregation index must be sorted and stored in the same order as the index search key, this has the following meaning:

When you create a gathering index, the table will be copied, the data in the table will be sorted, then the original table will be deleted. Therefore, there must be enough free space to store a copy of the data. By default, the data in the table will be sorted when the index is created. However, if the data has been sequenced in the correct order, the sort operation is automatically skipped. This will significantly speed up the index creation process. The order in which the data is loaded into the table should be as similar to the sequence of the search key you plan to generate the collected index. For large tables (such as those that usually become a data warehouse feature), the method will greatly accelerate the index creation process, thereby shortening the time required for your initial data loading. This method can be used if the order in the table remains in the table remains in the order in which the aggregated index is still not created. Any line sort is incorrect, and the operation will be canceled, and the corresponding error message will appear and the index will not be created. Similarly, the I / O required for the aggregation index for the data generated by the row, because the data is not required, sort the data, save the data back to the database, then delete the old table data, but will Leave the data in the extended panel assigned to it. The index extension panel is just added to the database to store the top-level node and the intermediate node. Note that the preferred method for generating indexes for large surfaces is: Mr. aggregate index and generate a non-aggregated index. In this way, it is not necessary to regenerate the non-aggregated index due to data movement. When all indexes are removed, the non-aggregated index will be removed, and finally remove the aggregation index. This way, you don't need to regenerate indexes.

Non-aggregated index

The non-aggregated index is best suited to extract a few a few rows with good selectivity from a large SQL Server table according to a particular key value. As mentioned earlier, the non-aggregated index is a binary tree formed from an 8 KB index page. The underlying or leaf layer of the index page binary tree contains all the data in columns that make up the index. When retrieving information from the table using a non-aggregated index according to the matching item of the key value, the indexed B tree is traveled until the matching item of the key is found in the indexed leaf layer. If you need a column that does not constitute an index in the table, the pointer will jump. This pointer jump will be possible to perform non-sequential I / O operations for disks. It may even need to read data from another disk, especially when the table and its accompanying index B tree are large. If multiple poke points to the same 8 kb data page, the impact on the I / O performance will be smaller because it only needs to read the page into the data cache once. If the SQL query involves searches with non-aggregated indexes, at least one pointer jump is required for each row returned to the query.

Note Since each jump is brought about by the pointer, the non-aggregated index is more suitable for processing only the query that only returns only one or several rows in the table. The aggregation index is more suitable for processing a series of queries that require a series of rows.

The following figure illustrates the storage of non-aggregated indexes. Note that the added leaf layer points to the corresponding data page. When using a non-aggregated index, the added pointer jump will play there when the aggregation index is not aggregated. For more information on non-aggregated indexes, see "SQL Server Book".

Marginwidth = "1" marginheight = "0" src = "/ china / tech / images / prodtechnol / SQL / 2000 / images / bokind1.gif" frameborder = "0" width = "95%" height = "650"> If Your browser does not support embedded frameworks, click here to view in a separate page.

Single index

The aggregated index and non-aggregated index can be used for uniqueness in the mandatory, and the method is to specify the UNIQUE keyword when creating an index on the existing table. Another way to make sure that the uniqueness in the table is to use the Unique constraint. As unique index, UNIQUE constraints for the uniqueness of each value in a column. In fact, the assignment of UNIQUE constraints automatically creates a base unique index to facilibrain the constraint. Since uniqueness can be defined and recorded as part of the CREATE TABLE statement, UNIQUE constraints typically take the creation of separately unique indexes. Calculate the index on the column

SQL Server 2000 introduces the function of creating an index on the calculation column. If the query is submitted in a general manner, it will provide a calculation column, but the administrator is not willing to use this function if the administrator is just to allow creating an index and continue to store data in the actual table column. very convenient. In this case, the index can be created by reference to the calculation column as long as the column meets the entire conditions required. Other restrictions include, calculating column expressions must have certain determinism, accurate, and must not be taken to TEXT, NTEXT, or IMAGE data types.

Certainty

If you want to create an index, the view, and computational columns are clicked on the view or computational column, no deterministic user-defined functions are invoked. All functions either have certain determinism, or no determinism:

These functions always return the same results whenever a certain set of functions are used to use a particular input value. The results returned by these functions may vary each time with a specific set of entered values, these functions may vary.

For example, the DATEADD built-in function has certain determinism because a set of given parameter values ​​incorporated for three input parameters of this function, it always returns a predictable result. GetDate is not determined. Although the GetDate function is always used to wake up with the same parameter value, the value returned each time the call is different.

accurate

If the following conditions are met, the calculation column expression is accurate:

It is not an expression of a Float data type. It does not use the Float data type in your own definition. For example, in the following statement, column Y is int and has certain determinism, but not accurate. Create Table T2 (A Int, B Int, C Int, X Float, Y ask X WHEN 0 THEN A WHEN 1 THEN B ELSE C END)

The ISPRECISE property of the ColumnProperty function reports computed_column_expression is precise.

Note that any Float expression is considered inaccurate, and cannot be used as an index; Float expression can be used in an index view, but cannot be used as a key. This rule is equally applicable to computational columns. Any function, expression, user-defined function, or view definition, as long as any Float expression, including logical expressions (comparison), are considered nothing to determine.

If you create an index on a column or view, INSERT or UPDATE operation that is previously executed correctly can not be executed now. This class cannot be performed when the calculation column causes an arithmetic error. For example, although the calculated column C in the table is caused by an arithmetic error, the INSERT statement will work:

Create TABLE T1 (A INT, B INT, C A / B) GOINSERT INTO T1 VALUES ('1', '0') GO

If it is changed to create an index on the calculation column C after creating the table, the same INSERT statement will fail.

Create Table T1 (A INT, B INT, C AS A / B) GOCREATE UNIQUE Clustered Index Idx1 on T1.cgoInsert INTO T1 VALUES ('1', '0') Go Index View

The indexing view is to achieve quick access, while continuously stored in the database and create an index. Like any other view, the index view also relies on the base table to provide view data. Such correlation means that the index view may become invalid if the base table for the index view provides data. For example, a column that renamed a view provides data to make this view invalid. To avoid such problems, SQL Server supports creating a view with schema binding. Architecture Binding Prohibits any modifications that will make the views invalid. The index view created using the view designer automatically gets the schema binding because SQL Server requires that the index view has a schema binding. Architecture Binding is not to say that you can't modify the view; it means that you cannot modify the base list or view in a way that change the view results set. In addition, just like the index on the calculation column, the index view must also have certain determinism, accurate, and must not contain text, ntext or image.

The index view is best in the case where the basic data is not updated frequently. Maintaining the cost of the index view may be higher than the cost of maintenance table indexes. If the underlying data update is frequent, the maintenance cost of the index view data may exceed the performance benefits that use the index view.

The indexing view improves the performance of the following types of queries:

Handle multi-line coupled and aggregation. Many queries often perform joints and aggregation operations. For example, in an OLTP database of a record list, many queries are expected to join the Parts, Partsupplier, and Suppliers tables. Although each query that implements this connection does not necessarily handle many rows, the joint processing of thousands of queries is still very large. Because these relationships are often updated, the overall performance of the entire system can be improved by defining the index view of the storage link result. Decision supports workload. The analysis system is characterized by storing summary data, aggregating data. Many decisions support queries are characterized by further aggregation data and many rows.

Indexing views usually do not improve the performance of the following queries:

The OLTP system that is often written. A database that is often updated. Does not involve aggregation or joint queries. A data aggregation of a high degree of bond. The degree of base is high means that the key contains many different values. The only key has the highest level of the base, because the value of each key is different. The index view improves performance by reducing the number of rows that the query must access. If the number of rows of view results set is almost the same as the number of rows of the base table, there is almost no performance income without any performance income. For example, for tables with 1,000 rows, use this query: SELECT PRIKEY, SUM (Salescol) from exampleTableGroup By Prikey If the base of the table key is 100, the index view generated by using the result of this query is only 100 lines. The average number of queries required to use this view is one tenth of the number of reads. If the key is a unique key, the base of the key is 1000, and the view result set will return 1000 lines. Use this index view without directly reading the base table, the query will not bring any performance improvements. Expand the join, these coupling is a view of the result set greater than the original data in the base table.

The index of your design should meet multiple operations. Because, even if the index view is not specified in the FROM clause, the optimizer can also use an index view, so the design a good index view can accelerate the processing speed of many queries. For example, consider creating an index on the following view:

Create View ExampleView (PriKey, Sumcolx, Countcolx) Asselect Prikey, Sum (Colx), Count_Big (Colx) from MyTableGroup by Prikey

The view not only meets the query of the direct reference column, but can be used to meet the query base form table and contain an example of an expression such as SUM (COLX), Count_Big (Colx), COUNT (COLX), AVG (COLX). All speeds of all such queries will be faster because they only need to retrieve a few columns in the view, not to read all the columns in the base table. The first index created on the view must be a unique aggregation index. Once you have created a unique aggregation index, you can create additional non-aggregated indexes. The index naming rules on the view are the same as the index naming rules on the table. The only difference is that the table name will replace the view name.

If the view is removed, all indexes on the view will also be removed. If you remove the aggregation index, all non-aggregated indexes on the view will also be removed. The non-aggregated index can be removed separately. In addition to the aggregation index on the view delete the stored result set, the optimizer will resume to process the view in a standard view.

Although only columns constituting the collected index key are specified in the Create Unique Clustered Index statement, the full result set of the view is stored in the database. As in the aggregation index of the base table, the B tree structure of the aggregated index contains only key columns, but the data line contains all the columns in the view results.

Note Any version of SQL Server 2000 can create an index view. In the SQL Server 2000 Enterprise Edition, the index view will be automatically considered by the query optimizer. To use an index view in all other versions, you must use the NoExpand prompt.

Overlay index

The overlay index is a non-aggregated index that satisfies all the columns required to meet the SQL query (in the selection conditions and WHERE predicates). Overlay index saves a lot of I / O to greatly improve the performance of the query. However, you need to balance the creation cost of the new index (and its associated B tree index structure maintenance cost) and the I / O performance benefits that override indexes. If the overlay index will greatly improve the performance of a query or a group query, and these queries are often running on the SQL Server, then it is worth creating overridden indexes.

The following example shows how to use override index:

Create Index IndexName1 on Table1 (Col2, Col1, Col3) Select Col3 from Table1 Where Col2 = 'Value'

When performing the above query, only less index page is read, so you can quickly retrieve the desired value from the base table, which is very efficient to resolve the query. Typically, if the overlay index is smaller (the number of bytes in the index is compared to the number of bytes in the single line of the table), and the query using the overlay index does execute frequently, then the overlay index is suitably used.

Index selection

The selection of the index will greatly affect the number of magnetic disk I / O generated, which greatly affects performance. The non-aggregated index is suitable for retrieving a small number of rows, while the aggregation index is suitable for interval scans. The following principles help to choose the type of index to use:

Try to streamline the index (the number of lines and the number of bytes remain minimal). This principle is especially suitable for aggregation indexing because the non-aggregated index will use a gathering index as a method of positioning line data. In the non-aggregated index, selectivity is very important. If you create a non-aggregated index on a large table of only a few unique values, use this non-aggregated index without saving a large number of I / O during the data search process. In fact, I / O generated using this index is much more likely to be much more than just I / O generated by continuous table scans. Suitable for objects that use non-aggregated indexes include invoices, unique customer numbers, social security numbers and phone numbers. For queries involved in interval scans, or when using columns to use other tables frequently, the aggregation index is better than the non-aggregated index. The reason is that the aggregation index is physically sorted to the table data, and there is a continuous 64 kb I / O on the key value. Objects suitable for use of aggregation indexes include provincial, company branches, sales dates, postal codes, and customer regions. You can only create a gathering index for a table; if a typical query is often extracted from a list of a large number of continuous intervals, the other columns of the table contain a unique value, then the aggregation index is used on the first column, in the column containing the unique value. Use a non-aggregated index. At each table attempts to select the best column to create a gathering index, the key issue to ask is: "Do you have a large query to extract a lot of rows according to the order of the column?". The answer will depend on the specific situation of each user environment. A company may conduct a large number of queries according to the date interval, and another company may conduct a large number of queries according to the bank branches. Indexed creation and parallel operation

In SQL Server 2000 Enterprise Edition and Developer Edition, the query scheme established for the creation index is allowed to perform parallel, multi-threaded index creation operations on a computer with multiple microprocessors.

SQL Server is the same as the algorithm used when determining the index operation (the total number of single threads running at the same time) is the same as the algorithm used for other Transact-SQL statements. The only difference is that Create INDEX, CREATE TABLE, or ALTER TABLE statement does not support MaxDoP query prompts. The maximum parallelism of index creation depends on the maximum parallel server configuration option, but you cannot set different maxdop values ​​for each index creation operation.

When SQL Server is established to create an index query plan, the number of parallel operations is taken as the minimum of:

The number of microprocessors or CPUs in the computer. The number specified in the Maximum Parallel Server Configuration option. The number of CPUs that have not exceeded the SQL Server thread working threshold.

For example, a computer is equipped with eight CPUs, but the maximum parallelism option is set to 6, then the parallel thread generated for creating an index will not exceed six. If the five CPUs in the computer exceed the threshold of the SQL Server work, the execution plan will only specify three parallel threads.

The main stages of parallel index creation include:

Coordinate the thread on the table for fast and random scans to estimate the distribution of the index key. Coordinating the thread establishs the key boundary, the role of the key boundary is to create multiple key intervals, the number of keys intervals is equal to parallel operation, and the number of rows included in each key interval is expected to be substantially the same. For example, if there are 4 million rows in the table, the maximum parallelism option is set to 4, and the coordination thread will determine the key value separated by four rows, each rowset contains one million lines. Coordination thread is assigned a plurality of threads according to parallel operation, and then waits for these threads to complete work. Each thread uses a filter scanning a base table, and the filter retrieves rows having a key value only within the interval assigned to the thread. Each thread establishes an index structure for rows within its key interval.

After all parallel threads are completed, the coordination thread connects the plurality of index sub-units into a single index. A separate CREATE TABLE or ALTER TABLE statement can have multiple constraints that need to create an index. Although each index creation operation can be performed on parallel on a computer with multiple CPUs, the plurality of index creation operations described herein are still sequentially.

Index maintenance

When you create an index in the database, the index information used by the query is stored in the index page. Continuous index pages are linked to each other by pointer one page. When the data affects the index is changed, the index information in the database will be disassembled. Rebuilding the index will retrieve the storage of index data (if the aggregation index, the storage of table data will be re-organized) to remove fragments. This can reduce the number of disk performance to obtain the page reads required to request data to improve. Insert an active or update modifies the search key value of the aggregated index, and the fragmentation occurs when performing a large amount of insertion activity or update. Therefore, in order to prevent split index pages and data pages, you should try to keep certain open spaces on the index page and data pages, this is important. If the index page or data page cannot be stored any new row, and due to the logical sort of the data defined in this page, you need to insert the page into the page, then the page will be split. When this happens, SQL Server needs to split the data of a whole page, moves about half of the data to a new page, so that both new and old two pages can retain certain open spaces. Because this will consume system resources and time, it is recommended not to do this.

When you initially establish an index, SQL Server attempts to place the indexed B tree structure on a physical continuous page; this can optimize I / O performance when using a continuous I / O scan index. SQL Server must assign a new 8 KB index page when a split page is split. If this happens at other locations in the hard disk, the physical continuous feature of the index page will be destroyed. This causes the execution of I / O operations from continuous switches to discontinuous, and also greatly reduces performance. The physical continuous order of the index page should be restored by reconstructing the index, which should be able to solve the problem of too many page splits. The same behavior may also occur on the leaf level of the aggregated index, thereby affecting the data page of the table.

In the system monitor, especially pay attention to "SQL Server: Access Method - Page Split / Second". The non-zero value of the counter indicates that the page splits should be used to use DBCC ShowContig for further analysis.

The dbcc showcontig command can also be used to reveal whether the table has been split on the table. The scanning density is a key indicator provided by DBCC ShowContig. This value should be as close as possible to 100%, the closer the better. If this value is much less than 100%, consider operating maintenance of indexed indexes.

DBCC Indexdefrag

An index maintenance option To use the new statement (DBCC Indexdefrag) introduced in SQL Server 2000. DBCC Indexdefrag can consolidate fragmentation of aggregation indexes and non-aggregated indexes on tables and views. DBCC Indexdefrag sorted fragments at the indexed leaf level, so the physical order of each page consistent with the logic order of the left to right, thereby improving the index scan performance.

DBCC Indexdefrag also compresses the indexed pages, which will consider the FillFactor specified when you create an index. The empty page created by the compression will be deleted.

If the index spans multiple files, DBCC Indexdefrag finishes fragmentation for a file. The index page will not migrate between files. DBCC Indexdefrag reports an expected percentage of completed every five minutes. In the execution, you can terminate DBCC Indexdefrag at any time, and all work that has been completed will be retained.

Different from DBCC DBREINDEX (or a general index establishment), DBCC Indexdefrag is online operation. It doesn't keep the lock for a long time, so it doesn't stop running inquiry or updates. In contrast, the index of the debris can be consolidated to be more fast than the establishment of new indexes, because the time required to sorting the debris is related to the amount of debris. The time for the very fragmented index may be much longer than the time of re-establishment. In addition, regardless of how the database recovery model is set, the fragmentation is always recorded (see Alter Database). The logs generated for the very fragmented index, which may even be more than the log generated by the record of the entire index creation process. However, since the debris is performed as a series of small transactions, if the log backup is often performed, or the recovery model is set to Simple, no big log is required. Also, if the two indexs are interlaced on the disk, they are not suitable for using DBCC Indexdefrag because indexdefrag will disable the location of the class. To improve the aggregation of the index page, reconstruct the index. For the same reasons, DBCC Indexdefrag cannot correct page split. It is substantially reordered for the index page assigned by the continuous order that reflects the search key. The order of the index page may become incorrect for a variety of reasons, including: disorderly data loading, excessive insertion, update, delete activity, and more.

A section code is provided in the SQL Server Books, you only need to modify this code to move a variety of index maintenance tasks. This example shows how to defragment all indexs of fragmented amounts in the database in a simple way to all indexes of declaration thresholds. For more information, see the topic "DBCC ShowContig" in the SQL Server Books.

DBCC DBREINDEX

Different according to the syntax used, DBCC DBREINDEX can only rebuild a specified index of the table, or you can rebuild all indexes of the table. Similar to the method used to remove and recreate each index, the DBCC DBREINDEX statement also has the advantages of all indexes that can rebuild tables in a statement. This is more convenient than writing a separate DROP INDEX and CREATE INDEX statement, and, you don't have to know the table structure or any specified constraint condition when you reconstruct the table or one or more indexes. In addition, the DBCC REINDEX statement is inaccurate. If you want to get the same atomicity when you write a separate DROP INDEX and CREATE INDEX statement, multiple separate commands must be included in one transaction.

DBCC DBREINDEX automatically utilizes more optimization schemes with a separate DROP INDEX and CREATE INDEX statement, especially when multiple non-aggregated indexes have aggregated indexes. DBCC DBREINDEX can also be used to rebuild the index of the Primary Key or UNIQUE constraints without having to delete and recreate constraints (because if you do not delete constraints, you cannot delete the index created for forcing the Primary Key or UNIQUE constraint). For example, you may want to re-establish a given fill factor by rebuilding the index by reconstructing the index on the Primary Key Constraint.

DROP_EXISTING

Another way to rebuild indexes or organize index fragments is: Re-create index after removing the index. By deleting old indexes, then recreate the same index to rebuild the aggregation index, this method is expensive because all secondary indexes depend on the aggregation keys to the data line. If you only delete the aggregation index, then recreate the index, you may carefully cause all reference non-set indexes to be deleted and recreated twice. Perform the first removal / recreation when the aggregation index is removed. The second removal / recreation is performed in the recreation of the aggregation index.

In order to avoid this opening, use create_index's Drop_existing clause to complete this recreate process one step. Use a step to recreate an index to tell SQL Server you have to retest existing indexes and avoid unnecessary work such as deleting and re-establishing relevant non-aggregated indexes. This approach has a clear advantage: you can use the data already sorted in advance in an existing index, so you don't need to perform data sorting. This will significantly reduce the time and cost of recreate the aggregation index. Drop Index / Create INDEX

The last way to maintain the index is to remove the index directly, and then recreate the index. This option is still widely used, and it may be the first choice for the following: familiar with this option, which can accommodate all indexed personnel recreated people. The disadvantage of using this method is that the event must be manually controlled so that the event occurs in the appropriate order. When manually removed and recreate the index, you must remove all non-aggregated indexes before removing and recreate the aggregation index. Otherwise, all non-aggregated indexes are automatically created when the aggregation index is created.

Manually created a non-aggregated index There is a advantage: Each non-aggregated index can be recreated at the same time. However, your partition strategy may affect the physical layout of the generated index. If two non-aggregated indexes are rebuilt on the same file (file group), the index page of these two indexes may be interlaced on the disk. This may disrupt the storage order of the data. If multiple files (file groups) are on different disks, you can specify a separate file (file group) to save the index after the index is created, thereby maintaining the sequential continuity of the index page.

The previously mentioned issues related to indexing on pre-sorted data are equally applicable here. The aggregated index established on the sorted data does not have to perform additional sorting steps, so that the time and processing resources required to establish an index can be greatly reduced.

FillFactor and Pad_index

The FillFactor option provides a way to specify the percentage of open space preserved on the index page and the data page. The PAD_INDEX option of Create Index applies FillFactor on the non-leaveable index page. If there is no PAD_INDEX option, FillFactor mainly affects the leaf level index page of the aggregation index. It is best to use the PAD_INDEX option and FillFactor option simultaneously.

PAD_INDEX and FILLFACTOR are used to control page split. The optimal value specified for FillFactor depends on the amount of new data on the 8 KB index page and the data page in a given time period. Remember, usually, the SQL Server index page contains far more lines than the number of rows included in the data page, because the index page contains only column data related to the index, and the data page contains the data of the entire line, this is very important.

Also, remember the frequency of the maintenance window, the maintenance window allows reconstruction of the index to correct the upcoming page split. Try to reconstruct the index only when most index pages and data pages have been filled with data. If the aggregation index of the table is selected, you will not need to rebuild the index. If the aggregation index is uniformly distributed, all the data pages related to the table-related data page are inserted into the table, then the data page will even fill evenly. Overall, this will provide more time before starting a page split point and it is necessary to reconstruct the aggregation index.

In order to determine the appropriate value for PAD_INDEX and FILLFAACTOR, you need to issue a judgment request. Before making a decision, you should consider two aspects: First, keep a lot of open space on the page, the second is the number of split pages that may occur, both of which maintain a balance in performance. If the percentage specified for FillFactor is small, it will keep a lot of open space on the index page and the data page, so that in order to answer the query, SQL Server needs to read a large number of pieces of population. For a large number of read operations, the more compressed data on the index page and the data page, the processing speed of SQL Server will be significantly accelerated. Specifies that too high FillFactor will keep the open space on each page, so that each page will overflow, resulting in page splitting. Before determining the FillFactor or PAD_INDEX value, remember that in many data warehouse environments, the number of read operations is often much more than the number of write operations. However, if data is loaded regularly, it may not be this situation. Many data warehouse administrators attempt to partition and organization of the table / index in order to accommodate the periodic data load that is expected.

According to general experience, if the expected write amount is equivalent to a large part of the read, the best method is to specify the FillFactor as high as the feasible situation, and the sufficient free space is retained on each 8 KB page to avoid frequent occurrences. Page split, at least let SQL Server can reach the next available time window required to recreate indexes. This strategy balances I / O performance (as far as possible to fill each page), and avoid page splitting (not allowing all pages overflow). If you do not write to the SQL Server database, FillFactor should be set to 100% to fill all index pages and data pages to get the best I / O performance.

SQL Server tools for analysis and optimization

This section provides an example code for loading data in the table, and later uses this segment code to explain how to use SQL event probes and SQL query analyzer analysis and optimization performance.

Sample data and workload

The following example shows how to use the SQL Server performance tool. First construct the following table:

Create Table Testtable (Nkey1 Int Idnessity, Col2 Char (300) Default 'ABC', CKEY1 CHAR (1))

Then, 20,000 row test data is loaded in this table. Data loaded to NKEY1 columns suitable for non-aggregated indexes. The data in the CKEY1 column is suitable for aggregation indexes, and the data in col2 is only to increase the size of each row to increase 300 bytes.

declare @counter intset @counter = 1while (@counter <= 4000) begininsert testtable (ckey1) values ​​( 'a') insert testtable (ckey1) values ​​( 'b') insert testtable (ckey1) values ​​( 'c') insert testtable (CKEY1) VALUES ('D') Insert TestTable (CKEY1) VALUES ('E') set @counter = @counter 1nd

The following query constitutes a database server workload:

select ckey1 from testtable where ckey1 = 'a'select nkey1 from testtable where nkey1 = 5000select ckey1, col2 from testtable where ckey1 =' a'select nkey1, col2 from testtable where nkey1 = 5000SQL Profiler

Common methods for optimizing performance are often referred to as markers and metrics. Whether the changes to improve performance doing have improved performance, the baseline or tag of the existing adverse performance is required. Metrics refers to the establishment of a number of methods to prove that performance is being improved.

The SQL event probe is a tool for tagging and metric. It can not only capture activities that occur in the server, but you can make performance analysis; and you can put it later. The playback function in SQL Server provides a useful regression test tool. With playback, you can easily determine if the operation currently taken in order to improve performance can achieve the expected effect.

Playback features can also simulate load or pressure testing. You can set multiple event probe client sessions to make them play back. For example, using this feature, administrators can easily capture five concurrent users, and then start ten playbacks, simulate system performance when 50 concurrent users. You can also track database activity and then play back this activity in a database that is being modified, or playback of the activity in the new hardware configuration that is tested.

Keep in mind that you can use the SQL Event Profiler to record activities that occur in the SQL Server database. You can configure the SQL event probe to monitor and record one or more users who perform queries for SQL Server. In addition to the SQL statement, use this tool to capture a wide variety of performance information. Some of the performance information recorded using the SQL event probe includes: I / O statistics, CPU statistics, lock request, Transact-SQL, and RPC statistics, index, and table scans, causing warnings, errors, database objects creation / Remove, establish a connection / disconnection, store procedure operation, cursor operation, and so on.

Capture Event Profiler Information Used in Archive Optimization Wizard

The combination of SQL event probes and index optimization wizards forms a very powerful tool combination that helps the database administrator to make the correct index on the table and view. The SQL Event Profiler can record the resource consumption of the query to three locations. The output can be oriented to the .TRC file, SQL Server table or monitor. After that, the index optimization wizard reads the captured data from the .TRC file or the SQL Server table. The Index Optimization Wizard analyzes the information in the captured workload and the information about the table structure, and then proposes the advice of which indexes that should be created for improved performance. With index optimization guidance,

You can automate the following tasks: Create the correct index for the database, schedule the index creation of later, generate the Transact-SQL script that can manually check and execute.

Analyze the query load needs to complete the following steps:

Set SQL Event Profiler

To select the SQL event probe on the Tools menu, launch the SQL Event Profiler from the SQL Server Enterprise Manager. Press the Ctrl n key to create a SQL event probe trace. In the Connect to the SQL Server dialog, select the server you want to connect. Select the SQLProfilertUning Template from the drop-down list box. Select Save As File or Save As Table check box. Save As Table Options will open the Connection dialog box, in which the trace information is saved to other servers other than the query query. If you want to save the tracking activity as a file and table, select these two check boxes. If you want to save as a .trc file, point to the valid target and file name. If you have run over, now run the same track again, point to an existing trace table; if this is the first time to capture the tracking activity to the table, you can also provide a new table name. Click OK. Click Run. Some (3-4) running workload

Start the SQL query analyzer from the SQL Server Enterprise Manager or Start menu. Connect to SQL Server, then set the current database to create a test table in it. Enter the following query in SQL Query Analyzer query window: select ckey1 from testtable where ckey1 = 'a'select nkey1 from testtable where nkey1 = 5000select ckey1, col2 from testtable where ckey1 =' a'select nkey1, col2 from testtable where nkey1 = 5000 Press the Ctrl E key to execute the query. This step is repeatedly executed three to four times, and workload samples are generated.

Stop SQL Event Profiler

In the SQL Event Profiler window, click the red square to stop the event probe.

Load the trace file or table to the Index Optimization Wizard

In the SQL event probe, select the Index Optimization Wizard on the Tools menu to start the wizard. Click Next. Select the database to be analyzed. Click Next. Select whether you want to keep an existing index or add an index view. Select an optimized mode (fast, moderate or thorough). In "fast" optimization mode, the index optimization wizard is less time required, but the analysis is not completely complete, the analysis generated in "thorough" mode is most thorough, but the desired analysis time is the longest. To find a trace file / table created with the SQL Event Profile, select my workload file or SQL Server trace table. Click Next. In the table dialog that you want to optimally, select the table you want to analyze, and then click Next. The Index Optimization Wizard will analyze the tracking workload and table structure, and then determine the correct index that should be created in the Index Recommendation dialog. Click Next. The wizard provides several options: Create an index immediately, schedule an index creation time (the task that will be performed automatically), or create a Transact-SQL script containing commands for creating an index. Select Preferences and click Next. Click Finish.

Index Optimized Wizard for sample databases and workloads Generated Transact-SQL

/ * Created by: index tuning wizard * /// * date: 9/6/2000 * // * time: 4:44:34 pm * // * Server name: jhmiller-as2 * // * Database name: tracedb * // * Workload File Name: C: / Documents and Settings / jhmiller / My Documents / trace.trc * / USE [TraceDB] goSET QUOTED_IDENTIFIER ON SET ARITHABORT ON SET CONCAT_NULL_YIELDS_NULL ON SET ANSI_NULLS ON SET ANSI_PADDING ON SET ANSI_WARNINGS ON SET NUMERIC_ROUNDABORT OFF goDECLARE @bErrors as bitBEGIN TRANSACTIONSET @bErrors = 0CREATE CLUSTERED INDEX [testtable1] ON [dbo]. [testtable] ([ckey1] ASC) IF (@@ error <> 0) SET @bErrors = 1CREATE NONCLUSTERED INDEX [testtable2] ON [dbo ]. [TestTable] ([Nkey1] ASC) IF (@@ Error <> 0) Set @berrors = 1IF (@berrors = 0) Commit TransactionElSerollback Transaction Sore Optimization Wizard to sample tables and data suggested indexes are what we need : Create a gathering index on CKEY1, create a non-aggregated index on NKEY1. CKEY1 is only five unique values, each value has 4000 rows. Assuming one of the sample queries (Select CKEY1, Col2 from testTable where ckey1 = 'a') needs to retrieve the table according to a value in CKEY1, so it is suitable for creating a gathering index on the CKEY1 column. The second query (Select Nkey1, Col2 from testTable where nkey1 = 5000) extracts a row according to the value of the NKEY1 column. Because Nkey1 is unique, there are 20,000 rows, so it is suitable for creating a non-aggregated index on this column.

In the actual database server environment that uses a lot of tables and to process a lot of queries, the SQL Event Profiler and Index Optimization Wizard is used, and the function will be very powerful. Using the SQL event probe using the SQL event probe logger using the SQL Event Profiler record .trc file or trace table. Subsequently, loaded into the Index Optimization Wizard to determine the correct index to establish. Follow the prompts in the Index Optimized Wizard to automate the index, or schedule an indexed creation job running in a non-peak time. You may wish to run a combination of SQL event probes and index optimization wizards regularly (perhaps once a week or monthly) to see if the query executed on the database server has changed, so it may need different index. Regular portfolio uses SQL Event Profiler and Index Optimization Wizard, which helps database administrators maintain SQL Server in the best running status in the case where the query workload changes and database grows.

Use the SQL Query Analyzer Analyzer Analysis The Information Record in the Event Profiler

After the information is recorded in the SQL Server table, you can use the SQL query analyzer to determine which query consumption resources in the system are most. In this way, the database administrator can focus on improving the queries that most needed. If you use the trace data in a table, you can easily select and filter the subset of the tracking data, thereby identifying the worst performance for optimizing performance. For example, in the above example, the duration column is the column you use the SQLProfiler Tuning template that can be used to identify a query that requires the longest execution time (in millisecond). To find the top 10% of the longest running query, you can run the following query: select top 10 percent * from [tracedb]. [Dbo]. [Trace] Order by Duration DESC

To find the first five queries for the longest running time, you can run similar to the following query:

Select Top 5 * from [tracedb]. [Dbo]. [Trace] Order by Duration Desc

To put it only in a separate table, consider using the following SELECT / INTO statement:

Select Top 10 Percent * INTO TUNINGTABLEFROM [tracedb]. [Dbo]. [Trace] Order by Duration DESC

The SQLPROFILER TUNING template mentioned earlier is only a set of pre-selected columns and filter settings for optimized recommendations. You may find that you need to capture more information. Of course, you can create your own custom optimization template, the method is: just open a pre-supplied template and save it with a different name. Many events can be captured, including I / O statistics, lock information, and more.

SQL query analyzer

SQL query analyzer is used to optimize queries. This tool provides a variety of mechanisms like "statistical information I / O" and to solve the problem of the problem of the query problem.

Statistics I / O

The SQL query analyzer provides an option that uses this option, you can get the query on the I / O consumption in the SQL query analyzer. To set this option, on the SQL query analyzer's query menu, select the current connection property to display the current connection property dialog. Select the Settings Statistics I / O check box and close the dialog. Then, execute the query and select the message tab in the result pane to view I / O statistics.

For example, when setting the STATISTICS IO option, sample data created in the "SQL Event Probe" section in the previous "SQL Event Progressor" is selected, and the following I / O information will be returned on the message tab:

Select CKey1, Col2 from testtable where ckey1 = 'a'table' TestTable'.Scan Count 1, Logical Reads 800, Physical Reads 62, Read-ahead Reads 760.

Using statistics I / O is a good way to monitor the optimization of query optimization. For example, creating an index Optimization Wizard for sample data suggested index, then runs the query again.

Select CKey1, Col2 from testtable where ckey1 = 'a'table' TestTable'.Scan Count 1, Logical Reads 164, Physical Reads 4, Read-ahead Reads 162.

Note that the number of logical reads and physical reads will be significantly reduced when the index can be used.

Implementation plan

Using a graphical execution plan can display detailed information about the operation of query optimization programs, allowing you to focus on problematic SQL queries. The expected execution plan of the query can be displayed in the "Results" pane of the SQL query analyzer, the method is: Use the Ctrl L key to perform SQL queries, or select the expected execution plan on the query menu. Each icon indicates which operations will be performed if the query optimization program is executed after the query is executed. Each arrow indicates the data flow direction of the query. Hover the mouse pointer over the operation icon, you can display more information about each operation. Each operational icon also indicates approximately cost of each operational step. With this tag, you can quickly determine which operation in the query is the most expensive.

You can also view the actual execution plan of the query, the method is to select the display execution plan on the query menu, and then execute the query. Compared to displaying the expected execution plan option, the display execution plan is displayed first, and then the actual execution plan for the query is displayed.

You can create a text version of the execution plan, and the method is to select the current connection property on the query menu and then set the showPLAN_Text option in the dialog. When performing a query, the execution plan will be displayed in the result tab.

You can also set up the execution plan option in the query, and the method is to do any of the following:

SET Showplan_all ONGOSet Showplan_Text ONGO

SET Showplan_all is used to read the application used. Return to the Microsoft MS-DOS® application (such as an OSQL utility) using SET Showplan_Text.

SET Showplan_Text and SET Showplan_all return information in the form of a set of text rows, and the layered trees formed by rows represent the steps taken when the SQL Server query processor is executing each statement. Each statement reflected in the output contains a statement text line, which describes the details of the execution step next to each other.

Example of execution plan output

These results are derived from the previously defined query example and "SET Showplan_Text On" executed in the SQL query analyzer.

Query 1

Select CKey1, Col2 from testtable where ckey1 = 'a'

Text-based implementation plan output

| --Clustered Index Seek (Object: ([TESTTABLE]. [TestTable1]), Seek: ([TestTable]. [CKEY1] = 'a') Ordered Forward)

Equivalent graphical implementation plan output

The figure below shows the graphical execution plan of Query 1.

MarginWidth = "1" marginheight = "0" src = "/ china / tech / images / prodtechnol / SQL / 2000 / images / pg58cldx.gif" frameborder = "0" width = "95%" height = "578"> If Your browser does not support embedded frameworks, click here to view in a separate page.

Executing Plan Use the aggregation index on the ckey1 column to parse the query, just as shown in the index lookup.

If the aggregation index is removed from the table, and the same query is executed again, the query will resume the use of table scans. The following graphical execution plan indicates that this behavior changes.

Text-based implementation plan output

| - TABLE SCAN ([TRACEDB]. [DBO]. [TestTable]), where: ([TestTable]. [CKEY1] = [@ 1]))

Equivalent graphical implementation plan output

The figure below shows the graphical execution plan of Query 1.

Marginwidth = "1" marginheight = "0" src = "/ china / tech / images / prodtechnol / SQL / 2000 / images / pg59tbln.gif" frameborder = "0" width = "95%" height = "567"> If Your browser does not support embedded frameworks, click here to view in a separate page. This execution plan uses a table scan to parse query 1. To retrieve information from a small table, the most efficient method is to use a table scan. But on a big table, the table scan indicated by the execution plan is actually a warning, which means that the table requires the best index, or the statistics of the existing index need to be updated. You can update statistics on the table or index using the update statistics command. If the sync difference between the heuristic page and the base index value is too large, SQL Server will automatically update the index. For example, if you delete all the rows that contain the CKEY1 value or equal to "B" from TestTable, then you don't update the statistics first. It is best to let SQL Server automatically maintain index statistics because it helps ensure that the query is always able to use intact index statistics. If you use the ALTER DATABASE statement to set the auto_update_statistics database option to OFF, SQL Server does not automatically update statistics.

Query 2

SELECT NKEY1, Col2 from testtable where nkey1 = 5000

Text-based implementation plan output

--Bookmark Lookup (Bookmark: ([BMK1000]), Object: ([TESTTABLE])) | --index Seek (Object: ([tracedb]. [Dbo]. [TestTable]. [TestTable2]), seek: ([TestTable]. [Nkey1] = Convert ([@ 1])) Ordered Forward)

Equivalent graphical implementation plan output

The following two figures show the graphical execution plan of the query 2.

MarginWidth = "1" marginheight = "0" src = "/ china / tech / images / prodtechnol / SQL / 2000 / images / pg60nclDx.gif" frameborder = "0" width = "95%" height = "558"> If Your browser does not support embedded frameworks, click here to view in a separate page.

Marginwidth = "1" marginheight = "0" src = "/ china / tech / images / prodtechnol / SQL / 2000 / images / pg61bkmk.gif" frameborder = "0" width = "95%" height = "558"> If Your browser does not support embedded frameworks, click here to view in a separate page.

The execution of the query 2 plans to use a non-aggregated index on the NKEY1 column. This is indicated by Index Seek operations on the NKEY1 column. Bookmark Lookup Operation indicates that SQL Server needs to perform pointer jumps, jump from the index page of the table to the data page to retrieve the requested data. The reason for the execution pointer jump is that the query requires the Col2 column, and the column is not included in the aggregated index.

Query 3

Select Nkey1 from testtable where nkey1 = 5000 based on text-based execution plan output

| --Index seeker ([TRACEDB]. [DBO]. [TestTable]. [TestTable2]), seek: ([TestTable]. [Nkey1] = Convert ([@ 1])) Ordered Forward

Equivalent graphical implementation plan output

The figure below shows the graphical execution plan of the query 3.

MarginWidth = "1" marginheight = "0" src = "/ china / tech / images / prodtechnol / SQL / 2000 / images / pg62cvry.gif" frameborder = "0" width = "95%" height = "574"> If Your browser does not support embedded frameworks, click here to view in a separate page.

The execution plan of the query 3 plans to use the non-aggregated index on NKEY1 as an overlay index. Note that this query does not need to perform Bookmark Lookup operations. The reason is that all information required for this query (SELECT and WHERE clauses) are provided by non-aggregated indexes. That is to say, there is no need to have a pointer jump in the data page in the non-aggregated index page. I / O decreases compared to the case where the bookmark lookup is required.

System monitoring

The system monitor provides a large number of information about Windows and SQL Server operations that occur during the execution of the database server.

In graphics mode of the system monitor, please pay attention to the maximum and minimum. Because excessive and too small data points will make the average distortion, it must be careful about the situation of excessive emphasis on the average. The shape of the graphics is studied and compared to the minimum and maximum value to accurately understand the behavior. Use the Backspace key to highlight the counter with a white line.

You can use the System Monitor to log all available Windows and SQL Server System Monitor Objects / Counters in the log file, while simultaneously view the system monitor (chart mode). The setting of the sampling interval determines the speed of the log file. The log file may soon become large (for example, if all counters are turned on, the sampling interval is 15 seconds, and the log file can reach a 100 megabyte within 1 hour). The test server preferably has enough free gigabytes to store these types of files. However, if retained space is important to you, try to use a longer log interval to avoid sampling the system over frequently frequently. Try 30 or 60 seconds. In this way, the system monitor re-samples all counters with reasonable frequencies while maintaining a smaller log file size.

System Monitor also consumes a small amount of CPU resources and disk I / O resources. If there is no extra spare disk I / O and / or CPU, consider running the system monitor from another computer and then monitor SQL Server over the network. When monitoring through network, use only graphics mode. Compared to information transmitted by local area network, the efficiency of SQL Server local record performance monitoring information is often higher. If you must record log information over the network, you can only record the most important counter information.

During performance test run, information record all available counters into a file for later analysis, this is not a good practice. Thus, for any counter, it can be further checked in the future. You can configure a system monitor to record all counters into a log file, while monitoring the most interesting counters under other modes (such as graphic mode). In this way, all information will be recorded during performance operation, but your most attention to the counter is displayed in a clear and neat system monitor graph.

Set the system monitor session to be recorded

From the Windows 2000 Start menu, point to Programs, Administrative Tools, and then click Performance to open System Monitor. Double-click Performance Logs and Alerts, then click Counter Log. Existing logs are listed in the detailed information pane. The green icon represents the log is running; the red icon indicates that the log has been stopped. Right-click on the blank area of ​​the details pane, then click New Log Settings. Type the name of the log in the name, then click OK. On the General tab, click Add. Select the counter to be recorded. You can determine the SQL Server counter to be monitored during the session. If you want to change the default file, make changes on the log file tab. The recorded session can be set to automatically run in a predefined time period. To do this, modify the schedule information on the Scheduling tab.

Note To save the counter settings for the log file, use the right-click on the file in the details pane, and then click Save the settings as. Then, specify the .htm file used to save these settings. To reuse the saved settings in the new log, use the right-click Details pane and click the new log setting from.

Start a recorded monitoring session

From the Windows 2000 Start menu, point to Programs, Administrative Tools, and then select Performance to open the System Monitor. Double-click Performance Logs and Alerts, then click Counter Log. Right-click the counter log you want to run and select Start. Existing logs are listed in the detailed information pane. The green icon represents the log is running; the red icon indicates that the log has been stopped.

Stop recorded monitoring sessions

From the Windows 2000 Start menu, point to Programs, Administrative Tools, and then select Performance to open the System Monitor. Double-click Performance Logs and Alerts, then click Counter Log. Right-click the counter log you want to run and select Stop.

Load data for system monitoring from the recorded monitoring session

From the Windows 2000 Start menu, point to Programs, Administrative Tools, and then select Performance to open the System Monitor. Click System Monitor. Right-click the Details pane of the system monitor and click Properties. Click the Source tab. Under the data source, click the log file, then type the file path, or click Browse to find the required log file. Click the time interval. To specify the time interval you want to view in the log file, drag the slider or slider, set the corresponding start and end time. Click the Data tab, then click Add, open the Add Counter dialog box. The counter you selected during the log configuration will be displayed. You can include all of these counters or some of them in the graphics.

How to make the system monitor records related to the past

From the System Monitor session, right-click the details pane of the System Monitor and click Properties. The time interval and sliding strips allow you to set the beginning, current and end time you want to view in the graph.

Key performance counters that need to be monitored

Several performance counters provide information about the following important aspects: memory, paging, processor, I / O, and disk activity.

Monitor memory

By default, SQL Server changes its memory requirements based on available system resources. If SQL Server requires more memory, it will query the operating system to determine if there is available idle physical memory and use the available memory. If SQL Server does not currently need to assign to its memory, it will release memory to the operating system. However, dynamically use memory options are replaced by the server configuration option, which is the minimum server memory, the maximum server memory, and setting work set size. For more information, see "SQL Server Book".

To monitor the amount of memory used by SQL Server, check the following performance counters:

Process: Work Set SQL Server: Buffer Manager: Cache His Motivation SQL Server: Buffer Manager: All Page SQL Server: Memory Manager: The Total Server Memory (KB) Workset Counter Displays the amount of memory used by the process. If the number is below the amount of memory used by the SQL Server configuration (the server option minimum server memory and maximum server memory setting), the memory configured for SQL Server is more memory than it actually needs. Otherwise, use the Setup Work Set Size Server Option to adjust the size of the work set.

The cache hit rate counter is a specific application; however, this ratio is reached or more than 90%. Please increase memory until this value has reached more than 90%, which indicates that the data cache satisfies more than 90% of data requests.

If the total server memory (KB) counter value is high, more memory is required than the physical memory amount in the computer.

Mandatory paging

If Memory: Page / Second is greater than zero or memory: Page read / second is greater than five, explaining that Windows is using disk to resolve memory references (forced paging errors). It consumes disk I / O CPU resources. Memory: Page / second clearly indicates the amount of pages being executed in Windows, and whether the current RAM configuration of the database server is enough. The forced paging information in the system monitor has a subset record that Windows must read the number / seconds of the paging file in order to resolve the memory reference, which is represented by memory: page reading / second. If memory: page read / second is greater than 5, it is unfavorable.

In order to avoid paging, SQL Server automatic memory optimization will try dynamically adjust the use of SQL Server to memory. Reading a small number of pages per second is normal, but if you are too much page, you need to take corrections.

If SQL Server automatically optimizes memory, you can choose to add more RAM or remove other applications from the Database Server to help memory: page / second reaches reasonable level.

If SQL Server Memory is manually configured on the database server, you need to reduce the memory specified to SQL Server, remove other applications from the database server, or add more RAM to the database server.

Keep memory: Page / second is zero or close to zero, which is advantageous for database server performance. That is to say, Windows and all its applications (including SQL Server) do not transfer to the paging file to satisfy any data in the memory request, so the RAM on the server is sufficient. Page / second is more acceptable, but remember that each time you retrieve data from paging files (rather than RAM), it will suffer relatively high performance penalties (disk I / O).

Memory between all drives related to Windows Pedestrian files: page input / second and logical disk: Disk Read / Second, and Memory: Page Output / Seconds and Logical Disks: Disk write / seconds are useful, because By they can know the magnetic disk I / O quantity related to the paging rather than other applications (ie SQL Server). Another convenient way to isolate paging file I / O activities is to ensure that paging files are not on the same set of drives with all other SQL Server files. Separating the paging file with the SQL Server file is also advantageous for disk I / O performance because it allows disk I / O associated with paging and execution of disk I / O associated with SQL Server.

Soft page

If Memory: Paging Error / Second is greater than zero, you will show that Windows is paged, but there are both forced paging in the counter, and a soft page. We have discussed mandatory paging in the previous section. The soft page represents the memory page requested on the database server is still within RAM, but is already outside the WINDOWS work collection. Memory: Paging Error / Second helps get the amount of soft pages being happening. There is no "soft page error / second" counter in the system. You can use this formula to calculate the number of flexible page error per second: Memory: Page / Second - Memory: Page / Second = Soft Page Error / Second To determine if it causes too much paging whether it is SQL Server instead of other processes Monit the process of the SQL Server process: Page Error / Second counter, and note whether the page error / second of the relevant SQLServr.exe instance is close to the memory: page / second value.

Compared to hard pane errors, the adverse impact of soft page miscarriers is usually small because they consume CPU resources. Hard partial error consumption Disk I / O resources. The best environment for obtaining good performance is to eliminate any type of error.

Note that when SQL Server accesss all of its data cache pages, the first access to each page causes a soft page error. You don't have to worry about the initial soft page error when SQL Server starts and the first time you use a data cache.

Monitor processor

Your goal should be: Use all the processors allocated to the server to get the best performance as much as possible, while avoiding the processor bottleneck due to too busy. The challenge of performance optimization is that if the CPU is not a bottleneck, there are always other things that are bottleneck (it is probably a disk subsystem), which is wasting CPU capacity. Typically, the CPU is the most difficult resource (except for certain levels, for example, many of the 4 CPUs or 8 CPUs on the latest systems), so if the CPU usage on the busy system exceeds 95%, indicating that the system is working well. . At the same time, you should monitor the response time of the transaction to ensure reasonable response; if the response time is unreasonable, the CPU usage exceeds 95%, it may indicate too much workload, you either add CPU resources, or decrease Or optimize the workload.

Check the system monitor counter processor:% processor time to ensure that processor usage on each CPU has been less than 95%. System: The processor queue length is a processor queue for all CPUs on a Windows system. If each CPU system: The processor queue length is greater than two, then the CPU bottleneck has occurred. When you detect the CPU bottleneck, you need to add a processor to the server or reduce the workload on your system. The way to reduce the workload is to reduce the CPU usage by optimizing queries or improving claims.

Another system monitor counter that needs to be monitored when suspecting the CPU bottle neck is system: context switch / second, because it indicates that Windows and SQL Server must switch to the frequency over another thread to another thread ( Number / second). It consumes a CPU resource. Context switch is a multi-thread, multiprocessor environment, but too much context switching reduces system performance. The method is to follow the context switch when there is a processor queue.

If the processor queue is observed, the context switch level is used as the scale of SQL Server performance optimization. If it appears in context switch, you can consider two methods: use the relationship mask option to use the fiber-based scheduling.

Use the relationship mask option to improve the performance of the symmetric multiprocessor (SMP) system (number of microprocessors exceeding four) under heavy load. You can make the thread associated with a specific processor and specify the processor that SQL Server will use. You can also use the relationship mask option to block SQL Server activities from using certain processors. Before changing the setting of the relationship mask, remember that Windows assigns the associated delay process call (DPC) activity of the NIC to the highest number of processors in the system. In the installation and activation of multiple NIC systems, each additional card is added to the next number up to the highest processor. For example, an eight-processor system for installing two NICs assigns each NIC's DPC to processor 7 and processor 6 (counting from 0). When using a lightweight pool option, SQL Server switches to the fiber-based scheduling mode, not the default thread-based scheduling mode. The fiber is essentially a lightweight thread. Use the command sp_configure 'LightWeight Pooling', 1 can enable the fiber-based scheduling. By monitoring the processor queue and context switching, you can monitor the effect of setting the relationship mask and the value of the lightweight pool. In some cases, these settings do not improve performance, but will reduce performance. In addition, they generally do not bring great benefits unless there are four or more processors in the system. DBCC SQLPERF (Threads) provides more information about I / O, memory, and CPU usage. Execute the following SQL query to investigate the current most consumer of CPU time:

Select * from master.sysprocesses Order by CPU DESC

Monitor processor queue length

If the system: The processor queue length is greater than the second, indicating that the work request received by the server is more than they can collectively process in a group. Therefore, Windows needs to place these requests in the queue.

Some processor queues describe the overall I / O performance of SQL Server. If there is no processor queue, and the CPU usage is low, the performance bottleneck may appear somewhere in the system, and most likely to disk subsystems. There is a reasonable work request in the processor queue, which means that the CPU is not idle, and the rest of the system is synchronized with the CPU.

According to general experience, the ideal number of processor queues is the number of CPUs in the database server.

If the number of processor queues is significantly higher than this value, it may indicate that the server encounters CPU bottlenecks and you need to investigate. Excessive processor queues consume query execution time. Multiple different activities may result in a processor queue. Eliminate the mandatory paging and soft page helps save CPU resources. Other methods that help reduce processor queues include optimizing SQL queries, selecting a better index to reduce disk I / O (thus reducing CPU usage), add more CPUs (processors) in the system.

Monitor I / O

Disk write byte / sec and disk read byte / secure tables indicate data throughput of the disk, with a number of bytes per second per second. Please carefully compare these numbers with disk read / second and disk write / second balance. Don't see a lower byte / second, I believe that the disk I / O subsystem is not busy.

Monitor the disk queue length of all drives related to the SQL Server file, then determine which files are related to too many disk queues.

If the system monitor indicates that some drives are not busy with other drives, the SQL Server file can be moved from the bottleneck driver to a not busy drive. This helps to distribute the disk I / O activity more evenly to each hard drive. If a large drive pool is used for SQL Server files, the disk queue has a solution to add more physical drives to the pool, thereby increasing the I / O capacity of the driver pool.

Disk queues may indicate that the number of I / O requests in a SCSI channel has reached saturation. System Monitor cannot directly determine if this is true. Memory vendors typically provide tools to help monitor the number of I / O served by the RAID controller, and whether the controller is queued in the I / O request. If many disk drives (ten or more) are connected to the SCSI channel, and all drives perform I / O at full speeds, this is more likely to happen. The solution to this situation is to connect half of the disk drive to another SCSI channel or RAID controller to balance the I / O. Typically, rebalance the drive between SCSI channels to rebuild the RAID array and complete backup / restore SQL Server database files. Disk time percentage

In system monitors, physical disks:% disk time and logical disk:% disk time counter monitors disk due to the time percentage of busy states due to read / write activities. If the% disk time counter is high (more than 90%), check the current disk queue length counter to see how many system requests are waiting for disk access. The number of requests to wait for I / O should always be 1.5 to 2 times that of the number of axes constituting the physical disk. Most disks have only one axis, but the non-expensive disk redundant array (RAID) device typically has multiple axes. The hardware RAID device is displayed as a physical disk in the system monitor; the RAID device created by software is displayed as multiple instances.

Disk queue length

Disk queues for monitoring too long are an important task.

To monitor disk queue lengths, you need to observe multiple system monitor disk counters. To enable these counters, run the DiskPerf -y command from the Windows 2000 or Windows NT command window, and then restart your computer.

The physical hard drive of the disk queue will block the disk I / O request while compensating for the I / O process. The SQL Server response time on these drives is not as good as before. This action consumes the query execution time.

If you use RAID, in order to calculate the disk queue of each physical drive, you need to know how many physical hard drives are regarded as a drive array of individual physical drives as a single physical drive. In order to understand the specific way each physical drive saved SQL Server data, as well as SQL Server data distributed to each SCSI channel, consult hardware experts to explain SCSI channels and physical drives.

View the selection of disk queues through the System Monitor. The logical disk counter is associated with a logical drive letter allocated by the Disk Manager, and the physical disk counter is considered to be a physical disk device content related to the disk manager. Note that the disk manager is considered a physical device drive that may be a hard drive, or a RAID array containing multiple hard drives. The current disk queue length is an immediate metric for disk queues, and the average disk queue length is the average of the disk queue metrics during the sampling period. If you indicate any of the following conditions, please pay attention:

Logical disk: average disk queue length> 2 Physical disk: average disk queue length> 2 logic disk: Current disk queue length> 2 Physical disk: Current Disk Queue Length> 2

These recommendations are suitable for each physical hard drive. If the RAID array is related to the disk queue, you need to divide the metric divided by the number of physical hard drives in the RAID array to determine the disk queue of each physical hard drive.

Note that the disk queue is not a useful metric method on the physical hard drive or RAID array of SQL Server log files, because the log manager does not queue multiple I / O requests for SQL Server log files.

Understand the insider of SQL Server technology

Learn about some technological insideracts of SQL Server 2000 help you manage the performance of your database. Work thread

SQL Server maintains a Windows thread pool that the role of these threads is to serve the SQL Server command submitted to the database server. The setting of the sp_configure option maximum working thread specifies the total number of threads that can provide services for all incoming command batch services (called working threads in SQL Server "). If the number of connections to the command batch is greater than the specified maximum number of work threads, the work thread will be shared between the connections of the command batch. Many installations are suitable for use by default 255. Please note that most of the more time is waiting to receive the command batch from the client.

The task written from the 8 KB dirty page from the SQL Server buffer cache is mainly completed by the working thread. For optimal performance, the working thread will schedule their I / O operations asynchronously.

Inert writer

The inert writer is the SQL Server system process running within the buffer manager. The inert writer refreshes the dirty old buffer (you must first write the changes contained in these buffers to disk, which can then reuse the buffer to reload the other different pages), and then provide them to the user process. This activity helps to generate and maintain available idle buffers, which are 8 KB, without any data, can reuse data cache pages. When the inert writer refreshes each 8 KB buffer to the disk, the ID of the cache page will be initialized so that other data can be written to the idle buffer. The inert writer works less when the disk I / O is small, thereby minimizing the activity on other SQL Server operations.

SQL Server automatically configures and manages airline buffer levels. Performance Counter SQL Server: Buffer Manager: Inert Writing / Second indicates the number of physical written to the 8 KB page of the disk. Monitor SQL Server: Buffer Manager: Available Pages, see if this value drops. The optimum state is: The inert writer keeps the counter level between all SQL Server operations, which means that the inert writer is synchronized with the user's demand for idle buffering. If the system monitor object SQL Server: Buffer Manager: The value of the available page reaches zero, indicating that the user load sometimes requires a higher level of free buffer, and the inert writer cannot provide this horizontal idle buffer.

If the inert writer is difficult to keep the idle buffer stably or at least in length, the disk subsystem may not provide sufficient disk I / O performance. To prove whether it is true, please compare the drop of idle buffer levels and disk queues. The solution is to add more physical disk drives to the database server disk subsystem to increase disk I / O processing capabilities.

Monitor the current disk queue level in the System Monitor, the method is to view the performance counters of logical disks or physical disks or the length of the current disk queue, ensuring that the disk queue of each physical drive associated with any SQL Server activity is less than 2 . For database servers using hardware RAID controllers and disk arrays, remember the number reported with the "Logical / Physical Disk" counter divided by the logical drive letter or physical hard drive disc (report on disk manager). The actual hard disk is quantified because Windows and SQL Server do not know the actual number of physical hard drives connected to the RAID controller. In order to correctly explain the number of disk queues reported by the system monitor, be sure to know the number of drivers associated with the RAID array controller.

For more information, see "SQL Server Book".

checking point

Each instance of SQL Server requires periodic to make sure all dirty logs and data pages are refreshed to disk. This is called checkpoints. When restarting an instance of SQL Server, use checkpoints can reduce the time and resources required to recover from the fault. During the checkpoint, the dirty page (the buffer cache page that has been modified after entering the buffer cache) will be written to the SQL Server data file. The buffer written to the disk at the checkpoint still contains the data page, and the user can read or update the page without having to re-read from the disk, which is different from the idle buffer created by the inert writer. Checkpoint logic attempts to let the work thread and the inert writer are responsible for most of the dirty pages to work. To this end, if possible, checkpoint logic is attempted to wait for an additional checkpoint before writing a dirty page. In this way, work threads and inert writers have more time to write torters. In some cases, checkpoint logic requires additional a few more time before writing a dirty page, for details on these circumstances, see the topic "Configuration of Checkpoints and Logs" in SQL Server Books. The focus of keeping the checkpoint logic will try to balance the SQL Server disk I / O activity over a longer period of time by waiting for additional checkpoints.

When there is a large number of data pages to refresh from the cache to the disk, in order to make the checkpoint operation more efficient, SQL Server will be sorted in the order in which they have appeared on the disk. This helps minimize the transfer of the disk in the cache refresh process and use a continuous disk I / O in a possible situation. Checkpoint processes also submits 8 KB Disk I / O requests to disk subsystems. In this way, SQL Server can complete the submission for the desired disk I / O request, because the checkpoint process does not have to wait for the disk subsystem to send back the report that has been actually written to the disk.

The important point is to monitor the disk queue on the hard drive associated with the SQL Server data file, determine if the I / O request currently transmitted by SQL Server exceeds the actual processing capability of the disk; if the situation is true, it is necessary to increase the disk I of the disk subsystem. / O capabilities make it handle loads.

Log Manager

Like all other mainstream RDBMS products, SQL Server can ensure that all write activities implemented on the database when interrupting SQL Server online status (for example, power off, disk drives are faulty, data center fire, etc.) (Insert, update, and delete) will not be lost. The SQL Server logging process helps ensure recoverable. The log manager must receive a signal from the disk subsystem before completing any implicit (single SQL query) or explicit transaction (the defined transaction that is defined as a BEGIN TRAN / COMMIT or ROLLBACK command sequence). All data changes have successfully written related log files. This rule ensures that if SQL Server is suddenly shut down for some reason, checkpoints and inert writers have notned transactions to the data cache to data files, then SQL Server can read and re-started after startup. Application transaction log. Recovery means reading a transaction log after the server is downtown and applies a transaction to SQL Server data.

Since the SQL Server must wait for the disk subsystem to complete I / O of the SQL Server log file, the disk containing the SQL Server log file should have enough disk I / O processing capability to withstand the expected transaction load. ,this point is very important.

The monitoring method of the relevant disk queue of the SQL Server log file is different from the monitoring method of the relevant disk queue of the SQL Server database file. Please use the system monitor counter SQL Server: Database : Log Refresh Waiting Time and SQL Server: Database : Log Refresh Waiting / Second to view log write to the disk subsystem Manager request. Controller performance with cache functionality is the highest, unless the controller ensures that the data is ultimately written to disk, even in the power failure, it can also be used to include a disk of the log file. For more information on controllers with cache functions, see the "Effect of Hardware RAID Controllerboard Cache" in this chapter.

Pre-reading management

SQL Server 2000 provides automatic management capabilities for events such as large-scale continuous reading table scans. Read-up management is completely configured and self-optimized, and

The operation of the SQL Server query processor is closely combined. Pre-reading management is used for large-scale scanning, large symbolic scans, detection aggregation indexes, and non-concentrated index secondary trees, and other situations. The reason is that the pre-reading is 64 kb I / O, compared to 8 kb I / O, 64 KB I / O enables the disk subsystem to reach a larger disk throughput. If you need to retrieve a large amount of data, SQL Server uses pre-read to get the maximum throughput.

SQL Server uses a simple and valid index allocation map (IAM) storage structure that supports pre-read management. IAM is a mechanism for recording the expansion panel location, that is, each 64 kb extension panel contains eight page data or index information. Each IAM page is an 8-kb page containing tight package (bitmap) information indicating which extensions contain the required data. The compression characteristics of the IAM page have accelerated their read speed, and the IAM page that is often used can also be retained in the buffer cache.

Pre-reading management can be combined with the query information from the query processor with the location information that needs to be read from the IAM page to constitute a plurality of consecutive read requests. Continuous 64 KB disk reads provide excellent disk I / O performance. SQL Server: Buffer Manager: Pre-read page / second performance counter provides information about the validity and efficiency of pre-read management.

SQL Server 2000 Enterprise Edition Dynamically Adjusts the maximum number of pre-readings based on existing memory. This value is fixed in all other versions of SQL Server 2000. Another improvement of the SQL Server 2000 Enterprise Edition is the usual "rotating Trojan Scan" that allows multiple tasks to share the entire table scan. If the execution plan of the SQL statement requests the data page in the table, and the relational database engine detects that the table has been scanned to another execution plan, the database engine adds the second scan to the first time the current location of the second scan. One scan. The database engine reads a page each time and passes all the rows of each page to both executive plans. This will continue until the end of the table. At this point, the first execution plan has a complete scan result, but the second execution plan must still retrieve the data pages that occur before it is being processed. Then, the scan of the second execution plan will return to the first data page of the table, and scan it forward to the position of the first scan. In this way, any number of scans can be combined; the database engine will always loop between all data pages until all scans are completed.

It is necessary to pay attention to pre-read management, that is too much pre-read, will be unfavorable to overall performance because it fills in the cache, which occupies the I / O and CPUs used in other purposes. For this, you can only resolve through general performance optimization: Optimize all SQL queries, minimize the number of pages that enter the buffer cache. It includes ensuring the correct index and uses these indexes. Using aggregated indexes can get effective interval scans, define non-aggregated indexes to help quickly locate a single line or smaller rowset. For example, if you are ready to create only an index in the table, and the index will be used to extract a single line or smaller rowset, the index is to set the index. From the surface, the aggregation index is fast than the non-aggregated index. Other performance themes

Database design using star architecture and snowflake architecture

Data warehouses use dimension modeling to organize data for analysis. The dimension modeling generates a star architecture and a snowflake structure, which brings performance efficiency to a large amount of data read operations that are often performed in the data warehouse. A large amount of data (usually thousands of rows) is stored in the fact data table, and each line in the table is short, which makes the storage requirements and query time minus. The attributes of business fact data are absent into dimension tables to minimize the number of tables when retrieving data.

For discussion on database design on data warehouses, see Chapter 17, "Data Warehouse Design Precautions".

Using an equivalent operator in the Transact-SQL query

Using a non-equivalent operator in the SQL query will force the database to use the table scan to take the value of non-equivalent objects. If you often run these queries, you will generate high I / O if you often run these queries. The WHERE clause containing "NOT" operators (! =, <>,! <,! = Some_value) will generate high I / O.

If you need to run this class, try changing the structure of the query, eliminate the NOT keyword. E.g:

Do not use:

Select * from Tablea Where Col1! = "Value"

Try to use:

Select * from Tablea Where Col1 <"Value" and col1> "Value"

Reduce line sets and communication overhead

Use the Microsoft ActiveX? Data Object (ADO), Remote Data Object (RDO), and Data Access Object (DAO) Database API and other ease interface SQL database programmakers need to consider the result set they generated.

ADO, RDO and DAO provide programmers with an excellent database development interface, and programmers can achieve rich SQL rowpoint features even without too much SQL programming experience. If the programmer carefully considers their application to return to the client's data, and track the location of the SQL Server index and the way SQL Server data can be avoided, performance issues can be avoided. SQL event probe, index optimized wizards, and graphical implementation plans are very useful tools that help programmers accurately locate and fix problems.

When using the cursor logic, select the cursor that best suits your processing type. Different types of cursor expenses are also different. You should understand what type of operation (read-only, forward processing, etc.) to be performed, then select the corresponding cursor type.

Look for various opportunities to reduce the size of the returned result set, including eliminating the columns that do not need to be returned in the selection list, only returns the required row. This helps reduce I / O and CPU consumption.

Use multiple statements

By performing processing on the database, you can reduce the size of the result set and avoid unnecessary network communication between the client and the database server. To perform processing that cannot be performed with a single Transact-SQL statement, SQL Server allows you to combine multiple Transact-SQL statements below. Group Method Description Batch Batch is a set of Transact-SQL statements sent from the application to the server in the form of a unit, which can contain one or more statements. SQL Server treats it as a single executable when performing each batch. The stored procedure stored procedure is a set of Transact-SQL statements that are predefined and precompiled on the server. The stored procedure can accept parameters, return result sets, return code, and output parameters to call applications. The trigger trigger is a special stored procedure. It is not called directly by the application, but is performed whenever the user performs a specified modification (INSERT, UPDATE or DELETE). The script script is a set of Transact-SQL statements stored in the file. This file can be used as an OSQL utility or an input of the SQL query analyzer. These utilities perform the Transact-SQL statements stored in the file.

The following SQL Server feature allows you to control at the same time using multiple Transact-SQL statements.

Function Description Control Streaming statements allow you to include conditional logic. For example, if the country is Canada, execute a group of Transact-SQL statements. If the country is a UK, another group of Transact-SQL statements are executed. Variables allow you to store data, used as input in a later Transact-SQL statement. For example, you can write such a query: Each time you execute the query, you need to specify a different data value in the WHERE clause. You can use variables in the WHERE clause when writing the query, and write the corresponding logic to populate the variable using the correct data. The parameter of the stored procedure is a special variable. Error handling allows you to customize how to respond to SQL Server response. You can specify a corresponding action taken when an error occurs, or display a custom error message for users more useful than a general SQL Server error message.

Reuse implementation plan

If SQL Server can use the existing execution plan of previous queries, performance can be improved. It is necessary to promote SQL Server reuse the implementation plan, and developers can do a lot. The Transact-SQL statement should be written according to the following principles.

Use the fully qualified name of the object, such as tables, and views. For example, please don't write a SELECT statement like this: select * from shippers where shipperiD = 3 should use the SQLBINDPARAMETER ODBC function (for example, using ODBC): select * from northwind.dbo.shippers where shipperid = 3 Use parameterized queries, and provide Parameter values, not specify the stored procedure parameter value or specify the value directly in the search criteria. Use the parameters in sp_executesql or use the parameters binding of the ADO, OLE DB, ODBC, and DBRARY API. For example, please don't write a SELECT statement like this: select * from northwind.dbo.shippers where shipperid = 3 should use the SQLBINDPARAMETER ODBC function (to use ODBC as an example), bind parameter tag (?) To program variables, and press below such write SELECT statement: SELECT * FROM Northwind.dbo.Shippers WHERE ShipperID = in Transact-SQL scripts, stored procedures or triggers, use sp_executesql to execute a SELECT statement:? DECLARE @IntVariable INTDECLARE @SQLString NVARCHAR (500) DECLARE @ParmDefinition NVARCHAR (500) / * Build the SQL string. * / SET @SQLString = N'SELECT * FROM Northwind.dbo.Shippers WHERE ShipperID = @ShipID '/ * Specify the parameter format once. * / SET @ParmDefinition = N' @ ShipID Int '/ * execute the string. * / set @intvariable = 3execute sp_executesql @sqlstring, @ parmdefinition, @ shipid = @intvariable If you want to avoid the opening of a separate stored procedure, you can use sp_executesql. Multiple batch reuse

If multiple concurrent applications will perform the same batch with a known parameter, the batch is implemented as a stored procedure that will be called by these applications.

When the ADO, OLE DB, or ODBC applications will perform the same batch multiple times, use the Prepare / Execute model that executes the batch. Use the parameter tags that bind to program variables to provide the desired input values, for example, expressions used in Update Values ​​clauses or search criteria predicates.

Statistics in maintenance columns

SQL Server allows you to create statistics related to the distribution of a column value, even if the column is not part of an index. The query processor can use this statistics to determine the best policies for evaluating queries. When you create an index, SQL Server automatically stores statistics related to the distribution of values ​​in the index column. In addition to the index column, if the auto_create_statistics database option is set to ON (default setting), as long as you use a column in the predicate, the SQL Server will automatically create the statistics of the column. With the changes in data in columns, the statistics of indexes and columns will be outdated, resulting in a query optimizer to do what to handle queries is not as good as before. As the data is changed in the table, SQL Server will automatically update this statistics. Sampling is randomly carried out in the data page and is sampled from the table required by the statistics or the minimum non-set index on the column. After reading the data page from the disk, all rows on the data page are used to update the statistics. Updating the frequency of the statistics from the data volume in the column or index, and the amount of data that has changed.

For example, a table contains 10,000 rows, if the 1,000 index value changes, it may be necessary to update the statistics of the table, because this 1,000 values ​​may represent a large part of the data in the table. However, for the table containing 10 million index entries, there is no much relationship between 1000 index values, there may not be automatically updated automatically. However, SQL Server always makes sampling minimum number of rows; always collects statistics by completely scanning tables with less than 8 MB.

Note When using the SQL query analyzer in a graphical manner to display the execution plan of the query, the statistics are expressed in the form of the show (red text display with the table name), indicating the statistics outdated or missing statistics. In addition, use the SQL event probe to monitor missing column statistics event classes, you can find when statistics are missing.

By using the sp_createstats system stored procedure, you can use a single statement to easily create statistics on all suitable columns in all user tables in the current database. Columns that are not suitable for creating statistics include: Uncertain or inaccurate computational columns, or data types for image, text, and ntext.

If you manually create a statistics, you can create statistics that contain multiple column densities (average combination repetitions). For example, a query contains the following subsee:

WHERE A = 7 and b = 9

At the same time, the manual statistics can be created on two columns (A and B) to make SQL Server to better estimate queries because statistics also include the average of non-repeat values ​​of the A and B columns. In this way, SQL Server can use the index established on COL1 (preferably for aggregation indexing) without having to perform table scans. For information on how to create column statistics, see the topic "Creating Statistics" in the SQL Server Books.

Find more information

"SQL Server Book" provides information about SQL Server structures and database optimization, and also provides complete command syntax and management documentation. "SQL Server Book" can be installed from the SQL Server installation media to any SQL Server client or server computer. For the latest information about Microsoft SQL Server, including technical papers for SQL Server, visit the following Microsoft SQL Server Web site:

转载请注明原文地址:https://www.9cbs.com/read-7465.html

New Post(0)
CopyRight © 2020 All Rights Reserved
Processed: 0.052, SQL: 9