RAID disk array technology detailed introduction

xiaoxiao2021-03-06  35

RAID disk array technology detailed introduction

With the increasing popularity of computer applications, people have gradually increased computing speed and performance. In a complete computer system, the role of CPU and memory is important, but the performance of data storage devices performance is slowly affected by the performance of the entire system. The RAID technology to be explained herein initially applied to the servers high-end market, but with the maturity and development of the personal user market, it is constantly moving toward the low-end market, thus providing users with a hard disk speed, but also ensuring data. A good solution for security. This article will introduce the RAID technology, and hope to help the readers. Getting Started RAID is an abbreviation for English Redundant Array Of INEXPENSIVE DISKS, and Chinese is abbreviated as disk arrays. In fact, from the English intention of RAID, how much we know how RAID is a redundant array composed of multiple cheap disks. Although RAID contains multiple disks, it appears as an independent large storage device under operating system. RAID technology is divided into several different levels, which can provide different speed, security, and cost performance. When developing RAID, it is primarily based on the following imagustions, that is, a few small capacity hard drives are below a large-capacity hard disk. Although this idea has not become a reality, RAID is not very obvious in cost savings, but RAID can fully exert the advantages of many hard drives, and achieve the speed and throughput far exceeds any individual hard drives. In addition to the improvement in performance, RAID can also provide good fault tolerance, and can continue to work in any problem in any hard disk, and will not be affected by damage to the hard disk. RAID 0 We have mentioned that RAID is divided into several different levels, of which RAID 0 is the simplest form. RAID 0 can connect multiple hard drives together to form a larger storage device. The simplest RAID 0 technology is just to provide more disk space, but we can also use RAID 0 to increase the performance and throughput of the disk by setting. RAID 0 has no redundancy or error repair capabilities, but achieving costs is the lowest. The simplest implementation of RAID 0 is to create a large volume set together with several hard drives together. The connection between the disk can be implemented in the form of a hardware, or the disk driver in the operating system can be implemented in software. The illustration is as follows:

In the above configuration, we combine 4 disks together to form a separate logic drive, which is equivalent to 4 times the capacity of any individual hard drive. As shown in the color area in the figure, the data is written to each disk. When the space of a disk is exhausted, the data will be automatically written to the next disk. This type of setting is only one advantage, that is, you can increase the capacity of the disk. As for the speed, it is the same as any of the speeds of any of the disks, because it can only perform I / O operations on one of the disks in the same time. If any of the disks have failed, the entire system will be destroyed and cannot be used. In this sense, the reliability of the use of pure RAID 0 is only equivalent to 1/4 of the use of a hard disk alone (because RAID 0 is used in this example, 4 hard drives). Although we cannot change RAID 0 reliability issues, we can provide system performance by changing configuration methods. Unlike the order described above, we can write data to multiple disks over the same time by creating a zone set. Specifically, as shown:

In the above figure, the system is converted to a 4 item to the logic device, each of which corresponds to a hard disk. We can clearly see that the data written in the original order is scattered in all four hard drives by establishing a zone set. The parallel operation of the four hard drives has increased by four times the speed of disk read and write in the same time. The reasonable selection area is very important when creating a zone set. If the belt area is too large, it may meet most of the I / O operation on a disk, so that the reading and writing of the data is still limited to a few, two hard drives, and cannot fully play parallel operations. The advantages. On the other hand, if the belt region is too small, any I / O instructions may cause a large number of read and write operations, accounting for too much controller busbar. Therefore, when you create a zone set, we should carefully select the size of the belt area according to the needs of the actual application. We already know that the zone set can be allocated to all disks on all disks. If we connect all the hard drives to a controller, it may bring potential hazards. This is because when we frequently read and write, it is easy to overload the load of the controller or bus. In order to avoid the above problems, it is recommended that users can use multiple disk controllers. The schematic is as follows: This allows us to reduce the data traffic on the original controller bus. Of course, it is best to work with a dedicated disk controller for each hard disk.

RAID 1 Although RAID 0 can provide more space and better performance, the entire system is very unreliable, if there is a fault, any remedy cannot be made. Therefore, RAID 0 is generally only used in situations of data security requirements. RAID 1 and RAID 0 are distinct, and its technical focus is on how to maximize the reliability and removability of the system without affecting performance. RAID 1 is one of the highest costs in all RAID grades. Despite this, people still choose RAID 1 to save those key important data. RAID 1 is also known as disk mirror, each of which has a corresponding mirror disk. Both data written to any disk will be replicated in the mirror disk; the system can read data from any of a set of mirrored disks. Obviously, disk mirroring will definitely improve system costs. Because the space we can use is just half of the total amount of disk capacity. The figure below shows a disk image consisting of 4 hard drives, which can be used as only two hard drives used as storage spaces (pictured oblique lines as mirror parts).

RAID 1, the failure of any hard disk does not affect the normal operation of the system, and as long as there is at least one disk in any pair of mirror discs can be used, RAID 1 can even be uninterrupted when half of the number of hard disks work. When a hard disk is faded, the system ignores the hard disk, and transfers the data using the remaining mirror disk. Typically, we call the RAID system that appears hard disk faults in the downgrade mode. Although the saved data can still be used, the RAID system will no longer be reliable. If the remaining mirror disc also occurs, the entire system will crash. Therefore, we should replace damaged hard drives in time to avoid new problems. After replacing the new disk, the data in the original well must be copied to the new disk. This operation is called synchronous mirroring. Synchronous mirrored usually takes a long time, especially when the damage of the damaged hard disk is large. During the process of synchronous mirroring, the external access to the data will not be affected, but because the replication data needs to occupy a part of the bandwidth, the performance of the entire system may be reduced. Because RAID 1 is primarily displayed by second read and write, the load of the disk controller is also quite large, especially in an environment that needs frequent writing data. In order to avoid performance bottlenecks, it is necessary to use multiple disk controllers. The figure below shows the disk image using two controllers.

Use two disk controllers can not only improve performance, but also improve data security and availability. We already know that RAID 1 allows half of the number of hard drives to fail, so according to the setting mode in our above (different disk controls separately), even if there is a problem with a disk controller, the system can still use Another disk controller continues to work. In this way, some damage to the unexpected operation can be reduced to the lowest level. RAID 0 1 uses RAID 1 alone, there will be a problem like RAID 0 alone, that is, only to one disk is written to data in the same time, and all resources can not be taken out. In order to solve this problem, we can establish a zone set in the disk image. Because this configuration is integrated with the advantages of zone sets and mirrored, it is called RAID 0 1. Hot-swappable Some disk mirroring systems for high-end applications can provide hot-swappable functionality of disks. The so-called hot swap function is to allow the user to remove and replace the damaged hard drive without shutting down the system without shutting down the power. If there is no hot plug, even if the disk corruption does not cause data loss, the user still needs to temporarily turn off the system so that the hard disk can be replaced. Now, using hot-swappable technology as long as it is simple to open the connection switch or turn the handle, you can directly remove the hard disk directly, and the system can still operate normal. Check RAID 3 and RAID 5 each provide fault tolerance with the concept of verification. Simply put, we can use the verification imagination as a binary check, one can tell you whether all other character bits are correct. In the field of data communication, parity is used to determine if the data is transmitted correctly. For example, for each byte, we can simply calculate the number of digits 1 and add additional check digits in bytes. In the receipt of the data, if the number of digits 1 is odd, and the one we use is an odd check, then the byte is correct. The same is true for even the number of values. However, if the number of digits 1 is inconsistent with the parity bit, the data shows an error during the transfer process. The RAID system also uses a similar check method, and can create a check block in the disk system. Each bit in the check block is used to verify all the counterparts in other associated blocks. In the field of data communication, although the checklink can tell us whether a byte is correct, it is impossible to tell us which one has a problem. That is to say that we can detect mistakes, but can't correct mistakes. For RAID, this is far less than enough. Easy error detection is very important, but if you can't fix an error, we cannot improve the reliability of the entire system. For example, suppose we find that the 5th bit of the 10th byte in the check block is incorrect. If this check block contains the check information of another 8 blocks, then which data block is the culprit of the problem? Maybe you may want to build a check block for each data block. But this method is difficult to achieve. In fact, RAID is mainly to detect an error location with an error reporting of the disk controller and fixes. If the disk controller does not issue any "complaint" when reading data, the system will depend on the data as the correct data and continue to use. RAID 3 RAID 3 uses a simpler verification implementation, using a dedicated disk to store all check data, and create read and write operations with zone set dispersion data in the remaining disks. For example, in a RAID 3 system composed of 4 hard drives, 3 hard drives will be used to save data, and the fourth hard disk is specifically used to check. This configuration can be expressed in the form of 3 1, as shown in the figure:

In the above figure, we use the same color to represent all data blocks of the same check block, and the part of the slash marked is the check block. The calibration block and all the corresponding data blocks constitute a band area. All checkpins in the fourth hard disk include check information corresponding to the data blocks in other 3 hard disks. The success of RAID 3 is that it can provide fault tolerance but not only like RAID 1, but the overall overhead is 25% (RAID 3 1) from RAID 1. With the increase of the number of disk used, cost overhead will get smaller and smaller. For example, if we use 7 hard drives, the total overhead will go to 12.5% ​​(1/7). In different cases, the complexity of the RAID 3 read and write operation is different. The easiest way is to read data from a well in a well RAID 3 system. At this time, it is only necessary to find the corresponding data block in the data storage disk, and no additional system overhead will not increase. When writing data to RAID 3, the situation will become more complicated. Even if we just write a data block to a disk, you must calculate the verification value of all data blocks with the data block in the same area, and rewrite the new value into the check block. For example, when we write data to the green data block in the figure, we must recalculate all the verification values ​​of all three green data blocks, and then rewrite the green check block located on the fourth hard disk. Thus, we can see that a write operation is in fact containing data reads (associated data blocks in the belt area), check value calculations, data block write, and check blocks write four processes. System overhead is greatly increased. We can make the RAID system simplified by appropriately setting the size of the zone. If the length of a write operation is equal to the size of a full band area (all band writes), then we do not have to read the associated data blocks in the belt area to calculate the check value. We only need to calculate the check value of the entire belt, and then write data and check information to the data disk and checkboard. So far, we have explored are all data read and written under normal health. Below, let's take another time when the hard disk fails, the RAID system runs in the downgrade mode. Although RAID 3 has fault tolerance, the system will be affected. When a disk is faded, all data blocks on the disk must be re-established using the verification information. If we read the data block from a good disk, there will be no changes. However, if the data that we want to read is just in a corrupt disk, you must read all other data blocks in the same zone at the same time, and reconstruct the lost data according to the verified value. When we replace the damaged disk, the system must have a data block a data block of data in the bad dish. The entire process includes reading a zone, calculating the lost data block, and writing a new data block to a new disk, which is automatically performed in the background. The reconstruction activity is preferably performed when the RAID system is idle, otherwise the performance of the entire system will be seriously affected. RAID 3 performance issues In addition to our issues discussed above, there are also other performance issues in the process of using RAID 3, in addition to the issues discussed above. The biggest deficiency in RAID 3 is also the reason why RAID 3 is rarely adopted is that the verification disk is easy to become the bottleneck of the entire system. We already know that RAID 3 will disperse the write operation of the data to multiple disks, but regardless of which data disk is written to data, it is necessary to rewrite the relevant information in the checkboard. Therefore, for applications that often need to perform a large number of write operations, the load will be large and cannot meet the running speed of the program, resulting in a decline in the performance of the entire RAID system. In view of this reason, RAID 3 is more suitable for applications where written operations, more read operations, such as databases and web servers, and the like. RAID 5 RAID 3 Performance issues of checkboards have enable almost all RAID systems to turn to RAID 5.

On the running mechanism, RAID 5 and RAID 3 are identical and shared a check block by several data blocks in the same zone. The biggest difference between RAID 5 and RAID 3 is that RAID 5 is not saved in a dedicated check disk in a dedicated checkboard, but is dispersed into all data discs. RAID 5 uses a special algorithm that can calculate the storage location of any belt check block. Specifically, as shown: Note The check blocks in the figure have been dispersed in different disks so that any read and write operations for the check block can be balanced in all RAID disks, thereby eliminated It is possible to generate bottlenecks.

转载请注明原文地址:https://www.9cbs.com/read-61764.html

New Post(0)