The previous study database, often heard the word RAID, I feel very familiar (I really want to learn in the system structure), there is very strange (there is no impression). These days have frequently exposed to this stuff, so I have intended it to make it clear.
RAID is the abbreviation of Redundant Arrays of Independent Disks, Chinese is an inexpensive redundant disk array. In 1987, a RAID (Redundant Arrayof INEXPENSIVE DISKS) is proposed in 1987, as a high-performance storage system, it has received increasing extensive applications. The level of RAID is proposed from the RAID concept. It has now developed multiple levels, and there is a clear standard level is 0, 1, 2, 3, 4, 5, etc. But the most commonly used is 0, 1, 3, and 5 four levels. Others are 6, 7, 10, 30, 50, etc. RAID reduces the cost to the user, increasing the performance efficiency, and provides the stability of the system operation.
Standard RAID Writing, including the check calculations required in RAID4 or RAID5, the following steps are required (1) Read data (2) in the target data disk (2) in the target data disc (2) read data in the target data disc ( 3) A new check data (4) written to the check data (5) written to the target data disk when the new data is written to the check data. When the data in the array RAID group is sent to the array, the array controller saves the data in the cache and immediately reports that the write work of the host is completed. The data writes to the array hard disk is done by the array controller, which can continue to be stored in the cache until the Cache is full, and the controller must be refreshed when the new data must be refreshed or the array is timely. This data is written from the Cache to the array hard disk. This cache backup technology allows the host to wait for the completion of the RAID check calculation process, so that the next read and write task can be processed, so that the read and write efficiency of the host is increased. When the host command writes a data to the hard disk, the array controller writes the data to the uppermost position, only the new data will be written to the hard disk in the Write-back cache mode.
RAID level:
NRAID: The hard disk is continuously used. Nraid means not using the RAID function. It uses the total capacity of the hard disk (not using the block read and write). In other words, the logical disc capacity it generated is the sum of the physical disc volume. In addition, NRAID does not provide the preparation of information.
JBOD: The meaning of JBOD is that the controller will use each hard drive on the machine as a separate hard disk, so each hard disk is used as a single independent logical disc. In addition, JBOD does not provide information preparation.
RAID0: RAID 0 - Disk Stripping WITHOUT PARITY, but also known as data block, that is, divide the data into small pieces of different sizes, and writes them on different hard drives on the array, which is also known as "stripping" (Jocked to data strip), this distribution of data is on multiple discs, which is operated simultaneously in parallel. In theory, its capacity and data transfer rate are N times of a single hard disk. N is the total number of hard drives constituting RAID0. Of course, if the array controller has multiple hard drive channels, the hard disk on multiple channels is operated, and I / O performance is higher. Therefore, it is often used in the fields of images, videos, and the RAID0 I / O transmission rate is high, but the average fault time MTTF is only one of the single disk, so the RAID0 reliability is the worst.
RAID1: RAID 1 - Disk mirroring (more common), also known as mirroring. That is, each working panel has a mirror disk. Each time you write data, you must write the mirror disk at the same time. Read the data only from the working disk, once the workpiece is faulty, transfer to the mirror disk, read from the mirror disk data. When the fault is replaced, the data can be reconstructed, and the recovery of the work is correct. This array reliability is high, but its effective capacity is reduced to less than half the total capacity, so RAID1 is often used in a very strict application, Such as fiscal, finance and other fields. RAID (0 1): Combined with RAID 0 and RAID 1-write read and write while using mirror operations. RAID (0 1) allows multiple hard drives to be damaged because it fully uses hard drives to implement data preparation. If there are more than two hard drives, RAID (0 1) is automatically implemented.
RAID2: Also known as the cross, it uses the sea maido code to make a dislocation, use the stand-in-one cross-access, applied to the reading and writing of big data, but the redundant information overhead is too large (multiple verification disk), has been Phased out.
RAID3: RAID 3 - Parallel Disk Array, is a single disk fault tolerance parallel transmission. That is, the Stripping technology is used to block the data, and the blocks are different or validated, and the verification data is written to the last hard disk. Its feature is that one disk exists in various hard disks that are in terms of bit or bytes (dispersion records on each hard disk of the same sector). When a hard disk is faulty, in addition to the faulty disk, the write operation will continue to operate the data disk and the checkboard. The read operation is performed by the data of the remaining data tray and the varying or calculation of the reconstructed fault. The advantage of RAID3 is that parallel I / O transmission and single disk fault, high reliability. Disadvantages: Each read / write should affect the entire group, only one I / O can be completed each time.
RAID4: Similar to RAID3, the difference is that RAID3 is a bit or byte crossover, and RAID4 is accessed by block (sector), which can be operated separately, not to be like Raid3, even if each time Small I / O operations should also be involved in the group, and only two hard drives in the group (a piece of data, a checkboard) can increase the small amount of data I / O speed. Disadvantages: For randomly dispersed small data I / O, fixed check discs have become I / O bottlenecks, such as transaction processing. For two small write operations, a Stripe1 written in DRIVE2, one written on the Stripe 2 of Drive3, and they all written to the checkboard, so there is a problem that the contention of the checkup is.
RAID5: RAID 5 - STRIPING WITH FLOATING PARITY DRIVE, is an array of rotating parity independent access, which is different from RAID3, RAID4, is unmodable, but according to a certain rule. The parity information is evenly distributed on the hard disk belonging to the array, so on each hard disk, both data information also has verification information. This change solves the problem of contention of the checkboard, so that multiple write operations are handled in the same group. Therefore, RAID5 is suitable for the operation of large data, and is also applicable to various transaction processing, which is a fast, large capacity, and fault-tolerant distribution rational disk array. When there is a N-piece array, the user space is the N-1 plate capacity.
RAID3, RAID5, after a hard disk fails, the RAID group changes from Online to Degraded, but I / O read and write is not affected until the fault is recovered. However, if there is a second piece of disk failure in the Degraded state, the data of the entire RAID group will be lost. RAID technology application:
DAS - DIRECT Access Storage Device Directly Access Storage Device DAS is the term for disk storage devices, which is previously used on large and medium machines. The latest form of using the hard disk device DAS is RAID using the latest form of the hard disk device DAS. "Direct Access" means that the time to access all data is the same. NAS --Network Attached Storage Network Additional Storage Device A special purpose server, which has embedded software systems that provide file sharing services for individual system platforms San --Storage Area networks storage area network A dedicated network used to establish a direct connection between servers, disk arrays, and tape libraries. It is located as an extended memory bus, connects the dedicated hub, the switch, and the gateway or bridge. SAN often uses Fiber Channel. One SAN can be local or remote, or it can be shared or dedicated. SAN breaks the binding between the memory and the server, allows you to choose the best memory or the best server to improve expientability and flexibility.
Why do you need a disk array?
How to increase the speed of the disk, how to prevent data from being lost due to disk failure and how to use disk space, has been the trouble of computer professionals and users; the price of large-capacity disks is very expensive, forming the user A big burden. The production of disk array technology solves these issues. In the past decade, the processing speed of the CPU has increased by 50 times. The access speed of memory has also increased significantly, and the data storage device-mainly the disk (hard disk) - only increased access speed Three, four times, forming a bottleneck of computer systems, pulling the overall performance of the computer system (THROUGH PUT), if it is not effective to improve the disk access speed, the imbalance between CPU, memory, and disk will make CPU and memory Improvement formation waste. There are two main modes of improving disk access speed. First, Disk Cache Controller, which will have the data read from the disk to reduce the number of disk access, and the read and write of the data is carried out in the middle of the memory. Increase the speed of access, and if the data to be read is not in the memory, or when you want to write data to disk, do disk access action. In this way, in the single-tasking envioronment, such as DOS, there is a good performance for a large amount of data (small and frequent access, but Multi-Tasking) The environment is under the environment (because the action of the SWApping) or the database (Database) is accessed (because each record is small), its performance cannot be displayed. This form is without any security. The second is the technology of using disk arrays. The disk array is a plurality of disks constitute an array, as a single disk is used, which stores the data in different disks in different disks, access data, and a sharp reduction in the array The access time of the data, and there is a better spatial utilization. Different techniques utilized by disk arrays, called RAID Level, different Level for different systems and applications to solve data security issues. Generally high-performance disk arrays are reached in hardware, further connecting disk to control and disk arrays in a controller (RAID Controler or Control Card, resolving people for different users) Four requirements: (1) Increase access speed, (2) Fault Tolerance, ie security (3) Effective utilization disk space; (4) to balance the performance difference between CPU, memory, and disk, improve computers Overall workability.