In today's INTERNET era, people need information can be obtained soon, and the user hopes that the application is available continuously. The chance gives you a lot of impact, which is not only reflected in the aspects (such as loss of sales), but also in some invisible areas (such as your company's image). In this article, I will tell some good practices that are used to get highly available applications, as well as items that require special attention in .NET.
First of all, I want to define what is the meaning of the height, how can you determine the availability of your application. As a characteristic, people are based on how many "nine" is discussed. For example, if your application reaches 99.9% availability, we think that it has three nine nine. The more nine, the better it, because it explains you more than 100% availability. The industry is set to five nine nine, which reaches 99.999% availability, which is equivalent to about 5 minutes of downtime per year (see Table 1).
In order to estimate your system availability, you can use this formula:
Availability = (MTBF / (MTBF MTTR) x 100
Mean Time Between Failure (MTBF) is the time before the application is faulty, the average operation. You can use the entire normal runtime to calculate it with the number of application faults. Mean Time to Recovery (MTTR)) is averaged before the application resumes, usually it is the downtime we finishes. You can use the hour number required to restore the application to calculate it with the number of application failures. This formula also illustrates a point. The availability is not only the minimum number of faults, but it is equally important to reach the time required to restore the application.
You can limit the downtime in your business through several ways. However, you should first consider what extent to which your application needs to reach, you can invest how much you want to get you expected. Different types of applications require varying degrees of availability at different times. Not all applications take 365 days a year, 7 days a week, 24 hours a day; in fact, most applications do not need. Financial applications, such as banks or stock brokers, may need this normal run time, but other internal commercial applications, only applications are available at 9 o'clock to 5 points every day, That is, when the workers work, it is available when needed.
Although some applications may have a short period of operation, it is inherent to fail in that time. Data acquisition is a good example. Suppose your application stores data from the CAT scanner. When scanning, your application may only take only 20 minutes of normal running time, but you don't want it to have problems during that time, otherwise the patient has to be scanned once.
Carefully study your application availability requirements, then determine how long it can be tolerated during that period of time. You should also consider getting the cost of availability you expect. Each additional nine will make you more in redundant hardware and software, and you need an additional IT personnel who have been trained at high levels. Here, I will tell what way you can use to get more nine, if that is your goal.
Take measures to eliminate shutdown you can help you eliminate downtime. Here are some of the best ways:
Invest in a good reserve and recovery plan. My article "Avoid disasters" (in the previous issue .NET Magazine) how to establish such a plan. Some downtime is inevitable, especially when they are caused by external factors - such as security is destroyed or natural disasters. Fortunately, by using a cautious consideration, and after a thorough test and recovery plan, you can minimize downtime. If the shutdown is inevitable, you need to prepare to make the application quickly return to normal operation. You should also let the clocks on your server with the Network Time Protocol (NTWORK TIME PROTOCOL (NTP)), or uses a similar approach (see resources). This is important for a good reserve and recovery plan because it can help you investigate and track questions. Use redundant hardware. First, a redundant hard disk memory is provided. Because the content of the hard disk storage is changing, the hard disk is one of the hardware that has the highest failure rate. For this reason, you should use RAID 1 or more advanced RAID to provide redundant disk storage. Hot swap drivers can also reduce downtime; they can quickly replace a failed drive without stopping the server. You should also connect multiple disk controllers to disk arrays. Usually you can implement it by using two or more server groups connected to a shared disk array. Similarly, if your disk controller uses write caching performance, it should have a backup battery to ensure that all Writing disks will eventually write.
Then, the fault-tolerant RAM is used for your server and uses Error Corrective Coding (ECC) memory, which automatically corrects errors in the memory module, avoiding data being destroyed, and the possible system is paralyzed. In addition, some motherboards also support thermal backup memory slots. They provide additional memory modules that can provide instant failover if the existing memory module fails.
You also need to provide redundant network transmission. With multiple network cards, the development team can configure on redundant separate physical networks to avoid a network path, or it can also configure these network cards as a separate logical network card to configure with a separate MAC address. To prevent one of the NIC, a network cable or network hub port is faulty.
It is also important to provide redundant power supplies for devices. Two or more redundant, hot-swappable power supplies for each server, and if possible, put each power on a single power supply network (Power Grid). A backup battery power supply is provided for the server and network devices, or back up the power generator power. Confident, the server's configuration can make it automatically shut down at the backup battery power supply, once the power is restored, it can restart.
Finally, provide a good operating environment for your hardware. Your data processing center should have an appropriate temperature and humidity controller, active flooring to prevent floods and electrostatic discharges, and a safe fire extinguishing system for electrical equipment, such as those fire extinguishing systems that use Halon.
Use software that supports failover (Failover). You should use your computer, server, and application software whenever possible. The cluster is to configure several machines, they are the same, working together in a separate application. In this case, if a server is parked, the application can transfer from a fault to another server. Clusters can also provide scalability for your application. You can collect front-end servers, such as web servers; you can also collect back-end servers, such as database servers to avoid any single point failure (for other alternative methods, see toolbars "to study a cluster method)"). If your development team wrote software applications for internal, make sure they support clusters, or implement custom failover methods. Messages and transaction services can also transfer your application from faults because they can be used to maintain status information between multiple servers. Use Network Load Balancing (NLB)). NLB will automatically assign your client accesses across multiple servers. Generally you use it together with the cluster. In this case, if a node in the cluster fails, the NLB automatically detects the fault and re-directed the access amount of the client to other nodes in the cluster.
Set security as the best-priority project. Realizing security is not only to make sure you have a good firewall and the latest security patch. It also provides physical security for your data management center, allowing employees to access those services they need to perform their specified tasks.
Event records and monitoring are widely used. When you build so many redundant hardware into your system, use monitoring applications is important. When you fail, they can notify the relevant people. You may want to know how often there is a fault in your RAID 5 array, so you can have a failure in another hard drive, and you lose all the data on the logical drive. Alternatively, if a node in a cluster has failed, you need to diagnose the problem and restore it as soon as possible. Monitoring all systems to make it normally, it is important. You may also want to know if there is an unusual load on the system, or if you have used the storage space.
Below is some list of matters you should monitor. When you have an exception, you should be alert these aspects:
· disk space
· Memory
· CPU usage
· Network load
· Application column
· hardware malfunction
· Software application error
· Database transaction time
· Safety record
A good monitoring alert can also let you know when you need to increase your system's capacity, you can also help you solve system bottleneck problems or possible failure issues.
Set a support underlayer frame. Not all types of faults appear in automatic monitoring or event records. Applications should have a way to contact a support team, which can track problems, from the initial problem report until the problem is solved. The support team should have an appropriate method to step by step to solve the problem, providing the necessary information with IT staff or software developers, let them solve the problem. Support group also helps you track questions so that these issues will be avoided in the future.
Test your application's failover performance. Set a test environment that allows you to simulate various types of hardware and software failures. This environment is critical, it ensures that the application's patching and upgrading method are reliable, and it is good.
Consider the purchase. When you are evaluating the underlying framework that needs to be used to implement a highly available application, it will be very expensive. You need to license redundant hardware and additional software for redundant servers. You need cluster software, NLB hardware, and software, and multiple redundant connections through NetWork Access Points (NAPS) with Internet. You also need a top-tip IT team and support staff to make the app run smoothly. For this reason, consider this Hosting's Hosting out into those suppliers who are good at this area. From an economic point of view, they usually give you a higher degree of availability, which is much higher than the usability of your own use cost yourself. Now you have already understood the meaning of high availability and to achieve this usability you can use, let's take a look at all these .NET applications mean. Considering the views of the .NET application from .NET application, they will think of the web service. Because the trend of .NET application is to use web services, the .NET application has more distributed features than other applications. This distributed nature is like a double-edged sword on the creation of a highly available application. To further explain, assume that you have an e-commerce application or online store (see Figure 1). For the sake of simplicity, the sample program uses two Web services from different externals to handle credit card transactions. (In fact, applications can use several web services internally and outside to perform tasks, such as directory tracking and tracking shipping goods.)
figure 1.
Create a high availability e-commerce .NET application
First, I tell what the Web Services provides a highly available application. The Web service provides a way to break up the application processing and distribute it to multiple redundant servers. You can put the server that provides a Web service in a separate physical location, which is far from each other. In this example, two different companies located in different places provide two credit card processing services. This reduces the opportunity to damage the application due to natural disasters or local network services. In addition, each vendor can have multiple similar web services for redundancy. This method of separating the processing logic to the web service provides simple application failover. Your application only needs to try a web service location, if it does not respond or returns an error, the application can continue to run and try another Web service that performs the same function.
However, when you create a highly available application, distributed web services have brought us some challenges. Network transmission to connect to web services may be unreliable. If your transfer is through the Internet, then the situation is especially true, as shown in Figure 1. Therefore, use the Internet to connect your web service, rather than allowing them to connect or pass a private WAN to increase the probability of failure. Usually, ISPS does not do any service guarantee. Therefore, the network quality of the service cannot be guaranteed.
To ensure the availability of your application, you need at least two separate Internet connections to different ISPs, it is best to connect to different NAPs. When connecting to a web service via an Internet or WAN, you usually need to consider these limitations of longer and lower bandwidth.
These are what you need to take into account when you design your application. They may determine which procedures can be implemented as web services, or the arrangement and connection speed between your web services. As two credit card processing companies in the sample, you can decide to use a web service provided by an exterior. In this case, your application availability will depend on the availability of the company's web service. We recommend that you get an external service level agreement (SLA) (SLA), which ensures that the vendor meets your application availability target. In addition, it can also improve your application availability with not only one Web service provider. When you design a highly available .NET application, remember that you need redundant hardware, software, and network connections to transfer faults. And you should carefully consider some of the features of at least your application.
CTO Hunter Stone Inc. company Todd Walker holds MCSE and MCSD certificate, he is located in Columbia, S.C of: About the author. Hunter Stone is a Microsoft Certified Partner, providing customized software applications based on medium-size and large-scale companies. Todd's contact details are twalker@hunterstone.com.