Scalability, wonderful scalability
This article was originally published in the Diving Into Data Access column (http://msdn.microsoft.com/voices/data.asp ") of MSDN Online Voices. Many years ago, when I was still a shy junior programmer, every time I learned that the opportunity of the project management center information is like a lucky god to me. Whether it is occasionally officially invited to participate in the high-level meeting, or the sneaky hide is sneaking at the table. For many years, I have repeatedly listened to "scalability" in the ear, like an annoyer. "How do we improve scalability?", "Do you have enough scalable?", "We need more scalability in the middle layer." Scalability, wonderful scalability. But what is scalability? No matter what it is, I want to be my duty, I should reach the attention of the scalability problem. It turns out that you never know what changes are doing in this field of project management. I still remember to see the wording of scalability in many academic books, which claims to cover the principles and techniques of distributed systems. "In a good design, scalability is always a decisive factor." Scalability, wonderful scalability. But what is scalability? Recently, I found that the word "scalable" has become a favorite "tool" that measures all software technology for distributed environments in functionality. So give me a feeling, I should invest more investments in my own scalability, in order to make my career more prosperous. The problem will come out. What does it mean in the final analysis? What does it mean in today's database servers and hardware environments? The scalability is abstractly abstract, I think it can be modified using the important properties of scalability by two basic methods. One is to optimize the system in theoretically in the design phase. Subject from all software tools and underlying elements, then apply it to practice. In the actual sense, this means that when the natural parallelism of the task is discovered, the impact of bottlenecks and key resources will minimize as much as possible. Another method is just the opposite. You can only control some of the hardware and software built-in features, your own solutions and creative rooms are relatively small. Basically, the responsibility of scalability and interoperability is handed over to the tool selected to handle this task. Incidentally, in the focus of focusing on performance and reliability, the first INTERNET era of scalability is ignored, and the first method has been used for several years. (This is indeed a relative problem without achieving a wide range of Internet connections.) A few years ago, because of the lack of functionality and affordable integrated solutions, people mainly consider design problems, pay attention Strengths on the resources consumed in the database structure and various operations. Today, the emergence of cheaper and powerful hardware makes optimization and design the effectiveness of the project has been retired in the overall management of the project, which has been retracted independently of the disadvantage of hardware. The important rules for computational complexity indicate that only the fastest algorithm can make full use of faster hardware. Please pay attention to this when considering scalability issues. In general, scalability means that the system maintains (without increasing) the average performance when the number of clients is increased. Therefore, scalability is a concise concept. However, scalability can also be said to be abstract. Because it is not a system properties that can be turned on and off by programming means, or directly controlled in some way. Instead, it is a systematic characteristic, all other features, overall design and implementation, and a combination of selected interactive models.
The inherent level of scalability in the distributed system is not easy to detect by monitoring and analysis tools. On the other hand, many implementations (whether important resources are rich, design bottleneck problems, lengthy but necessary tasks, and excessive serialization) limits scalability. However, if the system is not tested in the actual product environment, it is not possible to determine whether a particular system has sufficient scalability. Scalability is related to performance in some extent, if the system constructs good and uses a reasonable and complete plan, this will not matter. It will be surprised to see a major reason why an invisible system characteristic like scalability is suddenly degraded. Regardless of how the system evolves, as long as you pay attention to the resources consumed by the operations of the operation, you can stand in an invincible. Scalability has always been a factor affecting system growth. In summary, the system that needs to grow refers to a system that its current performance does not meet the number of expected users. How to improve such system performance from the structure? Simply put, you only need to install more powerful hardware or better hardware and software combinations. Today, these two options have been changed to more confusing and more market strategies. The hardware is called "expansion". The combination of hardware and software is called "expansion", which is more clever. To control the growth of the system, make sure to expand or expand. However, only the performance of the maintenance system can be expanded or expanded. Scalable, wonderful scalability. Is this the most appropriate definition? Expanding the expansion system is basically an updated, more powerful hardware system, such as an umbrella and an umbrella handle with an umbrella. Once the new system is ready, you can back up the tables and applications and formally start running. In this way, the impact on existing code and system organizational structures will be minimized. However, don't use the roses on the way to expand. The expansion also has its shortcomings, and there are some attention to it. First (also the most important), the expanded system has a defect that may lead to failure, so the system will eventually face some hardware restrictions. Expanded by using a more functional computers to increase the processing power of the server. The growth of a single hardware processing capability has a physical upper limit, although people may not be foreseen, but this value will be reached one day. Second, close to this upper limit requires considerable spending, which includes time (after more than a certain limit, it may cost several years to improve performance in technology), fees, including (last but not the most Not important) Power consumption and office space. That is to say, due to the limited impact on the existing structure, the expansion is a relatively reasonable preferred solution. Extended expansion refers to increasing the overall processing capabilities of the system, which is the opposite of the ability to increase the individual components of the server. The expanded system is essentially modular and consists of a computer cluster. Expanding such systems means adding one or more additional computers to the system. In an extension, highly partitioned environments, the processing capabilities you use should be more abstract and don't rely on hardware. The total processing power is the sum of the actual speed of each computer, and each computer is adjusted by the data and application partitions within the node range. It seems that it seems that the growth of the system does not seem to limit. It is undoubtedly good to start from this point. However, the extension involves a large number of redesign and re-implementing work. The system constructed based on single server theory must be re-considered and reorganized to meet the expansion requirements. You must determine how to partition data across multiple DBMS servers. The path to the application to the data must be carefully optimized by the appropriate implementation plan. In its life cycle, user activity should often be analyzed, which is critical to the adjustment system. After doing these, you have a virtual unlimited system that you can add processing resources in the intermediate layer and data layer to meet the growing user and workload.
Note that for a retractable system, the key problem is not that the number of users expects can reach. What truly considers is expected to grow at a speed of how fast. The number of relative growth is much more important than absolute quantity. Compared to a system designed for a relatively stable billion user, build a one hundred users, but the number of users will be more complicated over time. Therefore, the demand for scalability is particularly strong on the e-commerce web application impact on the industry. SQL Server 2000 can be extended, and relatively easy to achieve relatively, and the cost is relatively low for situations of less data and fewer users. But from the software, expansion technology is more interesting and more challenging. Without the help of existing software work in the backend, you can't extend reasonably. The cluster model you found in COM and Windows® 2000 is an extension model. All servers in the business layer have the same COM component copy. The Windows 2000 Load Balancing Server running in the background will be responsible for schedule a new request for components based on the workload of each component. From the point of view of the application, you see a single entity, a COM component set. You encode these components without having to consider the number of different servers running these components, and can ignore the role of the basic load balancer. The extension here is as simple as the newly configured Windows 2000 server is as simple as adding a newly configured WINDOWS 2000 server. In this cluster model, two distinctive entities work together: application components and system load balancers. This model is not easy to apply to the data layer. In fact, you have only one software entity - DBMS. SQL Server 2000 supports another cluster model called "Joint Server". This network enables applications to see a set of servers running SQL Server. All of these SQL Server public instances are managed independently through different tables or even different configuration settings. The main workload of DBMS is data. The application is primarily responsible for partitioning data to multiple servers. SQL Server 2000 itself provides built-in features that support updatable views that are distributed over multiple servers. You can determine the extension table over multiple SQL Server instances on the network. You can bundle these data together whenever you need it as needed. You can complete this step by "Partition View", which can enjoy special support at SQL Server 2000 runtime. Partition view partition view is applied to "Joint Table", which is a key table generated across two or more servers. The joint table is created by a processing program called "horizontal partition", which divides a given table into a smaller table. From an application perspective, the joint table is like a separate view. For balanced workload, the member table is placed in various different machines. Their format is the same as the original table, but each table contains only one line. The member table can use any name, but in order to increase the location transparency, it is recommended to use the original table name to name the member table. The constituent table is typically designed to contain the data portion of the same size. This feature is done by a unique, unrecognizable primary key. To ensure integrity and consistency, you must ensure that there is no repeated record. To do this, it is recommended that you perform Check restrictions for each member table. The partition column cannot be empty or as a column. The partition view is a normal view containing a distributed SELECT statement that is suitable for the same table, and puts the data together through the UNION ALL clause. Create View MyView Assread * from Server1.Database.OWNER.MYTABLE
Union all
Select * from server2.database.owner.mytable
Union all
Select * from server3.Database.OWNER.MYTABLE
Union allselect * from server4.database.owner.mytable
This partition view must be created on related servers and each server must be visible to other servers in the form of a link server. It is best to set their Lazy Schema Validation options to True. This property can determine if the architecture of the remote link is checked. Set it to true, Server will skip the check. Although this check makes performance better, in this special case, it is set to lazy without slight side effects. The partition view was originally introduced with SQL Server 7.0. However, with the introduction of SQL Server 2000 and some important improvements, they become an important tool for extending systems. In SQL Server 2000, you can update the partition view and can be specially optimized by query optimization, which maximizes minimizes the need to process across server. The main advantage of the joint table is the workload between the balance server. This is undoubtedly an advantage as long as all servers have completed the assigned tasks. In the following cases, the partition view will be updated automatically:
It consists of the result of a single SELECT statement merge with the UNION All clause. Each SELECT statement works on a separate table (ie, Join is not allowed). This table is a local table or a link table. This table contains time tag columns. The link table must be referenced using the following possible conditions: The full qualified name (server, database, owner, table name), OpenRowSet, or OpenDataSource function. INSERT, UPDATE, and DELETE statements for updated partition views must meet a series of restrictions to take effect. For example, if the table contains an identity column, the new row cannot be inserted. You must specify all columns, including columns with Default restrictions. If the view is self-join or join with any other member table, update or deletion is not allowed. The cluster model published by the extended practice SQL Server 2000 is not applicable to everyone. It is designed for high-end OLTP enterprise systems and certain active web applications. In order to improve efficiency, it requires data partitions, and the partition must follow the logical architecture. All related data must be on the same server and must be able to logically split. In-depth understanding of data is absolutely necessary. Further, with time, the shape of the data should not have too much change. If you know that it will change, it should be pre-understand its future shape and take into account this when planning the partition. Store related data on the same node is feasible, otherwise you will quickly lose the content you're with carefully designed load balancing strategy. After completing the data partition, you just have completed half even if you are very successful and smart. In fact, the data is truly moving to the selected cluster, arranges backups and monitoring solutions to be resolved. Compared to the expansion, expansion technology does not implement and is a more troublesome way. The design problem is concentrated as some realistic obstacles, such as lack of specific tools to manage the extended cluster as a single entity. Some of these tools are expected to be available in the next version of SQL Server, the code name is yukon. Expanding scalability seems to be very exciting, but it is still the most insurance in most cases on a single server. Expand or expand? In fact, the third generation Web services have strict requirements for hardware. What you need to solve in principle is: Select expansion or expand? With the enhancement of a single hardware system, the demand for expanding will also increase. The extension will make your system a collection of multiple systems and growing, which are connected to each other, but are small. The expanded system is more likely to fail, and the intrinsic reliability is poor and cannot be upgraded after a certain threshold. It is also expensive from power consumption, space and price. On the other hand, the system is expanded just as simple as backup and recovery. Although the processing of individual machines is always more than handling or hundreds of machines, the expansion system has stronger internal reliability and scalability, and the overall cost is lower. However, the advantages and disadvantages of these two ways are relatively, and it is completely for specific items. Expand or expand depends on the intrinsic characteristics of the system. The extension is the preferred solution if you have access to your data via a single SQL Server (which may run on a separate multiprocessor). Incidentally, this is a scalability model used in the web. However, the Web field extension model can only look at horizontal scalability from one aspect. If you need to handle a large number of concurrent data (eg, OLTP systems), you may need to process multiple servers that work together. In this case, the data must be properly reorganized. Such huge workload is not possible to complete in a short time. Before starting such a project, make sure your system is ready to work with such a massive work.
That is, make sure it can be used as the OLTP system. According to the general rules, although the expansion looks more promising, it should always be considering the expansion, only when the reason is sufficient, the expansion is given. Scalability is the trend of development regardless of the theory of expansion and expanding technology, there is a need to pay attention to: scalability is related to the complexity of the calculation, and indirectly related to performance. Only after coordinating the best computational complexity and implementation of the task, it can be considered to expand through hardware or using a special database service. However, according to what I know, with improved algorithms, optimize query implementation plan and find and eliminate bottlenecks, the faster hardware is always coming. And it can also help you complete your task on time. Dialogue: Arguing around ADO I once asked me: "What is the difference between ADO and ADO.NET?" I told him that ADO.NET was another data access layer, and even after it used, it won't I fully understand the advantage of the advantage. Later I found that the ADO code can also work normally in .NET, but you have a lot of confidence in the ADO's rocky music. I admit that, at first glance, it is easy to think of ADO.NET as "another data access layer". But look carefully, it is not that. ADO.NET has always been uniquely recommended for data processing applications running under .NET. In recent years, I found that people have risen to use ADOs after a fear, and give up the optimized and adjusted RDO code. In addition to exceptional results, they have all all other desired features. However, only poor ADO is blamed. Similarly, I dare to say poor ADO.NET will become a bad migration project. Although there is no perfect code, the system code is almost always better than your code. That is to say, RDO to ADO's upgrade should have problems. Why is you only to change the good code for the ODBC to OLE DB and more powerful object models? Using ADO.NET, things are completely different. If you choose .NET as a server platform, ADO.NET is your own choice. ADO.NET is the unique class that you can use to process data. Technically, you can adhere to the use of ADO, but you will eventually pay a high price on the conversion of .NET to COM. The key to ADO.NET is to start real encoding, you can learn it and use it in some control environment. How many people did it before upgrading to ADO? For today's World Driven world, ADO.NET is a more applicable model relative to ADO. At the same time, the ADO.NET object model has been as much as possible as possible, and these concepts are used in the case of appropriate, enough to explain the problem. Go to ADO.NET, starting from now, you will find that the learning process is so simple and fast, it is incredible. Developers don't have to learn too much new concept, you can use ADO.NET. But people must reset their ADO thinking framework and begin thinking according to another model. ADO.NET is never "another data access layer". You don't have to choose between ADO and ADO.NET. The main choice is to determine if you want to use the .NET platform.