【Summary】
A large and medium-sized library information system involves many technical and programs, this paper focuses on some of the content related to the performance of the web server.
I am very fortunate to participate in the design of a large library digital information system and based on the development of Web applications. Since most of the digital index, information, information, image or audio video, etc., have a high demand for web server performance due to most of the digital index, abstracts, full text, image or audio video.
Combined with the experience of actual engineering, this paper will be implemented from hardware implementation (cache server, balanced load equipment, web dual mirroring, CPU and network card upgrade, network bandwidth expansion) and software implementation means (three-layer C / S software structure design, application Two generous discussion of program deployment) How to improve the performance of the web server in order to enable users to use the application system faster, efficient and safely.
【text】
With the development of intranet information technology, the library's digital informationization project is also imperative in order to better exert the functions of its book circulation, data retrieval and academic exchange. A library has launched a part of the digital library project in order to step into the ranks of the World Advanced Library as soon as possible.
The digital library project mainly includes external information web distribution system, interactive search network, background housing information management system, multimedia data acquisition and production, and VOD on demand system. I am fortunate to be one of the project leaders, participated in the overall design of the entire digital information system and participated in the development of some Web-based applications (such as external information distribution systems, images / full-text hybrid retrieval systems, VOD on demand systems).
A library digital information system speaks from a network environment, mainly divided into multiple network segments: (1) Intranet access section, using 2M DDN line; (b) public network segment (non-military zone), mainly including reception release Database server, web server, e-mail / ftp / DNS server, retrieval server, and SAN network area storage device; (3) is an internal local area network, including an intranet web server, a background library database server, an OA server, and more. (4) It is a VOD on-demand network, including audio video on demand servers. Due to the development of strict network-level and application level access, the anti-question reliability is effectively controlled by high-performance switches and security authorization certification systems with three-layer exchange capabilities, ensuring data security and integrity. Considering the quality and future maintenance management operations, the operating system uses a Windows NT platform, the server selects the Dell high-end series, the database uses IBM DB2. The main network is a Gigabit fast swap Ethernet, the local area network is 100 megabytes to the desktop, and the VOD is overcast.
In this network environment, the application is mainly divided into three parts: (1) external Web distribution system, external book auxiliary search system; (2) Background hub information management system and image / full-text hybrid retrieval system; (3) VOD on demand system. Due to the vast majority of the Browser / Server mode structure, end users only need to install IE or Netscape Web browser locally, request and access all kinds of application services through web pages in the support of the background database server. In addition, due to the multimedia information such as index, summary, full text or audio video, which is flowing in the library information system, has higher requirements for web server performance and network bandwidth.
Through continuous testing and practice, we have found that the Web server performance can be improved relatively effectively from the following aspects;
(1) Cache server and balanced load devices use can alleviate access bottlenecks, improve network bandwidth, and implement balanced loads.
Cache server is also known as the Cache server, can store Cache static content such as web pages, multimedia on-demand resources, and conference live (compressed, have a format required), etc. In addition, the current US Cashflow cache server can be stored in dynamic content such as Cache databases, ASPs. The Cache server is typically placed outside the firewall. Before the external network web server, the Internet user clicks on the web page no longer directly access the website web server, but an access to the Cache server. Since the Cache server has multiple CPUs and high-speed large-capacity I / O channels, separate OSs, can greatly alleviate the Internet access bottleneck, and also have a certain ability to attack anti-hand attacks.
At present, a library uses this way, putting the quantity picture, on-demand resources, virtual 3D applications, etc. in the Cache server, even if only 2M Internet access bandwidth, play speed and effect of the above application Can still be satisfied with users.
Another way emphasizes a balanced load device or a web dual mirror image. This method achieves optimal Web access performance through load balancing methods. The web dual-machine mirror is an earlier mode. Although the system reliability can be improved, since the dual-machine is always interrogated to each other, it will affect certain access performance. Balanced load devices are hardware that is independent of the web server, and other servers in the web server and other servers are connected to the same switch. Through the load scheduling program, it can achieve the full use of resources, improve access performance. . Just because a library currently publish resources relatively small, only three web servers are used, so the current balanced load equipment is not significant.
(2) From the configuration of the web server, the number of web servers itself, the number of network cards, the number of webcards, the location of the web server and the firewall will affect the performance of the web server.
From the web server hardware itself, the number of CPUs increases, the number of network cards increases, and the I / O channel extension can undoubtedly improve the web server performance. In addition, since the Gigabit firewall is currently less and the cost is high, if the web server is placed, it will greatly affect the Internet access performance. A library adopts IDS (Intrusion Detection) web server (server firewall, lower end, no traffic) application server database server (firewall, high-end), and hierarchical security mode guarantees the security of the system Sex, improve network access performance.
In addition, a library also uses SAN network area storage to improve server access speed.
(3) The appropriate deployment of the three-layer C / S software structure design and application will also improve the performance of the web server.
Separate business logic, universal access interface, and data, etc., are placed on a web server, application server, and a reasonable deployment of program functions and logic, can also greatly improve Web server performance.
The general principle is that the web server only needs to accept Internet HTTP access requests, so that the Web only has the least task, handle the actual processing to each application server process, and then return the result to browser. A library uses this way to develop search engine application servers and hybrid retrieval applications, and achieve good application results.
In fact, there are still many ways and methods for the performance improvement of the web server, such as the relationship between CPU and storage, and Web switches, etc., which needs to be further practiced, analyzed and discussed. (This article mainly refers to the papers of Shanghai Tongyin et al.)
Comment: The theme is clear, and the provision is also distinct. However, the techniques discussed should be more organically combined with examples of the project.