Reduce network traffic / Reducing Network Traffic with web Caching Network Traffic with Web Caching
English Version
Reducing Network Traffic with Web Caching
Server Web Caches Speed Access To Web Pages and Ease Network Traffic
Rawn Shah Independent Technology Technologist and Freelance Journalist
Find out three ways to use Web caching to route Web traffic to your site more efficiently. Whether you run an extranet, intranet, or Internet site, Web caching can give you more control of your resources. Learn what hardware you need and what caching software .
Like the mass transit systems that move groups of people between popular destinations, Web caching systems do the same for URL requests for popular Web sites. You can use Web caches to put users on the express track to their destinations. Web caching stores local copies of popular Web pages so users can access them faster. A cache aggregates all the individual requests for a Web page and sends a single request as their proxy to the origin site, as the requested Web site is called. (But do not confuse a Web cache with a proxy server. The latter serves as an intermediary to place a firewall between network users and the outside world. A proxy server makes your outgoing network connection more secure, but it does little to reduce network traffic.) When the cache receives its copy of the contents, it then makes further copies and passes them on to the requesting users Caching in Web caches can help reduce the load on a Web server by reducing the number of incoming requests;. browsers retrie ve portions of data from the cache rather than directly from the server. However, most Web content providers do not have access or control of which users or how many users arrive at their site. The cache server needs to go near the user's end, rather Than Near The Web Serves Distribute The Incoming Load Across Multiple Servers At The Web Content Provider End, But That '
sa whole other story.) The most obvious beneficiary of Web caching is the user, who avoids some traffic snarls when browsing. The network administrator and the remote Web site also reap benefits. According to the National Laboratory for Applied Network Research (NLANR), large caches with lots of clients may field as many as 50% of the hits that would otherwise travel through a network individually to the origin site. A typical cache would easily field about 30% of the intended hits, says the NLANR's 1996 research. Thus , statistically speaking, a Web cache could eliminate at least 30% of the Web traffic that would normally be going out over a WAN line. If you're paying dollars for megabytes, Web caching can then save you considerable sums in a relatively short time . Even if you have a flat-rate WAN connection, caching improves customer satisfaction levels, because it speeds access for all. PCs have memory caches for code that's called often, and most browser programs have local caches that store recently surfed Web pages either in memory or on disk. A Web cache also stores frequently accessed Web pages, but it operates on a grander scale.Caching on a grand scale Global cache projects aim to reduce Internet traffic jams On the global scale of the Internet, Web caches can lighten the overall burden of traffic through the numerous high-speed links between top-tier Internet service providers. by providing a cache hierarchy that maps to the provider's network topology, ISPs can create regional cache areas supported by groups of Cache Servers. Each Region '
s cache contains data from Web sites in other regions. Thus, a programmer in California who wants to look up a page located in France would automatically be directed toward the cache server in California that holds a copy of the needed page, rather than pulling it straight down from halfway across the world. Because of the public benefit of reducing Internet traffic, the National Science Foundation (NSF) has supported research projects that enable large-scale caching systems. One such project from the NLANR is currently investigating a multilevel cache system based on the national supercomputing centers located in public institutions and universities around the United States It began when a study back in 1993 suggested that several strategically placed FTP servers could reduce the overall traffic on the then-NSFNet backbone by 44 percent. Internet topology has changed Significantly Since 1993, But The Basic Precepts Still Hold True. in Fut, The Ircache Project from the NLANR HAS CACHE servers located near or at the recommended central exchange points. In the United Kingdom, a similar project is part of the Joint Academic Network (JANET), a national academic network service, and the next generation SuperJANET system. This national cache service is also available for public use or cache peering arrangements. Both the IRCACHE and JANET Cache offer open participation from the public to combine their cache servers with the projects' distributed system. This gives the benefit of a global cache system that empowers that of your own, which in Turn, Speeds The Access of Your Users. For more details on participating, visit the site listed in resources.
Webrouting with cache servers In addition to reducing outgoing traffic by bundling duplicate requests from browsers, Web caches act like custom-dispatched express trains to solve the problem of Webrouting:. How to send Web traffic efficiently over a network While Internet Protocol routing handles low- level traffic direction of individual IP packets irrespective of the data contents, Webrouting directs application-specific HTTP traffic across the network. Because Web traffic constitutes most of all the Internet traffic, improving Webrouting can improve the overall performance of the Internet. Webrouting depends upon IP routing because Web traffic flows only along the paths defined as IP routes. However, a single Web flow can change from server to server as it is redirected by different Web routers. A Web server can use the Redirect command of the HTTP protocol to send Web Requests to other servers for processing. Web Caches Themselves Redirect Client and Server Traffic Locally Or To OTH er caches to provide faster access to pages. Finally, load-balancing devices for Web servers can redirect incoming client requests to a group of servers in the same location or in other network locations to evenly distribute the incoming requests among the servers. You can think Of all these devices as webrouters Directing http traffic. The process of webrouting with cache servers begins after the Web Request Leaves The Client Browser Workstation:
The cache server receives the request one of three ways: the request can be sent directly to the server; the server can actively monitor network traffic and pick out requests from the flow; other network devices can pick out the traffic and send it to the cache server.Then the cache resolves the Web request. It has to determine if the requested page is stored within its cache database. If not, it checks its partner cache servers, if any, for the requested data.Finally, the cache server returns the data to the client, either from its own database, from a partner's database, or from the original Web server. Just as public transit systems use buses, trains, trollies, shuttles, taxis, and ferries, this three-step Receive-Process- Return process has been implemented in various forms. Receiving the Web request The most basic method for diverting requests to a cache is to configure the browser to point to the cache as its proxy server, an option on most popular browsers. The client browser the n sends a request for a URL directly to the cache server to retrieve a document This method ensures that the cache does the greatest possible amount of request processing:.. every request goes through the cache server One downside of this method is that you can not always Control WHETHER THE BROWSER Uses a proxy;
thus, clever users who understand that this is a typical configuration option may try to bypass the proxy And another downside:.. When you have hundreds or thousands of desktops and Web browsers to configure, this method can turn into a management headache Transparent proxy caching also diverts all traffic to the cache server. A cache server sits directly on the data path between the clients and the remote Web sites and intercepts all outgoing requests. The cache examines every packet of data to look for Web requests, so in essence it serves as an advanced form of a packet filter. External packet filters and IP Layer-4 and Layer-7 switches can also handle and route client requests. These devices examine the packets that are going out of the network to identify Web requests and redirect them to The Cache Server. a Packet Filter Can Examine Any or All of the Contents of the Packet and Based Upon Some Predefined Policy, Redirect The Traffic AppropriaTely, A Layer- 4 switch redirects TCP or UDP to an appropriate destination; because all HTTP traffic is TCP-based, most such traffic is passed on first At the application layer of the ISO stack, a Layer-7 switch looks only for application-specific protocols such. AS HTTP, TO DIRECT to ApproPriate Destinations.
Comparing methods for handling requests Configuring every client Web browser can be a tedious task; transparent proxy caches are more practical for deployment on large networks or in organizations without strict control of the network For example, an ISP can use transparent proxy caches for its dial. -up modem clients without their ever knowing about it. Such a cache server would have to sit closest to the outgoing WAN connection to provide the maximum benefit. Transparent proxy caches work much more slowly, however, because the cache server has to process every single IP packet that goes through the network to look for Web packets. Thus transparent proxy caches require the fastest processors and fast dual network links. Using external packet filters or layer-specific switches optimizes the function of the device. In fact, some implementations have their OWN Protocols That Monitor The Activity of Multiple Caches for the purposes of loading-balacing.
Processing the Web request Once the cache server receives a Web request, it checks its database to see if it has the contents of the requested page stored somewhere. Web caching originally began as a single-server system that contained all the data of the cache. Although that's effective, cache servers tend to grow large. A single server runs out of disk space to store the requested pages or can not process the incoming requests fast enough. Eventually single-server schemes gave way to distributed cache servers working either hierarchically or in parallel , or both. These servers balance among themselves the amount of cached information they contain, placing the most commonly requested data at the top of their hierarchy for the most people to see and the least commonly requested data at the bottom, closer to the specific users Works as Extensions to Existing Web Server Products. in Such a Case, There's No Point in Logging Web Access Entr ies, so the administrator should either disable or limit logging to the server log file. A cache contains continuously changing information, and unless you know what each cached entry contains (which actual Web site it goes to), you will not know where the client was going. Cache logs may also get fairly large, because all your users will be contributing to it. It can consume disk space as quickly as third-graders do candy. Single-level caching A cache server is essentially a proxy Web client that Stores a Lot of Pages Locally. The Server Responds To Requests by Sending Along The Requested Web Page IF IT '
s available. A successful retrieval from the local cache is called a cache hit, and an unsuccessful one is called a cache miss. In this case, the server begins its own access to the requested URL. Such a first-time access to a page forces the cache server to contact the origin Web server that hosts the page. The cache server checks to see if the page can be cached, retrieves the data to cache locally, and, at the same time, passes through the contents to the client. The user may never realize that the cache is between the client and server except in special circumstances. A single cache server is the cheapest solution for improving Webrouting, but its effectiveness is limited by the capacity of the server. by combining a firewall, an IP router, and a cache together, vendors have created a single-box solution that works well for small office intranets. To go even cheaper, you can build a device with similar capabilities using a PC, the Linux operating system, and open-source software AV ailable publicly. Parallel and load-balanced caching A single cache server can handle only so many requests at a time, and even pumping up the machine with memory, disk space, and processors takes its capacity only so far. A better way to handle high -volume requests is to keep several cache servers running in parallel, handling requests from the same clients or different groups of clients. These parallel cache servers usually contain identical data and communicate changes among themselves. An enhancement to the parallel-server method involves creating a .
Multilevel caching A multilevel cache spreads the cached data contents across several servers across the network. The top level caching server holds the most commonly accessed pages, and the lowest-level caching server holds the least commonly accessed pages. The various levels combine in a network of cache servers called a Web caching mesh. The caches communicate among themselves, using HTTP and special cache-coordination protocols, to divide the contents appropriately and maintain consistency among the servers. Multilevel caching works almost the same as caching with single-cache servers. However, if there is a cache miss at one server level, the request is propagated up to the next higher level to see if that cache contains the data. Only when the request hits the top level and still encounters a cache miss will the cache server Go Directly to The Origin Web Site To Retrieve The Data. (You Can Customize This Configuration of Multilevel Caching. Typically It Looks At the Nearest Cach e server before going up the chain to the top-level server, which might be several hops away.) Multilevel cache systems work very well for a very large number of clients (in the 10,000s or 100,000s) accessing the system. Furthermore, if your many clients are spread widely across a WAN or the Internet, it's an even better solution. Returning the Web request Returning the results of a cache is currently still a simple process. Basically, the cache that contains the requested data examines the request packet .
Choosing protocols and options for multiple servers Coordinating the contents of a cache among multiple servers is a challenge As soon as you add a second cache server to the system, you encounter this problem:. How do you maintain the consistency among the multiple servers that should ? contain identical data If you add multiple levels of cache servers, you have to ask two other questions: how do you know what the other caches contain, and how do you redirect the request to the appropriate cache This is where cache protocols come in? . There are Three Main Types:
Query protocols send messages to other caches in a multilevel system to discover if they contain the needed data.Redirect protocols forward the client request to the cache server in the multilevel system that contains the needed data.Multicast protocols combine Query and Redirect protocols using multicast network communications. Multicast cache protocols work in concert with all cache servers at the same time. Multicasting is the ability to create a virtual network of computers that can communicate directly with every other member at the same time. Multicasting is a function of the IP network protocol , with the help of special multicast routers and protocol stacks. with such cache protocols, a cache server can query all the other servers at the same time to find out if they contain the needed data. In addition, a client request sent to such a Multicast Group is Automaticly Sent To All Member REDIRECTION. WITHIN THE GROUP, ONE OF THE CACHE Servers Recognizes The Reques ted URL as within its domain of responsibility and sends the data appropriately. The problem with multicast protocols is that they are still not very popular. What's more, multicasting over the current Internet Protocol is not really efficient because all of the Internet is connected by a mass of single point-to-point, or unicast, links, which defeats the purpose of multicasting. Still the software methods exist, and within intranets it is possible to set them up. The future generation of the Internet Protocol, called IPv6, Allows Real Multicasting to Take Place, But IT Will Be Some Time Before It's Widely Impenected. Setting Protocol Options for Cache Servers There Four Options for Caching Protocols:
The Internet Cache Protocol (ICP) is the first cache query documented as an informational standard by the Internet Engineering Task Force. It was developed during the research conducted in 1996 by the Harvest project, one of the early Web-caching projects. In a multilevel cache, ICP sends queries between the cache servers to check for specific URLs in other caches in the mesh. Unfortunately, ICP becomes inefficient beyond a certain number of distributed cache servers. If you are setting up one or two caches, this limitation of ICP does not pose a problem. On the other hand, if you're setting up a large multilevel cache with more than ten servers, ICP caches will spend too much of their time propagating changes and thus reduce efficiency. ICP also contains no real security to protect The Communications Between The Cache Servers.The Hypertext Caching Protocol (HTCP) IS A Better Query Protocol That Used to Discover Cache Servers On Local Networks and to Inquire IF URLS Are Contained ON the servers. It includes the HTTP headers from the original client request so that the cache server may process them, if necessary, as part of the request.The Cache Array Routing Protocol (CARP) is a redirect protocol for a multilevel cache system. Each cache is programmed with a list of all the other cache servers in the system. The cache server uses a hash function that maps the URL to a given cache server. It then sends a CARP message to the other cache server containing the original HTTP request to Fulfill. Microsoft's Proxy Server IMPLEments CARP.CISCO '
s proprietary Web Cache Control Protocol (WCCP) handles request redirection to a cache mesh from a router. One of the cache servers can send a WCCP message to the router to define the mapping between URLs and cache servers. The router processes outgoing packets and looks For http traffic;
it then uses a hash function to determine which cache server should process the URL in each request and redirects the traffic to the server with WCCP. Selecting hardware for cache servers Essentially, a cache server is a heavy-duty network file server. Unlike a proxy or firewall server, which can run on fairly low powered machines (even 486 machines can work well as firewalls), a cache server needs processing power and speed. to be most effective, a cache server needs fast network connections to the internal LAN network and the external WAN. Typically, plan for a cache storage capacity of several gigabytes on disk, as well as at least 128 MB of RAM, preferably gigabytes of RAM. By increasing the RAM storage, you directly increase the performance of the system, because direct Accesses to Physical Memory Work Much Faster Thanses To Disk-Stored Caches. Also, A Fast Processor System, Even with Slower Cpus, CAN Perform Better by Handling More Requests Si multaneously. Cache server administrators recognize that RAM and disk storage are the most important performance factors. A Linux-based cache server running on a dual-processor 350 MHz Pentium II system with 512 MB of RAM, 25 GB of SCSI disk space, and dual 100 Mbps Ethernet connection - an estimated price between $ 2,500 and $ 5,000 -. should be able to handle one to two million requests a day, serving between 1,000 and 10,000 users Typically, the cache server does not need any intervention from a sysadmin, so The Best Choice Is A Fairly Stable, Reliable Platform That Can Run Unattended. Both Commercial and Freeware Cache Software IS Available.cache Server Software Products
SoftwareVendor / DeveloperCaching TypePlatformApache Web Server caching module * Apache Information ServicesSingleAIX, BSD / OS, Digital UNIX, FreeBSD, HP-UX, IRIX, Linux, NetBSD, NextStep, SunOS, Solaris, SCO Unix, Windows NTBorderManager FastCacheNovellSingle, multilevelNetWareCache EngineCiscoSingle, multilevel, load-balancingCustom hardwareCacheFlow SeriesCacheFlowSingle, multilevel, load-balancingCustom hardwareCacheRaq 2Cobalt NetworksSingleCustom hardware applianceDeleGate * MITI ETLSingle, multilevel, load- balancingAIX, EWS4800, HP-UX, HI-UX, IRIX, NextStep, NEWS-OS, Digital UNIX, Solaris, SunOS , BSD / OS, FreeBSD, Linux, NetBSD, OpenBSD, Windows 95 / NT, OS / 2HTTPD Proxy Cache * Cern, World Wide Web Consortiumsingleaix, BSD / OS, Digital Unix, FreeBSD, HP-UX, IRIX, Linux, NetBSD, NextStep, SunOS, Solaris, UnixwareInternet Caching SystemNovellSingle, multilevel, load-balancingCustom hardwareJigsaw caching proxy module * World Wide Web ConsortiumSingleJavaNetCacheNetwork ApplianceSingle, multilev el, load-balancingCustom hardware applianceNetra Proxy ServerSun MicrosystemsSingle, multilevelSolaris, Custom hardwareProxy ServerAOL / NetscapeSingle, multilevel, load-balancingAIX, HP-UX, IRIX, Solaris, Windows NTProxy ServerMicrosoftSingle, multilevel, load-balancingWindows NTSquid * NLANRSingle, multilevel, load- balancingAIX, Digital UNIX, FreeBSD, HP-UX, IRIX, Linux, NetBSD, NextStep, SunOS, Solaris, SCO Unix, OS / 2Traffic ServerInktomiSingle, multilevel, load-balancingDigital Unix, FreeBSD, IRIX, Solaris, Windows NTWebSphere Performance Pack Cache Manager Or Web Traffic Expressibmsingle, Multilevel, Load-Balancingaix, Linux, OS / 400, Solaris, Windows NT * Freeware or OPEN Source Software
Resources Find more network statistics at The National Laboratory for Advanced Network Research: http://www.nlanr.net For info on the legal ramifications of using caches to access sites, read this synopsis of the Digital Millennium Copyright Act: http: // Www.arl.org/info/frn/copy/band.htmlto learn how to set up your OWN Cache Server, Readting Up a cache server "here on developerWorks.
Download Squid: http://squid.nlanr.net explore the Squid FAQ: http://squid.nlanr.net/squid/faq/ Download Jigsaw Caching Proxy Module: http://www.w3.org/jigsaw/ Download Delegate: http://wall.etl.go.jp/delegate/ Download Httpd Proxy Cache From The World Wide Web Consortium Cern Httpd Server: http://www.w3.org/daemon/ Find Out More About How To Install Linux Programs for Turning Squid Implementations Into Transparent Proxy Caches:
TRANSPARENT Proxy Caching with Squid: http://squid.nlanr.net/squid/faq/faq-17.htmlthe ip filter package: http://cheops.anu.edu.au/~avalon/ip-filter.htmltwo HOWTO Documents On Packet Filtering on Linux - HOWTO for ipchains: http://www.rustcorp.com/linux/ipchains/howto.html and howto for firewalls: http://penguin.spd.louisville.edu/ldp/howto/ FireWall-Howto.html if You're Interested in Participating in Global Caching Systems, Take a Look At The Following Two Links:
The NLANR IRCACHE Project: http: //www.ircache.netThe UK Joint Academic Network Cache Project: http://wwwcache.ja.net For details about the cache server products described in this article, visit the links in the commercial and freeware Cache Software Table.
Rawn Shah is an independent technologist and freelance journalist based in Tucson, Arizona, covering topics of networking and cross-platform integration since 1993. He can be reached at rawn@rtd.com.Chinese Version
Reducing network traffic using web cache
Server web cache can accelerate access to web pages and reduce network traffic
Rawn Shah Independent Technical Experts and Freedom Journalists
Find three ways to use Web caches more efficiently send web traffic. Whether you run an external site point, the internal site or the Internet site, the web cache can make you better control resources. Understand which hardware you need and which cache software to consider. Please check the details of the configuration Squid cache proxy in the Sister "Setting Up A Cache Server", with code and parameter examples.
Just like the Volkswagen transportation system that ships passengers between popular destinations, the Web Cache system also processes the URL requests of popular websites. You can use the web cache to send users to their destination through the fast lane. The web cache stores local copies of popular web pages, so users can access these pages faster. When someone visits the requested website, the cache collects all individual requests for a web page and sends a single request to the starting site as these requests. (But don't confuse the Web cache and proxy server. The latter is used to place an anti-firewall between the network users and the outside world. The proxy server makes your export network connection more secure, but hardly reduce network traffic.) When the cache receives the content replica, it will further create a copy and pass them to the requested user. The cache web cache helps reduce the load of the web server because it reduces the number of input requests; the browser retrieves data from the cache, not directly from the server. However, most web content providers cannot know or control which users or how many users access their websites. The cache server needs to be close to the user, not close to the web server. (Web load balancing scheme will enter multiple servers on one end of the web content provider, but it is completely another code.) The most obvious beneficiary of the web cache is a user, because when he browses it, Some traffic chaos. Network administrators and remote websites can also benefit from it. According to National Laboratory For Applied Network Research (NLANR), a large-capacity cache with a large number of clients can override up to 50% clicks. If there is no cache, these click will access the starting site over the network. NLANR pointed out in the 1996 research report that a general cache is easily covered by about 30%. Thus, from the statistical perspective, the web cache can at least eliminate 30% of the web traffic, and if there is no cache, these web traffic will enter a wide area network. If you need to pay by bytes, the Web cache can save you a considerable fee in a relatively short period of time. Even wide-area network connections for the package price, the cache can improve customer satisfaction because it increases everyone's access speed. Personal Computers set up a memory super cache for frequently called, most browser programs have local caches, which stores recently viewed web pages in memory or disk. The web cache also stores frequent web pages, but it runs with a larger scale.
Large Scale Cache Global Cycling Projects are designed to reduce Internet traffic blocking from the global scale of the Internet, the Web cache can reduce the total traffic burden on a large number of high-speed links between the top Internet service providers. By providing a cache hierarchy of map vendor network tops, ISP can create regional cache supported by the Cache Server Group. Each area cache contains data from websites from other regions. In this way, if the programmer in California If you want to find a web page in France, he will be automatically oriented to a cache server in California (the server saves a copy of the web page) instead of directly from the Earth. Extract this page at one end. Since the public interest involving the reduction of Internet traffic, the National Science Foundation (NSF) has supported research projects that support large-scale cache systems. One project in this area is currently studying a multi-layer cache system, which is based on the Super Computing Center of the United States and universities. This project began in 1993. A study at the time showed that several FTP servers placed in critical places allowed the total flow on the THEN-NSFNET main line by 44%. Since 1993, the Internet Top Topology has undergone tremendous changes, but the basic concepts are still applicable. NLANR's Ircache project actually places a cache server in the proposed central interchange point or it. In the UK, a similar project is part of Joint Academic Network (Janet), Janet is a national research network service, which is the next generation superjanet system. This state-owned cache service can also be used in public use or cache. Ircache and Janet caches provide an opportunity to participate in the public, and the public can combine their cache servers with the distributed system of the project. This allows the Global Cache system to take advantage of your cache server, which in turn accelerates access to your users. For more information on participation, please visit the website listed in "Reference". Using the web of cache servers In addition to reducing export traffic by bundled browsers, the WEB cache is like a special train that is scheduled to solve the Web sending: How to efficiently Send web traffic. Although the Internet protocol sends a handling of a single IP packet low-layer traffic direction independent of the data content, the Web sends the direction to be oriented to the application specific HTTP traffic on the network. Because web traffic is the most important component in all Internet traffic, it improves the overall performance of the Internet to improve the integration of the Internet. The web sector depends on the IP route selection, because the web traffic is only along the path defined along the IP routing. However, when a single web stream is reordered by different web routers, it can turn from one server to another server. The web server can send the web request to other servers using the Redirect command of the HTTP protocol. The web cache itself redirects the client and server traffic to a local or other cache to provide faster access to the web page. Finally, the load balancing device of the web server can redirect the input client to a set of servers to the same location or other network location to even distribute the input request between these servers. You can view all of these devices as a Web router that is directed to HTTP traffic. The process of using the web that uses the cache server is started after the web request leaves the client browser workstation:
The cache server receives the following three methods: send the request directly to the server; the server actively monitors network traffic and picks the request from the stream; other network devices select traffic and send it to the cache server. Then the cache parses the web request. It must determine if the requested web page is stored in its cache database. If not, it will check its cache server partner (if any) to obtain the requested data. Finally, the cache server returns the data to the client, and the data may be both its own database, or the database of a partner server, may also come from the starting web server. Just as public transport systems use buses, trains, trams, taxi and ferriers, this "search-processing-return" three-step process is implemented in various forms. The most basic method for receiving the web request to the request to turn to the cache is to configure the browser to point to the cache server, which is an option for most common browsers as its proxy servers. The client browser then sends a request for a URL to the cache server to retrieve a document. This method guarantees that the cache completes as many requests as possible: each request passes the cache server. One disadvantage of this method is that you can't control whether the browser uses a proxy; so, knowing this is a smart user of a typical configuration option, you might try to bypass the agent. Another disadvantage is that when you have hundreds or thousands of desktops and web browsers to configure, this method may become a headache. Transparent agent cache also allows all traffic to the cache server. The cache server is directly on the data path between the client and the remote website, and all outsourced requests are intercepted. The cache checks each packet to find a web request, so it is inherent to a high-level form of database filtering. The transition between the external packet filter and the 4th and 7th layers of IP can also handle and send client requests. These devices check out the outgoing packets to identify web requests and redirect them to the cache server. The packet filter can check any or all of the contents of the packet based on certain predefined policies, and appropriate redirection of traffic. In the transport layer, the 4th layer conversion will redirect TCP or UDP to the appropriate destination; because all HTTP traffic is based on TCP, most of these traffic is first passed. At the application layer of the ISO layered model, the Layer 7 conversion only finds the application specific protocol, such as HTTP to orien to the appropriate destination. Comparison of various processes of various processing requests Each customer web browser may be a tedious task; transparent agent cache is more practical for deployment of deployment on large networks or in an organization where there is no strict control of the network. For example, ISP can use a transparent agent cache to its coament modem customers, and the user will not know. This cache server must be located in the nearest position from the outgoing wide area to provide the greatest interest. However, the transparent agent cache runs very slow because the cache server must process each IP packet through the network to find a web packet. In this way, the transparent agent cache requires the fastest processor and fast dual network link. Conversion using an external packet filter or a specific layer can optimize the functionality of the device. In fact, some implementations have their own agreements to monitor multiple cache activities for load balancing.
Processing the Web Request Once the cache receives a web request, it checks your own database to see if you are in some places store the contents of the requested web page. The Web Cache is initially a single server system that contains all the data of the cache. Although this is effective, the cache server is destined to increase. The single server will run the disk space due to the continuous storage of the requested web, or you cannot handle the input requests quickly. Single server schemes ultimately enabled in a distributed cache server, which run in a hierarchical or parallel manner, or simultaneously run in two ways. These servers balance the amount of cache information they contain, and put the most frequently requested data on the top of the hierarchy for most people, and put the least requested data in the underlying, so that they are closer to Need their specific users. Some cache server software is actually an extension of existing Web server products. In this case, the recording web access entry is meaningless, so the administrator should disable or restrict the log file in the server log file. The cache contains changing information, so you won't know where the client has access to the client unless you know what content (which actual website is turning to each) each cache entry. Cache logs may also become quite large, because all of your users will affect it. It consumes the speed of disk space as fast as the third grade primary school students eat candies. Single-layer cache cache servers are essentially a proxy web client that stores a large number of web pages locally. The server responds by sending the requested web page (if available). Successfully retrieving from the local cache is called a cache hit, an unsuccessful search is called a cache lack. In this case, the server starts to access the requested URL. This first access to the web page contacts the starting web server that hosts the web page. The cache server checks if the page can be cached by a cache and then retrieve the data into a local cache while passing these content to the client. Unless it is in a special environment, users will never be aware of a cache between clients and servers. Single cache server is the cheapest solution to improve web sent, but its validity is restricted by server capacity. By combining firewalls, IP routers, and caches, the vendor creates a single-alone solution, which is suitable for small office internet access. To be cheaper, you can use a personal computer, a Linux operating system, and an open source software that is openly available open source software to build a similar function. Parallel and Load Balancing Cache Single Cache Server can only handle so many requests each time, even if the machine increases memory, disk space, and processor, it can only be done this step. A better way to handle a lot of request is to make several cache servers in parallel to process various requests from the same client or different group clients. These parallel cache servers typically contain exactly the same data and communicate with each other. Enhancements to parallel server methods include creation load balancing systems for parallel servers. All servers handle the same group of clients and balance the request load with each other. Multi-layer cache multi-layer cache distributes cache data content to several different servers in the network. The top Cache server saves the most frequently accessed web pages, the lowest level cache server saves the minimum web page. The various layers in the cache server network are called a web cache network. Cache uses HTTP and dedicated cache coordination protocols to communicate with each other to properly allocate content and maintain consistency between servers. The work made by the multi-layer cache is almost identical to the work made by a single cache server. However, if a cache is missing in a server layer, the request will be propagated to a higher layer to see if the cache contains this data. The cache server is directly turned to the starting site only when the request is reached and the cache is missing.
Typically, it first looks for the nearest cache server before turning to the top server chain (in the middle.). Multi-layer cache servers are ideal for accessing systems for a large number of clients (10,000 to 100,000). In addition, if your customers are scattered in WAN or Internet distributions, this will be a better solution. Returns the Web Request Currently, the result of returning a cache is still a simple process. The basic process is such that the cache analysis request packet containing the requested data, takes out the source IP address, and assumes that the data is sent to the client in the name of the starting web server. The content of coordinating cache between multiple servers is a challenge for multiple server selection protocols and options between multiple servers. As long as you add a second cache server in your system, you will encounter the following problem: If multiple servers should contain the same data, how to maintain consistency between multiple servers? If you add a multi-layer cache server, you must answer two questions: How can I know what other caches contain, and how to redirect requests to the appropriate cache? This is where the cache protocol works. There are three main cache protocol types: Query protocol sends messages to other caches in multi-layer systems to see if they contain the required data. The redirect protocol forwards the client request to the high speed cache server that contains the required data in the multi-layer system. Multi-point transfer protocol uses multi-point transfer network communication to combine query protocols and redirect protocols. Multi-Distance Cache Protocol works with all cache servers. Multi-point transmission is a capability that creates a virtual computer network, and each computer in the virtual network can communicate with other members directly. Multi-point transmission is a feature of the IP network protocol, which works in a secondary multi-point broadcast router and protocol stack. With these cache protocols, cache servers can query all other servers at the same time to see if they contain the required data. In addition, the client request sent to this multi-point transfer group is automatically sent to all members, thereby avoiding any redirection. In the group inside, a cache server confirms that the requested URL is within its responsibilities, appropriately sends the data. The problem with multiple transfer protocols is that they are not very popular. Moreover, multi-point transmission above the current Internet protocol is not high, because the entire Internet is connected by a large number of single point pairs (or unicast) links, and multi-point transmission cannot be implemented. Even so, there are still some software methods that can be set in the internal network. Next generation Internet protocol, called IPv6, allowing real multi-point transmission, but its widespread implementation still takes a period of time. Set the protocol options for the cache server with four cache protocol options:
The Internet Cache Protocol (ICP) is the first cache query that the Internet Engineering Task Group will promulgate as information standards. It was developed by the Harvest project in 1996, which is an early Web cache project. In the multi-layer cache, ICP sends a query between the cache server to find a specific URL in other caches in the network. Unfortunately, ICP becomes very inefficient when the number of distributed cache servers exceeds a certain size. If one or two caches are set, this limitability of ICP will not be a problem. On the other hand, if the setting has a large multi-layer cache with more than ten servers, the ICP cache will cost too much time propagation change, which reduces performance. ICP does not include real security to protect communication between cache servers. Hypertext Cache Protocol (HTCP) is a better query protocol to discover cache servers on your local network, and querying whether you need the required URL on these servers. It includes the HTTP header requested by the original client so that the cache server can process them as part of the request when it is necessary. The Cache Array Routing Protocol (CARP) is a redirect protocol for multi-layer cache systems. Each cache has a list of all other cache servers in your system. The cache server uses a hash function to map the URL to a given cache server. It then sends a CARP message to other cache servers, which contain the original HTTP request to be processed. Microsoft Proxy Server implemented CARP. Cisco's Private Web Cache Control Protocol (WCCP) processes the request to redirection of the router to the cache network. A cache server can send a WCCP message to the router to define mappings between the URL and the cache server. The router processes the outstanding packet and finds HTTP traffic; it then uses a hash function to determine which cache server should process the URL in the request and redirect this traffic to that server using WCCP. Selecting a hardware cache server for a cache server is an intrinsic network file server. Unlike proxy servers or firewall servers (or even 486 machines that can run on quite low-level machines), the cache server requires processing capabilities and speeds. In order to achieve the best results, the cache server needs to connect to the internal local area network and the external WAN for a fast network connection. Typically, the cache storage capacity of several thousand megabytes should be planned on the disk, and the memory should be at least 128 MB, preferably a few thousand megons. Increasing memory can directly improve the performance of the system, as direct access to physical memory is much faster than accessing the cache stored on the disk. At the same time, the fast processor also helps, but multiprocessor systems (even some slow CPUs) will perform better because it can process more requests. Cache server administrators acknowledge that memory and disk storage are the most important performance factors. A Linux-based cache server with two 350 MHz Pentium II processors, 512 MB memory and 25 GB of SCSI disk space, and double 100 Mbps Ethernet connection - valuation between $ 2,500 to $ 5,000 - Between $ 2,500 to $ 5,000 It should be able to handle one to two million requests per day, from 1,000 to 10,000 users. Typically, the cache server does not require any intervention of the system administrator, so the best choice is quite stable, and no one is running. Reliable platform. Commercial cache software and free cache software are now available. Cache server software products
Software Manufacturer / Developer Cache Types Apache Web Server Cache Module * Apache Information Services Single AIX, BSD / OS, Digital UNIX, FreeBSD, HP-UX, Irix, Linux, NetBSD, Nextstep, Sunos, Solaris, Sco Unix, Windows NTBorderManager FastCacheNovell single, multi-layer NetWareCache EngineCisco single, multi-layer, load balancing, custom hardware CacheFlow SeriesCacheFlow single, multi-layer, load balancing, custom hardware CacheRaq 2Cobalt Networks single custom hardware DeleGate * MITI ETL single, multi-layer load balancing AIX, EWS4800, HP-UX, Hi-UX, Irix, Nextstep, News-OS, Digital UNIX, Solaris, Sunos, BSD / OS, FreeBSD, Linux, NetBSD, OpenBSD, Windows 95 / NT, OS / 2HTTPD Proxy Cache * European Particle Physics Research Institute, World Wide Web Alliance Single AIX, BSD / OS, Digital UNIX, FreeBSD, HP-UX, Irix, Linux, NetBSD, Nextstep, Sunos, Solaris, UnixwareInternet Caching SystemNovel single, multi-layer, load balancing custom hardware Jigsaw Cache Agent Module * World Wide Web Alliance Single JavaNetCacheNetwork Appliance Single, Multi-Layer, Load Balancing Custom Hardware Devices Netra Proxy Serversun Microsystems Single, Multi-Layer Solaris, Custom Hardware Proxy Serveraol / Netscape Single, Multi-Layer, Load Balancing AIX, HP-UX , Irix, Solaris, Windows NTProxy ServerMicrosoft single, multi-storey, load balancing Windows NTSQUID * NLANR single, multi-layer, load balancing AIX, Digital UNIX, FreeBSD, HP-UX, IRIX, Linux, NetBSD, Nextstep, Sunos, Solaris, SCO Unix, OS / 2Traffic ServerInktomi single, multi-layer load balancing Digital Unix, FreeBSD, IRIX, Solaris, Windows NTWebSphere Performance Pack Cache Manager or Web Traffic ExpressIBM single, multi-layer load balancing AIX, Linux, OS / 400, Solaris, Windows NT * Free Software or Open Source Software References View (US) National Advanced Network Research Laboratory More Network Statistics: http://www.nlanr.net for legal branches of using cache access sites Information, please read the comparison of Digital Millennium Copyright Act: http://www.arl.org/info/frn/copy/band.html
To learn how to set your own cache server, read the article on developerWorks: "Setting Up A Cache Server".
Download Squid: http://squid.nlanr.net Browse Squid FAQ: http://squid.nlanr.net/squid/faq/ Download Jigsaw Cache Proxy Module: http://www.w3.org/ Jigsaw / Download Delegate: http://wall.etl.go.jp/deLegate/ download httpd proxy cache: http://www.w3.org/daemon/ Find about how to install Linux Program to transform Squid to transparent agent cache: HTTP: //squid.nlanr.net/squid/faq/faq-17.htmlip filter package: http: / /Cheops.anu.edu.au/~avalon/ip-filter.html Two HOWTO documents for packet filtering on Linux - Howto for ipchains: http://www.rustcorp.com/linux/ipchains/howto.html And HOWTO for FireWalls: http://penguin.spd.louisville.edu/ldp/howto/firewall-howto.html If you are interested in participating in the global cache system, please review the following two links:
NLANR IRCACHE Item: http://www.ircache.net UK Joint Academic Network Cache item: http://wwwcache.ja.net For more information on the cache server product mentioned herein, please visit the commercial cache software and Free cache software table.
Rawn Shah is an independent technical expert and free journalist in Tucson, USA, since 1993, the theme he discussed involved in all aspects of networking and cross-platform integration. You can contact him via Rawn@rtd.com.