For a daily visit to a million-level website, the speed will become a bottleneck. In addition to optimizing the application of the content publishing system, if the output result of the dynamic page that does not require real-time update is converted into a static web page, the increase in the speed will be significant, because a dynamic page is often more than static The page is 2-10 times, and if the content of the static web page can be cached in memory, access speed is even more than 2-3 order levels than the original dynamic web page. Comparison of dynamic cache and static cache
Site planning based on reverse proxy acceleration
Reverse agent acceleration based on Apache Mod_Proxy
Squid-based reverse proxy acceleration implementation
Backed page design
Application Cache Compatibility Design: HTTP_HOST / Server_Name and Remote_addr / Remote_host need to use http_x_forwarded_host / http_x_forwarded_server instead
?
The page output of the background content management system complies with the cache design so that performance issues can be resolved to the front desk cache server, thereby greatly simplifying the complexity of the CMS system itself. Comparison of static cache and dynamic cache
There may be two forms of the cache of the static page: the main difference is whether the CMS is responsible for the cache update management of related content.
1. Static cache: It is a static page of the corresponding content at the same time, such as March 22, 2003, the administrator immediately generates http: // after entering an article through the background content management interface. Www.chedong.com/tech/2003/03/22/001.html This static page and synchronizes the link on the relevant index page. 2. Dynamic cache: After the new content is released, it is not prescribed to the corresponding static page until the corresponding content is issued, if the front cache server does not find the corresponding cache, request, the background system The static page of the corresponding content will be generated. When the user visits the page, it may be slower, but it will be directly accessed. If you go to ZDNET, foreign websites will find that the VIGNET-based content management system they use has such a page name: 0,22342566,300458.html. In fact, 0,22342566,300458 is a multiple parameter that is separated by commas: After the first access is not found, it is equivalent to generating a DOC_TYPE = 0 & DOC_ID = 22342566 & DOC_TEMPLATE = 300458 in the server side, and the query result will Static page for generated cache: 0, 22342566, 300458.html
Disadvantages of static cache:
1. Complex trigger update mechanism: These two mechanisms are very suitable when the content management system is simpler. But for a relatively complex website, the logical reference relationship between the page is a very and very complicated issue. The most typical example is a news that the news should appear in the news home and related three news topics. In the static cache mode, each new article is sent, in addition to this news content itself, the system needs to trigger the system. The gear generates multiple new related static pages, which often become one of the most complex parts of the content management system.
2. Batch update of the old content: By static cache released, it is difficult to modify for the previously generated static page, so that the new template does not take effect when the user accesss the old page.
In dynamic cache mode, each dynamic page only needs to be careful, and the relevant other pages can be automatically updated, which greatly reduces the need for design-related pages to update triggers.
I used to use similar ways before making small applications: After the first access, the query result of the database is used locally, and the next request will check if there is a cache file in the local cache directory, thereby reducing access to the background database. . Although this can also carry a relatively large load, such content management and cache management integration is difficult to separate, and data integrity is not well saved, and the content is updated, the application needs to put the corresponding content File delete. But such a design is often necessary to make a certain distribution of the cache directory when the cached file is many, otherwise the file node in a directory exceeds 3000, and the RM * will be wrong. At this time, the system needs to be divided again, breaking complex content management systems into: content input and cache these two relatively simple system implementations.
Backstage: Content management system, focus on content release, such as complex workflow management, complex template rules, etc. ... Front desk: Cache management can be implemented using cache system
_____________________ ???????????????????????__________________ | Squid Software Cache | ????????????? | F5 Hardware Cache | -------------- -------- ????????????? ------------------- ??????????? / ???????????????????????????????????????????????????????????????????????????????????????????????????? | ASP | JSP | PHP | ???????????? content management system ?????????????? -------------- -
So after division of labor: Content management and cache management 2, no matter which one is available, it is very large: software (such as the front desk 80 port uses Squid to cache the background 8080 content release management system), cache hardware, even Hand give a professional service provider like Akamai.
The cached site planning a WEB acceleration for multiple sites using Squid HTTP Acceleration scenario: The original site may be like this: 200.200.20.207 www.chedong.com 200.200.200.208 News.chedong.com 200.200.200.209 BBS .chedong.com 200.200.200.205 images.chedong.com The design of the cached server: All sites point to the same IP: 200.200.200.200/201 through external DNS (2 sets for redundancy Backup)
working principle:
When the external request comes over, set the cache to turn parsing according to the configuration file. In this way, the server request can be forwarded to the internal address we specified.
In terms of processing multi-virtual host steering: MOD_PROXY is simpler than Squid: You can turn different services to different ports of multiple IPs in the background.
Squid can only be disabled by disabling DNS parsing, and then forwards the address based on the local / etc / hosts file, and multiple servers must use the same port.
Use reverse proxy to accelerate, we can not only get performance improvements, but also get additional security and flexibility:
Configuration flexibility: You can control the DNS resolution of the background server on the internal server. When you need to migrate adjustments between the server, you don't have to modify the external DNS configuration, just modify the adjustment of internal DNS implementation services.
Data security has increased: all background servers can be easily protected in the firewall.
Background application design complexity reduction: I originally needed to establish a special picture server image.chedong.com and load relatively high application server bbs.chedong.com Separation, in the reverse proxy acceleration mode, all reception requests pass cache Server: In fact, it is a static page. In this way, you don't have to consider the picture and the application itself. Maintenance and management of file systems. Reverse agent cache speed based on Apache mod_Proxy
Apache contains the MOD_PROXY module, which can be used to implement the proxy server, and install the Apache 1.3.x compile for the reverse speed of the background server: - Enable-shared = max --enable-module = MOST Note: Apache 2.x MOD_PROXY It has been separated into mod_proxy and mod_cache: MOD_CACHE has file and memory-based different implementation / var / www / proxy, setting Apache service users can write mod_proxy configuration sample: Anti-phase agent cache cache frame with WWW. Example.com reverse the 8080 port service of www.backend.com in the process of agents. Modify: httpd.conf
Servername www.example.com
ServerAdmin admin@example.com
# REVERSE Proxy Setting
ProxyPass / http://www.backend.com:8080/
ProxyPassReverse / http://www.backend.com:8080/
# cache dir root
Cacheroot "/ var / www / proxy"
# Max Cache Storage
Cachesize 50000000
# Hour: Every 4 Hour
Cachegcinterval 4
# Max Page Expire Time: HOUR
Cachemaxexpire 240
# Expire Time = (now - last_modified) * CacheLastModifiedFactor
CacheLastModifiedFactor 0.1
# Defalt Expire Tag: Hour
CachedefaultExpire 1
# Force Complete After Precent of Content Retrived: 60-90%
CacheforceCompletion 80
Customlog / usr / local / apache / logs / dev_access_log combined
Squid-based reverse proxy acceleration implementation
Squid is a more dedicated proxy server, performance, and efficiency will be much higher than the Apache's Mod_Proxy.
If you need Combined format log patch: http://www.squid-cache.org/mail-archive/squid-dev/200301/0164.htmlsquid compile: ./ Configure --Nable-useERAGENT-LOG? --Enable- referer-log --enable-default-err-language = Simplify_Chinese / --enable-err-languages = "Simplify_Chinese English" --disable-internal-dns? make # make install # cd / usr / local / squidmake dir cachechown squid . Squid * vi /usr/local/squid/squid.conf In / etc / hosts: joining internal DNS resolution, such as:
192.168.0.4 www.chedong.com 192.168.0.4 News.chedong.com192.168.0.3 bbs.chedong.com -------------------- Cut here-- -------------------------------- # visible name.example.com # cache config: Space USE 1G and Memory Use 256Mcache_dir ufs / usr / local / squid / cache 1024 16 256 cache_mem 256 MBcache_effective_user squidcache_effective_group squidhttp_port 80httpd_accel_host virtualhttpd_accel_single_host offhttpd_accel_port 80httpd_accel_uses_host_header onhttpd_accel_with_proxy on # accelerater my domain onlyacl acceleratedHostA dstdomain .example1.comacl acceleratedHostB dstdomain .example2.comacl acceleratedHostC dstdomain .example3.com # accelerater http protocol on port 80acl acceleratedProtocol protocol HTTPacl acceleratedPort port 80 # access arcacl all src 0.0.0.0/0.0.0.0# Allow requests when they are to the accelerated machine AND to the # right port with right protocolhttp_access allow acceleratedProtocol acceleratedPort acceleratedHostAhttp_access allow acceleratedPr otocol acceleratedPort acceleratedHostBhttp_access allow acceleratedProtocol acceleratedPort acceleratedHostC # loggingemulate_httpd_log oncache_store_log none # manageracl manager proto cache_objecthttp_access allow manager allcachemgr_passwd pass all ---------------------- cut here ----- ---------------------------- Create Cache Directory: / usr / local / Squid / Sbin / Squid -z Start Squid / USR / LOCAL / Squid / sbin / squid stop Squid: / usr / local / Squid / sbin / squid -k shutdown Enable new configuration: / usr / local / squid / sbin / squid -k reconfig Truncate / Rounder Logs daily with crontab daily 0 0 * * * (/ usr / local / squid / sbin / squid -k rotate) Cacheed dynamic page design
What kind of page can be better than the cache server cache? If you return the contents of "Last-Modified" and "Expires" in HTTP header, such as: Last-Modified: Wed, 14 May 2003 13:06:17 GMTEXPIRES: Fri, 16 Jun 2003 13:06:17 GMT front end The cache server will have the generated page to have a local: hard disk or memory until the above page expires. Therefore, a cached page: page must contain Last-Modified: Tag General Pure Static page itself will have Last-Modified information, dynamic pages need to be enforced by function, such as in PHP: // ALWAYS Modified NOWHEADER ("Last -Modified: "Gmdate (" D, D myh: i: s ")." GMT ");
You must have an expiRES or Cache-Control: Max-Age tag setting page expiration time: For static pages, the cache cycle is set according to the page MIME type by apache: such as the picture default is 1 month, the HTML page is 2 Day.
?? EXPIRESACTIVE ON
?? ExpiresBytype Image / GIF "Access Plus 1 Month"
?? ExpiresBytype text / css "now plus 2 day"
?? EXPIRESDEFAULT "Now Plus 1 Day"
For dynamic pages, you can directly return to the header information returned by HTTP, such as for the news home index.php can be 20 minutes, and for a specific news page may be expired after 1 day. For example: adding 1 month after PHP, expired: // Expires One Month Laterhead ("Expires:" .gmdate ("D, D MYH: I: S", TIME () 3600 * 24 * 30). " GMT ");
If the server side has HTTP-based authentication, there must be Cache-Control: Public tags, allowing cache transformations of the front desk ASP application to first add the following public functions (such as include.asp):
<% 'SET Expires Header In minutesfunction setExpiresheader (byval minutes) ???' set page last-modified header: ??? 'Converts Date (19991022 11:08:38) To http form (Fri, 22 Oct 1999 12:08 : 38 gmt) ??? response.addheader "last-modified", DateTohttpdate (now ()) ??? ?? 'the page expires in minutes ??? response.expires = minutes ??? ???' set cache control to externel applications ??? Response.CacheControl = "public" End Function 'Converts date (19991022 11:08:38) to http form (Fri, 22 Oct 1999 12:08:38 GMT) Function DateToHTTPDate (ByVal OleDATE) ? Const gmtdiff = # 08: 00: 00 #? Oledate = oledate - GMTDIFF? DATETOHTTPDATE = EngweekdayName (OLEDATE) & _ ??? "," & Right ("0" & DAY (OLEDATE), 2) & "&" Engmonthname (OLEDATE) & _ ??? "" & Year (OLEDATE) & "& Right (" 0 "& Hour (OLEDATE), 2) & _ ???": "& Right (" 0 "& minute OLEDATE, 2) & ":" & Right ("0" & second (oledate), 2) & "gmt" end function function engweekdayname (DT) ??? Dim out ??? Select Case Weekday (DT, 1) ????? case 1: out = "sun" ??? ??? case 2: out = "mon"??? ??? case 3: out = "tue" ??? ??? case 4: OUT = "WED" ????? case 5: Out = "thu" ??? ??? case 6: OUT = "Fri" ??? ??? case 7: out = "sat" ??? End select ??? EngweekdayName = OUTEND FUNCTIONFUNCTION ENGMONTHNAME (DT) ??? Dim out ??? SELECT CASE MONTH (DT) ??? ??? case 1: out = "jan" ??? ??? case 2: out = "feb" ??? ??? case 3: out = "mar" ??? ??? case 4: Out = "APR" ??? ??? case 5: Out = "May" ????? case 6: Out = "jun"
????? case 7: out = "jul" ??? ??? case 8: out = "aug" ??? ??? case 9: out = "SEP" ????? case 10 : OUT = "oct" ??? ??? case 11: out = "nov" ??? ??? case 12: out = "dec" ??? End select ??? engmonthname = Outend function%> then Among the specific pages, such as Index.asp and News.asp Add: HTTP Header <% 'page will be set for 20 minutes to expire STEXPIRESHEADER (20)%>
Cache compatibility design
After the agent, since the intermediate layer is added between the client and the service, the server cannot directly get the client's IP, and the server-side application cannot be returned directly to the client through the address of the forwarding request. However, in the HTTD header information of the forwarding request, it adds http_x_forwarded _ ???? information. To track the original client IP address and the server address of the original client request. Below is 2 examples for explaining the design principle of cache and compatibility applications:
function getHostName () dim hostName as String = "" hostName = Request.ServerVariables ( "HTTP_HOST") if not isDBNull (Request.ServerVariables ( "HTTP_X_FORWARDED_HOST")) thenif len (trim (Request.ServerVariables ( "HTTP_X_FORWARDED_HOST")))> 0 THENHOSTNAME = Request.SerVariables ("http_x_forwarded_host") end ifend ifreturn hostnmaeend function
/ / For a PHP application that needs to record client IP: Do not reference Remote_addr directly, but use http_x_forwarded_for, function getUserip () {// Note ¥ 字 字 中 字 中 字 是 美 是 u 美 美 美 美 美 美 美 美 美 美 美Remote_addr "]; if (¥ _server [" http_x_forwarded_for "]) {¥ user_ip = ¥ _Server [" http_x_forwarded_for "];}}
Note: http_x_forwarded_for If you have passed multiple intermediate proxy servers, what can be a comma-divided address, such as: 200.28.7.155, 200.10.25.77 Unknown, 219.101.137.3 Therefore, in many old database designs (such as BBS) often The field used to record the client address is set to 20 bytes. I often see the following error message:
Microsoft Jet Database Engine Error '80040E57' field is too small without accepting the number of data to be added. Try to insert or paste less data. /inc/char.asp, line 236
The reason is that when designing client access addresses, the relevant user IP field size is preferably designed to 50 bytes. Of course, the chances of 3 or more agents are also very small. How to check the cacheablility of the current site page? You can refer to the tools on the following 2 sites: http://www.ircache.net/cgi-bin/cacheability.py: Squid performance test test phpman.php is a PHP-based Man Page Server, each MANPAGE needs to be called The man command and many page format tools in the background, the system load is relatively high, provides the URL of CacheFriendly, the following is a performance test data for the same page: Test Environment: RedHat 8 on Cyrix 266 / 192M MEM Test: Using Apache AB (Apache Benchmark): Test Condition: Request 50, concurrent 50 connection test items: Directly pass the Apache 1.3 (80 port) VS Squid 2.5 (8000 port: Acceleration 80 port) test 1: No cache 80-port dynamic output: AB-N 100 -C 10 http://www.chedong.com:81/phpman.php/man/kill/1this is apachebench, version 1.3d <$ revision: 1.2
Gt; Apache-1.3copyright (c) 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/copyright (c) 1998-2001 The apache group, http://www.apache.org/
Benchmarking localhost (be patient) ..... Donserver Software: Apache / 1.3.23 Server Hostname: LocalhostServerport: 80
Document path: / phpman.php/man/kill/1document length: 4655 BYTES
Concurrency Level: 5Time taken for tests: 63.164 secondsComplete requests: 50Failed requests: 0Broken pipe errors: 0Total transferred: 245900 bytesHTML transferred: 232750 bytesRequests per second: 0.79 [# / sec] (mean) Time per request: 6316.40 [ms] (mean Time Per Request: 1263.28 [MEAN, Across All Concurrent Requests Transfer Rate: 3.89 [KBYTES / Sec] Received
CONNNECTION TIMES (MS)
MIN Mean [ /- SD] Median MaxConnect: 0 29 106.1 0 553Processing: 2942 60161845.4 6227 10796
WAITI: 2941 5999 1850.7 6226 10795
Total: 2942 6045 1825.9 6227 10796
Percentage of the Requests Served Withnin A Certain Time (MS) 50% 622766% 706975% 719080% 747490% 819595% 889898% 972199% 10796100% 10796 (Last Request) Test 2: Squid Cache Output / Home / Apache / BIN / AB - N50 -C5 "http: // localhost: 8000 / phpman.php / man / kill / 1" this is apachebench, version 1.3d <$ revision: 1.2
Gt; Apache-1.3copyright (c) 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/copyright (c) 1998-2001 The apache group, http://www.apache.org/
Benchmarking localhost (be patient) ..... Donserver Software: Apache / 1.3.23 Server Hostname: LocalhostServerport: 8000
Document path: / phpman.php/man/kill/1document length: 4655 BYTES
Concurrency Level: 5Time taken for tests: 4.265 secondsComplete requests: 50Failed requests: 0Broken pipe errors: 0Total transferred: 248043 bytesHTML transferred: 232750 bytesRequests per second: 11.72 [# / sec] (mean) Time per request: 426.50 [ms] (mean Time Per Request: 85.30 [MS] (Mean, Across All Concurrent Requests) Transfer Rate: 58.16 [KBYTES / Sec] Received
CONNNECTION TIMES (MS)
MIN Mean [ /- SD] Median MaxConnect: 0 1 9.5 0 68Processing: 7 83 537.4 7 3808
Waiting: 5 81 529.1 6 3748
Total: 7 84 547.0 7 3876
Percentage of the Requests Served Withnin A Certain Time (MS) 50% 766% 775% 780% 790% 795% 798% 899% 3876100% 3876 (Last Request)
Conclusion: no cache / cache = 6045/84 = 70 Conclusion: For pages that may be cached, the server speed can have two orders of magnitude improvement, because Squid is placing the cache page in memory (so there is almost no hard disk I / O Operation).
Section:
The big visual website should generate a dynamic web page as a cache release as much as possible as possible, and even a dynamic application such as the search engine, the caching mechanism is also very important.
Use the HTTP Header to define a cache update policy in a dynamic page.
Use the cache server to get additional configuration and security
Logs are very important: the Squid log default does not support the Combined log, but it is very important for this patch that requires the Referer log: http://www.squid-cache.org/mail-archive/squid-dev/200301/0164.html
?
Reference: HTTP proxy cache http://vancouver-webpages.com/proxy.html can be cached page design http://linux.oreillynet.com/pub/a/linux/2002/02/28/cachefriendly.html ASP.NET output buffer to store dynamic pages - Developers - ZDNet Chinattp: //www.zdnet.com.cn/developer/tech/story/0,2000081602,39110239-2,00.htm Related RFC Documents:
RFC2616:
section13 (Caching) section14.9 (Cache-Control header) section14.21 (Expires header) section14.32 (Pragma: no-cache) is important if you are interacting withHTTP / 1.0 caches section14.29 (Last-Modified) is the Most Common Validation Method Section3.11 (Entity Tags) Covers The Extra Validation Method
Cacked check http://www.web-caching.com/cacheability.html Cache Design Elements http://vancouver-webpages.com/cachenow/detail.html
Several use APACHE MOD_PROXY MOD_GZIP accelerate document http://www.zope.org/Members/anser/apache_zserver/http://www.zope.org/Members/softsign/ZServer_and_Apache_mod_gziphttp://www.zope on ZOPE. ORG / MEMBERS / RBEER / CACHING