WEB caching acceleration based on reverse proxy - cached CMS system design

xiaoxiao2021-03-05  32

Source: chedong.com

For a daily visit to a million-level website, the speed will become a bottleneck. In addition to optimizing the application of the content publishing system, if the output result of the dynamic page that does not require real-time update is converted into a static web page, the increase in the speed will be significant, because a dynamic page is often more than static The page is 2-10 times, and if the content of the static web page can be cached in memory, access speed is even more than 2-3 order levels than the original dynamic web page.

Dynamic cache and static cache comparisons based on the reverse proxy acceleration site planning Apache mod_Proxy's reverse proxy speeds to implement Squid-based reverse proxy speed-oriented page design

The page output of the background content management system complies with the cache design so that performance issues can be resolved to the front desk cache server, thereby greatly simplifying the complexity of the CMS system itself.

Comparison of static cache and dynamic cache

There may be two forms of the cache of the static page: the main difference is whether the CMS is responsible for the cache update management of related content.

Static caching: It is a static page of the corresponding content at the same time, such as March 22, 2003. After entering an article through the background content management interface, the administrator immediately generates http: // www. Chedong.com/tech/2003/03/22/001.Html This static page and synchronizes the link on the relevant index page. Dynamic cache: After the new content is released, it is not prescribed to the corresponding static page until it issues a request for the corresponding content, if the front cache server does not find the corresponding cache, the background system will issue a request, the background system generates The static page of the corresponding content may be slower when the user visits the page, but it will be directly accessed. If you go to ZDNET and other foreign websites will find that the Vignette content management system they use is available in the Vignette content management system: 0,22342566,300458.html. In fact, 0,22342566,300458 is a multiple parameter that is separated by commas: After the first access is not found, it is equivalent to generating a DOC_TYPE = 0 & DOC_ID = 22342566 & DOC_TEMPLATE = 300458 in the server side, and the query result will Static page for generated cache: 0, 22342566, 300458.html

Disadvantages of static cache:

Complex trigger update mechanism: These two mechanisms are very suitable when the content management system is relatively simple. But for a relatively complex website, the logical reference relationship between the page is a very and very complicated issue. The most typical example is a news that the news should appear in the news home and related three news topics. In the static cache mode, each new article is sent, in addition to this news content itself, the system needs to trigger the system. The gear generates multiple new related static pages, which often become one of the most complex parts of the content management system. Batch update of old content: By static cache released, it is difficult to modify for previously generated static pages, so that the user has access to the old page, the new template does not take effect at all.

In dynamic cache mode, each dynamic page only needs to be careful, and the relevant other pages can be automatically updated, which greatly reduces the need for design-related pages to update triggers.

I used to use similar ways before making small applications: After the first access, the query result of the database is used locally, and the next request will check if there is a cache file in the local cache directory, thereby reducing access to the background database. . Although this can also carry a relatively large load, such content management and cache management integration is difficult to separate, and data integrity is not well saved, and the content is updated, the application needs to put the corresponding content File delete. But such a design is often necessary to make a certain distribution of the cache directory when the cached file is many, otherwise the file node in a directory exceeds 3000, and the RM * will be wrong. At this time, the system needs to be divided again, breaking complex content management systems into: content input and cache these two relatively simple system implementations.

Backstage: Content management system, focus on content release, such as complex workflow management, complex template rules, etc. ... Front desk: Cache management can be implemented using cache system

So after division of labor: Content management and cache management 2, no matter which one is available, it is very large: software (such as the front desk 80 port uses Squid to cache the background 8080 content release management system), cache hardware, even Hand give a professional service provider like Akamai.

A cached site planning a Web acceleration HTTP Acceleration scheme for multiple sites using Squid:

The original site planning may be like this:

200.200.200.207 www.chedong.com

200.200.200.208 news.chedong.com

200.200.200.209 bbs.chedong.com

200.200.200.205 images.chedong.com

In the design of the cached server: All sites point to the same IP: 200.200.200.200/201 through external DNS (using 2 sets for redundant backup)

working principle:

When the external request comes over, set the cache to turn parsing according to the configuration file. In this way, the server request can be forwarded to the internal address we specified.

In terms of processing multi-virtual host steering: MOD_PROXY is simpler than Squid: You can turn different services to different ports of multiple IPs in the background.

Squid can only be disabled by disabling DNS parsing, and then forwards the address based on the local / etc / hosts file, and multiple servers must use the same port.

Use reverse proxy to accelerate, we can not only get performance improvements, but also get additional security and flexibility:

Configuration flexibility: You can control the DNS resolution of the background server on the internal server. When you need to migrate adjustments between the server, you don't have to modify the external DNS configuration, just modify the adjustment of internal DNS implementation services. Data security has increased: all background servers can be easily protected in the firewall. Background application design complexity reduction: I originally needed to establish a special picture server image.chedong.com and load relatively high application server bbs.chedong.com Separation, in the reverse proxy acceleration mode, all reception requests pass cache Server: In fact, it is a static page. In this way, you don't have to consider the picture and the application itself. It also greatly reduces the complexity of the design of the background content distribution system. It is also convenient for data and applications. Maintenance and management of file systems.

Reverse Agent Cache Acceleration Based on Apache Mod_Proxy implementation Apache contains the MOD_PROXY module, which can be used to implement the proxy server, and accelerate against the background server.

Install apache 1.3.x compile: - enable-shared = max --Nable-module = MOST

Note: MOD_PROXY in Apache 2.x has been separated into mod_proxy and mod_cache: MOD_CACHE has file and memory-based different implementation

Create / VAR / WWW / Proxy, setting up Apache service users can write

MOD_PROXY configuration example: reverse agent cache cache

Set up the front desk www.example.com reverse the 8080 port service of www.backend.com in the background.

Modify: httpd.conf

ServerName www.example.comServerAdmin admin@example.com# reverse proxy settingProxyPass / http://www.backend.com:8080/ProxyPassReverse / http://www.backend.com:8080/# cache dir rootCacheRoot "/ var / www / proxy" # max cache storageCacheSize 50000000 # hour: every 4 hour CacheGcInterval 4 # max page expire time: hourCacheMaxExpire 240 # Expire time = (now - last_modified) * CacheLastModifiedFactor CacheLastModifiedFactor 0.1 # defalt expire tag: hourCacheDefaultExpire 1 # Force Complete After prevent of content retrived: 60-90% CacheforceCompletion 80CustomLog / USR / local / apache / logs / dev_access_log combined

Squid-based reverse proxy Acceleration Squid is a more dedicated proxy server, performance and efficiency will be much higher than the Apache's mod_proxy.

If you need a Combined format log patch:

http://www.squid-cache.org/mail-archive/squid-dev/200301/0164.html

Compilation of Squid:

./configure --enable-useragent-log --enable-referer-log --enable-default-err-language = Simplify_Chinese --enable-err-languages ​​= "Simplify_Chinese English" --disable-internal-dns make # make Install # cd / usr / local / squidmake dir cachechown squid.squid * vi /usr/local/squid/etc/squid.conf

In / etc / hosts: add internal DNS resolution, such as:

192.168.0.4 www.chedong.com 192.168.0.4 news.chedong.com192.168.0.3 bbs.chedong.com

-------------------- Cut here --------------------------- -------

# Visible namevisible_hostname cache.example.com # cache config: space use 1G and memory use 256Mcache_dir ufs / usr / local / squid / cache 1024 16 256 cache_mem 256 MBcache_effective_user squidcache_effective_group squidhttp_port 80httpd_accel_host virtualhttpd_accel_single_host offhttpd_accel_port 80httpd_accel_uses_host_header onhttpd_accel_with_proxy on # accelerater my domain onlyacl acceleratedHostA dstdomain. example1.comacl acceleratedHostB dstdomain .example2.comacl acceleratedHostC dstdomain .example3.com # accelerater http protocol on port 80acl acceleratedProtocol protocol HTTPacl acceleratedPort port 80 # access arcacl all src 0.0.0.0/0.0.0.0# Allow requests when they are to the accelerated machine AND to the # right port with right protocolhttp_access allow acceleratedProtocol acceleratedPort acceleratedHostAhttp_access allow acceleratedProtocol acceleratedPort acceleratedHostBhttp_access allow acceleratedProtocol acceleratedPort acceleratedHostC # loggingemulate_httpd_log oncache_st Ore_log none # manageracl manager proto cache_objecthttp_access allow manager allcachemgr_passwd pass all ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -------------------

Create a cache directory:

/ usr / local / Squid / Sbin / Squid -z

Start Squid

/ usr / local / Squid / Sbin / SQUID

Stop Squid:

/ usr / local / Squid / Sbin / Squid -k Shutdown

Enable new configuration:

/ usr / local / Squid / Sbin / Squid -k Reconfig

Truncate / round-off logs per day through crontab:

0 0 * * * (/ usr / local / squid / sbin / squid -k rotate)

Can a cache dynamic page design What kind of page can be better than the cached server cache? If there is "Last-Modified" and "Expires" in the HTTP header of the content, such as:

Last-Modified: Wed, 14 May 2003 13:06:17 GMTEXPIRES: Fri, 16 Jun 2003 13:06:17 GMT

The front-end cache server will have a generated page to be stored locally: hard disk or memory until the above page expires.

Therefore, a cached page:

The page must contain Last-Modified: Mark General Pure Static Page itself will have Last-Modified information, and dynamic pages need to be enforced by functions, such as in PHP: // always modified nowheader ("Last-Modified:". Gmdate D, D MYH: I: S ")." GMT "); must have expiffic or cache-control: max-age tag setup page's expiration time: For static pages, set the cache cycle according to the MIME type of the page via the page MIME Type For example, the image is 1 month, and the HTML page default is 2 days. ExpiresActive on ExpiresByType image / gif "access plus 1 month" ExpiresByType text / css "now plus 2 day" ExpiresDefault "now plus 1 day" for dynamic pages, it can be directly written by HTTP The returned header information, such as the news home index.php can be 20 minutes, and for a specific news page may be expired after 1 day. For example: adding 1 month after PHP, expired: // Expires One Month Laterhead ("Expires:" .gmdate ("D, D MYH: I: S", TIME () 3600 * 24 * 30). " GMT "); If the server is HTTP-based authentication, there must be Cache-Control: PUBLIC tags, allowing cache modifications for reception ASP applications First, add the following public functions in public containments (such as include.asp):

<% 'SET Expires Header In minutesfunction setExpiresheader (byval minutes)' set page last-modified header: 'Converts Date (199991022 11:08:38) To HTTP FORM (Fri, 22 Oct 1999 12:08:38 GMT) response. AddHeader "Last-Modified", DateToHTTPDate (Now ()) 'The Page Expires in minutes Response.Expires = minutes' Set cache control to externel applications Response.CacheControl = "public" End Function' Converts date (19991022 11:08:38 ) TO HTTP FORM (Fri, 22 Oct 1999 12:08:38 GMT) Function DateTohttpdate (Byval Oledate) const gmtdiff = # 08: 00: 00 # OLEDATE = OLEDATE - GMTDIFF DATOHTTPDATE = ENGWEEKDAYNAME (OLEDATE) & _ "," Right ("0" & ​​DAY (OLEDATE), 2) & "& Engmonthname (OLEDATE) & _" & Year (OLEDATE) & "& Right (" 0 "& Hour (Oledate), 2) &_ : "& Right (" 0 "& Minute (OLEDATE), 2) &": "& Right (" 0 "& Second (OLEDATE), 2) &" GMT "End Function Function EngweekDayName (DT) Dim Out Select Case Weekday (dt, 1) case 1: out = "sun" case 2: out = "mon" Case 3: Out = "TUE" case 4: Out = "WED" case 5: out = "thu" case 6: out = "fri" case 7: out = "sat" end select EngweekdayName = OUTEND FUNCTION ENGMONTHNAME (DT) Dim out select case month (dt) case 1: out = "jan" case 2: out = "feb" case 3: out = "mar" case 4: out = "APR" case 5: Out = "May" case 6 : OUT = "jun"

Case 7: out = "jul" case 8: out = "aug" case 9: out = "SEP" case 10: out = "oct" case 11: out = "nov" case 12: out = "dec" End SELECT Engmonthname = Outend function%> Then in the specific page, for example, "top" adds the following code in Index.asp and News.asp: http header <% 'Page will be set for 20 minutes to expire STEXPIRESHEADER (20)%>

How to check the cacheablility of the current site page? You can refer to the tools on 2 sites:

http://www.ircache.net/cgi-bin/cacheability.py

Attached: Squid performance test test phpman.php is a PHP-based Man Page Server, each Man Page needs to call the man command in the background, and many page formatters, the system load is relatively high, providing the Cache Friendly URL, the following is for Performance Test Information of the same page:

Test Environment: Redhat 8 on Cyrix 266 / 192M MEM

Test procedure: Use apache's AB (Apache Benchmark):

Test conditions: request 50 times, 50 connections

Test Project: Directly via Apache 1.3 (80 port) vs Squid 2.5 (8000 port: accelerated 80 port)

Test 1: 80-port dynamic output without cache:

AB-N 100 -C 10 http://www.chedong.com:81/phpman.php/man/kill/1this is apachebench, version 1.3d <$ revision: 1.1 $> apache-1.3copyright © 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/copyright © 1998-2001 The Apache Group, http://www.apache.org/benchmarking localhost (be patient) ..... Donserver Software: Apache / 1.3 .23 Server Hostname: localhostServer Port: 80Document Path: /phpMan.php/man/kill/1Document Length: 4655 bytesConcurrency Level: 5Time taken for tests: 63.164 secondsComplete requests: 50Failed requests: 0Broken pipe errors: 0Total transferred: 245900 bytesHTML transferred: 232750 BYTESREQUESTS Per Second: 0.79 [# / sec] Time Per Request: 6316.40 [MS] (Mean) Time Per Request: 1263.28 [MS] Transfer Rate: 3.89 [KBYTES / SEC] ReceivedConnnection Times (MS) min mean [ /- SD] Median maxConnect: 0 29 106.1 0 553Processing: 2942 6016 1845.4 6227 10796Waiting: 2941 5999 1850.7 6226 10795Total: 2942 6045 1825.9 6227 10796Percentage of the requests served within a certain time (ms) 50% 6227 66% 7069 75% 7190 80% 7474 90% 8195 95% 8898 98% 9721 99% 10796 100% 10796 (Last Request) Test 2: Squid Cache Output

/ home / apache / bin / ab-n50 -c5 "http: // localhost: 8000 / phpman.php / man / kill / 1" this is apachebench, version 1.3d <$ revision: 1.1 $> apache-1.3copyright © 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/copyright © 1998-2001 The Apache Group, http://www.apache.org/benchmarking localhost (be patient) ..... Donserver Software : Apache / 1.3.23 Server Hostname: localhostServer Port: 8000Document Path: /phpMan.php/man/kill/1Document Length: 4655 bytesConcurrency Level: 5Time taken for tests: 4.265 secondsComplete requests: 50Failed requests: 0Broken pipe errors: 0Total transferred: 248043 bytesHTML transferred: 232750 bytesRequests per second: 11.72 [# / sec] (mean) Time per request: 426.50 [ms] (mean) Time per request: 85.30 [ms] (mean, across all concurrent requests) Transfer rate: 58.16 [ KBYTES / Sec] ReceivedConnnection Times (MS) min mean [ / -SD] Median MaxConnect: 0 1 9.5 0 68Processing: 7 83 537.4 7 3808Waiting: 5 81 529.1 6 3748 Total: 7 84 547.0 7 3876Percentage of the Requests Served Withnin A Certain Time (MS) 50% 7 66% 7 75% 7 80 % 7 90% 7 95% 7 98% 8 99% 3876 100% 3876 (Last Request) Conclusion: No cache / cache = 6045/84 = 70 Conclusion: For pages that may be cached, server speed can have 2 orders of magnitude Improve, because Squid is placed in memory (so there is almost no hard disk I / O operation).

Section:

The big visual website should generate a dynamic web page as a cache release as much as possible as possible, and even a dynamic application such as the search engine, the caching mechanism is also very important. Use the HTTP Header to define a cache update policy in a dynamic page. It is very important to use the cache server to get additional configuration and security logs: Squid log default does not support Combined logs, but it is very important for this patch that requires the Referer log: http://www.squid-cache.org/mail-archive/ Squid-dev / 200301 / 0164.html Reference:

HTTP proxy cache http://vancouver-webpages.com/proxy.html

Cacheable page design http://linux.oreillyNet.com/pub/a/ linux / 2002/02/28 / cachefriendly.html

Related RFC Documents:

RFC 2616:

section 13 (Caching) section 14.9 (Cache-Control header) section 14.21 (Expires header) section 14.32 (Pragma: no-cache) is important if you are interacting with HTTP / 1.0 caches section 14.29 (Last-Modified) is the most common Validation Method Section 3.11 (Entity Tags) Covers The Extra Validation Method

Cache check: http://www.web-caching.com/cacheability.html Cache Design Elements:

http://vancouver-webpages.com/cachenow/detail.html

Several documents accelerated by Apache MOD_PROXY MOD_GZIP

http://www.zope.org/members/anser/apache_zserver/

http://www.zope.org/members/ SoftSign / ZSERVER_AND_APACHE_MOD_GZIP

http://www.zope.org/members/rbeer/caching

转载请注明原文地址:https://www.9cbs.com/read-32175.html

New Post(0)