China's top portal architecture analysis

xiaoxiao2021-03-05  26

First, declare that the following content is my personal conjecture based on some tools. It does not guarantee the same as the architecture used in the reality, but I think eight or nine is not left ten. Sina and Sohu's popularity in China can be described as unattended. They have more than 10,000 hits per day. Such a large number of visits has used limited resources for Sina and Sohu to make netizens get the fastest speed. After all, the network has left the burn-up stage and began to develop a benign development. Every money is It takes a certain echo to return. On the other hand, the technician must rack their brains and cannot let users always have access, or the accessed speed is extremely slow. This way, there is a good editor, better sales, they are also difficult to sell the advertising, waiting for them will be closed. Of course, these situations have not happened because their technicians fully utilize existing resources and play them to the extreme. Speaking of the bottom is to do Web Cache Server with Squid, and Apache provides a real web service behind Squid. Of course, use such an architecture must ensure that most of the homepage is a static page. This requires programmers to convert all the pages into a static page before feedback to the client. Well, the basic architecture is like this, let's talk about how I guessed and specific architectures: one of the magic weapons: NSLOOKUP actual combat: nslookup www.sina.com.cn Server: ns-px.online.sh.cn address: 202.96. 209.5 Non-authoritative answer: Name: taurus.sina.com.cn Addresses: 61.172.201.230, 61.172.201.231, 61.172.201.232, 61.172.201.233 61.172.201.221, 61.172.201.222, 61.172.201.223, 61.172.201.224, 61.172. 201.225 61.172.201.226, 61.172.201.228, 61.172.201.228, 61.172.201.229 Aliases: www.sina.com.cn, you can see that Sina is used on the home page, starting someone I want to say that the Sina will be rough. Actually, continue to see: nslookup news.sina.com.cn server: ns-px.online.sh.cn address: 202.96.209.5 Non-Authoritative Answer: name: Taurus.sina.com.cn Addresses: 61.172. 201.228, 61.172.201.229, 61.172.201.230, 61.172.201.231 61.172.201.232, 61.172.201.233, 61.172.201.221, 61.172.201.222, 61.172.201.223 61.172.201.224, 61.172.201.225, 61.172.201.226, 61.172.201.227 Aliases: news .sina.com.cn, jupiter.sina.com.cn carefully discovered that the number of IPs of the news from this channel is the same, and IP is exactly the same. That is, these IPs are called Taurus.sina.com.cn on the name of the DNS of Sina, and those IPs are A records of this domain. NEWS, SPORTS, JCZS.NEWS. . . Both are CNAME records. Use DNS to do automatic polling.

Also don't believe, come back, you will have a sports channel: nslookup sports.sina.com.cn Server: ns-px.online.sh.cn address: 202.96.209.5 Non-Authoritative Answer: name: Taurus.sina.com.cn Addresses: 61.172.201.222, 61.172.201.223, 61.172.201.224, 61.172.201.225 61.172.201.226, 61.172.201.227, 61.172.201.228, 61.172.201.229, 61.172.201.230 61.172.201.231, 61.172.201.232, 61.172.201.233, 61.172. 201.221 Aliases: Sports.sina.com.cn, other you can try yourself. Let's take a look at SOHU: nslookup www.sohu.com Server: ns-px.online.sh.cn address: 202.96.209.5 Non-Authoritative Answer: name: Pagegrp1.sohu.com Addresses: 61.135.132.172, 61.135 .132.173, 61.135.132.176, 61.135.133.109 61.135.145.47, 61.135.150.65, 61.135.150.67, 61.135.150.69, 61.135.150.74 61.135.150.75, 61.135.150.145, 61.135.131.73, 61.135.131.91, 61.135.131.180 61.135. 131.182, 61.135.131.183, 61.135.132.65, 61.135.132.80 Aliases: www.sohu.com -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ----------------

nslookup news.sohu.com Server: ns-px.online.sh.cn Address: 202.96.209.5 Non-authoritative answer: Name: pagegrp1.sohu.com Addresses: 61.135.150.145, 61.135.131.73, 61.135.131.91, 61.135. 131.180 61.135.131.182, 61.135.131.183, 61.135.132.65, 61.135.132.80, 61.135.132.172 61.135.132.173, 61.135.132.176, 61.135.133.109, 61.135.145.47, 61.135.150.65 61.135.150.67, 61.135.150.69, 61.135.150.74 , 61.135.150.75 Aliases: news.sohu.com, like sina, just from the surface to see SOHU's IP number than Sina's IP number, then more channels used by SOHUs are more than Sina? Of course, I can't say that because a server can bind multiple IP, so I can't determine how many servers have been used from the number of IP numbers. From these experiments, it can be found that SINA and SOHU have the same technique for channels, and Squid is listened to the 80 port of these IP, and the real web server to listen to another port. There is no difference from the user's feeling, and relative to the way Web Server is directly connected to the client, this way significantly saves bandwidth and servers. The speed feeling of user access will be faster. 1. Is it possible to prove how they use Squid according to the IP of several domain names? Of course, it is just a speculation. Below is the real confirmation that I guess above. First Nslookup, a SINA sports channel. nslookup sports.sina.com.cn Server: ns1.china.com Address: 61.151.243.136 Non-authoritative answer: Name: taurus.sina.com.cn Addresses: 61.172.201.231, 61.172.201.232, 61.172.201.233, 61.172. 201.9 61.172.201.10, 61.172.201.11, 61.172.201.12, 61.172.201.13, 61.172.201.14 61.172.201.15, 61.172.201.16, 61.172.201.17, 61.172.201.227, 61.172.201.228 61.172.201.229, 61.172.201.230 Aliases: sports. Sina.com.cn, jupiter.sina.com and then directly accessed any of these IPs to try, access the result should be as shown below: This can prove that Sina is set in DNS a lot of IPs To point to the domain name sqsh-19.sina.com.cn, while the channels of the other same properties are just a alias for sqsh- 19.sina.com.cn, specified with CNAME. The setting of the DNS should be like this, then SERVER, through Squid 2.5.stable5 (the latest stability is stable6) to listen to the 80 port. The above is based on some information analysis, it should be basically correct.

Some of the following is my personal guess: its true web server is also listening to the 80-port, because there is an item in the Squid configuration file: httpd_accel_port 80 If you are set to other port numbers (such as 88), then The error message of the figure will turn while Trying to retrieve the URL: http://61.172.201.19:88 Tool 2: NMAP Scan Program: Can be used to check the server to open. I am now using NMAP to scan a Sina's IP: 61.172.201.19 to analyze Bash-2.05 $ NMAP 61.172.201.19 Starting NMAP 3.50 (http://www.insecure.org/nmap/) AT 2004-07-30 13: 31 GMT Interesting ports on 61.172.201.19: (The 1657 ports scanned but not shown below are in state: filtered) PORT STATE SERVICE 22 / tcp open ssh 80 / tcp open http Nmap run completed - 1 IP address (1 host up) Scanned in 73.191 Seconds can see that he only opened 2 ports, and the 80-port is the Squid we said, this is already verified. The 22-port is used to remotely connect to the SSH, primarily the SA is used to remotely operate the server very high security. Tools 3: Lynx or other tools and applets that can read HTTP headers: Direct view examples are better understood: http / 1.0 200 ok Date: Fri, 30 Jul 2004 05:49:47 GMT Server: Apache / 2.0.49 (UNIX) Last-Modified: Fri, 30 Jul 2004 05:48:16 GMT Accept-Ranges: bytes Vary: Accept-Encoding Cache-Control: Max-agent = 60 Expires: Fri, 30 Jul 2004 05:50:47 GMT Content-Length: 180747 Content-Type: Text / HTML AGE: 37 X-Cache: Hit from Sqsh-230.sina.com.cn Connection: Close is the feedback information of the HTTP header of Sina. There are a lot of valuable stuff inside :) For example, the apache behind it is 2.0.49, but also has an expiration time for 2 minutes. Last Modified. These are all loaded when compiling Apache, especially Last-Modified, which requires small changes - at least I do this. In summary of the SINA architecture should be front Squid, according to the current server 2U, 2G memory usually each server can run at least 4 Squid2.5stable5. This way it uses four servers in 16 IPs. The latter layer is Apache2.0.49 should use 2 sets. These two possible all private IPs, specified in the HOSTS file through the front Squid server. Specific implementation method I will subscribe to the documentation I do next time :) And Apache's HTDOCS may have one or 2 disk arrays as NFS. When Apache Mount NFS Server should be read-only, then there is also a server transfer to the editor server to edit the personnel update articles. This server should have accessible permissions to NFS Server.

转载请注明原文地址:https://www.9cbs.com/read-32212.html

New Post(0)