China's top portal architecture analysis 1
The top day before yesterday, I told the most basic speculation, and I went deep into some:) 1. If I can prove that they use Squid according to the IP of several domain names? Of course, it is just a speculation. Below is the real confirmation that I guess above. First Nslookup, a SINA sports channel. Nslookup sports.sina.com.cnserver: ns1.china.comAddress: 61.151.243.136
Non-authoritative answer: Name: taurus.sina.com.cnAddresses: 61.172.201.231, 61.172.201.232, 61.172.201.233, 61.172.201.9 61.172.201.10, 61.172.201.11, 61.172.201.12, 61.172.201.13, 61.172.201.14 61.172 .
Then directly access any IP in these IPs, the result of accessing the result should be as shown below:
This can prove that Sina is set a lot of IP in DNS to point to domain sqsh-19.sina.com.cn, while other channels of the same properties are just sqsh-19.sina.com.cn an alias, with CNAME Specify. The setting of the DNS should be like this, then SERVER, through Squid 2.5.stable5 (the latest stability is stable6) to listen to the 80 port. The above is based on some information analysis, it should be basically correct. Some of the following is my personal guess:
Its true Web Server is also listening to 80 ports because there is an item in the Squid configuration file:
HTTPD_ACCEL_PORT 80
If you are set to other port numbers (such as 88), the error message above will become
While Trying to Retrieve the URL: http://61.172.201.19:88
Tool 2: NMAP Scan Program: Can be used to check what ports have been opened.
I am now using NMAP to scan an IP: 61.172.201.19 for Sina.
BASH-2.05 $ NMAP 61.172.201.19
Starting NMAP 3.50 (http://www.insecure.org/nmap/) AT 2004-07-30 13:31 Gmtinteresting Ports on 61.172.201.19: (THE 1657 Ports Scanned But Not Shown Below Are In State: Filtered) Port State Service22 / TCP Open SSH80 / TCP OPEN HTTP
NMAP Run Completed - 1 IP Address (1 host up) Scanned in 73.191 Seconds
You can see that he only opened 2 ports, and the 80-port is the Squid we said, this is already verified. The 22-port is used to remotely connect to the SSH, primarily the SA is used to remotely operate the server very high security.
Tool 3: Lynx or other tools and applets that can read HTTP headers: Direct view examples are better understood :)
HTTP / 1.0 2004 05:49:47 GMTSERVER: APACHE / 2.0.49 (UNIX) Last-Modified: Fri, 30 Jul 2004 05:48:16 gmtaccept-ranges: bytesvary: accept-encodingcache- Control: max-age = 60expires: fri, 30 jul 2004 05:50:47 Gmtcontent-length: 180747content-type: text / htmlage: 37x-cache: Hit from sqsh-230.sina.com.cnConnection: Close is Sina Feedback from HTTP headers. There are a lot of valuable stuff inside :) For example, the apache behind it is 2.0.49, but also has an expiration time for 2 minutes. Last Modified. These are all loaded when compiling Apache, especially Last-Modified, which requires small changes - at least I do this.
In summary
The architecture of the Sina should be the front Squid, according to the current server 2U, 2G memory usually each server can run at least 4 Squid2.5stable5. So that it uses four servers in 16 IPs. The latter layer is Apache2.0.49 should use 2 sets. These two possible all private IPs, specified in the HOSTS file through the front Squid server. Specific implementation method I will subscribe to the documentation I do next time :) And Apache's HTDOCS may have one or 2 disk arrays as NFS. When Apache Mount NFS Server should be read-only, then there is also a server transfer to the editor server to edit the personnel update articles. This server should have accessible permissions to NFS Server.
---- This plan for a complete set of Sina, of course, many of them are guess, I have no communication with Sina's technicians (because one doesn't know), otherwise I will not write it. . Other SOHU, 163 should also have such an architecture.
Finally, this is just a structure of some static pages constitutive channels. Sina has many other servers, what is downloaded, online update, etc. are not in this architecture.