Introduction to the HTTP protocol
The HTTP protocol is a hypertext transfer protocol, working on the network application layer, which is widely used in WWW global information services since 1990. The detailed description of the HTTP protocol can be reviewed online and other documents.
The old standard of the HTTP protocol is HTTP / 1.0, and the most common standard is HTTP / 1.1. HTTP / 1.1 is an upgrade based on HTTP / 1.0, adding some functions, compatible with HTTP / 1.0. HTTP / 1.0 does not support file breakpoints, if the server uses HTTP / 1.0, "Network Ants" can only download single-threaded download; good in the current Web server uses HTTP / 1.1, so, the following will be described based on HTTP / 1.1.
Important commands for HTTP protocol
Based on HTTP-based browser browsing web pages, when downloading files, working principle Similar client / server mode: The browser issues an HTTP request line to the web server; the web server returns a status row or multiple responses after receiving a valid request Title, a blank line and related documentation. Under this work principle, the download program must implement the function to send requests to the server and get the server response state.
1. Send a GET request command to the server
An HTTP request consists of a request line, an optional number of request titles, a blank line, and some additional data in the case of POST. The format of the request line is:
Request method URI HTTP / version number
The GET command is a document request method that is commonly used by the browser, uses in the middle of the program
GET URI HTTP / 1.1
Send a request line to the web server (line number 3), the Java code is as follows:
. . . .
Clientsocket = new socket (Host, port); // Open the socket to download the file server
Outstream = New PrintStream (ClientSocket.getOutputStream ());
. . . .
Outstream.println ("GET" URI "HTTP / 1.1");
Outstream.println ("Host: Host);
Outstream.println ("accept: * / *");
Outstream.println ("Referer:");
Outstream.println ();
. . . .
Note: The 4th line gives the host name and port number in the URL, and the 5th line shows that the client receives all MIME types. The seventh row sends a blank line, indicating the request of the request.
2. Get server response status
After sending an HTTP request row, the program can read the server's response state. The HTTP response status line includes: HTTP status code and some HTTP response headers.
1) HTTP status code
The HTTP status format is a digital representation of HTTP / version information. The status code example is as follows:
HTTP / 1.0 200 ok // Indicates that the server supports the HTTP / 1.0 protocol, successful
HTTP / 1.1 200 ok // Indicates that the server supports HTTP / 1.1 protocol, successful
HTTP / 1.0 404 NOT FOUND / / The server supports the HTTP / 1.0 protocol, and the access file is not found.
In the middle of the program, if you read the "HTTP / 1.1 200 OK" string, you can use the server to save the breakpoint, you can use multithreaded downloads. If you read the string of "http / 1.0 200 ok", indicate that you want to download the file, but the server does not support breakpoints, you can only use single-threaded downloads. . . . . .
While ((line = instream.readline ())! = null) // reads the server response status to LINE
. . . . . . . .
IF (Line.Substring (0,7). Equals ("http / 1.")) // Judgment HTTP / 1.1
{IF (line.charat (7) == '0')
{
System.out.println ("Server USE HTTP / 1.0");
Threadcount = 1;
}
IF (! (line.substring (9, 12)). Equals ("200")) // Judging whether the request is successful
{System.out.println ("Error:" line);
Return False;
}
}
2) Read important response headings, get the length of the file to download the document
If the HTTP status code indicates that the access is successful, the server will return some title line. We are most concerned about this line of Content-Length. For example, if the server returns "Content-Length: 1000", it indicates that the length of the request file is 1000 bytes, so Read this line of information, you can get the length information of the file:
. . . .
IF (Line.Substring (0,15). Equals ("Content-Length:"))
{FileLength = long.parselong (line.substring (15) .trim ());
System.out.println ("File Length:" FileLength);
}
. . . . . .
Send a break to the server
As mentioned above, if the server supports http / 1.1, send a GET request to the server again:
. . . . .
Outstream.println ("GET" URI "HTTP / 1.1");
Outstream.println ("Host: Host);
Outstream.println ("accept: * / *");
OutStream.println ("Range: Bytes =" (FileBlocklength) * ThisthreadID "-");
Outstream.println ();
. . . . .
The fourth line is the key. The server starts from the file xxxx byte, which is what we usually say.
Split file, multi-thread download
Using multi-threaded programming technology, start multiple threads, depending on the number of threads, calculate the file segmentation position, send several different download breakpoints to the server, and accept data and write to files, you can implement multi-thread downloads. .....
Raf = new randomaccessfile (file, "rw"); // Open files in random access
.....
Synchronized (RAF) // Writes each thread by synchronization to write files separately
{Raf.seek (ThisthreadID * (FileLength / Threadcount) K * BUFLLENGTH);
Raf.write (ReadBytes);
......
}
......
Below is the construction of the construction HTTP protocol part in POST in PHP simulation
$ Request = "post / hatpy/member.php http / 1.1 / r / n";
$ Request. = "Pragma: no cache / r / n";
$ Request. = "Host: phpx.com/r/n";
$ Request. = "User-agent:". $ _server ['http_user_agent']. "/ r / n";
$ Request. = "accept: * / * / r / n";
$ Request. = "Accept-language:". $ _server ['http_accept_language']. "/ r / n";
$ Request. = "Keep-alive: 300 / r / n";
$ Request. = "Connection: Keep-alive / R / N";
$ Request. = "Cache-control: max-age = 0 / r / n";
$ Request. = "Content-Type: Application / X-www-form-urlencoded / r / n";
$ Request. = "Content-Length: $ LengHT / R / N";
$ Request. = "/ r / n";
$ Request. = $ PostValues;
=====================================
Below is the information responded successfully returned.
HTTP / 1.1 200 ok
Date: fri, 05 nov 2004 01:06:59 GMT
Server: Apache
Set-cookie: bblastvisit = 1099616819; Expires = SAT, 05-NOV-2005 01:06:59 gmt; path = /
Set-cookie: bbuserid = 17027; Expires = SAT, 05-NOV-2005 01:06:59 gmt; path = /
Set-cookie: bbpassword = 3332DEF6F45E948BD403276B3B2002D4; Expires = SAT, 05-NOV-2005 01:06:59 gmt; path = / set-cookie: sessionhash = 53A2B0EE3798FE2CA15342541B62F823; PATH = /
Content-Length: 3325
Keep-alive: Timeout = 5, MAX = 100
Connection: Keep-alive
Content-type: text / html; charset = GB2312
........................................