8, cookies
HTTPClient automatically manages cookies, including allowing the server to set cookies and automatically return cookies when needed, which also supports manual setting cookies to the server side. Unfortunately, for how to handle cookies, there are several specification mutual conflicts: Netscape cookie draft, RFC2109, RFC2965, and a large number of software vendors' cookie implementation does not follow any specification. In order to deal with this situation, HttpClient provides Policy-driven cookie management method. HttpClient supports the cookie specification:
Draft Netscape Cookie is the earliest cookie specification based on RFC2109. Although this specification has a big difference with RC2109, this can be compatible with some servers. RFC2109 is the first official cookie specification released by W3C. In theory, all servers are in handling Cookie (Version 1), they all follow this specification. For this reason, HTTPClient sets them to the default specification. Regrettably, this specification is too strict, so that many servers have implemented this specification or still in the Netscape specification. In this case, a compatibility specification should be used. Compatibility specification, designed to be compatible with as many servers as possible, even if they don't follow standard specifications. When parsing the Cookie problems, compatibility specifications should be considered.
The RFC2965 specification is not supported by HttpClient (in the later version), it defines the cookie version 2, and explains the shortcomings of version 1Cookie, and RFC2965 is intended to replace RFC2109.
In httpclient, there are two ways to specify the use of the cookie specification.
HTTPCLIENT Client = New httpclient ();
Client.getState (). setCookiePolicy (cookiepolicy.compatibility);
This method setting is only valid for the current httpState, and the parameter can be valued valuePolicy.comPatibility, cookiepolicy.netscape_draft or cookiepolicy.rfc2109.
System.SetProperty ("Apache.commons.httpClient.Cookiespec", "Compatibility");
This approach is specified, which is valid for each new HTTPSTATE object, and the parameters can be "compatibility", "netscape_draft" or "RFC2109". There is often a problem that Cookie is not resolved, but most of the replacement to compatibility can be solved.
9. What should I do with HTTPClient encountered problems?
Use a browser to access the server to confirm that the server responds normal If you make the agent, turn off the agent to try another server to try (if you run a different server software better) Check if the code is written according to the idea in the tutorial Set the LOG level to debug, find the reason for the problem, open WiRetrace, to track the communication of the client and the server, to send information to the server with Telnet or Netcat, is suitable for guessing, for guessing. When testing the Netcat, running Netcat in a listening manner, used as a server to check how HttpClient handles your response. Try with the latest HTTPCLIENT, bugs can fix the mail list to Bugzilla in the latest version.
10, SSL
With Java Secure Socket Extension (JSSE), HTTPClient fully supports HTTP on Secure Sockets Layer (SSL) or IETF Transport Layer Security (TLS) protocol. JSSE has jre1.4 and later versions, the previous version requires manual installation settings, see the Sun website or this study notes.
Use SSL in httpclient is very simple, refer to the following two examples:
HTTPCLIENT httpclient = new httpclient ();
GetMethod httpget = new getMethod ("https://www.verignign.com/");
HttpClient.executeMethod (httpget);
System.out.println (httpget.getStatusline (). TOSTRING ());
If the agent is required, the following:
HTTPCLIENT httpclient = new httpclient ();
HttpClient.gethostConfiguration (). setProxy ("MyProxyHost", 8080);
HttpClient.getState (). setProxycredentials ("My-Proxy-Realm", "MyProxyhost",
New UserNamePasswordcredentials ("My-Proxy-UserName", "My-Proxy-Password"))
GetMethod httpget = new getMethod ("https://www.verignign.com/");
HttpClient.executeMethod (httpget);
System.out.println (httpget.getStatusline (). TOSTRING ());
The steps to customize SSL in httpclient are as follows:
Provides a socket factory that implements the org.apache.commons.httpclient.protocol.secureProtocolsocketFactory interface. This Socket Factory is responsible for playing a server's port, using standard or third-party SSL library, and performs initialization operations such as connection handshake. Normally, this initialization operation is automatically performed when the port is created. Instantiate an org.apache.commons.httpClient.Protocol.Protocol object. When you create this instance, you need a legitimate protocol type (such as https), a custom Socket Factory, and a default end number (such as HTTPS 443 port).
Protocol myhttps = new protocol ("https", new mysslsocketfactory (), 443);
This instance can then be set to the processor of the protocol.
HTTPCLIENT httpclient = new httpclient ();
HttpClient.gethostConfiguration (). SETHOST ("www.whaatever.com", 443, myhttps);
GetMethod httpget = new getMethod ("/");
HttpClient.executeMethod (httpget);
This customized instance is registered as a default processor for a particular protocol by calling the protocol.registerProtocol method. Thus, you can easily customize your protocol type (such as MyHTTPS). Protocol.registerProtocol ("Myhttps",
New Protocol ("HTTPS", New MysslsocketFactory (), 9443));
...
HTTPCLIENT httpclient = new httpclient ();
GetMethod httpget = new getMethod ("myhttps://www.whatever.com/");
HttpClient.executeMethod (httpget);
If you want to replace the HTTPS default processor with your customized processor, you only need to register it as "https".
Protocol.registerProtocol ("https",
New Protocol ("HTTPS", New MysslsocketFactory (), 443);
HTTPCLIENT httpclient = new httpclient ();
GetMethod httpget = new getMethod ("https://www.whaatever.com/");
HttpClient.executeMethod (httpget);
Known restrictions and problems
Continuous SSL connections cannot work on Sun's lower than 1.4JVM, which is due to the JVM bug. When accessing the server by proxy, Non-preemptive authentication will fail because of the designed defects of HttpClient, will be modified in later versions.
Processing
Many problems, especially when JVM is less than 1.4, is caused by JSSE installation.
The following code can be used as the final detection means.
Import java.io.bufferedreader;
Import Java.io.InputStreamReader;
Import Java.io.OutputStreamwriter;
Import java.io.writer;
Import java.net.socket;
Import javax.net.ssl.sslsocketfactory;
public class Test {public static final String TARGET_HTTPS_SERVER = "www.verisign.com"; public static final int TARGET_HTTPS_PORT = 443; public static void main (String [] args) throws Exception {Socket socket = SSLSocketFactory.getDefault () createSocket (. TARGET_HTTPS_SERVER, TARGET_HTTPS_PORT; TRY {Writer Out = New OutputStreamWriter (Socket.getOutputStream (), "ISO-8859-1"); Out.write ("Get / http / 1.1 / r / n"); Out.write (" Host: " Target_HTTPS_SERVER ": " Target_https_port " / r / n "); Out.write (" Agent: SSL-TEST / R / N "); Out.write (" / r / n "); OUT .flush (); bufferedreader in = new bufferedreader (socket.getinputStream (), "ISO-8859-1"); string line = null; while ((line = in.readline ())! = null) {System.out.println (line);}} finally {socket.close ();}}} 11, multi-threaded processing of HTTPCLIENT
The main purpose of using multi-thread is to achieve parallel downloads. During the HTTPCLInt run, each HTTP protocol method uses an HTTPConnection instance. Since the connection is a limited resource, each connection can only be used for one thread and method at a moment, so it is necessary to ensure that the connection is properly allocated when needed. HTTPClient uses a method similar to the JDBC connection pool to manage the connection, which is done by MultithreadedhttpConnectionManager.
MultithreadedhttpConnectionManager ConnectionManager =
New multithreadedhttpConnectionManager ();
HTTPCLIENT Client = New HttpClient (ConnectionManager);
This is, the client can be used in multiple threads to perform multiple methods. Each time you call the httpclient.executeMethod () method, you will go to the Link Manager to apply for a connection instance. If you apply, this link instance is checked out (checkout), and you must return the manager after the link is used. Manager supports two settings:
MaxConnectionsPerhost Each host's maximum parallel link number, the default is the maximum number of parallel links in the 2MaxTotalConnections client, and the default is 20
When the manager reuses the link, it takes an early returning man to reuse the way (Least Recently Used Approach).
Since the use of httpclient programs instead of HTTPClient itself, HTTPCLIENT cannot determine when the connection is no longer used, which requires manual explicit call releaseConnection after reading the main body of the response package. To release the links for the application. MultithreadedhttpConnectionManager ConnectionManager = New MultithreadedHttpConnectionManager ();
HTTPCLIENT Client = New HttpClient (ConnectionManager);
...
// In a thread.
GetMethod get = new getMethod ("http://jakarta.apache.org/");
Try {
Client.executeMethod (GET);
// Print Response to Stdout
System.out.println (get.getResponsebodyasstream ());
} finally {
// be Sure The Connection IS Released Back to The Connection
// manager
Get.releaseConnection ();
}
There must be a Method.ReleaseConnection () with each httpclient.executeMethod.
12, HTTP method
HTTP methods supported by HTTPClient have 8 kinds, which are described below. 1. Options HTTP method Options is used to send a request to the server, and it is desirable to obtain the function options that can be used for the communication process of the request / response to the request / response flag. Through this method, the client can decide what action and / or some necessary conditions can be taken to a resource before taking specific actions, or understand the functions provided by the server. The most typical application of this method is to obtain which HTTP methods supported by the server. There is a class called OptionsMethod in HttpClient to support this HTTP method, using this class's getAllowedMethods method, you can simply implement the above typical applications.
OptionsMethod options = new optionsmethod ("http://jakarta.apache.org");
// Perform a method and do the corresponding exception handling
...
Enumeration allowedmethods = options.getallowedmethods ();
Options.releaseConnection ();
2, Get
The HTTP method GET is used to retrieve any information requested by the request URI (entity), "Get" is intended to mean "Get". If the URI is requested to point to a data processing, the data generated by the process is returned in the form of an entity instead of returning the code of this process.
If the HTTP package contains if-modifiedsince, if-match, if-none-match, or if-match, the get has become the "conditional GET", ie only the above fields are met. The entities of the description are retrieved, which can reduce some non-essential network transmission, or reduce multiple requests for a certain resource (such as the first check, the second download). (General browser, there is a temporary directory, used to cache some web information, when browsing a page again, only download those modified content, to speed up the browsing speed, this is the truth. As for the check, it is commonly used Be achieved better than get.) If the http package contains the RANGE header field, then in the entity specified by the URI, only the part of the decision range is taken back. (Friends with excessive thread download tools may be easier to understand this) The typical application of this method is used to download documents from the Web server. HttpClient defines a class named getMethod to support this method. You can take the document (such as HTML page) in the answering package in the getResponseBody. In this three functions, GetResponseBodyAsStream is usually the best way, mainly because it avoids caches all downloaded data before processing the downloaded document.
GetMethod get = new getMethod ("http://jakarta.apache.org");
// Perform a method and process the failure request.
...
InputStream in = get.getresponsebodyasstream ();
// use the input stream to process information.
Get.releaseConnection ();
The most common incorrect use of GetMethod is not read out without the data of all the response mains. Also, you must pay attention to the release of the link manually.
3, HEAD
HTTP's HEAD method, which is exactly the same with the GET method, the only difference is that the server cannot contain the host-body in the answering package, and must not contain the body. Using this method allows customers to get some basic information about it without the need to download the resource. This method is often used to check the accessibility of the hyperlink and the resource has never been modified.
The most typical application of HTTP's HEAD method is the basic information of the resource. HTTPClient defines the HEADMETHOD class to support this method. Like other * Method classes, the HeadMethod class is used to take the header information with getResponseHeaders () without its own special way.
HeadMethod head = new headmethod ("http://jakarta.apache.org");
// Perform a method and process the failure request.
...
// Remove the header field information of the answer package.
Header [] headers = head.getResponseheaders ();
// Retrieve only the information of the final modification date field. String lastmodified = head.getResponseheader ("Last-Modified"). GetValue ();
4, POST
POST has the meaning of "dispatch" in English. The HTTP method POST is required to request the server to accept the entity in the request package, and use it as a subsidiary of the request URI. In essence, this means that the server wants to save this entity information and is usually processed by the server. The design intent of the POST method is to implement the following functions in a unified manner: publish the information to the BBS, newsgroups, mailing lists, or similar article groups to the data. The processing process expands a database through additional operations. These are expected to generate certain "side effects" on the server side, such as modifying the database.
HttpClient defines the PostMethod class to support the HTTP method. In HTTPClient, use the Post method with two basic steps: Prepare the data for the request packet, and then read the information of the answering package for the server. By calling the setRequestBody () function, the data is provided for the request package, which can receive three types of parameters: input stream, name value pair or string. As for reading the response package, you need to call getResponseBody * that of the series of methods, the same method for handling the answering package with the GET method.
Frequently Asked Questions is that all responses are not read (whether it is useful to the program), or no link resource is released.