Jakarta Commons HttpClient Learning Notes

xiaoxiao2021-03-06  40

.skiplink {display: none};

.code {

Background-color: #efefef;

Padding: 5px;

Font-Family: Courier New, Courier, Monospace

FONT-SIZE: 110%;

Margin-bottom: 5px;

Margin-top: 5px;

Margin-left: 10px;

Margin-Right: 10px;

Padding-top: 2px;

Padding-bottom: 2px;

Border: 1px dashed # 999999;

}

Although the page can be taken back with Telnet, in interaction with the web server, if it involves cookie or HTTPS or SSL, the general function is still necessary. The browser such as IE or Netscape is really good, but if you call your browser in order to achieve continuous interaction, I personally think that the workload is still not small, this hasn't considered copyright issues. The best way is that there is an open source package that enables the function of HTTP clients for our development program. HttpClient is such a package, I believe that there may be better than it, but I only pay attention to this. :)

Below is the functional comparison table made by Nogoop:

FeaturesnogoopSun JRE <1.4.2Sun JRE 1.4.2InnovationApache / Jakartacookies XXplug compatibleXXXX [partial] true request output stream XXtrue response input streamX XXconnection keep aliveXXXXXconnection pool throttlingX Xconnection / request timeoutX X [uns] XXidle connection timeoutX Xpipelining of requests X alternate DNS resolution (dnsjava ) X SSLXXXXXbasic authenticationXXXXXdigest authenticationXXXXXNTLM authenticationX [Windows only] Xproxy authenticationXXXXXminimum JRE version1. 2101 Nian 4 Yue 2 Ri 1.21.2price $ 499freefreefreefreesource availableX XXdiagnostic tracingX XXactively supportedXXX Xfix turnaroundfastslowslownonemediumlicensepurchaseSun JRESun JRELGPLApache

1, HTTPCLIENT function

Based on standard, pure Java, HTTP1.0 and 1.1 are implemented. In an extensible OO frame, all methods of HTTP (GET, POST, PUT, DELETE, HEAD, OPTIONS, AND TRACE) supports encryption of HTTPS (HTTP on SSL) transparently through HTTP proxy establishment With the Connect method, use the local Java Socket, transparently through the HTTPS connection by establishing a HTTP agent, establishing a connection support for the Using Basic, Digest, and NTLM encrypted authentication support for uploading large files. Multi-Part form Post method plug-in security socket implementation, easy to use third-party solution connection management, support multi-threaded applications, support settings a single host total connection and maximum connection number, automatic detection and shutdown failure connection directly will request information flow The port sent to the server directly reads the response information sent from the server to support HTTP / 1.0 in Keepalive and HTTP / 1.1 persistent connection to the persistent connection to the server directly accessed by the server, and the header information can be set. Time httpMethods implements Command Pattern to allow parallel requests or efficient connection to read the Apache Software License protocol, source code free 2, preparatory work

For JRE1.3. *, If you want httpclient to support HTTPS, you need to download and install it.

JSSE and

The steps to install are as follows:

1) Download JSSE and JCE.

2) Check that there is no JAR package related to JSSE and JCE in ClassPath.

3) Copy US_EXPORT_POLICY.jar, local_policy.jar, jsse.jar, jnet.jar, JCE1_2_X.jar, Sunjce_Provider.jar, Jternal.jar to the directory:

UNIX: $ JDK_HOME / JRE / LIB / EXT

Windows:% JDK_HOME% / JRE / LIB / EXT

4) Modify the java.security file in the following directory.

UNIX: $ JDK_HOME / JRE / LIB / SECURITY /

Windows:% JDK_HOME% / JRE / LIB / SECURITY /

5)

will

#

# List of providers and their preference Orders:

#

Security.Provider.1 = sun.security.provider.sun

Security.Provider.2 = com.sun.rsajca.provider

Change to:

#

# List of providers and their preference Orders:

#

Security.Provider.1 = com.sun.crypto.priseprovider.sunjce

Security.Provider.2 = sun.security.provider.sun

Security.Provider.3 = com.sun.rsajca.provider

Security.Provider.4 = com.sun.net.ssl.internal.ssl.Provider

HttpClient also requires the installation of Commons-logging, following the HTTPClient installation.

3, get the source code

CVS -D: PServer: anoncvs@cvs.apache.org: / home / cvspublic login

Password: anoncvs

CVS -D: PServer: anoncvs@cvs.apache.org: / home / cvspublic Checkout Jakarta-Commons / Logging

CVS -D: PServer: Anoncvs@cvs.apache.org: / home / cvspublic Checkout Jakarta-Commons / HttpClient compile:

CD JAKARTA-Commons / Logging

ANT DIST

CP DIS / *. jar ../httpclient/lib/

CD ../httpclient

ANT DIST

4, use HTTPCLIENT programming basic steps

Create an instance of an instance of HttpClient Create a method (DeleteMethod, EntityEnclosingMethod, ExpectContinueMethod, GetMethod, HeadMethod, MultipartPostMethod, OptionsMethod, PostMethod, PutMethod, TraceMethod), generally available to the target URL as a parameter. Let HTTPClient execute this method. Read the answer information. Release the connection. Processing acknowledge.

In the process of performing methods, there are two exceptions, one is httprecoverableException, indicating that an accidental error occurs, and the other can be successful, the other is IOEXCEPTION, a serious error.

There is a routine in this tutorial, you can

download.

5, certification

HTTPClient three different authentication schemes: Basic, Digest and NTLM. These programs can be used for server or proxy for client authentication, referred to as server authentication or proxy authentication.

1) Server Authentication

HttpClient handler is almost transparent, and only developers are required to provide login credentials. The login information is saved in an instance of the HTTPState class, which can be obtained or set by setCredentials (String Realm, Credentials Cred) and GetCredentials (String Realm). Note that the login information required to access the non-specific site access is set, and the Realm parameter is set to NULL. HTTPCLIENT built-in automatic authentication can be turned off via the HttpMethod class's setDoauthentication (Boolean Doauthentication) method, and this closure only affects the current httpmethod Example.

Preemptive Authentication can be opened by the following method.

Client.getState (). setAuthenticationPreemptive (TRUE);

In this mode, HttpClient actively transmits the BASIC certified response information to the server, even in some case, the server may return the authentication failure response, which is mainly to reduce the establishment of the connection. To make each new HTTPState instance are first authentication, you can set the system properties as follows.

SetSystemProperty (Authenticator.Preemptive_Property, "True");

The first authentication implemented by HTTPCLIENT follows RFC2617.

2) Proxy Authentication

In addition to logging in information, the agent authentication is almost consistent with server authentication. Use setProxycredentials (String Realm, Credentials Cred) and GetProxycredentials (String Realm) to take the login information.

3) Authentication Schemes

Basic

It is the earliest and most compatible (?) Program in HTTP, unfortunately, is the most unsafe program because it transmits the username and password in coded. It requires a UserNamePasswordCredentials instance that specifies the access space of the server or the default login information. Digest

It is an increase in http1.1, although it is not as good as Basic's software support, but it is still widely used. The DiGest solution is much more secure than the Basic program, because it does not transfer the actual password through the network, transmitted with this password to a random number (nonce) from the server. It requires a UserNamePasswordCredentials instance that specifies the access space of the server or the default login information.

NTLM

This is the most complex authentication agreement supported by HTTPClient. It is a private protocol for M $ design, and there is no disclosed specification description. Atten, due to the defect of the design, NTLM security is poor than Digest, and after a servicePack patch, security is compared to Digest. NTLM requires an NTCredentials instance. Note that since NTLM does not use the concept of access space (Realms), HttpClient utilizes the server's name of the server. It is also important to note that providing NTCredentials username, do not use the prefix of the domain name - such as: "adrian" is correct, and "Domain / Adrian" is wrong.

NTLM certified work mechanisms have great differences in Basic and Digest. These differences are generally treated by httpclient, but understanding these differences help avoid errors when using NTLM certification.

From the perspective of httpclientApi, NTLM works as the same job as other authentication methods, and the difference is required to provide 'NTCredentials' instance instead of 'usernamepasswordcreditials' (in fact, the former is only extended) to NTLM authentication, and the access space is connected to the machine Domain name, this for many domain names, there are some troubles. Only the domain name specified in the HTTPClient connection is the domain name for authentication. It is recommended to set REALM to NULL to use the default settings. NTLM just authenticates a connection instead of a request, so whenever a new connection is established, it is important to keep a connection during the authentication process. Therefore, NTLM cannot be used at the same time for proxy authentication and server authentication, and cannot be used for HTTP1.0 connections or servers do not support persistent connections.

6, redirect

HTTPCLIENT can also be redirected to be automatically redirected due to technical restrictions, and to ensure 2.0 release API, HTTPCLIENT can not be automatically redirected, but to redirect to the same host, the same port and adopt the same protocol HTTPCLIENT can be supported. The case where it is not possible, including the case where artificial interaction is required, or the ability to exceed HTTPCLIENT.

When the server redirects instructions refer to different hosts, HTTPClient simply simply acts as a reply state. All 300 to 399 (including both ends) have a redirect response. Commonly known:

301 Permanently moves. Httpstatus.sc_moved_permanently 302 temporary mobile. HttpStatus.sc_moved_temporarily 303 See Other. Httpstatus.sc_see_other 307 Temporary redirection. Httpstatus.sc_temporary_redirect

When a simple redirection is received, the program should take a new URL from the HTTPMethod object and download it. In addition, restricting the number of redirections is a good idea, which avoids recursive cycles. The new URL can be extracted from the header field, as follows: String RedirectLocation;

Header locationHeader = method.getResponseheader ("location");

IF (LocationHeader! = null) {

RedirectLocation = locationheader.getValue ();

} else {

// the response is invalid and did not provide the new location for

// The resource. Report An Error or Possibly Handle The RESPONSE

// Like a 404 Not Found Error.

}

Special redirection:

More than 300 selection. Httpstatus.sc_multiple_choices 304 has no change. Httpstatus.sc_no t_modified 305 uses the agent. Httpstatus.sc_use_proxy

7, Character Encoding

The head of an HTTP protocol or a response (in the HTTP protocol, the data packet is divided into two parts, part is the head, consisting of some name values, part is the main body (body), is a true biography of data (eg HTML page, etc.), must be encoded in US-ASCII, because the header does not pass data, only some of the information of the data being transmitted, one exception is cookie, which is data but transmits through the head, so It also uses US-ASCII encoding.

The body portion of the HTTP packet can be encoded in any way. The default is ISO-8859-1, and the specific can be specified with the header field content-type. You can use the AddRequestHeader method to set the encoding method; get the encoding method with getResponsecharset. For HTML or XML and other types of documents, their own content-type can also specify encoding methods, mainly distinguishing between the scope of the two to get the correct decoding.

The encoding standard of the URL is specified by RFC 1738, which can only be composed of a printed 8-bit / byte of US-ASCII character, and 80-ff is not a US-ASCII character, and 00-1f is a control character, in which two areas The characters used must be encoded (Encoded).

8, cookies

HTTPClient automatically manages cookies, including allowing the server to set cookies and automatically return cookies when needed, which also supports manual setting cookies to the server side. Unfortunately, for how to handle cookies, there are several specification mutual conflicts: Netscape cookie draft, RFC2109, RFC2965, and a large number of software vendors' cookie implementation does not follow any specification. In order to deal with this situation, HttpClient provides Policy-driven cookie management method. HttpClient supports the cookie specification:

Draft Netscape Cookie is the earliest cookie specification based on RFC2109. Although this specification has a big difference with RC2109, this can be compatible with some servers. RFC2109 is the first official cookie specification released by W3C. In theory, all servers are in handling Cookie (Version 1), they all follow this specification. For this reason, HTTPClient sets them to the default specification. Regrettably, this specification is too strict, so that many servers have implemented this specification or still in the Netscape specification. In this case, a compatibility specification should be used. Compatibility specification, designed to be compatible with as many servers as possible, even if they don't follow standard specifications. When parsing the Cookie problems, compatibility specifications should be considered. The RFC2965 specification is not supported by HttpClient (in the later version), it defines the cookie version 2, and explains the shortcomings of version 1Cookie, and RFC2965 is intended to replace RFC2109.

In httpclient, there are two ways to specify the use of the cookie specification.

HTTPCLIENT Client = New httpclient ();

Client.getState (). setCookiePolicy (cookiepolicy.compatibility);

This method setting is only valid for the current httpState, and the parameter can be valued valuePolicy.comPatibility, cookiepolicy.netscape_draft or cookiepolicy.rfc2109.

System.SetProperty ("Apache.commons.httpClient.Cookiespec", "Compatibility");

This approach is specified, which is valid for each new HTTPSTATE object, and the parameters can be "compatibility", "netscape_draft" or "RFC2109". There is often a problem that Cookie is not resolved, but most of the replacement to compatibility can be solved.

9. What should I do with HTTPClient encountered problems?

Use a browser to access the server to confirm that the server responds normal If you make the agent, turn off the agent to try another server to try (if you run a different server software better) Check if the code is written according to the idea in the tutorial Set the LOG level to debug, find the reason for the problem, open WiRetrace, to track the communication of the client and the server, to send information to the server with Telnet or Netcat, is suitable for guessing, for guessing. When testing the Netcat, running Netcat in a listening manner, used as a server to check how HttpClient handles your response. Try with the latest HTTPCLIENT, bugs can fix the mail list to Bugzilla in the latest version.

10, SSL

With Java Secure Socket Extension (JSSE), HTTPClient fully supports HTTP on Secure Sockets Layer (SSL) or IETF Transport Layer Security (TLS) protocol. JSSE has jRE1.4 and later versions, the previous version requires manual installation settings, see the specific process

Sun website or this study notes.

Use SSL in httpclient is very simple, refer to the following two examples:

HTTPCLIENT httpclient = new httpclient ();

GetMethod httpget = new getMethod ("https://www.verign.com/"); httpclient.executemethod (httpget);

System.out.println (httpget.getStatusline (). TOSTRING ());

If the agent is required, the following:

HTTPCLIENT httpclient = new httpclient ();

HttpClient.gethostConfiguration (). setProxy ("MyProxyHost", 8080);

HttpClient.getState (). setProxycredentials ("My-Proxy-Realm", "MyProxyhost",

New UserNamePasswordcredentials ("My-Proxy-UserName", "My-Proxy-Password"))

GetMethod httpget = new getMethod ("https://www.verignign.com/");

HttpClient.executeMethod (httpget);

System.out.println (httpget.getStatusline (). TOSTRING ());

The steps to customize SSL in httpclient are as follows:

Provides a socket factory that implements the org.apache.commons.httpclient.protocol.secureProtocolsocketFactory interface. This Socket Factory is responsible for playing a server's port, using standard or third-party SSL library, and performs initialization operations such as connection handshake. Normally, this initialization operation is automatically performed when the port is created. Instantiate an org.apache.commons.httpClient.Protocol.Protocol object. When you create this instance, you need a legitimate protocol type (such as https), a custom Socket Factory, and a default end number (such as HTTPS 443 port).

Protocol myhttps = new protocol ("https", new mysslsocketfactory (), 443);

This instance can then be set to the processor of the protocol.

HTTPCLIENT httpclient = new httpclient ();

HttpClient.gethostConfiguration (). SETHOST ("www.whaatever.com", 443, myhttps);

GetMethod httpget = new getMethod ("/");

HttpClient.executeMethod (httpget);

This customized instance is registered as a default processor for a particular protocol by calling the protocol.registerProtocol method. Thus, you can easily customize your protocol type (such as MyHTTPS).

Protocol.registerProtocol ("Myhttps",

New Protocol ("HTTPS", New MysslsocketFactory (), 9443));

...

HTTPCLIENT httpclient = new httpclient ();

GetMethod httpget = new getMethod ("myhttps://www.w.WHATEVER.com/"); httpclient.executeMethod (httpget);

If you want to replace the HTTPS default processor with your customized processor, you only need to register it as "https".

Protocol.registerProtocol ("https",

New Protocol ("HTTPS", New MysslsocketFactory (), 443);

HTTPCLIENT httpclient = new httpclient ();

GetMethod httpget = new getMethod ("https://www.whaatever.com/");

HttpClient.executeMethod (httpget);

Known restrictions and problems

Continuous SSL connections cannot work on Sun's lower than 1.4JVM, which is due to the JVM bug. When accessing the server by proxy, Non-preemptive authentication will fail because of the designed defects of HttpClient, will be modified in later versions.

Processing

Many problems, especially when JVM is less than 1.4, is caused by JSSE installation.

The following code can be used as the final detection means.

Import java.io.bufferedreader;

Import Java.io.InputStreamReader;

Import Java.io.OutputStreamwriter;

Import java.io.writer;

Import java.net.socket;

Import javax.net.ssl.sslsocketfactory;

public class Test {public static final String TARGET_HTTPS_SERVER = "www.verisign.com"; public static final int TARGET_HTTPS_PORT = 443; public static void main (String [] args) throws Exception {Socket socket = SSLSocketFactory.getDefault () createSocket (. TARGET_HTTPS_SERVER, TARGET_HTTPS_PORT; TRY {Writer Out = New OutputStreamWriter (Socket.getOutputStream (), "ISO-8859-1"); Out.write ("Get / http / 1.1 / r / n"); Out.write (" Host: " Target_HTTPS_SERVER ": " Target_https_port " / r / n "); Out.write (" Agent: SSL-TEST / R / N "); Out.write (" / r / n "); OUT .flush (); bufferedreader in = new bufferedreader (socket.getinputStream (), "ISO-8859-1"); string line = null; while ((line = in.readline ())! = null) {System.out.println (line);}} finally {socket.close ();}}} 11, multi-threaded processing of HTTPCLIENT

The main purpose of using multi-thread is to achieve parallel downloads. During the HTTPCLInt run, each HTTP protocol method uses an HTTPConnection instance. Since the connection is a limited resource, each connection can only be used for one thread and method at a moment, so it is necessary to ensure that the connection is properly allocated when needed. HTTPClient uses a method similar to the JDBC connection pool to manage the connection, which is done by MultithreadedhttpConnectionManager.

MultithreadedhttpConnectionManager ConnectionManager =

New multithreadedhttpConnectionManager ();

HTTPCLIENT Client = New HttpClient (ConnectionManager);

This is, the client can be used in multiple threads to perform multiple methods. Each time you call the httpclient.executeMethod () method, you will go to the Link Manager to apply for a connection instance. If you apply, this link instance is checked out (checkout), and you must return the manager after the link is used. Manager supports two settings:

MaxConnectionsPerhost Each host's maximum parallel link number, the default is the maximum number of parallel links in the 2MaxTotalConnections client, and the default is 20

When the manager reuses the link, it takes an early returning man to reuse the way (Least Recently Used Approach).

Since the use of httpclient programs instead of HTTPClient itself, HTTPCLIENT cannot determine when the connection is no longer used, which requires manual explicit call releaseConnection after reading the main body of the response package. To release the links for the application. MultithreadedhttpConnectionManager ConnectionManager = New MultithreadedHttpConnectionManager ();

HTTPCLIENT Client = New HttpClient (ConnectionManager);

...

// In a thread.

GetMethod get = new getMethod ("http://jakarta.apache.org/");

Try {

Client.executeMethod (GET);

// Print Response to Stdout

System.out.println (get.getResponsebodyasstream ());

} finally {

// be Sure The Connection IS Released Back to The Connection

// manager

Get.releaseConnection ();

}

There must be a Method.ReleaseConnection () with each httpclient.executeMethod.

12, HTTP method

HTTP methods supported by HTTPClient have 8 kinds, which are described below. 1. Options HTTP method Options is used to send a request to the server, and it is desirable to obtain the function options that can be used for the communication process of the request / response to the request / response flag. Through this method, the client can decide what action and / or some necessary conditions can be taken to a resource before taking specific actions, or understand the functions provided by the server. The most typical application of this method is to obtain which HTTP methods supported by the server. There is a class called OptionsMethod in HttpClient to support this HTTP method, using this class's getAllowedMethods method, you can simply implement the above typical applications.

OptionsMethod options = new optionsmethod ("http://jakarta.apache.org");

// Perform a method and do the corresponding exception handling

...

Enumeration allowedmethods = options.getallowedmethods ();

Options.releaseConnection ();

2, Get

The HTTP method GET is used to retrieve any information requested by the request URI (entity), "Get" is intended to mean "Get". If the URI is requested to point to a data processing, the data generated by the process is returned in the form of an entity instead of returning the code of this process.

If the HTTP package contains if-modifiedsince, if-match, if-none-match, or if-match, the get has become the "conditional GET", ie only the above fields are met. The entities of the description are retrieved, which can reduce some non-essential network transmission, or reduce multiple requests for a certain resource (such as the first check, the second download). (General browser, there is a temporary directory, used to cache some web information, when browsing a page again, only download those modified content, to speed up the browsing speed, this is the truth. As for the check, it is commonly used Be achieved better than get.) If the http package contains the RANGE header field, then in the entity specified by the URI, only the part of the decision range is taken back. (Friends with excessive thread download tools may be easier to understand this) The typical application of this method is used to download documents from the Web server. HttpClient defines a class named getMethod to support this method. You can take the document (such as HTML page) in the answering package in the getResponseBody. In this three functions, GetResponseBodyAsStream is usually the best way, mainly because it avoids caches all downloaded data before processing the downloaded document.

GetMethod get = new getMethod ("http://jakarta.apache.org");

// Perform a method and process the failure request.

...

InputStream in = get.getresponsebodyasstream ();

// use the input stream to process information.

Get.releaseConnection ();

The most common incorrect use of GetMethod is not read out without the data of all the response mains. Also, you must pay attention to the release of the link manually.

3, HEAD

HTTP's HEAD method, which is exactly the same with the GET method, the only difference is that the server cannot contain the host-body in the answering package, and must not contain the body. Using this method allows customers to get some basic information about it without the need to download the resource. This method is often used to check the accessibility of the hyperlink and the resource has never been modified.

The most typical application of HTTP's HEAD method is the basic information of the resource. HTTPClient defines the HEADMETHOD class to support this method. Like other * Method classes, the HeadMethod class is used to take the header information with getResponseHeaders () without its own special way.

HeadMethod head = new headmethod ("http://jakarta.apache.org");

// Perform a method and process the failure request.

...

// Remove the header field information of the answer package.

Header [] headers = head.getResponseheaders ();

// Retrieve only the information of the final modification date field. String lastmodified = head.getResponseheader ("Last-Modified"). GetValue ();

4, POST

POST has the meaning of "dispatch" in English. The HTTP method POST is required to request the server to accept the entity in the request package, and use it as a subsidiary of the request URI. In essence, this means that the server wants to save this entity information and is usually processed by the server. The design intent of the POST method is to implement the following functions in a unified manner: publish the information to the BBS, newsgroups, mailing lists, or similar article groups to the data. The processing process expands a database through additional operations. These are expected to generate certain "side effects" on the server side, such as modifying the database.

HttpClient defines the PostMethod class to support the HTTP method. In HTTPClient, use the Post method with two basic steps: Prepare the data for the request packet, and then read the information of the answering package for the server. By calling the setRequestBody () function, the data is provided for the request package, which can receive three types of parameters: input stream, name value pair or string. As for reading the response package, you need to call getResponseBody * that of the series of methods, the same method for handling the answering package with the GET method.

Frequently Asked Questions is that all responses are not read (whether it is useful to the program), or no link resource is released.

转载请注明原文地址:https://www.9cbs.com/read-67222.html

New Post(0)