HttpClient

xiaoxiao2021-03-06  15

1, HTTPCLIENT function

Based on standard, pure Java, HTTP1.0 and 1.1 are implemented.

In an extensible OO frame, all methods of HTTP are implemented (GET, POST,

PUT, DELETE, HEAD, OPTIONS, AND TRACE

Support encryption operations for HTTPS (HTTP on SSL)

Transparently through the HTTP agent to establish a connection

With the Connect method, use the HTTPS connection through the HTTP proxy

Use local Java Socket, transparently through the SOCKS (Version 5 and 4) agent established connections

Support authentication with Basic, Digest and NTLM encryption

Supports multi-part form Post method for uploading large files

Plug-in security socket implementation, easy to use third-party solutions

Connection management, support multi-threaded applications, support setting single host total connection and maximum connection number, automatic detection and closing failover

Directly send the request information to the port of the server

Read the answer information sent from the server's port

Support for persistent connectivity settings with persistance in Keepalive and HTTP / 1.1 in HTTP / 1.0

Access the response code and header information sent by the server directly

Connection timeout

HttpMethods implements Command Pattern to allow parallel requests or efficient connection multiplexing

Follow the Apache Software License protocol, source code free available

2, preparatory work

For JRE 1.3. *, If you want httpclient to support HTTPS, you need to download and install the JSSE and JCE. The steps to install are as follows:

1) Download JSSE and JCE.

2) Check that there is no JAR package related to JSSE and JCE in ClassPath.

3) Copy US_EXPORT_POLICY.jar, local_policy.jar, jsse.jar, jnet.jar, JCE1_2_X.jar, Sunjce_Provider.jar, Jternal.jar to the directory:

UNIX: $ JDK_HOME / JRE / LIB / EXT

Windows:% JDK_HOME% / JRE / LIB / EXT

4) Modify the java.security file in the following directory.

UNIX: $ JDK_HOME / JRE / LIB / SECURITY /

Windows:% JDK_HOME% / JRE / LIB / SECURITY /

5)

will

#

# List of providers and their preference Orders:

#

Security.Provider.1 = sun.security.provider.sun

Security.Provider.2 = com.sun.rsajca.provider

Change to:

#

# List of providers and their preference Orders:

#

Security.Provider.1 = com.sun.crypto.priseprovider.sunjce

Security.Provider.2 = sun.security.provider.sun

Security.Provider.3 = com.sun.rsajca.provider

Security.Provider.4 = com.sun.net.ssl.internal.ssl.Provider

HttpClient also requires the installation of Commons-logging, following the HTTPClient installation.

3, get the source code

CVS -D: PServer: anoncvs@cvs.apache.org: / home / cvspublic login

Password: anoncvs

CVS -D: PServer: anoncvs@cvs.apache.org: / home / cvspublic Checkout Jakarta-Commons / LoggingCVS -D: PServer: Anoncvs@cvs.apache.org: / Home / Cvspublic Checkout Jakarta-Commons / HttpClient

Compile:

CD JAKARTA-Commons / Logging

ANT DIST

CP DIS / *. jar ../httpclient/lib/

CD ../httpclient

ANT DIST

4, use HTTPCLIENT programming basic steps

Create an instance of httpclient.

Create a method (DeleteMethod, EntityEnclosingMethod, ExpectContinueMethod, GetMethod, HeadMethod, MultipArtPostMethod, OptionsMethod, PostMethod, Putmethod, TraceMethod), generally available to target URLs as parameters.

Let HTTPCLIENT perform this method.

Read response information.

Release the connection.

Treatment response.

In the process of performing methods, there are two exceptions, one is httprecoverableException, indicating that an accidental error occurs, and the other can be successful, the other is IOEXCEPTION, a serious error.

There is a routine in this tutorial, you can download.

5, certification

HTTPClient three different authentication schemes: Basic, Digest and NTLM. These programs can be used for server or proxy for client authentication, referred to as server authentication or proxy authentication.

1) Server Authentication

HttpClient handler is almost transparent, and only developers are required to provide login credentials. The login information is saved in an instance of the HTTPState class, which can be obtained or set by setCredentials (String Realm, Credentials Cred) and GetCredentials (String Realm). Note that the login information required to access the non-specific site access is set, and the Realm parameter is set to NULL. HTTPCLIENT built-in automatic authentication can be turned off via the HttpMethod class's setDoauthentication (Boolean Doauthentication) method, and this closure only affects the current httpmethod Example.

Preemptive Authentication can be opened by the following method.

Client.getState (). setAuthenticationPreemptive (TRUE);

In this mode, HttpClient actively transmits the BASIC certified response information to the server, even in some case, the server may return the authentication failure response, which is mainly to reduce the establishment of the connection. To make each new HTTPState instance are first authentication, you can set the system properties as follows.

SetSystemProperty (Authenticator.Preemptive_Property, "True");

The first authentication implemented by HTTPCLIENT follows RFC2617.

2) Proxy Authentication

In addition to logging in information, the agent authentication is almost consistent with server authentication. Use setProxycredentials (String Realm, Credentials Cred) and GetProxycredentials (String Realm) to take the login information. 3) Authentication Schemes

Basic

It is the earliest and most compatible (?) Program in HTTP, unfortunately, is the most unsafe program because it transmits the username and password in coded. It requires a UserNamePasswordCredentials instance that specifies the access space of the server or the default login information.

Digest

It is an increase in http1.1, although it is not as good as Basic's software support, but it is still widely used. The DiGest solution is much more secure than the Basic program, because it does not transfer the actual password through the network, transmitted with this password to a random number (nonce) from the server. It requires a UserNamePasswordCredentials instance that specifies the access space of the server or the default login information.

NTLM

This is the most complex authentication agreement supported by HTTPClient. It is a private protocol for M $ design, and there is no disclosed specification description. Atten, due to the defect of the design, NTLM security is poor than Digest, and after a servicePack patch, security is compared to Digest. NTLM requires an NTCredentials instance. Note that since NTLM does not use the concept of access space (Realms), HttpClient utilizes the server's name of the server. It is also important to note that providing NTCredentials username, do not use the prefix of the domain name - such as: "adrian" is correct, and "Domain / Adrian" is wrong.

NTLM certified work mechanisms have great differences in Basic and Digest. These differences are generally treated by httpclient, but understanding these differences help avoid errors when using NTLM certification.

From the perspective of HTTPClientAPI, NTLM works the same work as other authentication methods, and the difference is needed to provide 'NTCredentials' instance instead of 'usernamepasswordcredentials' (in fact, the former is just expanding the latter)

For NTLM authentication, the access space is the domain name of the connected machine, which has some troubles for multi-domain names. Only the domain name specified in the HTTPClient connection is the domain name for authentication. It is recommended to set REALM to NULL to use the default settings.

NTLM just authenticates a connection instead of a request, so whenever a new connection is established, it is important to keep a connection during the authentication process. Therefore, NTLM cannot be used at the same time for proxy authentication and server authentication, and cannot be used for HTTP1.0 connections or servers do not support persistent connections.

6, redirect

HTTPCLIENT can also be redirected to be automatically redirected due to technical restrictions, and to ensure 2.0 release API, HTTPCLIENT can not be automatically redirected, but to redirect to the same host, the same port and adopt the same protocol HTTPCLIENT can be supported. The case where it is not possible, including the case where artificial interaction is required, or the ability to exceed HTTPCLIENT.

When the server redirects instructions refer to different hosts, HTTPClient simply simply acts as a reply state. All 300 to 399 (including both ends) have a redirect response. Commonly known:

301 Permanent Move. Httpstatus.sc_moved_persManently

302 temporary movement. Httpstatus.sc_moved_temporarily

303 see Other. Httpstatus.sc_see_other307 Temporary redirection. Httpstatus.sc_temporary_redirect

When a simple redirection is received, the program should take a new URL from the HTTPMethod object and download it. In addition, restricting the number of redirections is a good idea, which avoids recursive cycles. The new URL can be extracted from the header Location, as follows:

String redirectLocation;

Header locationHeader = method.getResponseheader ("location");

IF (LocationHeader! = null) {

RedirectLocation = locationheader.getValue ();

} Else {

// the response is invalid and did not provide the new location for

// The resource. Report An Error or Possibly Handle The RESPONSE

// Like a 404 Not Found Error.

}

Special redirection:

300 multiple choices. Httpstatus.sc_multiple_choices

304 did not change. Httpstatus.sc_no t_modified

305 uses a proxy. Httpstatus.sc_use_proxy

7, Character Encoding

The head of an HTTP protocol or a response (in the HTTP protocol, the data packet is divided into two parts, part is the head, consisting of some name values, part is the main body (body), is a true biography of data (eg HTML page, etc.), must be encoded in US-ASCII, because the header does not pass data, only some of the information of the data being transmitted, one exception is cookie, which is data but transmits through the head, so It also uses US-ASCII encoding.

The body portion of the HTTP packet can be encoded in any way. The default is ISO-8859-1, and the specific can be specified with the header field content-type. You can use the AddRequestHeader method to set the encoding method; get the encoding method with getResponsecharset. For HTML or XML and other types of documents, their own content-type can also specify encoding methods, mainly distinguishing between the scope of the two to get the correct decoding.

The encoding standard of the URL is specified by RFC 1738, which can only be composed of a printed 8-bit / byte of US-ASCII character, and 80-ff is not a US-ASCII character, and 00-1f is a control character, in which two areas The characters used must be encoded (Encoded).

8, cookies

HTTPClient automatically manages cookies, including allowing the server to set cookies and automatically return cookies when needed, which also supports manual setting cookies to the server side. Unfortunately, for how to handle cookies, there are several specification mutual conflicts: Netscape cookie draft, RFC2109, RFC2965, and a large number of software vendors' cookie implementation does not follow any specification. In order to deal with this situation, HttpClient provides Policy-driven cookie management method. HttpClient supports the cookie specification:

Draft Netscape Cookie is the earliest cookie specification based on RFC2109. Although this specification has a big difference with RC2109, this can be compatible with some servers. RFC2109 is the first official cookie specification released by W3C. In theory, all servers are in handling Cookie (Version 1), they all follow this specification. For this reason, HTTPClient sets them to the default specification. Regrettably, this specification is too strict, so that many servers have implemented this specification or still in the Netscape specification. In this case, a compatibility specification should be used.

Compatibility specification, designed to be compatible with as many servers as possible, even if they don't follow standard specifications. When parsing the Cookie problems, compatibility specifications should be considered.

The RFC2965 specification is not supported by HttpClient (in the later version), it defines the cookie version 2, and explains the shortcomings of version 1Cookie, and RFC2965 is intended to replace RFC2109.

In httpclient, there are two ways to specify the use of the cookie specification.

HTTPCLIENT Client = New httpclient ();

Client.getState (). setCookiePolicy (cookiepolicy.compatibility);

This method setting is only valid for the current httpState, and the parameter can be valued valuePolicy.comPatibility, cookiepolicy.netscape_draft or cookiepolicy.rfc2109.

System.SetProperty ("Apache.commons.httpClient.Cookiespec", "Compatibility");

This approach is specified, which is valid for each new HTTPSTATE object, and the parameters can be "compatibility", "netscape_draft" or "RFC2109".

There is often a problem that Cookie is not resolved, but most of the replacement to compatibility can be solved.

9. What should I do with HTTPClient encountered problems?

Access the server with a browser to confirm that the server responds

If you have a proxy, you turn off the agent.

Try another server (if you run different server software)

Check if the code is written according to the ideas mentioned in the tutorial

Set the LOG level for debug, find out the cause of the problem

Open WiRetrace to track the communication of clients and servers to do anything else

Use Telnet or Netcat to send information to the server, suitable for guessing the reasons for testing

Run Netcat in a listener, use as a server to check how HttpClient handles your response.

Try with the latest HTTPCLIENT, bugs may fix it in the latest version.

Ask for email list

Report bug to Bugzilla.

10, SSL

With Java Secure Socket Extension (JSSE), HTTPClient fully supports HTTP on Secure Sockets Layer (SSL) or IETF Transport Layer Security (TLS) protocol. JSSE has jre1.4 and later versions, the previous version requires manual installation settings, see the Sun website or this study notes.

Use SSL in httpclient is very simple, refer to the following two examples:

HTTPCLIENT httpclient = new httpclient ();

GetMethod httpget = new getMethod ("https://www.verign.com/"); httpclient.executemethod (httpget);

System.out.println (httpget.getStatusline (). TOSTRING ());

If the agent is required, the following:

HTTPCLIENT httpclient = new httpclient ();

HttpClient.gethostConfiguration (). setProxy ("MyProxyHost", 8080);

HttpClient.getState (). setProxycredentials ("My-Proxy-Realm", "MyProxyhost",

New UserNamePasswordcredentials ("My-Proxy-UserName", "My-Proxy-Password"))

GetMethod httpget = new getMethod ("https://www.verignign.com/");

HttpClient.executeMethod (httpget);

System.out.println (httpget.getStatusline (). TOSTRING ());

The steps to customize SSL in httpclient are as follows:

Provides a socket factory that implements the org.apache.commons.httpclient.protocol.secureProtocolsocketFactory interface. This Socket Factory is responsible for playing a server's port, using standard or third-party SSL library, and performs initialization operations such as connection handshake. Normally, this initialization operation is automatically performed when the port is created.

Instantiate an org.apache.commons.httpClient.Protocol.Protocol object. When you create this instance, you need a legitimate protocol type (such as https), a custom Socket Factory, and a default end number (such as HTTPS 443 port).

Protocol myhttps = new protocol ("https", new mysslsocketfactory (), 443);

This instance can then be set to the processor of the protocol.

HTTPCLIENT httpclient = new httpclient ();

HttpClient.gethostConfiguration (). SETHOST ("www.whaatever.com", 443, myhttps);

GetMethod httpget = new getMethod ("/");

HttpClient.executeMethod (httpget);

This customized instance is registered as a default processor for a particular protocol by calling the protocol.registerProtocol method. Thus, you can easily customize your protocol type (such as MyHTTPS).

Protocol.registerProtocol ("Myhttps",

New Protocol ("HTTPS", New MysslsocketFactory (), 9443));

...

HTTPCLIENT httpclient = new httpclient ();

GetMethod httpget = new getMethod ("myhttps://www.w.WHATEVER.com/"); httpclient.executeMethod (httpget);

If you want to replace the HTTPS default processor with your customized processor, you only need to register it as "https".

Protocol.registerProtocol ("https",

New Protocol ("HTTPS", New MysslsocketFactory (), 443);

HTTPCLIENT httpclient = new httpclient ();

GetMethod httpget = new getMethod ("https://www.whaatever.com/");

HttpClient.executeMethod (httpget);

Known restrictions and problems

Continuous SSL connections cannot work on Sun's lower than 1.4JVM, which is due to the JVM bug.

When accessing the server by proxy, Non-preemptive authentication will fail because of the designed defects of HttpClient, will be modified in later versions.

Processing

Many problems, especially when JVM is less than 1.4, is caused by JSSE installation.

The following code can be used as the final detection means.

Import java.io.bufferedreader;

Import Java.io.InputStreamReader;

Import Java.io.OutputStreamwriter;

Import java.io.writer;

Import java.net.socket;

Import javax.net.ssl.sslsocketfactory;

Public class test {

Public Static Final String Target_https_server = "www.verign.com";

Public Static Final Int Target_https_port = 443;

Public static void main (string [] args) throws exception {

Socket Socket = SslsocketFactory.getDefault ().

Createsocket (target_https_server, target_https_port);

Try {

Writer out = new outputReamWriter

Socket.getOutputStream (), "ISO-8859-1");

Out.write ("Get / HTTP / 1.1 / R / N");

Out.write ("Host:" Target_HTTPS_SERVER ":"

TARGET_HTTPS_PORT "/ r / n");

Out.write ("Agent: SSL-TEST / R / N");

Out.write ("/ r / n");

Out.flush ();

BufferedReader in = New BufferedReader

New INPUTSTREAMREADER (Socket.GetinputStream (), "ISO-8859-1");

String line = NULL;

While ((line = in.readline ())! = null) {system.out.println (line);

}

} Finally {

Socket.close ();

}

}

}

11, HTTPCLIENT multi-threaded processing

The main purpose of using multi-thread is to achieve parallel downloads. During the HTTPCLInt run, each HTTP protocol method uses an HTTPConnection instance. Since the connection is a limited resource, each connection can only be used for one thread and method at a moment, so it is necessary to ensure that the connection is properly allocated when needed. HTTPClient uses a method similar to the JDBC connection pool to manage the connection, which is done by MultithreadedhttpConnectionManager.

MultithreadedhttpConnectionManager ConnectionManager =

New multithreadedhttpConnectionManager ();

HTTPCLIENT Client = New HttpClient (ConnectionManager);

This is, the client can be used in multiple threads to perform multiple methods. Each time you call the httpclient.executeMethod () method, you will go to the Link Manager to apply for a connection instance. If you apply, this link instance is checked out (checkout), and you must return the manager after the link is used. Manager supports two settings: MaxConnectionsPerhost's maximum parallel link of each host, default is 2

MaxTotalConnections client general parallel link maximum, default is 20

When the manager reuses the link, it takes an early returning man to reuse the way (Least Recently Used Approach).

Since the use of httpclient programs instead of HTTPClient itself, HTTPCLIENT cannot determine when the connection is no longer used, which requires manual explicit call releaseConnection after reading the main body of the response package. To release the links for the application.

MultithreadedhttpConnectionManager ConnectionManager = New MultithreadedHttpConnectionManager ();

HTTPCLIENT Client = New HttpClient (ConnectionManager);

...

// In a thread.

GetMethod get = new getMethod ("

http://jakarta.apache.org/;);

Try {

Client.executeMethod (GET);

// Print Response to Stdout

System.out.println (get.getResponsebodyasstream ());

} Finally {

// be Sure The Connection IS Released Back to The Connection

// manager

Get.releaseConnection ();

}

There must be a Method.ReleaseConnection () with each httpclient.executeMethod.

12, HTTP method

HTTP methods supported by HTTPClient have 8 kinds, which are described below.

1, Options

The HTTP method Options is used to send a request to the server, and it is desirable to obtain the function options that can be used for the communication process of the request / response to the request / response. Through this method, the client can decide what action and / or some necessary conditions can be taken to a resource before taking specific actions, or understand the functions provided by the server. The most typical application of this method is to obtain which HTTP methods supported by the server. There is a class called OptionsMethod in HttpClient to support this HTTP method, using this class's getAllowedMethods method, you can simply implement the above typical applications.

Optionsmethod options = new optionsmethod ("

http://jakarta.apache.org ";);

// Perform a method and do the corresponding exception handling

...

Enumeration allowedmethods = options.getallowedmethods ();

Options.releaseConnection ();

2, Get

The HTTP method GET is used to retrieve any information requested by the request URI (entity), "Get" is intended to mean "Get". If the URI is requested to point to a data processing, the data generated by the process is returned in the form of an entity instead of returning the code of this process.

If the HTTP package contains if-modifiedsince, if-match, if-none-match, or if-match, the get has become the "conditional GET", ie only the above fields are met. The entities of the description are retrieved, which can reduce some non-essential network transmission, or reduce multiple requests for a certain resource (such as the first check, the second download). (General browser, there is a temporary directory, used to cache some web information, when browsing a page again, only download those modified content, to speed up the browsing speed, this is the truth. As for the check, it is commonly used Be achieved better than get.) If the http package contains the RANGE header field, then in the entity specified by the URI, only the part of the decision range is taken back. (Friends with excessive thread download tools may be more likely to understand this)

A typical application of this method is used to download documents from the web server. HttpClient defines a class named getMethod to support this method. You can take the document (such as HTML page) in the answering package in the getResponseBody. In this three functions, GetResponseBodyAsStream is usually the best way, mainly because it avoids caches all downloaded data before processing the downloaded document.

GetMethod get = new getMethod ("

http://jakarta.apache.org ";);

// Perform a method and process the failure request.

...

InputStream in = get.getresponsebodyasstream ();

// use the input stream to process information.

Get.releaseConnection ();

The most common incorrect use of GetMethod is not read out without the data of all the response mains. Also, you must pay attention to the release of the link manually. 3, HEAD

HTTP's HEAD method, which is exactly the same with the GET method, the only difference is that the server cannot contain the host-body in the answering package, and must not contain the body. Using this method allows customers to get some basic information about it without the need to download the resource. This method is often used to check the accessibility of the hyperlink and the resource has never been modified.

The most typical application of HTTP's HEAD method is the basic information of the resource. HTTPClient defines the HEADMETHOD class to support this method. Like other * Method classes, the HeadMethod class is used to take the header information with getResponseHeaders () without its own special way.

Headmethod head = new headmethod ("

http://jakarta.apache.org ";);

// Perform a method and process the failure request.

...

// Remove the header field information of the answer package.

Header [] headers = head.getResponseheaders ();

// Retrieve only information on the final modification date field.

String lastmodified = head.getResponsehead ("last-modified"). GetValue ();

4, POST

POST has the meaning of "dispatch" in English. The HTTP method POST is required to request the server to accept the entity in the request package, and use it as a subsidiary of the request URI. In essence, this means that the server wants to save this entity information and is usually processed by the server. The design intent of the POST method is to implement the following functions in a unified manner:

Make a comment on existing resources

Publish information to BBS, newsgroups, mailing lists, or similar article groups

Put a piece of data, submitted to the data processing process

Expand a database by adding operations

These are operations look forward to producing certain "side effects" on the server side, such as modifying the database.

HttpClient defines the PostMethod class to support the HTTP method. In HTTPClient, use the Post method with two basic steps: Prepare the data for the request packet, and then read the information of the answering package for the server. By calling the setRequestBody () function, the data is provided for the request package, which can receive three types of parameters: input stream, name value pair or string. As for reading the response package, you need to call getResponseBody * that of the series of methods, the same method for handling the answering package with the GET method.

Frequently Asked Questions is that all responses are not read (whether it is useful to the program), or no link resource is released.

转载请注明原文地址:https://www.9cbs.com/read-46252.html

New Post(0)