Tomcat's encoding process for String

zhaozj2021-02-11  203

Reprinted: Please indicate http://9cbs.net

Author: ggyy1977@hotmail.com

Chinese processing of Tomcat (1)

Seeing a lot of friends asking about Chinese processing issues, let's take a process of Unicode with Tomcat 4.0 for servlet, JSP engine.

1) Accept request from the client

When the client requests a JSP document of Tomcat, Tomcat constructs an instance of the corresponding httpservletRequest implementation class to represent the client, read by convection servletInputStream, we can get the client's data.

Request.getParameter () we usually use in JSP to get the value of the parameter, how is the back of this function? How to encode String?

Implementation of the source code from Tomcat httpservletRequest:

Public String getParameter (String name)

{

Parseparameters (); / handle Parameters

String Values ​​[] = (String []) parameters.get (name); // Get the object of this parameter name corresponds to Object (is an array)

IF (VALUES! = NULL)

{

Return Values ​​[0];

Else

{

Return NULL;

}

}

Where Parameters is a data member of a MAP type of Request, which is used to store data for accepted clients. That is to say, whenever the client requests, Tomcat constructs a request instance, which has a parameter used to store the data of the client read from the Servlet instance.

The most important thing from the code knows the ParseParameters () function, which is to process parameters.

Let's take a look:

protected void parseparameters ()

{

IF (PARSED)

{

Return; // If handled, don't process it.

}

ParameterMap Results = parameters; / Constructing local references for Parameters objects

IF (results == null)

{

Results = new parametermap (); // If there is no instance

}

Results.setlocked (false);

String encoding = getcharacterencoding (); // get HTTPSERVELTREQUEST encoding

IF (Encoding == Null)

{

Encoding = "ISO-8859-1"; // If you do not specify the code of HTTPSERVELTREQUEST, "ISO-8859-1"

}

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Requestutil.Parseparameters (Results, querystring, eNCoding); // Processing code

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Is.read (BUF, LEN, MAX - LEN); // Read data from the stream

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Requestutil.Parseparameters (Results, BUF, Encoding); /// Processing Coding

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Parameters = results; // Reset reference

}

Let's take a look at Requestutil.ParsEParameters (Results, BUF, Encoding); / Process: There is no post source code,

Requestutil.ParsEParameters (Results, BUF, EnCoding) Processing for BYTE arrays, constructs Key and Value, which are parameter names and parameter values:

While (ix

{

Byte C = Data [IX ];

Switch ((char) c)

{

Case 38: // '&'

Value = New String (Data, 0, OX, Encoding);

IF (Key! = null)

{

PUTMAPENTRY (MAP, Key, Value);

Key = NULL;

}

ox = 0;

Break;

Case 61: // '='

Key = New String (Data, 0, OX, Encoding);

ox = 0;

Break;

Case 43: // ' '

Data [OX ] = 32;

Break;

Case 37: // '%'

Data [OX ] = (Byte) ((CONVERTHEXDIGIT (DATA [IX ]) << 4) ConvertHexDigit (Data [IX ]));

Break;

DEFAULT:

Data [OX ] = C;

Break;

}

}

IF (Key! = null)

{

Value = New String (Data, 0, OX, Encoding);

PUTMAPENTRY (MAP, Key, Value);

}

Obviously, the NEW STRING (DATA, 0, OX, ENCODING) used for parameter names and parameters; method is constructed using the specified encoding method.

Conclusion: We are not difficult to see if there is no request to specify the encoding method, the name and parameter value of the parameters accepted from the client is the String encoded in ISO-8859-1.

That is to say, the parameter value given in the form element in the JSP page is encoded in ISO-8859-1 after the String of Request.GetParamter () is ISO-8859-1.

And we look at the Java file generated by Tomcat for JSP. For String, which is not specified in JSP, Tomcat is used by the ISO-8859-1, not the system default.

such as:

<%

String name = new string ("Hello"); or string name = "Hello"; / all use of the ISO-8859-1 encoding method.

System.out.println (Name); / will generate garbled. (Because the default code of the system used by Console, the Chinese system is GB2321, and Japanese is MS932).

%>

Next article We introduce HTTPSERVLETRESPONSE processing

转载请注明原文地址:https://www.9cbs.com/read-4257.html

New Post(0)