JSP - read text files and Chinese characters

xiaoxiao2021-03-06 152

/ * * CREATED ON 2004-8-18 * * TODO reads TXT file JavaBeans * Window - Preferences - Java - Code Style - Code Templates * / package stud;

/ ** * @Author yhwell.xiong * * TODO JavaBeans * Window - Preferences - Java - Code Style - Code Templates * finally solved the problem with a text file, as follows: * 1. Put the text Save as a 'ANSI' encoding method; * 2. Nature is: * "ls_heightarray [i] = new string (randomaccessfile.readline (). GetBytes (" ISO-8859-1 ")," GBK ");" "* / Import java.io. *;

public class Readtxt {private String ls_heightArray []; private int li_i = 0; // private String ls_PathtxtFile = "d: temptest.txt"; public String [] getHeightArray (String ls_PathtxtFile) {try {RandomAccessFile randomAccessFile = new RandomAccessFile (ls_PathtxtFile, "r"); for (int I = 0; i <5; i ) {ls_heightarray [i] = new string (randomaccessfile.readline (). getBytes ("ISO-8859-1"), "GBK"); / / ls_heightarray [i] = randomaccessfile.readline ();}} catch (except e) {system.out.println (e);} return ls_heightarray;}} ============== =============

****************************** The following :::! ! ! *********************** 1. Select 'ANSI'2.ls_Str = New String when selecting the text file Save as you need to display, select the encoding method. Randomaccessfile.readline (). GetBytes ("ISO-8859-1"), "GBK"); 3.ok!

*********************************************************** ******************************************************** JSP / Chinese character encoding problem in servlet ************************************************ 1. Question Country (or regional) specifies the character coding set for computer information exchange, such as the development of the US expansion ASCII code, China's GB2312-80, Japan's JIS, etc. As the foundation of information processing in the country, there is a unified code. An important role. The character coding set is divided into two categories: SBCS (single-byte character set), DBCS (Double-byte character set). Early software (especially operating system), in order to solve the computer processing of local character information, there have been various localized version (L10N), in order to distinguish, introduce LANG, CODEPAGE and other concepts. However, due to the overlap of each local character set code, information exchange is difficult to exchange; the localized version of the software is high. It is therefore necessary to extract the commonality in the localization work, and consistently processed, and special localization processing content is lowered to minimize. This is the so-called internationalization (I18N). Various language information is further specified as Locale information. The underlying character set of processed has become a Unicode that contains all glyphs.

Most of the software core character processing of most international features is based on Unicode. When the software is run, determine the corresponding local character encoding setting based on the local locale / codepage setting, and processes the local characters. The mutual conversion of Unicode and local character sets is required during processing, or even Unicode is the mutual conversion of two different local character sets in the middle. This approach extends in a network environment, and character information at both ends of any network also needs to be converted to an acceptable content according to the setting of the character set.

The Java language inside the Java language uses Unicode to represent the character, comply with Unicode V2.0. The Java program will be transformed from the URL connection to the URL connection to the URL connection, or the URL connection is written, whether it is from / the file system. Although this increases the complexity of programming, it is easy to confuse, but it is in line with international ideas.

In theory, these characters conversions according to character set settings should not have too many problems. The fact is due to the different actual operating environment of the application, the complementary, perfect, and the irregularity of the system or application, and the problem that the system or application implemented, and the problem occurred in the time of transcoding, the programmer and the user are plagued.

2. GB2312-80, GBK, GB18030-2000 Chinese character set and encoding

In fact, the method of solving Chinese character encoding problems in the Java program is often simple, but understands the reasons behind it, positioning problems, but also understands existing Chinese character encoding and coding conversion.

GB2312-80 is developed in the initial phase of domestic computer Chinese character information technology, which contains most commonly used, secondary Chinese characters, symbols in 9 districts. This character set is a Chinese character set supported by almost all Chinese systems and international software, which is also the most basic Chinese character set. Its coding range is high 0xa1-0xfe, the low position is also 0xa1-0XFE; the Chinese character starts from 0xB0A1, ending at 0xf7fe;

GBK is an extension of GB2312-80 and is compatible. It contains 20902 Chinese characters whose coding range is 0x8140-0XFEFE, which eliminates the line of the high 0x80. All characters can be missed one-to-one to Unicode 2.0, that is, Java actually provides support for GBK character sets. This is the default character set of Windows and other Chinese operating systems, but not all international software supports this character set. It feels that they don't fully know how GBK is going. It is worth noting that it is not a national standard, but it is just a standard. With the release of GB18030-2000 national standard, it will complete its historical mission in the near future. GB18030-2000 (GBK2K) further expanded Chinese characters on the basis of GBK, increasing the shape of a small number of ethnic minorities. GBK2K has fundamentally solved the problem of insufficient word and lack of glyphs. It has several characteristics,

It did not determine all glyphs, just specified the coding range and expanded later. The coding is growing, and its binabed part is compatible with GBK; the four-byte part is the expanded glyph, the word bit, its code is the first byte 0x81-0XFE, the two bytes 0x30-0X39, three bytes 0x81- 0xfe, four bytes 0x30-0x39. Its promotion is a phased, first requiring implementation, all glyphs that can be fully mapped to the Unicode 3.0 standard. It is a national standard and is mandatory. There is no operating system or software to implement GBK2K, which is the work content of the current and future Chinese. Introduction to Unicode ... Just free.

Java supported Encoding related to Chinese programming: (there are several not listed in the JDK document)

ASCII 7-Bit, with ASCII7 ISO8859-1 8-Bit, with 8859_1, ISO-8859-1, ISO_8859-1, Latin1 ... GB2312-80 with GB2312, GB2312-1980, EUC_CN, EUCCN, 1381, CP1381, 1383 , CP1383, ISO2022CN, ISO2022CN_GB ... GBK (payment case), with MS936 UTF8 UTF-8 GB18030 (now only IBM JDK1.3.? Supported), with CP1392, 1392

The Java language uses Unicode processing characters. But from another perspective, in the Java program can also use non-Unicode transcoding, it is important to ensure that the program entry and the export of Chinese characters are not true. If I use ISO-8859-1 to process Chinese characters, the correct results can be achieved. Many solutions to the network are all in this type. In order not to be confused, this paper does not discuss this approach.

3. Chinese transcathes to '?', Garbled

Both direction conversions are likely to get the wrong results:

Unicode -> byte, if the target code set does not exist, the result is 0x3f. Such as: "U00D6U00ECU00E9U0046U00BBU00F9" .GetBytes ("GBK") The result is "? Ìéf?", The HEX value is 3FA8ACA8A6463FA8B4. Take a closer look, you will find that the U00EC is converted to 0xA8AC, and the U00E9 is converted to Xa8a6 ... It is actually effective! This is because some symbols in the GB2312 symbol area are mapped to some public symbol encodings, because these symbols appear in ISO-8859-1 or some other SBCS character set, they are proderated in Unicode, there are some valid Bit is only 8 bits, and the encoding overlap of Chinese characters (in fact this mapping is only the code mapping, it is not the same. The symbol in UNICODE is single byte wide, the symbols in the Chinese characters are double bytes.). Such a symbol between Unicodeu00A0 - U00FF has 20. Understanding this feature is very important! It is not difficult to understand why Java programming, some garbled characters often appear (actually symbolic characters) in the error result of Chinese character encoding, not the '' 'character, such as the above example. Byte -> Unicode, if the Byte identifier is not existent in the source code set, the result is 0xFFFD. For example, byte ba [] = {(byte) 0x81, (byte) 0x40, (byte) 0xB0, (Byte) 0xA1}; New String (BA, "GB2312"); the result "? Ah", the HEX value is "uffdu554a". 0x8140 is the GBK character, and there is no corresponding value according to the GB2312 conversion table, take UFFD. (Please note: When the Unicode is displayed, because there is no corresponding local character, the previous situation is also applicable, displayed as a "?".)

In the actual programming, the JSP / Servlet program gets the wrong Chinese character information, which is often superposed in these two processes, sometimes even two processes to overlap the results.

4. JSP / servlet Chinese character encoding problem and solution in WAS

4.1 Phenomenon of common Encoding issues Online common JSP / servlet Encoding issues generally behave in Browser or application terms, such as how both the Chinese characters in the JSP / Servlet page seen in your browser become '?'? Browse How do the Chinese characters in the servlet page seen in the unit become garbled? How does the Chinese characters in the Java application interface become square? The JSP / Servlet page cannot display GBK Chinese characters. JSP page is embedded

The Chinese in Java Code contained in Tag became garbled, but other Chinese characters on the page were right. JSP / servlets cannot receive Chinese characters submitted by Form. JSP / servlet database read-write cannot obtain the correct content. It is hidden behind these issues that are caused by various errors, except that the third is caused by the Java Font setting error). Solve similar characters eNCoding issues, you need to know the running procedure of the JSP / Servlet, check the points that may have problems.

4.2 JSP / Servlet Web Programming Encoding Questions The JSP / Servlet runs on the Java application server provides HTML content for Browser, which is shown below:

Among them, there is a character encoded and converted.

JSP compilation. The Java Application Server reads the JSP source file based on the JVM file.Encoding value, compiles the Generation Java source file, and then writes back to the file system according to the file.encoding value. If the current system language supports GBK, then Encoding issues will not appear at this time. If it is an English system, such as Linux, AIX or Solaris, which is en_us, then place the JVM's file.Encoding value into GBK. System Language If it is GB2312, then determine if you want to set file.encoding, set File.Encoding to GBK to solve the potential GBK character garble problem Java needs to be compiled into .class to execute in JVM, this process exists A. The same file.encoding problem. Since the start of servlet and JSP from here, it is similar to the Servlet's compilation is not automatic. For JSP programs, compilation of the generated Java intermediate file is automatically performed (call Sun.Tools.javac.MAIN directly). So if there is a problem in this step, you also check Encoding and OS language environment. Or turn the static Chinese character embedded in JSP Java Code to Unicode, or static text output is not in Java Code. Hand-specified -encoding parameters for servlets, Javac compiles.

Servlet needs to convert HTML pages to Browser acceptable encoding content. Depending on the implementation of each Java App Server, some will query the Browser Accept-Charset and Accept-Language parameters or to determine the Encoding value in other guessing methods, and there is no matter whether it is. Therefore, it is best to use a fixed encoding perhaps the best solution. For Chinese web pages, contentType = "text / html; charset = GB2312" can be set in JSP or Servlet; if there is a GBK character in the page, set to contentType = "text / html; charSet = GBK", due to IE and Netscape pairs GBK's support is different, you need to test it when making this setting. Because the 16-bit Java Char is discarded at the time of network transmission, it is also desirable to ensure that the Chinese characters in the servlet page (including embedded and servlet runs) are expected internal codes, you can use PrintWriter Out = RES. . the getWriter () substituted ServletOutputStream out = res.getOutputStream () PrinterWriter as will be specified in accordance contentType charset conversion (before the contentType specified in need!); OutputStreamWriter package can also be used with type ServletOutputStream and write (string) output kanji character string. For JSP, Java Application Server should ensure that the embedded Chinese characters will be transmitted correctly at this stage.

This is explaining the URL character eNCoding problem. If you contain Chinese character information from the parameter value returned from Browser through the GET / POST method, the servlet will not get the correct value. In the J2SDK of Sun, httputils.Parsename does not consider the language settings of Browser at the time of parsing the parameters, but will be parsed by byte. This is an Encoding issue discussed online. Because this is a design defect, it can only resolve the resulting string in bin; or resolved in the Hack Httputils class. Reference article 2 has a presentation, but it is best to change the Chinese Encoding GB2312, CP1381 to GBK, otherwise there will be a problem when you encounter GBK Chinese characters. Servlet API 2.3 provides a new function httpserveletRequest.setCharacterenceEncoding to specify eNCoding you want before calling Request.GetParameter ("param_name"), which will help completely solve this problem. [Posted Reply] [View the original post] [Add to Favorites] [Close]

-------------------------------------------------- ------------------------------ Eclipse Reply to: 2002-08-18 10:35:14 In addition, mentioned above "Servlet API 2.3 provides a new function httpserveletRequest.setCharacterencoding" saying, I tried, very easy to use, Tomcat4.0.1. The method is to configure a filter, filtering the request, the filter is as follows: [code] import java.io.ioException; import javax.servlet.filter; import javax.servlet.filterchain; import javax.servlet.filterConfig; import javax. Servlet.ServletException; import javax.servlet.servletRequest; import javax.servlet.servletResponse; import javax.servlet.unavailableException;

/ ** *

Title: Chinese problem

Description: Chinese problem

COMPANY:

* @Author Writeonce

* @version 1.0

* /

public class EncodingFilter implements Filter {protected String encoding = null; protected FilterConfig filterConfig = null; public void destroy () {this.encoding = null; this.filterConfig = null;

} Public void doFilter (ServletRequest request, ServletResponse response, FilterChain chain) throws IOException, ServletException {// Select and set (if needed) the character encoding to be used String encoding = selectEncoding (request); if (! Encoding = null) { request.setCharacterEncoding (encoding);} // Pass control on to the next filter chain.doFilter (request, response);} public void init (FilterConfig filterConfig) throws ServletException {this.filterConfig = filterConfig; this.encoding = filterConfig.getInitParameter ("encoding");} protected string selectencoding (servletRequest request) {

Return (this.encoding);

}

[/ code]

-------------------------------------------------- ------------------------------ Eclipse Reply to: 2002-08-18 10:35:48 [This post is finally Writeon in 2002/07/23 12:39 PM Editor]

At the same time, add the following configuration in Web.xml: [code]

Set Character Encoding

Encodingfilter

ENCODING

GBK

Set Character Encoding

/ * [/ Code]

-------------------------------------------------- ------------------------------ QDWHT reply to: 2002-11-14 15:27:08 Thank you very much, you can use it. However, although it can be used, it will appear when starting Tomcat. As follows: [Error] Digester - -PARSE Error At Line 60 Column 11: The Content of Element Type "Web-App" Must Match "(icon?, Display-name?, Description?, Distributable?, Context-param *, Filter *, filter-mapping *, listener *, servlet *, servlet-mapping *, session-config?, mime-maping *, welcome-file-list?, error-page *, taglib *, resource-env-ref *, Resource-ref *, security-constraint *, login-config?, security-role *, env-entry *, EJB-REF *, EJB-local-ref * ".

I am already the latest http://java.sun.com/dtd/web-app_2_3.dtd, why? -------------------------------------------------- ------------------------------ QDWHT reply to: 2002-12-10 16:27:05 No hero can help ?

-------------------------------------------------- ------------------------------ arren211314 Reply to: 2004-07-08 18:29:32 Really stupid! [color = red: b31db82201] Is it so difficult to handle Chinese characters? [/ color: b31db82201] I am a Chinese management system that is made under the Japanese operating system. Note that the output-type of the JSP is UTF-8, and the Content-Type of the head of the HTML tag content is also set to UTF-8. 1. When the servlet receives the customer request information, it will drop REQUEST.SETCHARACTERENCODING ("UTF-8") before calling getParameter (); it is possible. The information received later can be stored directly in SQLServer without garbled. 2. The Chinese characters read from the database are not used as any conversion, and Out.println is directly Out.println. Very simple. Is it so cumbersome? Note that UTF-8 is international coding. : o: P: EM11:

-------------------------------------------------- ------------------------------ Justui Reply to: 2004-07-12 21:04:18 I also encountered similar problems And start with:

Response.setContentType ("text / html; charset = GB2312"); Request.setCharacterencoding ("GB2312"); PrintWriter out = response.getwriter ();.... // name is incorporated by the external URL, later The form of the form of the form, the Parameters introduced, can be Chinese String Name = Request.GetParameter ("name");

The above code can normalize the form get, POST encoded data, but for the encoded Chinese parameters that are hand-formed, it cannot be normal, and everything will be displayed normally after it is changed to the following code.

Response.setContentType ("text / html; charset = GB2312"); Request.SetCharacterencoding ("ISO-8859-1"); PrintWriter out = response.getwriter ();.... // Name is a start-up external URL biography The parameters incorporated by the GET, POST of the form, can be Chinese String name = new string (Request.getParameter ("name"). GetBytes ("ISO-8859-1"), "GB2312");

// can display the Chinese characters normally

The reason for causing the above differences in: Using the first encoding, when manually inputs the URL, because there is no calling [Request.setCharacterencoding "] Single bytes, it is displayed as question mark and garbled, and the second time passes the GET or POST EtOAc ("GB2312"). The second code is unified to receive "ISO-8859-1" and is unified to GB2312, so it can be displayed normally.

"> Click value:

It can enable the Text box to enter Chinese normally, Out.print () outputs normally. Removed Chinese data in the database

After that, it can be displayed normally. Question: Now write Chinese to the database, no matter how it is not normal, deposit

This is connected to the database in JavaBean: Connection conn = drivermanager.getConnection ("JDBC: mysql: // localhost / test? Useunicode = true & characterencoding = GB2312", "root", ""); reserved in JSP page

It can achieve the goal.

*********************************************************** *********************************************************** *********** No problem solves my problem: (******************************* *********************************************************** *********, for example: string name = "abcde"; cookie myck1 = new cookie ("ck1", name); response.addcookie (myck1); ------------ ---------------------------------------- String Sname, Svalue; cookie [] CKS = Request.getCookies (); for (int i = 0; i " Svalue " ");} Everything is normal ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ ----------------- But if Name is Chinese, why? What java.net.urlencoder.Encode, GetBytes ("ISO-8859-1") If you can't get it, you can help it, thank you ------------------------------------------------------------------------------------------------------------- -------------------------------- Original language sentence change to: sname = new string (CKS [i] .Getname () .getbytes ("ISO-8859-1"), "GBK"); svalue = new string (CKS [i] .GetValue (). getBytes ("ISO-8859-1", "GB K "); but pay attention to no coding operations before.

TEXT / HTML; Charset = GB2312

HTML

Text / html; charSet = GB2312 Thank you, using the above method to solve it! ! ! The solution is to modify the /webapps/web_inf/web.xml file in the Tomcat program directory, add a configuration code:

HTM

TEXT / HTML; Charset = GB2312

HTML

Text / html; charset = GB2312 can solve Tomcat browsing HTML garbled problems! If you want to solve all Tomcat all project garbled, you should modify the CONFWEB.XML file in the tomcat directory. Thank you! ! ! ! -------------------------------------------------- ------------- There is another solution to combine Tomcat and Apache, let Apache explain HTML, let Tomcat explains files such as JSP. Then modify simply modify the Apache configuration. Under the Apache's conf folder, httpd.conf, there is AddDefaultcharset ISO-8859-1, you change it to AddDefaultCharset GB2312.

转载请注明原文地址:https://www.9cbs.com/read-103604.html

9cbs

New Post(0)