Thesis index
Adjust JavatM I / O performance
Glen mccluskey translation: cheami
This article discusses and clarifies a variety of technologies that provide JavaTM I / O performance. Most technologies are adjusted to the disk file I / O, but some content is equally suitable for network I / O and window output. Part 1 technology discusses the underlying I / O problem, then discusses advanced I / O issues such as compression, formatting, and serialization. However, this discussion does not include application design issues, such as search algorithms and data structures, and no system level issues, such as file cache. When we discuss Java I / O, it is worth noting that the Java language takes two distinct disk file structures. One is based on byte stream and the other is a character sequence. One character in the Java language has two bytes, rather than a byte like a usual language such as C language. Therefore, the conversion needs to be read from one file read characters. This difference is very important in some cases, just like the following examples will be displayed. Low-level I / O problem
Accelerate I / O basic rules buffer read and write text file formatted price random access advanced I / O problem
Compressed Cyber Buffer Decomposition Serial Get File Information More Information Accelerated I / O Basic Rules As the beginning of this discussion, there are several basic rules for Accelerate I / O:
Avoid access disk Avoid access to the underlying operating system Avoid way to avoid individual processing bytes and characters. It is clear that these rules cannot be avoided on all issues, because if it can be, there is no actual I / O execution. Consider the three sections of the new line characters ('/ n') in the calculation file below. Method 1: The first method of the Read method Simple uses FileInputStream's Read method:
Import java.io. *;
Public class intro1 {
Public static void main (string args []) {
IF (args.length! = 1) {
System.err.Println ("Missing FileName");
System.exit (1);
}
Try {
FileInputStream Fis =
New fileInputstream (args [0]);
INT CNT = 0;
INT B;
While ((b = fis.read ())! = -1) {
IF (b == '/ n')
CNT ;
}
fis.close ();
System.out.println (CNT);
}
Catch (IOException E) {
System.err.Println (e);
}
}
}
However, this method triggers a large number of underlying runtime system calls -fileinputstream.read - returns the next byte of the next byte of the file. Method 2: Use a large buffer second method using a large buffer to avoid the above problem:
Import java.io. *;
Public class intro2 {
Public static void main (string args []) {
IF (args.length! = 1) {
System.err.Println ("Missing FileName");
System.exit (1);
}
Try {
FileInputStream Fis =
New fileInputstream (args [0]);
BufferedInputStream Bis =
New BufferedInputStream (FIS);
INT CNT = 0;
INT B;
While ((b = bis.read ())! = -1) {
IF (b == '/ n')
CNT ;
bis.close ();
System.out.println (CNT);
}
Catch (IOException E) {
System.err.Println (e);
}
}
}
BufferedInputStream.Read gets the next byte from the input buffer, just only the primary system is accessed. Method 3: Direct buffering third method to avoid using bufferedInputStream and directly buffer, so exclude calls of the Read method:
Import java.io. *;
Public class intro3 {
Public static void main (string args []) {
IF (args.length! = 1) {
System.err.Println ("Missing FileName");
System.exit (1);
}
Try {
FileInputStream Fis =
New fileInputstream (args [0]);
BYTE BUF [] = New byte [2048];
INT CNT = 0;
Int n;
While ((n = fis.read (buf))! = -1) {
For (int i = 0; i IF (BUF [i] == '/ n') CNT ; } } fis.close (); System.out.println (CNT); } Catch (IOException E) { System.err.Println (e); } } } For a 1 MB of input file, execution time in seconds is: INTRO1 6.9 INTRO2 0.9 INTRO3 0.4 Or in the slowest method and the fastest method are different from 1 to 1. This huge acceleration does not prove that you should always use the third method, that is, you can do it yourself. This may be an incorrect tendency, especially when the file is handled, there is no careful implementation. It does not have other methods in readability. But remember how it costs and how to correct it when necessary. Method 2 may be a "correct" method for most applications. Buffering methods 2 and 3 use buffering techniques, and large block files are read from disk and then access one byte or character each time. Buffering is a basic and important accelerated I / O technology, and there are several types of support buffers (BufferedInputStream is used for bytes, BufferedReader is used for characters). A obvious problem is: Is the bigger the buffer I / O faster? A typical Java buffer length 1024 or 2048 bytes, a larger buffer may accelerate I / O but can only account for a small proportion, about 5 to 10%. Method 4: The extreme situation of the entire file buffer is to determine the length of the entire file in advance, then read the entire file: Import java.io. *; Public class readfile { Public static void main (string args []) { IF (args.length! = 1) { System.err.Println ("Missing FileName"); System.exit (1); } Try { INT LEN = (int) (new file (args [0]). Length ()); FileInputStream Fis = New fileInputstream (args [0]); BYTE BUF [] = new byte [len]; Fis.read (BUF); fis.close (); INT CNT = 0; For (int i = 0; i CNT ; } System.out.println (CNT); } Catch (IOException E) { System.err.Println (e); } } } This method is very convenient, here the document is used as a byte array. But there is a significant problem that there may not be enough memory for a huge file. Another aspect of the buffer is to output to the text of the window terminal. By default, System.out (a printStream) is a row buffer, which means that the output buffer is submitted after encountering a new line. This is important for interaction, in which case you might like to display an input prompt before the actual output. Method 5: Close line buffer buffer can be banned, like the following example: Import java.io. *; Public class bufout { Public static void main (string args []) { FileOutputStream Fdout = New fileoutputstream (FileDescriptor.out); BufferedoutputStream Bos = New BufferedOutputStream (FDOUT, 1024); PrintStream PS = New PrintStream (BOS, FALSE); System.setOut (PS); Final INT N = 100000; For (INT i = 1; i <= n; i ) System.out.println (i); ps.close (); } } This program output integer 1 to 100,000 default output, three times faster than in the default row buffer. The buffer is also an important part of the examples that will be displayed below, where there is a buffer being used to accelerate file random access. The read and write text files have earlier that the consumption of methods called from the characters from the file may be significant. This problem can also be found in another example of the number of rows of the text file. : Import java.io. *; Public class line1 { Public static void main (string args []) { IF (args.length! = 1) { System.err.Println ("Missing FileName"); System.exit (1); } Try { FileInputStream Fis = New fileInputstream (args [0]); BufferedInputStream Bis = New BufferedInputStream (FIS); DataInputStream DIS = New DataInputStream (bis); INT CNT = 0; While (dishdline ()! = null) CNT ; Dis.close (); System.out.println (CNT); } Catch (IOException E) { System.err.Println (e); } } } This program uses an old DataInputStream.Readline method, which is implemented using the Read method of reading each character. A new method is: Import java.io. *; Public class line2 { Public static void main (string args []) { IF (args.length! = 1) { System.err.Println ("Missing FileName"); System.exit (1); } Try { FileReader fr = new fileReader (args [0]); BufferedReader Br = New BufferedReader (FR); INT CNT = 0; While (Br.Readline ()! = null) CNT ; br.close (); System.out.println (CNT); } Catch (IOException E) { System.err.Println (e); } } } This method is faster. For example, on a 6 MB text file with 200,000 rows, the second program is about 20% faster than the first. But even if the second program is not faster, the first program still has an important issue to pay attention. The first program caused a warning under the Javatm 2 compiler because DataInputStream.Readline is too old. It cannot be properly converted into characters, so it may be an inappropriate selection when operating a text file containing a non-ASCII character. (Java language uses Unicode character set rather than ASCII) This is the difference between the byte stream and the character stream mentioned earlier. A program like this: Import java.io. *; Public class conv1 { Public static void main (string args []) { Try { FileOutputStream Fos = New FileOutputStream ("OUT1"); PrintStream PS = New PrintStream (FOS); Ps.println ("/ UFFF / U4321 / U1234"); ps.close (); } Catch (IOException E) { System.err.Println (e); } } } Write a file, but there is no saving the actual Unicode character output. The Reader / Writer I / O class is character-based, designed to solve this problem. OutputStreamWriter is applied to byte encoded characters. A program that uses PrintWriter Write a Unicode character is like this: Import java.io. *; Public class conv2 { Public static void main (string args []) { Try { FileOutputStream Fos = New FileOutputStream ("OUT2"); OutputStreamWriter OSW = New OutputStreamWriter (FOS, "UTF8"); PrintWriter PW = New PrintWriter (OSIW); PW.Println ("/ UFFFF / U4321 / U1234"); PW.Close (); } Catch (IOException E) { System.err.Println (e); } } } This program uses UTF8 encoding, with ASCII text is itself and other characters are two or three bytes of characteristics. The formatted price is actually written to the file only part of the output price. Another considerable price is data formatting. Consider a three-part program, it is like this below: The Square of 5 IS 25 Method 1 The first method simply outputs a fixed string to understand the inherent I / O overhead: PUBLIC class format1 { Public static void main (string args []) { Final int count = 25000; for (int i = 1; i <= count; i ) { String s = "The Square of 5 IS 25 / N"; System.out.print (s); } } } Method 2 The second method uses a simple format " ": PUBLIC class format2 { Public static void main (string args []) { INT n = 5; Final Int count = 25000; For (int i = 1; i <= count; i ) { String s = "The Square of" N "IS" N * n "/ n"; System.out.print (s); } } } Method 3 The third method uses the MessageFormat class in the java.text package: Import java.text. *; PUBLIC CLASS format3 { Public static void main (string args []) { MessageFormat FMT = New MessageFormat ("The Square of {0} IS {1} / N"); Object values [] = new object [2]; INT n = 5; VALUES [0] = new integer (n); VALUES [1] = new integer (n * n); Final Int count = 25000; For (int i = 1; i <= count; i ) { String s = fmt.format (values); System.out.print (s); } } } These programs produce the same output. Runtime is: Format1 1.3 Format2 1.8 Format3 7.8 Or the slowest and fastest approximately 6 to 1. If the format does not precompile, the third method will be slower, use a static method instead: method 4MessageFormat.Format (string, object []) Import java.text. *; PUBLIC CLASS FORMAT4 { Public static void main (string args []) { String fmt = "The Square of {0} IS {1} / n"; Object values [] = new object [2]; INT n = 5; VALUES [0] = new integer (n); VALUES [1] = new integer (n * n); Final Int count = 25000; For (int i = 1; i <= count; i ) { String s = MessageFormat.Format (FMT, VALUES); System.out.print (s); } } } This takes more than 1/3 time than the previous example. The third method is slower than the first two ways, does not mean that you should not use it, but you have to realize time overhead. In internationalization, information format is important, and applications that care about this issue usually read formats from a binding resource and then use it. Random access randomaccessFile is a class that performs random file I / O (at byte hierarchy). This class provides a SEEK method, and similar to C / C , moving file pointers to any location, and then from that position byte to be read or written. The SEEK method has access to the underlying runtime system therefore it is often consumed. A better replace is to build your own buffer on the RandomaccessFile and implement a direct byte Read method. The parameter of the READ method is byte offset (> = 0). One example is: Import java.io. *; Public class readrandom { Private static final int default_bufsize = 4096; PRIVATE RANDOMACCESSFILE RAF; Private Byte Inbuf []; PRIVATE long startpos = -1; PRIVATE long endpos = -1; Private int buffsize; Public Readrandom (String Name) Throws filenotfoundexception { This (name, default_bufsize); } Public Readrandom (String Name, INT B) Throws filenotfoundexception { Raf = new randomaccessfile (Name, "R"); Bufsize = B; Inbuf = new byte [buffsize]; } Public int od (long POS) { IF (POS Long blockstart = (pOS / bufsize) * bufsize; Int n; Try { Raf.seek (blockstart); n = raf.read (inbuf); } Catch (IOException E) { Return -1; } STARTPOS = blockstart; ENDPOS = BlockStart N - 1; IF (POS Return -1; } Return Inbuf [(int) (POS - STARTPOS)] & 0xFFFF; } Public void close () throws oException { Raf.close (); } Public static void main (string args []) { IF (args.length! = 1) { System.err.Println ("Missing FileName"); System.exit (1); } Try { Readrandom rr = new readrandom (Args [0]); Long POS = 0; INT C; BYTE BUF [] = New byte [1]; While ((c = r.read (pOS))! = -1) { POS ; BUF [0] = (Byte) C; System.out.write (buf, 0, 1); } r.close (); Catch (IOException E) { System.err.Println (e); } } } This program is simple to read byte sequences and then output them. If there is access location, this technology is useful, and the nearby bytes in the file are almost simultaneously read. For example, if you implement two bismuth findings on a sorted file, this method may be useful. If you do random access at any of a huge file, there is no great value. Compressed Java provides classes for compressing and unpacking the word stream, which are included in the java.util.zip package, which also serves as a service base for JAR files (JAR files are ZIP files with attached files). The following program receives an input file and writes it into a compressed ZIP file: Import java.io. *; Import java.util.zip. *; Public class compress { Public Static Void Doot String filein, String fileout ) { FileInputStream Fis = NULL; Fileoutputstream fos = NULL; Try { FIS = New FileInputStream (filein); FOS = New FileOutputStream (Fileout); Zipoutputstream ZOS = New ZipOutputStream (FOS); ZIPENTRY ZE = New ZipeN; Zos.putNextentry (ZE); Final int buffs = 4096; BYTE INBUF [] = new byte [bufsiz]; Int n; While ((n = fis.read (inbuf))! = -1) Zos.Write (Inbuf, 0, N); fis.close (); FIS = NULL; Zos.Close (); FOS = NULL; } Catch (IOException E) { System.err.Println (e); } Finally { Try { IF (FIS! = NULL) fis.close (); IF (fos! = null) Fos.close (); } Catch (IOException E) { } } } Public static void main (string args []) { IF (args.length! = 2) { System.err.Println ("Missing FileNames"); System.exit (1); } IF (args [0] .equals (args [1])) { System.err.Println ("Filenames Are Identical"); System.exit (1); } DOIT (Args [0], Args [1]); } } The next program performs the opposite process, put a hypothesis only one ZIP file as the input and then extract it to the output file: Import java.io. *; Import java.util.zip. *; Public class uncompress { Public Static Void Doot String filein, String fileout ) { FileInputStream Fis = NULL; Fileoutputstream fos = NULL; Try { FIS = New FileInputStream (filein); FOS = new fileoutputstream (fileout); zipinputstream ZIS = new zipinputstream (fis); ZIPENTRY ZE = ZIPENTRY = ZIS.GETNEXTENTRY (); Final int buffs = 4096; BYTE INBUF [] = new byte [bufsiz]; Int n; While ((n = zis.read (inbuf, 0, bufsiz)! = -1) Fos.write (Inbuf, 0, N); Zis.Close (); FIS = NULL; Fos.close (); FOS = NULL; } Catch (IOException E) { System.err.Println (e); } Finally { Try { IF (FIS! = NULL) fis.close (); IF (fos! = null) Fos.close (); } Catch (IOException E) { } } } Public static void main (string args []) { IF (args.length! = 2) { System.err.Println ("Missing FileNames"); System.exit (1); } IF (args [0] .equals (args [1])) { System.err.Println ("Filenames Are Identical"); System.exit (1); } DOIT (Args [0], Args [1]); } } Compression is improved or damage I / O performance relies on your hardware configuration, especially with the speed of the processor and disk drive. Compression using ZIP technology usually means 50% in the size of the data, but the cost is compressed and decompressed. A huge (5 to 10 MB) compressed text file, using a 300-MHz Pentium PC with an IDE hard drive can be read from the hard disk than about 1/3 of the time. A useful example of compression is to write data to a very slow medium such as a floppy disk. Using a high speed processor (300 MHz Pentium) and a low-speed floppy drive (a normal floppic drive on the PC) to compress a huge text file and then write a floppy disk than 50%. A detailed discussion of the cache of the cache about the hardware is exceeded in the scope of this article. However, in some cases software cache can be used to accelerate I / O. Consider reading a line from a text file in a random order, a way to do this is to read all rows, then store them into an ArrayList (a collection class similar to vector): Import java.io. *; Import java.util.arraylist; Public class linecache { Private arraylist list = new arraylist (); Public linecache (string fn) throws oException { FileReader fr = new fileReader (fn); BufferedReader Br = New BufferedReader (FR); String ln; While ((ln = br.readline ())! = null) List.Add (ln); br.close (); } Public String getLine (int N) { IF (n <0) Throw new illegalargumentException (); Return (n (String) list.get (n): null; } Public static void main (string args []) {if (args.length! = 1) { System.err.Println ("Missing FileName"); System.exit (1); } Try { Linecache lc = new linecache (args [0]); INT i = 0; String ln; While ((ln = lc.getline (i ))! = NULL) System.out.println (ln); } Catch (IOException E) { System.err.Println (e); } } } The GetLine method is used to get any row. This technology is very useful, but it is clear that there is too much memory for a large file, so there is limitations. One replacement method is to simply remember the most recent 100 lines of the requested row, and other requests are read directly from the disk. This arrangement is useful in local access, but there is not much role in real random access. Decomposition Decomposition refers to a process of dividing byte or character sequence into a logical block like a word. Java provides a StreamTokenizer class, like this: Import java.io. *; Public class token1 { Public static void main (string args []) { IF (args.length! = 1) { System.err.Println ("Missing FileName"); System.exit (1); } Try { FileReader fr = new fileReader (args [0]); BufferedReader Br = New BufferedReader (FR); StreamTokenizer ST = New StreamTokenizer (Br); St.ResetSyntax (); St.Wordchars ('A', 'Z'); Int tok; While ((tok = st.nextToken ())! = StreamTokenizer.tt_eof) { IF (tok == streamTokenizer.tt_word) // St.sval Has token } br.close (); } Catch (IOException E) { System.err.Println (e); } } } This example decomposes lowercase words (letters A-Z). If you implement the same function, it may be like this: Import java.io. *; Public class token2 { Public static void main (string args []) { IF (args.length! = 1) { System.err.Println ("Missing FileName"); System.exit (1); } Try { FileReader fr = new fileReader (args [0]); BufferedReader Br = New BufferedReader (FR); INT MAXLEN = 256; INT Currlen = 0; Char Wordbuf [] = new char [maxlen]; INT C; Do { C = br.read (); IF (c> = 'a' && c <= 'z') { IF (currlen == maxlen) {Maxlen * = 1.5; Char Xbuf [] = New char [Maxlen]; System.ArrayCopy Wordbuf, 0, Xbuf, 0, currlen; Wordbuf = Xbuf; } WordBuf [Currlen ] = (char) C; } Else IF (Currlen> 0) { String s = new string (WordBuf, 0, currlen; // Do Something with s Currlen = 0; } } while (c! = -1); br.close (); } Catch (IOException E) { System.err.Println (e); } } } The second program is about 20% faster than the previous run, and the price is written some subtle underlying code. StreamTokenizer is a mixed class that is read from a character stream (such as BufferedReader), but in the form of bytes, all characters as double-bytes (greater than 0xFF), even if they are alphanumeric characters. Serialized serialization converts any Java data structure into a byte stream in standard format. For example, the following program output random integer array: Import java.io. *; Import java.util. *; Public class serial1 { Public static void main (string args []) { ArrayList Al = New ArrayList (); Random rn = new random (); Final INT N = 100000; For (INT i = 1; i <= n; i ) Al.Add (new integer ()); Try { FileOutputStream Fos = New FileOutputStream ("Test.ser"); BufferedoutputStream Bos = New BufferedoutputStream (FOS); ObjectOutputStream OOS = New ObjectOutputStream (BOS); Oos.WriteObject (al); OOS.Close (); } Catch (throwable e) { System.err.Println (e); } } } And the following program reads an array: Import java.io. *; Import java.util. *; PUBLIC CLASS Serial2 { Public static void main (string args []) { ArrayList al = NULL; Try { FileInputStream Fis = New FileInputStream ("Test.ser"); BufferedInputStream Bis = New BufferedInputStream (FIS); ObjectInputStream OIS = New ObjectInputStream (bis); Al = (arraylist) Ois.ReadObject (); Ois.Close (); } Catch (throwable e) { System.err.Println (e); } } } Note We use buffering to increase the speed of I / O operation. Is there a way to output a lot of data faster than serialization and then read back? There may be no unless it is in special cases. For example, suppose you decide to output the text to a 64-bit integer rather than a group of 8 bytes. The maximum length of the long integer as the text is approximately 20 characters, or 2.5 times longer than the binary representation. This format doesn't seem quickly. However, in some cases, such as a bitmap, a special format may be an improvement. However, using your own scheme rather than serialization will enable you into some weighing. In addition to serializing actual I / O and formatted overhead (using DataInputStream and DataOutputStream), there are other overhead, such as the need to create new objects when serialization recovery. Note that the DataOutputStream method can also be used to develop semi-customized data formats, such as Import Java.io. *; Import java.util. *; Public class binary1 { Public static void main (string args []) { Try { FileOutputStream Fos = New FileOutputStream ("Outdata"); BufferedoutputStream Bos = New BufferedoutputStream (FOS); DataOutputStream DOS = New DataOutputStream (BOS); Random rn = new random (); Final INT N = 10; dos.writeint (n); For (INT i = 1; i <= n; i ) { INT r = rn.nextint (); System.out.println (r); Dos.Writeint (r); } dos.close (); } Catch (IOException E) { System.err.Println (e); } } } with: Import java.io. *; Public class binary2 { Public static void main (string args []) { Try { FileInputStream Fis = New FileInputStream ("Outdata"); BufferedInputStream Bis = New BufferedInputStream (FIS); DataInputStream DIS = New DataInputStream (bis); INT n = dish.readint (); For (INT i = 1; i <= n; i ) { INT r = dish.readint (); System.out.println (r); } Dis.close (); } Catch (IOException E) { System.err.Println (e); } } } These programs write 10 intenses to the file and then read them back. Get file information so far we discus our discussion around a single file input. But acceleration I / O performance has on the other hand - and the file characteristics are obtained. For example, consider a small program that prints the length of the file: Import java.io. *; Public class longth1 { Public static void main (string args []) { IF (args.length! = 1) { System.err.Println ("Missing FileName"); System.exit (1); } File f = new file (args [0]); Long len = f.length (); System.out.println (LEN); } } Java runtime does not know the length of the file, so you must query the underlying operating system to get this information. For additional information for the file, this is also true, such as whether the file is a directory, the file last modification time, etc. The FILE class in the java.io package provides a way to query this information. These methods are generally overhead in time, so it should be as few as possible. Below is a longer example of a query file information, which recursively writes all file paths: Import java.io. *; Public class roots { Public static void visit (file f) { System.out.println (f); } Public static void walk (file f) { Visit (f); IF (f.IsDirectory ()) { String List [] = f.list (); For (int i = 0; i Walk (New File (f, List [i])); } } Public static void main (string args []) { File List [] = file.listroots (); For (int I = 0; i IF (List [i] .exists ()) Walk (List [i]); Else System.err.Println ("Not Accessible:" List [I]); } } } This example uses the File method, such as ISDirectory and Exists, through the directory structure. Each file is queried once its type (ordinary file or directory). More information papers: JDC performance skills and firewall tunnel technology Discuss some general methods for improving Java applications. Some of the problems found above, other problems with the underlying problem. Don Knuth's book, computer programming art, third volume, discussing sorting and search algorithms, such as using B. About the author Glen McCluskey has focused on programming languages since 1998. He considers Java and C language performance, test, and technical documentation.