Article Source: http://www.blogjava.net/gf7/28140.aspx? Pending = true # PostPost
This article discusses and clarifies a variety of technologies that provide JavaTM I / O performance. Most technologies are adjusted to the disk file I / O, but some content is equally suitable for network I / O and window output. Part 1 technology discusses the underlying I / O problem, then discusses advanced I / O issues such as compression, formatting, and serialization. However, this discussion does not include application design issues, such as search algorithms and data structures, and no system level issues, such as file cache. When we discuss Java I / O, it is worth noting that the Java language takes two distinct disk file structures. One is based on byte stream and the other is a character sequence. One character in the Java language has two bytes, rather than a byte like a usual language such as C language. Therefore, the conversion needs to be read from one file read characters. This difference is very important in some cases, just like the following examples will be displayed. Low-level I / O I Problem Accelerates I / O Basic Rules Buffed Read Write Text File Format Random Access Advanced I / O Questions Compressed Cyber Buffer Decomposition Serialization Get File Information More Information Accelerated I / O Basic Rules as The beginning of this discussion, here there are several basic rules to accelerate I / O: Avoid access disk avoid access to the underlying operating system Avoid ways to avoid individual processing bytes and characters. It is obvious that these rules cannot be avoided on all issues. Because if it can be, there is no actual I / O execution. Consider the three sections of the new line characters ('/ n') in the calculation file below. Method 1: Reading method The first method is simple to use FileInputStream's read method: import java.io. *; public class intro1 {public static void main (string args [= 1) {ix (args.lend ") {system .rr.println ("missing filename"); system.exit (1);} tryinputstream fis = new fileinputstream (args [0]); int CNT = 0; int b; while ((b = fis.read) ))))! = -1) {IF (b == '/ n') CNT ;} fis.close (); system.out.println (ccent);} catch (ioException e) {system.err.println (e) );}}} However, this method triggers a large number of underlying runtime system calls -fileinputstream.read - returns the next byte of the next byte of the file.
Method 2: Use a large buffer second method using a large buffer to avoid the above problem: Import java.io. *; public class intro2 {public static void main (string args []) {if (args.length! = 1) {System.err.println ( "missing filename"); System.exit (1);} try {FileInputStream fis = new FileInputStream (args [0]); BufferedInputStream bis = new BufferedInputStream (fis); int cnt = 0 ; Int b; while ((b = bisread ())! = -1) {if (b == '/ n') CNT ; bis.close (); system.out.println (cnt);} Catch (IOException E) {system.err.println (e);}}} BufferedInputStream.read The next byte is obtained from the input buffer, only the primary system is only accessed. Method 3: Direct buffering third method to avoid using BufferedInputStream and directly buffer, thus excluding the call of the Read method: import java.io. *; public class intro3 {public static void main (string args []) {IF (args. Length! = 1) {System.err.Println ("missing filename"); system.exit (1);} tryinputstream fis = new fileinputstream (args [0]); byte buf [] = new byte [2048] ; Int CNT = 0; int N; while ((n = fis.read (buf))! = -1) {for (int i = 0; i Method 2 may be a "correct" method for most applications. Buffering methods 2 and 3 use buffering techniques, and large block files are read from disk and then access one byte or character each time. Buffering is a basic and important accelerated I / O technology, and there are several types of support buffers (BufferedInputStream is used for bytes, BufferedReader is used for characters). A obvious problem is: Is the bigger the buffer I / O faster? A typical Java buffer length 1024 or 2048 bytes, a larger buffer may accelerate I / O but can only account for a small proportion, about 5 to 10%. Method 4: The extreme situation of the entire file buffer is to determine the length of the entire file in advance, then read the entire file: import java.io. *; public class readfile {public static void main (string args []) {if (args.length ! = 1) {System.err.Println ("missing filename"); system.exit (1);} try {int LEN = (int) (new file (args [0]). Length ()); FileInputStream Fis = New fileinputstream (args [0]); byte buf [] = new byte [len]; fis.read (buf); fis.close (); int CNT = 0; for (int i = 0; i Method 5: Close-down routine buffer can be disabled, like the example below: Import java.io. *; public class bufout {public static void main (string args []) {fileoutputstream fdout = new fileoutputstream (fileDescriptor.out) BufferedoutputStream Bos = New BufferedoutputStream (FDOUT, 1024); PrintStream PS = New PrintStream (BOS, FALSE); System.Setout (PS); Final Int n = 100000; for (int i = 1; i <= n; i ) System.out.println (i); ps.close ();}} This program output integer 1 to 100000 default output, three times faster than in the default row buffer. The buffer is also an important part of the examples that will be displayed below, where there is a buffer being used to accelerate file random access. The read and write text files have earlier that the consumption of methods called from the characters from the file may be significant. This problem can also be found in another example of the number of rows of the text file. : Import java.io. *; public class line1 {public static void main (string args []) {if (args.length! = 1) {system.err.println ("missing filename); system.exit (1 );} try {FileInputStream fis = new FileInputStream (args [0]); BufferedInputStream bis = new BufferedInputStream (fis); DataInputStream dis = new DataInputStream (bis); int cnt = 0; while (dis.readLine ()! = null ) CNT ; dis.close (); system.out.println (cnt);} catch (ioException e) {system.err.println (e);}}} This program uses old DataInputStream.readline method, this method is Use the read method to read each character. A new method is: import java.io. *; public class line2 {public static void main (string args []) {if (args.length! = 1) {system.err.println ("missing filename); system .exit (1);} tryereader (args [0]); bufferedreader br = new bufferedreader (fr); int CNT = 0; while (br.readline ()! = null) CNT ; Br. Close (); system.out.println (cnt);} catch (ioException e) {system.err.println (e);}}} This method is faster. For example, on a 6 MB text file with 200,000 rows, the second program is about 20% faster than the first. But even if the second program is not faster, the first program still has an important issue to pay attention. The first program caused a warning under the Javatm 2 compiler because DataInputStream.Readline is too old. It cannot be properly converted into characters, so it may be an inappropriate selection when operating a text file containing a non-ASCII character. (Java language uses Unicode character set rather than ASCII) This is the difference between the byte stream and the character stream mentioned earlier. A program like this: import java.io. *; public class conv1 {public static void main (String args []) {try {FileOutputStream fos = new FileOutputStream ( "out1"); PrintStream ps = new PrintStream (fos); Ps.Println ("/ UFFFF / U4321 / U1234"); ps.close ();} catch (ooException e) {system.err.println (e);}}} written in a file, but no actual Unicode characters output. The Reader / Writer I / O class is character-based, designed to solve this problem. OutputStreamWriter is applied to byte encoded characters. Unicode characters using a program written PrintWriter is such that: import java.io. *; public class conv2 {public static void main (String args []) {try {FileOutputStream fos = new FileOutputStream ( "out2"); OutputStreamWriter osw = New OutputStreamWriter (FOS, "UTF8"); PrintWriter PW = New PrintWriter (OSIW); PW.Println ("/ UFFFF / U4321 / U1234); PW.Close ();} catch (ioException e) {system.err .println (e);}}} This program uses UTF8 encoding, with ASCII text is itself and other characters are two or three bytes of characteristics. The formatted price is actually written to the file only part of the output price. Another considerable price is data formatting. Consider a three-part program, it is like the following line: The Square of 5 IS 25 Method 1 The first method is simple to output a fixed string, understand the inherent I / O overhead: public class format1 {public static void main (String args []) {final int count = 25000; for (int i = 1; i <= count; i ) {string s = "the square of 5 IS 25 / n"; system.out.print (s) }}} Method 2 Second method uses simple format " ": public class format2 {public static void main (string args []) {int n = 5; Final Int count = 25000; for (int i = 1; I <= count; i ) {string s = "the Square of" "IS" N * n "/ n"; system.out.print (s);}}} method 3 third method Using the MessageFormat class in the java.text package: import java.text. *; Public class format3 {public static void main (string args []) {messageformat fmt = new messageformat ("The Square of {0} is {1} / N "); Object Values [] = new object [2]; int N = 5; values [0] = new integer (n); values [1] = new integer (n * n); final int count = 25000; For (INT i = 1; i <= count; i ) {stri NG S = fmt.format (value); system.out.print (s);}}} These programs produce the same output. Running time is: format1 1.3 Format2 1.8 Format3 7.8 or the slowest and fastest is about 6 to 1. If the format does not pre-compiling the third method will slower, use a static method instead: method 4MESSAGEFORMAT.FORMAT (String, Object []) Import java.text. *; Public class format4 {public static void main (String args " ) {String FMT = "The Square of {0} IS {1} / n"; Object Values [] = New Object [2]; int N = 5; values [0] = new integer (n); values [1 ] = New integer (n * n); final int count = 25000; for (int i = 1; i <= count; i ) {string s = messageformat.format (fmt, values); system.out.print (S );}}} This takes more than 1/3 time than the previous example. The third method is slower than the first two ways, does not mean that you should not use it, but you have to realize time overhead. In internationalization, information format is important, and applications that care about this issue usually read formats from a binding resource and then use it. Random access randomaccessFile is a class that performs random file I / O (at byte hierarchy). This class provides a SEEK method, and similar to C / C , moving file pointers to any location, and then from that position byte to be read or written. The SEEK method has access to the underlying runtime system therefore it is often consumed. A better replace is to build your own buffer on the RandomaccessFile and implement a direct byte Read method. The parameter of the READ method is byte offset (> = 0). An example of this is: import java.io. *; public class ReadRandom {private static final int DEFAULT_BUFSIZE = 4096; private RandomAccessFile raf; private byte inbuf []; private long startpos = -1; private long endpos = -1; private int bufsize; public ReadRandom (String name) throws FileNotFoundException {this (name, DEFAULT_BUFSIZE);} public ReadRandom (String name, int b) throws FileNotFoundException {raf = new RandomAccessFile (name, "r"); bufsize = b; inbuf = New byte [buffsize];} public int} {if (pOS If there is access location, this technology is useful, and the nearby bytes in the file are almost simultaneously read. For example, if you implement two bismuth findings on a sorted file, this method may be useful. If you do random access at any of a huge file, there is no great value. Compressed Java provides classes for compressing and unpacking the word stream, which are included in the java.util.zip package, which also serves as a service base for JAR files (JAR files are ZIP files with attached files). The following program receives an input file and writes it to a compressed ZIP file: import java.io. *; import java.util.zip. *; Public class compress {public static void DOIT (String filein, String fileout) {FileInputStream fis = null; FileOutputStream fos = null; try {fis = new FileInputStream (filein); fos = new FileOutputStream (fileout); ZipOutputStream zos = new ZipOutputStream (fos); ZipEntry ze = new ZipEntry (filein); ZOS.PUTNEXTENTRY (ZE); Final Int Bufsiz = 4096; Byte Inbuf [] = new byte [buffs]; int N; while ((n = fis.read)! = -1) zos.write (Inbuf, 0, N); fis.close (); FIS = null; zos.close (); fos = null;} catch (ooException e) {system.err.println (e);} finally {try {if (FIS! = NULL) fis.close (); if (fos! = Null) fos.close (); } Catch (ioException e) {}}} public static void main (string args []) {if (args.length! = 2) {system.err.println ("missing filenames"); system.exit (1); } IF (args [0]) {system.err.println ("filenames are identical"); system.exit (1);} DOIT (args [0], args [1]) }} The next program performs the opposite process, putting a set of only one ZIP file as an input and then extracts to the output file: import java.io. *; import java.util.zip. *; Public class uncompress { Public static void doit (string filein, string fileout) {fileInputstream fis = null; FileOutputStream fos = null; fos = new FileOutputStream (fileout);; ZipInputStream zis = new ZipInputStream (fis); ZipEntry ze = zis.getNextEntry (); final int BUFSIZ = 4096; {fis = new FileInputStream (filein) try byte inbuf [ ] = New byte [buffs]; int N; while ((n = zis.read (inbuf, 0, bufsiz))! = -1) fos.write (Inbuf, 0, n); zis.close (); FIS = NULL; fos.close (); fos = null;} catch (ioException e) {system.err.println (e);} finally {try {if (fis! = Null) fis.close (); if (FOS ! = Null) fos.close ();} catch (ooexception e) {}}} public static void main (string args []) {if (args.length! = 2) {system.err.println ("Missing FileNames "); System.exit (1);} if (args [0] .Equals (args [1])) {system.err.println (" Filenames Are Identical "); System.exit (1);} DOIT Args [0], Args [1]);}} The compression is improved or the I / O performance relies on your hardware configuration, especially the speed of the processor and the disk drive. Compression using ZIP technology usually means 50% in the size of the data, but the cost is compressed and decompressed. A huge (5 to 10 MB) compressed text file, using a 300-MHz Pentium PC with an IDE hard drive can be read from the hard disk than about 1/3 of the time. A useful example of compression is to write data to a very slow medium such as a floppy disk. Using a high speed processor (300 MHz Pentium) and a low-speed floppy drive (a normal floppic drive on the PC) to compress a huge text file and then write a floppy disk than 50%. A detailed discussion of the cache of the cache about the hardware is exceeded in the scope of this article. However, in some cases software cache can be used to accelerate I / O. Consider reading a row in a random order in a text file, making a way to read all rows, then store them into an arraylist (a collection class similar to vector): import java.io. *; import java.util.ArrayList; public class lineCache {private ArrayList list = new ArrayList (); public lineCache (String fn) throws IOException {FileReader fr = new FileReader (fn); BufferedReader br = new BufferedReader (fr); String ln; While ((ln = br.readline ())! = null) List.add (ln); br.close ();} public string getLine (int N) {if (n <0) throw new illegalgumentException (); return (n Java provides StreamTokenizer class, like this: import java.io. *; public class token1 {public static void main (string args []) {if (args.length! = 1) {system.err.println ("Missing filename "); System.exit (1);} try {FileReader fr = new FileReader (args [0]); BufferedReader br = new BufferedReader (fr); StreamTokenizer st = new StreamTokenizer (br); st.resetSyntax (); St.WordChars ('a', 'z'); int tok; while ((tok = st.nextToken ())! = streamtokenizer.tt_eof) {if (tok == streamtokenizer.tt_word); // St.sval HAS Token} br.close ();} catch (ooException e) {system.err.println (e);}}} This example decomposes lowercase words (letter AZ). If you implement the same function, it may be like this: import java.io. *; public class token2 {public static void main (string args [= 1) {system.rr.println ("missing filename"); system.exit (1);} tryreader (args [0]); bufferedreader br = new bufferedreader (fr); int maxlen = 256; int curllen = 0; char Wordbuf [] = New char [maxlen]; int C; do {c = br.read (); if (c> = 'a' && c <= 'z') {if (currlen == maxlen) {Maxlen * = 1.5; char xbuf [] = new char [maxlen]; system.Arraycopy (WordBuf, 0, Xbuf, 0, currlen); WordBuf = Xbuf;} WordBuf [Currlen ] = (char) C;} else if (Currlen> 0 ) {String s = new string (wordbuf, 0, currlen); // do something with s currlen = 0;}} w Hile (c! = -1); br.close ();} catch (ioException e) {system.err.println (e);}}} The second program is about 20% more than the previous run, the price is written Some subtle underlying codes. StreamTokenizer is a mixed class that is read from a character stream (such as BufferedReader), but in the form of bytes, all characters as double-bytes (greater than 0xFF), even if they are alphanumeric characters. Serialized serialization converts any Java data structure into a byte stream in standard format. For example, the following program output random integer array: import java.io. *; import java.util. *; Public class serial1 {public static void main (string args []) {arraylist al = new arraylist (); random rn = New random (); Final Int n = 100000; for (INT i = 1; i <= n; i ) al.add (new integer (rn.nextint ())); tryoutputstream fos = new fileoutputstream ("TestStream" .ser "); BufferedOutputStream bos = new BufferedOutputStream (fos); ObjectOutputStream oos = new ObjectOutputStream (bos); oos.writeObject (al); oos.close ();} catch (Throwable e) {System.err.println (e );}}} And the following program read the number of groups: import java.io. *; import java.util. *; Public class serial2 {public static void main (string args [] {arraylist al = null; try {fileInputstream FIS = New FileInputStream ("Test.ser"); BufferedInputStream Bis = New BufferedInputStream (Fis); ObjectInputStream Ois = New ObjectInputStream (bis); al = (arraylist) ois.readObject (); ois.close ();} catch (throwable e) {system.err.println (e);}}} Note We use buffer improve I / O The speed of operation. Is there a way to output a lot of data faster than serialization and then read back? There may be no unless it is in special cases. For example, suppose you decide to output the text to a 64-bit integer rather than a group of 8 bytes. The maximum length of the long integer as the text is approximately 20 characters, or 2.5 times longer than the binary representation. This format doesn't seem quickly. However, in some cases, such as a bitmap, a special format may be an improvement. However, using your own scheme rather than serialization will enable you into some weighing. In addition to serializing actual I / O and formatted overhead (using DataInputStream and DataOutputStream), there are other overhead, such as the need to create new objects when serialization recovery. Note that the DataOutputStream method can also be used to develop semi-customized data formats, such as Import Java.IO. *; Import Java.util. *; Public Class Binary1 {Public Static Void Main (String Args []) {Try {fileoutputstream fos = new FileOutputStream ( "outdata"); BufferedOutputStream bos = new BufferedOutputStream (fos); DataOutputStream dos = new DataOutputStream (bos); Random rn = new Random (); final int N = 10; dos.writeInt (N); for (int i = 1; i <= n; i ) {int R = rn.nextint (); system.out.println (r); dos.miteint (r);} dos.close ();} catch (ioException e) {System.err.Println (e);}}} and: import java.io. *; public class binary2 {public static void main (string args []) {Try {fileInputstream fis = new fileInputstream ("OutData"); BufferedInputStream Bis = New BufferedInputStream (Fis); DataInputStream DIS = New DataInpu TSTREAM (BIS); int n = dish.readint (); for (int i = 1; i <= n; i ) {int r = dish.readint (); system.out.println (r);} DIS. CLOSE ();} catch (ooException e) {system.err.println (e);}}} These programs write 10 intenses to the file and read them back. Get file information so far we discus our discussion around a single file input. But acceleration I / O performance has on the other hand - and the file characteristics are obtained.