Compressing and Decompressing Data Using Javatm APIS
By Qusay H. Mahmoud with Contributions from Konstantin Kladkofebruary 2002
Many sources of information contain redundant data or data that adds little to the stored information. This results in tremendous amounts of data being transferred between client and server applications or computers in general. The obvious solution to the problems of data storage and information transfer is to install additional storage devices and expand existing communication facilities. to do so, however, requires an increase in an organization's operating costs. One method to alleviate a portion of data storage and information transfer is through the representation of data by more efficient code. This article presents a brief introduction to data compression and decompression, and shows how to compress and decompress data, efficiently and conveniently, from within your JavaTM applications using the java.util.zip package.
While it is possible to compress and decompress data using tools such as WinZip, gzip, and Java ARchive (or jar), these tools are used as standalone applications. It is possible to invoke these tools from your Java applications, but this is not a Straightforward Approach and Not An Effectient Solution. this is especially true if you wish to compress and decability Data on the fly (Before Transferring It to a remote machine for example). This Article:
Gives you a brief overview of data compression Describes the java.util.zip package Shows how to use this package to compress and decompress data Shows how to compress and decompress serialized objects to save disk space Shows how to compress and decompress data on the fly to Improve The Performance of Client / Server ApplicationsOverView of Data Compression
The Simplest Type of Redundancy in a File Is The Repetition of Characters. For Example, Consider The Following String:
Bbbbbhddxxxxkkkkkwwzzzz
. This string can be encoded more compactly by replacing each repeated string of characters by a single instance of the repeated character and a number that represents the number of times it is repeated The earlier string can be encoded as follows:
4B2H2D4X4K2W4Z
Here "4B" Means Four B's, Ando On. Compressing a string in this way is called run-length encoding.
AS Another Example, Consider The Storage of a Rectangular Image. AS A Single Color Bitmapped Image, IT CAN Be Stored As Shown In Figure 1.
Figure 1: a Bitmap with information for run-length encoding
Another Approach Might Be To Store The Image As a graphics Metafile:
Rectangle 11, 3, 20, 5
This Says, The Rectangle Starts At Coordinate (11, 3) of Width 20 and Length 5 Pixels.
The Rectangular Image Can Be Compressed with Run-Length Encoding by Counting Identical Bits as Follows:
0, 40
0, 40
0, 10 1, 20 0, 10
0, 10 1, 1 0, 18 1, 1 0, 10
0, 10 1, 1 0, 18 1, 1 0, 10
0, 10 1, 1 0, 18 1, 1 0, 10
0, 10 1, 20 0, 10
0,40
The first line above says that the first line of the bitmap consists of 40 0's. The third line says that the third line of the bitmap consists of 10 0's followed by 20 1's followed by 10 more 0's, and so on for the other lines. note that run-length encoding requires separate representations for the file and its encoded version. Therefore, this method can not work for all files. Other compression techniques include variable-length encoding (also known as Huffman Coding), and many others. for more information .
There are many benefits to data compression. The main advantage of it, however, is to reduce storage requirements. Also, for data communications, the transfer of compressed data over a medium results in an increase in the rate of information transfer. Note that data Compression Can Be Implement On Existing Hardware by Software Or Through The Use of Special Hardware Devices That Incorporate Compression Techniques. Figure 2 Shows A Basic Data-Compression Block Diagram.
Figure 2: Data-Compression Block Diagram
ZIP vs. Gzip
If you are working on Windows, you might be familiar with the WinZip tool, which is used to create a compressed archive and to extract files from a compressed archive. On UNIX, however, things are done a bit differently. The tar command is used TO CREATE AN Archive (NOT Compressed) and another program (gzip or compress) is buy to compress the archive.
Tools such as WinZip and PKZIP act as both an archiver and a compressor. They compress files and store them in an archive. On the other hand, gzip does not archive files. Therefore, on UNIX, the tar command is usually used to create an Archive kiln s used to compress the archived file.the java.util.zip package
Java provides the java.util.zip package for zip-compatible data compression. It provides classes that allow you to read, create, and modify ZIP and GZIP file formats. It also provides utility classes for computing checksums of arbitrary input streams that can be Used to Validate Input Data. This Package Provides ONTERFACE, FOURTEEN CLASSES, AND TWO Exception Classes As Shown in Table 1.
Table 1: The java.util.zip package
ItemTypeDescriptionChecksumInterfaceRepresents a data checksum. Implemented by the classes Adler32 and CRC32Adler32ClassUsed to compute the Adler32 checksum of a data streamCheckedInputStreamClassAn input stream that maintains the checksum of the data being readCheckedOutputStreamClassAn output stream that maintains the checksum of the data being writtenCRC32ClassUsed to compute the CRC32 checksum of a data streamDeflaterClassSupports general compression using the ZLIB compression libraryDeflaterOutputStreamClassAn output stream filter for compressing data in the deflate compression formatGZIPInputStreamClassAn input stream filter for reading compressed data in the GZIP file formatGZIPOutputStreamClassAn output stream filter for writing compressed data in the GZIP file formatInflaterClassSupports general decompression using the ZLIB compression LibraryinLfaterInputStreamClassan Input Stream Filter for Decompression Data In The Deflate Compression FormatzilentryClassRepresents A Zip Fil e entryZipFileClassUsed to read entries from a ZIP fileZipInputStreamClassAn input stream filter for reading files in the ZIP file formatZipOutputStreamClassAn output stream filter for writing files in the ZIP file formatDataFormatExceptionException ClassThrown to signal a data format errorZipExceptionException ClassThrown to signal a zip errorNote: The ZLIB compression library was Initially Developed As Part of The Portable Network Graphics (PNG) Standard That IS Not Protected by Patents.
Decompressing and extracting data from a zip file
The java.util.zip package provides classes for data compression and decompression. Decompressing a ZIP file is a matter of reading data from an input stream. The java.util.zip package provides a ZipInputStream class for reading ZIP files. A ZipInputStream can be created just any other input stream like for example, the following segment of code can be used to create an input stream for reading data from a ZIP file format:. FileInputStream fis = new FileInputStream ( "figs.zip");
ZipinputStream ZIN = New
ZipinputStream (New BufferedInputStream (FIS));
ONCE A ZIP INPUT Stream Is Opened, you can read the zip entries Using the getnextentry method. If the end-file is reached, getNextentry returns Null:
Zipentry Entry;
While ((entry = zin.getnextentry ())! = NULL) {
// extract data
// Open Output streams
}
Now, IT ITPUT Stream, Which Can Be Done As Follows:
Int buffer = 2048;
FileOutputStream Fos = New
FileoutPutStream (entry.getname ());
BufferedoutputStream Dest = New
BufferedoutputStream (FOS, BUFFER);
NOTE: in this segment of code we have used the
Bufferedoutputstream instead of there
ZipOutputStream. The
ZipoutputStream and the
GzipoutputStream Use Internal Buffer Sizes of 512. The use of the
Bufferedoutputstream Is Only Justified When Thepe of the Buffer Is Much More Than 512 (in this Example IT IS set to 2048). While T
Zipoutputstream Doesn't Allow You to Set The Buffer Size, In The Case of To
.
In this segment of code, a file output stream is created using the entry's name, which can be retrieved using the entry.getName method Source zipped data is then read and written to the decompressed stream:. While ((count = zin.read ( DATA, 0, BUFFER)! = -1) {
//System.out.write(X);
Dest.write (DATA, 0, Count);
}
And Finally, Close The Input and Output Streams:
dest.flush ();
dest.close ();
ZIN.CLOSE ();
The Source Program in Code Sample 1 Shows How To Decompress and Extract Files from A Zip Archive. To Test this Sample, Compile The Class and Run IT by Passing a Compressed File In Zip Format:
Prompt> Java Unzip Somefile.zip
Note That Somefile.zip Could Could Be A Zip Archive Created Using Any Zip-Compatible Tool, Such as Winzip.
Code Sample 1: unzip.java
Import java.io. *;
Import java.util.zip. *;
PUBLIC CLASS UNZIP {
Final int buffer = 2048;
Public static void main (String Argv []) {
Try {
BufferedoutputStream Dest = NULL;
FileInputStream Fis = New
FileInputStream (Argv [0]);
ZipinputStream ZIS = New
ZipinputStream (New BufferedInputStream (FIS));
Zipentry Entry;
While ((entry = zis.getNextentry ())! = null) {
System.out.println ("Extracting:" Entry);
INT country;
Byte data [] = new byte [buffer];
// write the files to the disk
FileOutputStream Fos = New
FileoutPutStream (entry.getname ());
Dest = new
BufferedoutputStream (FOS, BUFFER);
While (count = zis.read (data, 0, buffer)
! = -1) {
Dest.write (DATA, 0, Count);
}
dest.flush ();
dest.close ();
}
Zis.Close ();
} catch (exception e) {
E.PrintStackTrace ();
}
}
}
It is important to note that the ZipInputStream class reads ZIP files sequentially. The class ZipFile, however, reads the contents of a ZIP file using a random access file internally so that the entries of the ZIP file do not have to be read sequentially.Note : Another Fundamental Difference Between
ZipinputStream and
Zipfile Is in Terms of Caching. Zip Entries Are Not Cached When The File Is Read Using A Combination of
ZipinputStream and
FileInputStream. However, if the file isot
Zipfile (FileName) Then IT ITED INTERNALLY, SO IF
Zipfile (filename) IS Called Again The File Is ONLY ONCE. The cached value is used on the second.. if you work on Unix, IT IS Worth NOTING THAT ZIP FILES OPENED USING
Zipfile Are Memory Mapped, and Therefore The Performance of
Zipfile Is Superior To
ZipinputStream. If the contents of the same zip file, however
ZipinputStream is preferred.
This Is How A Zip File Can Be Decompressed Using The Zipfile Class:
Create a ZipFile object by specifying the ZIP file to be read either as a String filename or as a File object: ZipFile zipfile = new ZipFile ( "figs.zip"); Use the entries method, returns an Enumeration object, to loop through all The zipentry Objects of the file: while (E.hasMoreElements ()) {
Entry = (zipentry) E.NEXTELEMENT ();
// read Contents and Save Them
}
Read the contents of a specific ZipEntry within the ZIP file by passing the ZipEntry to getInputStream, which will return an InputStream object from which you can read the entry's contents: is = new
BufferedInputStream (Zipfile.GetinputStream (entry);
RETRIEVE The entry's filename and create an output stream to save it: byte data [] = new byte [buffer]; fileoutputstream fos = new
FileoutPutStream (entry.getname ());
DEST = New BufferedoutputStream (FOS, BUFFER);
While (count = is.read (data, 0, buffer)! = -1) {
Dest.write (DATA, 0, Count);
}
Finally, Close All INPUT AND OUTPUT streams: dest.flush ();
dest.close ();
Is.close ();
The Complete Source Program Is Shown in Code Sample 2. Again, To Test this Class, Compile IT and Run IT by Passing a file in a zip format as an argument:
Prompt> Java unzip2 Somefile.zip
Code Sample 2: unzip2.java
Import java.io. *;
Import java.util. *;
Import java.util.zip. *;
Public class unzip2 {
Static Final Int buffer = 2048;
Public static void main (String Argv []) {
Try {
BufferedoutputStream Dest = NULL;
BufferedInputStream IS = NULL;
Zipentry Entry;
Zipfile zipfile = new zipfile (argv [0]);
ENUMERATION E = zipfile.entries ();
While (E.haASMoreElements ()) {
Entry = (zipentry) E.NEXTELEMENT ();
System.out.println ("Extracting:" Entry);
IS = New BufferedInputStream
Zipfile.GetinputStream (entry);
INT country;
Byte data [] = new byte [buffer];
FileOutputStream Fos = New
FileoutPutStream (entry.getname ());
Dest = new
BufferedoutputStream (FOS, BUFFER);
While (count = is.read (data, 0, buffer)
! = -1) {
Dest.write (DATA, 0, Count);
}
dest.flush ();
dest.close ();
Is.close ();
}
} catch (exception e) {
E.PrintStackTrace ();
}
}
}
Compressing and Archiving Data in A Zip file
The ZipOutputStream can be used to compress data to a ZIP file. The ZipOutputStream writes data to an output stream in a ZIP format. There are a number of steps involved in creating a ZIP file.The first step is to create a ZipOutputStream object, to Which we pass the outprut stream of the file we wish to write to. Here is how you create a zip file entitled "myfigs.zip": fileoutputstream dest = New
FileOutputStream ("Myfigs.zip");
ZipOutputStream out = new
ZipOutputStream (New BufferedOutputStream (DEST);
Once the target zip output stream is created, the next step is to open the source data file. In this example, source data files are those files in the current directory. The list command is used to get a list of files in the current directory : File f = new file (".");
String files [] = f.list ();
For (int i = 0; i System.out.println ("Adding:" Files [i]); FileInputStream Fi = New FileInputStream (files [i]); // CREATE ZIP Entry // Add entries to zip file } Note: This Code Sample IS Capable of Compressing All Files in The Current Directory. It Doesn't Handle Subdirector. As An Exercise, You May Want To Modify Code Sample 3 To Handle Subdirector. Create a zip entry for each file that is read: ZipEntry entry = new ZipEntry (files [i])) Before you can write data to the ZIP output stream, you must first put the zip entry object using the putNextEntry method: out.putNextEntry (Entry); Write the data to the zip file: int count While (count = Origin.read (Data, 0, Buffer)! = -1) { Out.write (DATA, 0, Count); } Finally, You Close the INPUT AND OUTPUT streams: Origin.Close (); Out.close (); The Complete Source Program Is Shown In Code Sample 3.code Sample 3: Zip.java Import java.io. *; Import java.util.zip. *; PUBLIC CLASS zip { Static Final Int buffer = 2048; Public static void main (String Argv []) { Try { BufferedInputstream Origin = NULL; FileOutputStream Dest = New FileOutputStream ("c: //zip//myfigs.zip"); ZipOutputStream out = new zipoutputstream (New BufferedoutputStream (DEST)); //out.setmethod(zipoutputstream.deflated); Byte data [] = new byte [buffer]; // Get a list of Files from Current Directory File f = new file ("."); String files [] = f.list (); For (int i = 0; i System.out.println ("Adding:" Files [i]); FileInputStream Fi = New FileInputStream (files [i]); Origin = new BufferedInputStream (Fi, Buffer); ZIPENTRY Entry = New Zipentry (files [i]); Out.putNextentry (entry); INT country; While (count = Origin.read (Data, 0, Buffer)! = -1) { Out.write (DATA, 0, Count); } ORIGIN.CLOSE (); } Out.close (); } catch (exception e) { E.PrintStackTrace (); } } } Note: Entries CAN Be Added To A Zip File Either in a compressed (stored) or uncompressed (store) form. The SetMethod can be used to set the method of store. for example: to set the method to deflated (compressed) USE: Out.setMethod (ZipoutputStream.deflated) and to set it to store: Out.setMethod (zipoutputStream.Stored). ZIP File Properties The ZipEntry class describes a compressed file stored in a ZIP file. The various methods contained in this class can be used to set and get pieces of information about the entry. The ZipEntry class is used by the ZipFile and ZipInputStream to read ZIP files, and . the ZipOutputStream to write ZIP files Some of the most useful methods available in the ZipEntry class are shown, along with a description, in Table 2.Table 2: Some useful methods from the ZipEntry class Method SignatureDescriptionpublic String getComment () Returns the comment string for the entry, null if nonepublic long getCompressedSize () Returns the compressed size of the entry, -1 if not knownpublic int getMethod () Returns the compression method of the entry, -1 if not specifiedpublic String getName () Returns the name of the entrypublic long getSize () Returns the uncompressed zip of the entry, -1 if unknownpublic long getTime () Returns the modification time of the entry, -1 if not specifiedpublic void setComment (String c) sets the optional comment string for the entrypublic void setMethod (int method) sets the compression method for the entrypublic void setSize (long size) sets the uncompressed size of the entrypublic void setTime (long time) sets the modification time of the entry Checksums Some of the other important classes in the java.util.zip package are the Adler32 and CRC32 classes, which implement the java.util.zip.Checksum interface and compute the checksums required for data compression. The Adler32 algorithm is known to be faster than ................... .. Checksums can be used to mask corrupted files or messages. For example, suppose you want to create a ZIP file then transfer it to a remote machine. Once it is at the remote machine, using the checksum you can check whether the file got corrupted during the transmission to demonstrate how to create checksums, we modify Code Sample 1 and Code Sample 3 to use CheckedInputStream and CheckedOutputStream as shown in Code Sample 4 and Code Sample 5.Code Sample 4:. Zip.java Import java.io. *; Import java.util.zip. *; PUBLIC CLASS zip { Static Final Int buffer = 2048; Public static void main (String Argv []) { Try { BufferedInputstream Origin = NULL; FileOutputStream Dest = New FileOutputStream ("c: //zip//myfigs.zip"); CheckedoutputStream CHECKSUM = New CheckedOutputStream (DEST, New Adler32 ()); ZipOutputStream out = new ZipOutputStream (New BufferedoutputStream (Checksum); //out.setmethod(zipoutputstream.deflated); Byte data [] = new byte [buffer]; // Get a list of Files from Current Directory File f = new file ("."); String files [] = f.list (); For (int i = 0; i System.out.println ("Adding:" Files [i]); FileInputStream Fi = New FileInputStream (files [i]); Origin = new BufferedInputStream (Fi, Buffer); ZIPENTRY Entry = New Zipentry (files [i]); Out.putNextentry (entry); INT country; While (count = Origin.read (Data, 0, Buffer)! = -1) { Out.write (DATA, 0, Count); } ORIGIN.CLOSE (); } Out.close (); System.out.println ("CHECKSUM: " Checksum.getChecksum (). getValue ()); } catch (exception e) { E.PrintStackTrace (); } } } Code Sample 5: Unzip.java Import java.io. *; Import java.util.zip. *; PUBLIC CLASS UNZIP { Public static void main (String Argv []) { Try { Final int buffer = 2048; BufferedoutputStream Dest = NULL; FileInputStream Fis = New FileInputStream (Argv [0]); CheckedInputStream CHECKSUM = New CheckedInputStream (FIS, New Adler32 ()); ZipinputStream ZIS = New ZipinputStream (New BufferedInputStream (Checksum); Zipentry Entry; While ((entry = zis.getNextentry ())! = null) { System.out.println ("Extracting:" Entry); INT country; Byte data [] = new byte [buffer]; // write the files to the disk FileOutputStream Fos = New FileoutPutStream (entry.getname ()); Dest = New BufferedoutputStream (FOS, Buffer; While (count = zis.read (data, 0, Buffer)! = -1) { Dest.write (DATA, 0, Count); } dest.flush (); dest.close (); } Zis.Close (); System.out.println ("CHECKSUM: " Checksum.getChecksum (). getValue ()); } catch (exception e) { E.PrintStackTrace (); } } } To test Code Sample 4 and 5, compile the classes and then run the Zip class to create a ZIP archive (a checksum value will be calculated and printed on the screen for your information) and then run the UnZip class to decompress the archive (a checksum value will be printed on the console). The two values must be exactly the same, otherwise the file is corrupted. Checksums are very useful in validating data. For example, you can create a ZIP file and send it to your friend along with A Checksum. Your Friend Unzips The File and Compares The CHECKSUM with The one you provide, if it is the file.. Compressing Objects We have seen how to compress data available in file form and add it to an archive. But what if the data you wish to compress is not available in a file? Assume for example, that you are transferring large objects over sockets. To improve the performance of your application, you may want to compress the objects before sending them across the network and uncompress them at the destination. As another example, let's say you want to save objects on the disk in compressed format. The ZIP format, which is record -based, is not really suitable for this job. The GZIP is more appropriate as it operates on a single stream of data.Now, let's see an example of how to compress objects before writing them on disk and how to decompress them after reading them . Code Sample 6: Employee.java Import java.io. *; Public Class Employee IMPLEments Serializable { String name; Int agec; Int Salry; Public Employee (String Name, Int Age, Int Salry) { THIS.NAME = Name; THIS.AGE = AGE; THIS.SALARY = SALARY; } Public void print () { System.out.println ("Record for:" Name); System.out.println ("Name:" Name); System.out.println ("Age:" AGE; System.out.println ("Salary:" Salary); } } Now, write another class that creates a couple of objects from the Employee class. Code Sample 7 creates two objects (sarah and sam) of the Employee class, then saves their state in a file in a compressed format. Code Sample 7 SaveemPloyee.java Import java.io. *; Import java.util.zip. *; Public class saveemployee { Public static void main (String Argv []) throwsexception { // Create Some Objects Employee Sarah = New Employee ("s. Jordan", 28, 56000); Employee Sam = New Employee ("s. Mcdonald", 29, 58000); // Serialize the Objects Sarah and Sam FileOutputStream Fos = New FileOutputStream ("db"); GzipOutputStream GZ = New GzipOutputStream (FOS); ObjectOutputStream Oos = New ObjectOutputStream (GZ); Oos.WriteObject (Sarah); Oos.WriteObject (SAM); OOS.FLUSH (); OOS.Close (); Fos.close (); } } Now, The ReademPloyee Class Shown In Code Sample 8 IS Used to Reconstruct The State of The Two Objects. Once The State Has Been Constructed The Print Method Is Invoked on Them. Code Sample 8: ReademPloyee.java Import java.io. *; Import java.util.zip. *; Public class reademployee { Public static void main (String Argv []) Throws EXCEPTION { // Deserialize Objects Sarah and Sam FileInputStream Fis = New FileInputStream ("db"); GzipinputStream GS = New GzipinputStream (FIS); ObjectInputStream Ois = New ObjectInputStream (GS); Employee Sarah = (Employee) Ois.ReadObject (); Employee Sam = (EMPLOYEE) Ois.ReadObject (); // Print The Records After Recontruction of State Sarah.print (); Sam.print (); Ois.Close (); fis.close (); } } The Same Idea Can Be Used to Compress Large Objects That Are Sent over Sockets. The Following Segment Of Code Show How To Write Objects in a Compressed Format, from the Server to the client: // Write to Client GzipoutputStream Gzipout = New GzipOutputStream (socket.getOutputStream ()); ObjectOutputStream Oos = New ObjectOutputStream (gzipout); Oos.writeObject (OBJ); gzipos.finish (); And, The Following Segment of code shows how to decompress the Objects at the client side overce receive from the server: // read from Server Socket Socket = New Socket (RemoteServerIP, Port); GzipinputStream gzipin = new GzipinputStream (socket.getinputstream ()); ObjectInputStream Ois = New ObjectInputStream (gzipin); Object o = ois.readObject (); What about jar files? The Java ARchive (JAR) format is based on the standard ZIP file format with an optional manifest file. If you wish to create JAR files or extract files from a JAR file from within your Java applications, use the java.util.jar package, which provides classes for reading and writing JAR files. Using the classes provided by the java.util.jar package is very similar to using the classes provided by the java.util.zip package as described in this article. Therefore, you should be able To Adapt Much of the Code in this article. You wish to use the java.util.jar package. Conclusion This article discussed the APIs that you can use to compress and decompress data from within your applications, with code samples throughout the article to show how to use the java.util.zip package to compress and decompress data. Now you have the tools to utilize Data Compression and Decompression in your applications. The article also shows how to compress and decompress data on the fly in order to reduce network traffic and improve the performance of your client / server applications. Compressing data on the fly, however, improves the performance of client / server applications only when the objects being compressed are more than a couple of hundred bytes. You would not be able to observe improvement in performance if the objects being compressed and transferred are simple String objects, for example.For more information The java.util.zip package the java.util.jar Package Object Serialization Transporting Objects over sockets About the Author Qusay H. Mahmoud provides Java consulting and training services. He has published dozens of articles on Java, and is the author of Distributed Programming with Java (Manning Publications, 1999) and Learning Wireless Java (O'Reilly, 2002).