BIG / LITTLE problem in Java
1. Solve endian issues: a summary
All things in the Java binaries exist in the form of Big-Endian, high byte priority, which is sometimes referred to as network order. This is a good news that if you only use Java. All files are processed in the same manner on all platforms (Mac, PC, Solaris, etc.). Binary data can be freely switched, in electronic form on the Internet, or on the floppy disk without considering the endian problem. The problem is that there will be some problems when you exchange data files with programs written in Java. Because these programs are used in the Little-Endian order, the C language usually used on the PC. Some platforms use BIG-Endian bytes (Mac, IBM390) inside; some platforms use the Little-Endian Byte Order (Intel). Java concealed the endian problem to the user.
In binary, there is no split between the domains, the file is a binary, unreadable ASCII. If the data you want to read is not a standard format, usually prepared by a non-Java program. Can be used by four options:
1). Override the output of the input file. It can directly output a BIG-Endian byte stream DataOutputStream or a character DataOutputSream format.
2) Write an independent translation program, read and arrange bytes. Can be written in any language.
3) Read data in bytes and rearrange them (on the fly).
4). The easiest way is to use the LEDATAINPUTSTREAM, LEDATAOUTSTREAM, and LERANDOMACCESSFILE to simulate DataInputStream, DataOutputStream and RandomaccessFile, which use the Little-Endian byte stream. You can read About LedatastReam. You can download the code and source free. You can get help from the file you how to use the classes. Just tell it you have limited - endian binary data.
2. You may not even have any problems.
Many Java newers from C may consider that the BIG is still useful in the platform that needs to be considered. This is not a problem in Java. Further, you can't know how they are stored in this category. .
Endian issues are required only when communicating with the legacy C / C application. The following code will produce the same results on the Big or Little Endian machine:
// Take 16-Bit Short Apart Into Two 8-bittes. Short x = 0xAbcd; Byte High = (BYTE) (x >>> 8); Byte Low = (Byte) x; / * Cast Implies & 0xFF * / System.out.println ("x =" x "high =" high "low =" low);
3. Read Little-Endian Binary Filesthe Most Common Problem is Dealing with Files Stored in Little-endian format.
I had to implement routines parallel to those in java.io.DataInputStream which reads raw binary, in my LEDataInputStream and LEDataOutputStream classes. Do not confuse this with the io.DataInput human-readable character-based file-interchange format.
IF you wanded to do it yourself, Withnout the overhead of the full of ledatainputstream and ledataoutputstream classes, Here Is The Basic Technique:
PRESUMING YOURS ARE IN 2'S Complement Little-Endian Format, Shorts Are Pretty Easy To Handle:
Short readshortlittlendian ()
{// 2 Bytes int low = readbyte () & 0xff; int high = readbyte () & 0xff; return (short) (high << 8 | low)
Or if you want to get Clever and Puzzle Your Readers, You Can Avoid One Mask Since The High Bits Will Later Be Shaved Off by Conversion Back to Short.
Short readshortlittlendian ()
{// 2 Bytes int low = readbyte () & 0xff; int high = readbyte (); // avoid masking here return (high << 8 | low)
Longs area a little more completed:
Long Readlonglittlendian ()
{// 8 Bytes long Accum = 0; for (int Shiftby = 0; ShiftBy <64; Shiftby = 8)
{// must cast to long or shift done modulo 32 accum | = () & 0xff) << shiftby;}
Return Accum;
IN A Similar Way We Handle Char and Int.
Char readcharlittlendian ()
{// 2 Bytes int low = readbyte () & 0xff; int high = readbyte (); return (char) (high << 8 | low)
Int Readintlittlendian ()
{// 4 Bytes int access = 0; for (int shiftby = 0; shiftby <32; shiftby = 8)
{Accum | = (ReadByte () & 0xFF) << Shiftby;} Return Acu
FLOATING POINT IS A Little Trickier. PRESUMING Your Data I IEEE Little-Endian Format, You NEED SOMETHING LIKE THIS:
Double readdoublelittlendian ()
{long ACCUM = 0; for (int ShiftBy = 0; ShiftBy <64; Shiftby = 8)
{// must cast to long or shift done modulo 32 accum | = ((long) (r) (r) << shiftby;}
Return Double.longBitStodouble (Accum);
Float readfloatlittlendian ()
{INT accum = 0; for (int shiftby = 0; shiftby <32; shiftby = 8)
{accum | = (readbyte () & 0xff) << shiftby;}
Return float.intBitStofloat (Accum);
You Don't Need A Readbytelittlendian Since The Code Would Be Identical To Readbyte, Though You Might Create ONE JUST for Consistency:
Byte readbytelittlendian ()
{// 1 byte returnore ();
4. History
In Gulliver's travels the Lilliputians liked to break their eggs on the small end and the Blefuscudians on the big end. They fought wars over this. There is a computer analogy. Should numbers be stored most or least significant byte first? This is sometimes referred to AS byte sex.
Those in the big-endian camp (most significant byte stored first) include the Java VM virtual computer, the Java binary file format, the IBM 360 and follow-on mainframes such as the 390, and the Motorola 68K and most mainframes. The Power PC is endian-agnostic.
Blefuscudians (big-endians) assert this is the way God intended integers to be stored, most important part first. At an assembler level fields of mixed positive integers and text can be sorted as if were one big text field key. Real programmers read it hex dumps, and big-endian is a lot easier to comprehend.In the little-endian camp (least significant byte first) are the Intel 8080, 8086, 80286, Pentium and follow ons and the AMD 6502 popularised by the Apple] [.
Lilliputians (little-endians) assert that putting the low order part first is more natural because when you do arithmetic manually, you start at the least significant part and work toward the most significant part. This ordering makes writing multi-precision arithmetic easier since you work up not down. It made implementing 8-bit microprocessors easier. At the assembler level (not in Java) it also lets you cheat and pass addresses of a 32-bit positive ints to a routine expecting only a 16-bit parameter and still HEVE IT WORK. REAL Program Work.
IF a Machine IS Word Addressable, With No Finer Addressing Supported, The Concept of Endianness Means Nothing Since Words Are Fetched from Ram in Parallel, Both Ends First.
5. What SEX is your CPU?
Byte Sex Endianness of CPUS
CPU
Endianness Notes
AMD 6502, DURON, Athlon, Thunderird
Little
6502 WAS Used In The Apple] [, THE DURON, Athlon and Thunderbird in Windows 95/08 / ME / NT / 2000 / XP
Apple] [6502
Little
Apple Mac 68000
BIG
Uses Motorola 68000
Apple Power PC
BIG
CPU IS Bisexual But Stays Big in The Mac OS.
Burroghs 1700, 1800, 1900
?
Bit Addressable. Used DiffERENT Interpreter Firmware Instruction Sets for Each Language.
BURROUGHS 7800?
Algol Machine
CDC LGP-30
Word-Addressable Only, Hence No endianness
311/2 Bit Words. Low Order Bit Must Be 0 on The Drum, But Can Be 1 in The Accumulator.
CDC 3300, 6600
Word-addressable
?
DEC PDP, VAX
Little
IBM 360, 370, 380, 390
BIG
IBM 7044, 7090
Word Addressable
36 BITS
IBM AS-400
BIG
?
Power PC
Either
The endian-agnostic power-pc's have a foot in Both Camps. They is Bisexual, But The OS Usually Imposes One Convention or the Other. E.g. Mac PowerPCS Are Big-Endian.
Intel 8080, 8080, 8086, 80286, 80386, 80486, Pentium I, II, III, IV
Little
CHIPS Used in PCS
Intel 8051
BIG
MIPS R4000, R5000, R10000
BIG
Used in Silcon Graphics IRIX.
Motorola 6800, 6809, 680x0, 68hc11
BIG
Early Macs Used The 68000. Amiga.
NCR 8500
BIG
NCR Century
BIG
Sun Sparc and UltraSparc
BIG
Sun's Solaris. Normally Used As Big-Endian, But Also Has Support for Operating for Little-Endian Mode, Including Being Able To Switch Endianness Under Program Control for Particular Loads and Stores.
UNIVAC 1100
Word-addressable
36-bit words.
UNIVAC 90/30
BIG
IBM 370 Clone
Zilog Z80
Little
Used in CPM Machines.
IF you know the endianness of other cpus / oses / platforms please email me at ket @MindProd.com.
In Theory Data Can Have Sofferent Byte Sexes But Cpus, in this world of mixed Left And Right Hand Drive, That There Are Not Real Cpus with all four sextes to contend
The Four Possible Byte Sexes for CPUS
Which Byte Is Stored in The Lower-Numbered Address?
Which Byte Is Addressed?
Used in LSB
LSB
Intel, AMD, POWER PC, DEC.
LSB
MSB
None That I Know of.
MSB
LSB
Perhaps One of the Old Word Mark Architecture Machines.
MSB
MSB
Mac, IBM 390, Power Pcyou Are Visitor Number 8680.
You can get an updated copy of this page from http://mindprod.com/endian.html