MD5 to be considered harmful someday
Dan Kaminsky
Senior Security Consultant, Avaya
Dan@doxpara.com
06-DEC-2004
Abstract
Joux and Wang's Multicollision Attack Has Yielded Collisions for Several One-Way Hash Algorithms. Of
THESE, MD5 Is The MOST Problematic Due To Its Heavy Deployment, But There Exists a Perception That The Flaws
Identified Have No Applied Implications. We show what the appendability of Merkle-Damgard Allows US To Add
Any PayD Hashes Released by Wang et al. We then Demonstrate A Tool, Stripwire,
That Uses this Capability To create Two Files - One Which Executes An Arbitrary Sequence of Commans, The
Other Which Hides Those Commands with The Stregth of Aes - Both with The Same Md5 Hash. We show how
This Affects File-Oriented System Auditors Such As Tripwire, But Point Out That The Failure Is Nowhere Near AS
Catastrophic As It Appears at First Glance. We Examine How this Failure Affects HMAC and Digital Signatures
Withnin Digital Rights Management (DRM) Systems, And How The Full Attack Expands Into An Unusual Pseudosteganographic
Strikeback Methodology Against Peer to Peer Networks.
I. Introduction
THE MODERN Application Of Cryptographic Principles Is Actually Quite Primitive - Not in ITS Complexity,
But in the way of complexity has been management. Independent Primitives Such As Hashes and
Ciphers Completely Specify THE BEHAVIOR OF A LIMITED SET OF AGGIRESSIVELY AUSTED ALGORITHMS. Each Trusted
Implementation is Chosen to Be Entirely Functionally Equivalent To One Another; Choosing ONE over ANOTHER
Is to have no impact on what the user (legitimate or otherwise) can do. Deviations Between the chosen
algorithms are limited to speed of operation, some mild key and block size constraints, and a vaguelyunderstood "security level" of the underlying mathematics It is this last fear -. that even after all our auditing,
Something Will Still Get Through - That Drives Adherence To The Primitive Specification. if ethes
Implements the Same Specification, We can swap out a broken mail information for a Correct One.
But Just Because We can do something doesn't mean we will. Joux [1] andwang [2] have already it plainly
Clear That MD5 HAS Serious Problems. This Shouldn't come as much surprise; Dobbertin's Work [3] Almost
A Decade Ago Made It Clear That this was coming. Yet eve yrhe it who have hinted what there
Isn't Any Applied Risk and That The Vulnerabilities Are Purely Theoretical. Outside of FIPS'S UnwillingnessNess
To certify md5 there is no apparent push to migrate Away from MD5 As We Once Did for ITS Predecessor,
MD4.
The attacks discovered area indeed obscure. But complesetly theoretical? No. Even Given What Little
Data Has Been Released - Code Implementing The Attack Isn't Even Public YET - SUFFICIENT INFORMATION HAS
Been Released to Piece Together A Rudimentary Proof of Concept Tool That Demonstrates, At minimum, That
The Selection of MD5 Exposes New and Potentially Deeply Undesirable Functionality Above and Beyond
What The One-Way Hash Primitive Specifies. The Tool, Stripwire, Implements Some of the Attacks Described
Herein.
That Being Said, this paper is not a "smalking gun" Indictment of MD5. I've Taken Great Pains to include
The Caveats of Each Vulnerability, As It Is Far Too Easy To Overestimate The Risks Described in this Paper. IT IS
For That Reason I am Not saying "Today", or "any day now". The title states "Someday" for a real, there is the Risk Of MD5. Here Are A Few More, In The Hopes That They Will Start
TO BE Connected.
MD5 to be considered harmful someday 2
Ii. MD5 HOWTO
For A Detailed Description, Look Elsewhere [4] [5]. Put Simply Though, MD5 is An Implementation of
A One-Way Hash by Which An Arbitrary Amount of Data May Be Reduced to a 128 Bit Fingerprint of What
Went In. The Hash IS One When It's Simple To Compute The Hash From Arbitrary Data But Difficult - in
A "Computational Infeasible" SENSE - TO REVERSE The Process, Finding Data That Matches a Particular Hash.
The Hashing Process Needs to Be Resistant To The Point Where Two DataSets Cannot Even Be CREATED for THE
Express Purpose Of "Colliding" - Having The Same Hash Value. These Cryptographical Strong ONE WAY
Hashes Are Quite Useful When We Want To Store Summaries of Data, And Retain The Ability To Recognize That
Data At a Later Time, WITHOUT ACTUALLING TO Keep A Copy of The Original Data Around or Needing to Worry
About Other people..
Iii. The Discovery: Joux and Wang's MultiCollision Attack
For MD5 (and actually a number of popular hashing algorithms, sha-1 not among the), IT IS Possible
To Compute Particular Classes of Input Data for Which Subtle Changes Can Be Silently Introducesd Without
Causing Apparent Changes in The Final Md5 Hash. Capacity Is Not Hue - of The Two 128 Byte Proof-ofconcept
Files Released by Wang, Only Six Bits Differ. But Many "Doppelganger" sets can becomput, EACH
of which may be swapped out with the other at no effect to the resultant hash. The sets are two MD5blocks long. Because it's possible to compute new blocks on demand, a generic "antivirus" style colliding
Block Detector isn't Possible. it may be possible to generate a Custom Weak Class Detector. The Ability To
Generate Colliding DataSets Exposes A Fundamentally New Mode of Operation for MD5.
IV. EXTENDING THE ATATACK
To Seek This Relatively Obscure New Mode Can Cause Problems, IT IS Necessary To Understand How
MD5 Works. In What's Referred To As a Merkle-Damgard Construction, MD5 Starts with An Arbitrary Initial
State 128 Bits in Length. 512 Bits of Input Data Are "Stirred" Into this 128 Bit State, With New, Massively
Shuffled 128 Bit Value As The Result. 512 More Bits Are Constantly Stirred in, over and over, unsre of
Further Data. 64 Bits More Are Appended to the Data Street to Explicitly Reflecting The Amount of Data Being
Hashed WITH ANOTHER ROUND OF MD5 Being Done if Need BE (if Thee Wasn't Enough Room in A Previous Round
To Hash In That 64 Bits), And The Final 128 Bit ValueAfter All The Stirring Is Complete Is Christened The MD5
Hash.
Now, Amongst the cryptological community there is a well known failure mode to the this particular
Construction: if at Any Point in The Cascade Two Different DataSets Are Stirred Into Equal 128 Bit Values,
Arbitrary Data Can Be appended to Both DataSets and Their Hashes Will Remain Equal. in Mathematical Terms,
Using the " " Sign TO REFER TO Concatenation and Assuming Length (x) and length (y) Both Evenly Divide Into
The 64 Byte Blocksize Of MD5, IF MD5 (X) = MD5 (Y), Then MD5 (X Q) = MD5 (Y Q).
It's relatively straightforward to see why this occurs: Files are read in 512 bits at a time with each blocksummarized into only what can fit inside the 128 bit value Once two deviant datasets collide to the same.
128 Bit Value, Anything Added On After the Fact Is Too Late - MD5 May Be a Chaotic And Nonlinear Function,
But from The Same Seed, The Chaotic Linearity Between The Two DataSets Will Remain Forever Synchronized
With The Early Difference Forever Cloake.
The Original Attack Gives US Our Two Deviant Datasets. This Extension Shows Us How We CAN Append
Arbitrary Data After The Datasets and Still Retain Collision. Stripwire Demonstrates How We can Convert this
Collision Into An Applied Attack.
Md5 to be considered harmful someday 3
V. Stripwire
We Begin by Defining Two Files, "VEC1" and "vec2", as the proof-of-concept test vectors release by
WANG. Vec1 and vec2 Have The Same MD5 Hash But Difer by Six Bits Out of 1024.WE Also Define
"PayLoad" as some arbitrary string of commists to be executed. The "encrypted payload" IS SIMPLY THE
AES encrypted representation of payload, using the sha-1 of vec1 as the key. (It is useful to note what
While vec1 and vec2 do share the Same MD5 Hash, Thede Do Not in This Case Share The Same Sha-1 Hash.)
We now define two more files, "fire" and "ip". Fire is simply vec1 with the encrypted payload appended
To it, while Ice is vec2 with the encrypted payload attached. Only Six Bits Separate Fire and Ice But
THIS SMALL Deviation IS critical. Fire Contains Vec1, Which Can Be Easily Hashed To Acquire The Key To The THE
Encrypted PayLoad. Ice Contains Vec2, Which Can Be Run Through The Sha-1 Hash But Yields a Useless Value
That Fails to Decrypt The PayLoad. so while fire easily express
Fire Burns. Fire and Ice Have The Same Md5 Hash. Returning to the Math, The
Encrypted payload is q; it is a constant payload appended to the x and y of colliding sets. through this
Mechanism Ice and Fire Can Be Exchanged At Will, And as Far AS MD5 IS Concerned, Nothing Ever Happened.
This is not theoretical. It looks like this:
A. Demo
Stripwire itself Has Been Designed to Be As Readable As Possible; for Some Readers ITS Source Code Will
Be Much Better Documentation Than This Paper. for Those Seeking To Reimplement The Attack from this
Document Alone, The Two Test Vectors Are As Follows:
$ vec1 = h2b ("
D1 31 DD 02 C5 E6 EE C4 69 3D 9A 06 98 AF F9 5C
2F CA B5 87 12 46 7e AB 40 04 58 3E B8 FB 7F 89
55 AD 34 06 09 F4 B3 02 83 E4 88 83 25 71 41 5A
08 51 25 E8 F7 CD C9 9F D9 1D BD F2 80 37 3C 5B
D8 82 3E 31 56 34 8F 5B AE 6D AC D4 36 C9 19 C6
DD 53 E2 B4 87 Da 03 FD 02 39 63 06 D2 48 CD A0
E9 9F 33 42 0F 57 7e E8 CE 54 B6 70 80 A8 0D 1E
C6 98 21 BC B6 A8 83 93 96 F9 65 2B 6F F7 2A 70
");
$ vec2 = h2b ("
D1 31 DD 02 C5 E6 EE C4 69 3D 9A 06 98 AF F9 5C
2F CA B5 07 12 46 7e AB 40 04 58 3E B8 FB 7F 89
55 AD 34 06 09 F4 B3 02 83 E4 88 83 25 F1 41 5A
08 51 25 E8 F7 CD C9 9F D9 1D BD 72 80 37 3C 5B
D8 82 3E 31 56 34 8F 5B AE 6D AC D4 36 C9 19 C6
DD 53 E2 34 87 Da 03 FD 02 39 63 06 D2 48 CD A0
E9 9F 33 42 0F 57 7e E8 CE 54 B6 70 80 28 0D 1e
C6 98 21 BC B6 A8 83 93 96 F9 65 AB 6F F7 2A 70
");
1 Since Ice and Fire deviate by only 6 bits, it may be possible for a particularly adept auditor to brute-force convert vec2 to vec1 and thusacquire the correct key to examine the AES encrypted payload. If bit deviations can be at arbitrary positions, this Becomes a 245 attic; if
WANG's Attack Only Allows A Few Locations To Be Involved in Multicollisions, (As IT APPEARS TO) Converting VEC2 To VEC1 May Be a Near-Trivial
.
Md5 to be considered harmful someday 4
A line Has Been Inserted Between The Two 64 Byte Md5 Blocks and bytes with Deviant Bits Have Been
HIGHLIGHTED. FOR EXAMPLE, THETE SET TO "87" IN VEC1 IS SET to "07" IN VEC2. It's Worth Noticing That That
Changes foring, in The Same Position, Between Their First Block and their second
BLOCK.
Now ONTO OUR PayLoad .ur paypted to be encrypted may be of arbitrary size; for the purposes of this
Paper We Will Demonstrate A Bare-Bones Application That Opens a Pseudoshell on an Arbitrary Port.
$ cat backlash.pl
#! / usr / bin / perl
# Backlash: Open a pseudoshell on port 50023
# Author: samy kamkar,
Www.lucidx.com
USE IO;
While (1) {
While ($ C = New IO :: Socket :: inet (localport,
50023, Reuse, 1, Listen -> ACCEPT) {
$ ~-> fdopen ($ C, W);
Stdin-> FDOpen ($ C, R);
System $ _ while <>;
}
}
First We generate Fire and Ice.
$ ./stripwire.pl -v -b backlash.pl
Fire.bin: MD5 = 4DF01EC3A18DF7D7D6CDF8E16E98CD99
Ice.bin: MD5 = 4DF01EC3A18DF7D7D6CDF8E16E98CD99
Fire.bin: SHA1 = a7f6ebb805ac595e4553f84cb9ec40865cc11e08
Ice.bin: SHA1 = 85F602DE91440CD877C7393F2A58B5F0D72CBC35
Note, Their Md5sum's Match, But Not Their Sha1 Sums. And of Course The Share The Same FileSize. $ Ls -l fire.bin ice.bin
-rw-r - r - 1 kaminsky mkgroup_496 NOV 30 20:50 Fire.bin
-rw-r - r - 1 kaminsky mkgroup_496 NOV 30 20:50 Ice.bin
Binary Comparison Cannot Be Fooled.
$ DIFF FIRE.BIN ICE.BIN
FILES Fire.bin and Ice.bin Differ
Stripwire Contains The Execution Harness for Fire and Ice. WHEN We Run It Against Ice ...
$ ./stripwire.pl -v -r ice.bin
Unable to decrypt file: Ice.bin
Fireure. Fire Is Another Story:
$ ./stripwire.pl -v -r fire.bin &
[1] 1420
$ telnet 127.0.0.1 50023
Trying 127.0.0.1 ...
Connected to 127.0.0.1.
Escape Character is ']'.
CAT / etc / ssh_host_dsa_key_demo
----- Begin DSA Private Key -----
Mih5ageaakeAlctshggpyy0eqgrbjryqcrbdgxhfwftbxazsgbrkiebh1aal4et6
VPYZ7 / OLPBRKXWMNX5MCEHYWMEHOCK00PWIVAJYQ0ZLKPRPR2EJWZ / ECGR1XGUVP
AKBWEUY6MJHAPO5SF T0V7VS319FGVW0J8DTHUEQ2PAZHJL063SC2N9JKAMZRHEN
MD5 to be considered harmful someday 5
J7C04XMEHNFDMIVXTNFCAVKZAKEAIEVTNTFNNV7SIF0M4Z60MJ1HZ3ZJ50R7IH1S
SPON IXZKSOAEP9JKYJS67 HBQGPOWXNUKOFAQDWL1GCLGFWIVAJUPPSN6YJ2E
Z5M7ATZZ72B131H8
----- End DSA Private Key -----
Vi. Caveats
IT Should Come As No Surprise That The Primary Applied Target for the Stripwire Tool Is The Hightly Popular
"Tripwire" File System Auditing Tool. Alth Tripwire Can Configured To Use More Trustworthy Algorithms,
Under Common Configurations IT Works by Collecting MD5 Hashes of Every System File Contained
Within the file system and alarming if any of those hashes change. The base security presumption is there
As Long As The File System Doesn't Change, Neither Will The Behavior of The System Running On Top of IT.
Stripwire makes it trivial for an attacker to swap out the harmless Ice for the arbitrarily dangerous Firewith Tripwire none the wiser. So does this mean Tripwire is fundamentally broken? The short answer, no,
Absolutely Not. The Longer Answer is Where Things Get Interesting.
WE Begin by Looking Closer At Tripwire's Base PRESUMPTION - YES, Security Engineers Use Tripwire To
Detect Unauthorized Changes in The File System, But Altering The File System Is Not The Only Way Operations
Can Be Affected. The File System Doesn't Fully Define An Operating Environment Any More Than Laws Fully
Define a legal system. Any Number of External Sources Can Alter Behavior. Faults Amidst ITS Files Are But
One Path, And Not Necessarily The Best One. An entire Branch of Exploit Research Focuses On Memory-ONLY
Attacks That Use the network as their inJection Vector and Alter Only The in-Ram Kernel Or Library Structure
To Support Remote Control of the OS. The Disk is never touch; all evidence bleeds away the moment the
Plug is pulled by a naive forensic analyst.
And Of Course, Systems Do Not Need To Be.comed To Exhibit Deviant Behavior with a constant software
Load. Anything from cpu speed to motherboard temperature sensors to the particular date emitted by the
RTC (Real Time Clock) Can Be Used to Select Between Completely Different Sets of Instructions. Systems
Can Be Even Be configured to alter their behavior randomly. What matters is what the system isprogrammed. IS What THATE
To do, and what's the second problem: tripwire doesn't tell you, you can trust,, only auditors
Can do this if you could trust it before, youst, for stripwire to pose an activity
threat to a deployed environment not only would Ice need to be added to the trusted list of MD5-monitoredfiles but so too would the Stripwire execution harness itself. That is an unlikely circumstance.
SO MOST Uses of MD5, Even by TripWire, Remain Secure - Under The Present Threat Regime At Least. There
STILL Remains a critical blind spot in anything md5; to Pick One Example this is a fantastic channel
For a group of malicious development to submit innocuous and undecryptable content to their auditors for
Approval, and the once That's acquired to swap in a self-decrypting and unaudited payload. Audits Against
The Shipped Code Would SHOW The Same MD5 Hashes, And All Would Appear Well.
NOT THAT MALICE from The Developers IS A Required Component of Such An Attack. MayNor [6] Describes
A Fascinating Failure Mode Where The Multitude of Compilation, Assembly, and Packaging Tools Used To
Bring code from raw text to deployed code is the logical progression of thompson's
Classic Essay On Trusting Trust [7], IN WHICH A C Compiler Was Infected and Would Subsequently
INFECT Anything Else Compiled with it, including Other C Compilers, MayNor's Approach Has Some Interesting
Implications When Combined with stripwire. Conceivably, "Ice" Could Be inJected Into Each Build
Assembled by the developers, thus allowing internal testing to proced uninhibited. But, Upon Shipment
"Fire" Would Be Swapped in by a Malicious Third Party. Even if system administrators had a process by
Which the value to be installed with the we code to be installed with the development 'concert of what sum
(Say, THROUGH An Automated Package Manager), They Would Still Find Themselves Installing The Corrupted
Code.
Md5 to be considered Harmful Someday 6ULTIMATELY, MD5 Cannot Be Depunded Up PROTECT AGAINST A BAT AND SWITCH, AND NEITHER CAN Anything
That Depends on it.
Vii. Digital Signatures and DRM
Digital Rights Management, OR DRM, HAS Become a catch-all term for a extensive reimagining of issues
NOT Simply Technical, But Legal, Political, And Economic AS Well. The Latter Three Have Effectively Driven
The Concept of a Mutually Trusted "Third Party Attester" Into Technology That Has Traditionally Operated
ON A "Dumb Automaton" Model of Command / Execution. Third Party To THIRD PARTY TO
Control The Precise Manner in Which a System Should Operate, Independent of Mere Technology.
Cryptographic Primitives Are Chaind Together In DRM SYSTEMS to LINK GRANTABLE RESOURCES To the Externally
Provided Objects That Provide the granting.
DRM SYSTEMS with MD5 AS Part of Their Chain Could Conceivables Face Problems Even with they never
Hash Data Directly. All Three Major Digital Signature Algorithms - RSA, DSA / ELGAMEL, AND ElliPTICAL
Curve - Are Almost Universally Used in a Mode Where They Do Not Sign Directly, But Rather Sign A
ASYMMETRIC ALGORITHMS Are Quite Slow; this Maneuver Makes It Realistics
To Sign Arbitrarily Large Files.) OFTEN THE HASH Algorithm of Choice Is MD5. Identical Input Yields Identical
Output - if Two Files Have The Same Hash, They'll Both Verify Against The Same Signature. So, a KEY CONSTRAINT
Of the Digital Signature Primitive, That No Other Data Could Survive Signature Verification Save for the Data
That Was Originally Signed, Cannot Be Met.
There appears to be only limited vulnerability to this in open deployment. Microsoft's Authenticodetechnology, used within its browser to limit executable content within web page to signed documents, does
Indeed Use (or at Least Allow) MD5 Hashes To Be Signed. It Would Be Trivial To Sign Something Innocuous
And the Actually Release Something Malicious. But The Security Model of Authenticode Has Always Been
One of Legal Accountability - Having Someone to Sue - And NOT OF TECHNICAL RESTRICTION. Indeed The Amount
Of Abjectly Destructure "Spyware" tunneled to user machine through internet explorer is astonishing.
IT IS Worth Noting That Probably The Widest-Deployed Hardware That Employs Digital Signatures for ThidParty
Attestation, The Microsoft X-Box, Uses SHA-1 AS ITS Hashing Algorithm and Not MD5 [8]. SO IT IS Not
Vulnerable. But it's alsoworth noting That Had Microsoft SELECTED MD5 INSTEAD OF SHA-1 THEIR USE OF A
2048 Bit Key for their RSA Signature 10.
Viii. MultiCollisions Unleashed
Interestingly Enough, None of What's Been Discussed Already Actually Requires The Full Attack Discovered DISCOVERED
By Joux and Wang. Thus far evenning has been based online
WANG's Test Vectors. But Failures Inside Cryptographic Primitives, Even Very Small Ones, TEND TO Lead To
Slowly Discovered Catastrophicles. Md5 Does Not See Rule.
We can do Much More with the actual multicollision attack. The test vectors collide only when stirred
INTO The Default Initial State for the MD5 Algorithm; The Attack Itself Works Against Any Arbitrary State.
THE UPSHOT Here Is That We can not Only APPITRARY DATA But Prepend It As Well. Presently Fireand Ice Required A Dedicated External Execution Harness. With prepending
Available, a Correctly Formatted Binary Executable Could Be Synthesized That Would Self-Analyze and Branch
ApproPriately Depending On Which Vector Was Contained within.
In addui, being limited to the md5 initial state means only Hashes Calculated on a per-file basis CAN
Be Made to Collide; A Full Disc or Partition Sum Will Come Across The DoppelGanger Set At A Vastly Different
Initial State and Fail to Collide. with the full attact we could specifyur colliding block against the MD5
State That Would Be Found During A Full Disk Or Partition Hashing Operation. of Course, Then the Colliding
MD5 to Be Considered Harmful Someday 7
Set We generated Wouldn't Collide On a per-file basis. Thus far we can only Adapt to a Single MD5 State At
A Time.
IX. Hmac
Most Observers Have Written That The Hmac, or Hashed Message Authentication Code, Construction IS
Entirely Immune to The Multicollision Attack. They Are Mostly Correct, Just Not Entirely Correct. HMAC
Is A Method of Taking An Arbitrary Hashing Algorithm Like Md5 and Introducing a Secret To It Such That
Only Someone with That Secret Can Either Synthesize or Verify The Correct Hash for Some Arbitrary Input.
In Simple Terms, HMAC Is A Mechanism for Altering The Initial State According to Some Password. More
Precisely, HMAC Does The Following:
Inner = MD5 (Key XOR 0x36 DATA)
Outer = MD5 (Key XOR 0x5c Inner)
HMAC-MD5 = OUTER
There Are Three Things Going On Here:
1. a key is prepnded to the data being hashed.
2. Additional noise is added to the user provided key.3. Two rounds of hasening area Used instead of one, with the noise varying between the Two Rounds.
There's A Fair Amount of Defeensive Cryptosystem Design Inside of Hmac - It's An Avowed Goal of To
Algorithm to Still Function Even IF SMALL FAULTS Are Found in The Underlying Hash. But for All ITS Defensive
Operation, The Data Portion is Only Invoked ONCE, Prepended With New Block. This New Block
Creates A New 128 Bit State for Data To Be Stirred Into, and this State Diverges Substantially from Our Generic
MD5 initial state.
But The Multicollision Attack Works Against Any State, Not Just The Generic One. That Means It's StraightForward,
Given the key, to adapt to the new system state and cause a collision in the inner hash. onCe
The Inner Hash Has Collided, The Outer Must As Well, AS It's Getting The Same Input from Both DataSets (IF
The Outer Hash Also Concatenated The Data Portion, The Attack Would Fail Entirely, Because It's Thus Far
Impossible to adapt to the two separate md5 stats by the 0x36 v. 0x5c padding xors.)
This is the first known method of create under hmac-md5. Once Again,
Though, The Caveats Are Deep.
From One Perspective, Saying Hmac Is Insecure When THE KEY IS KNOWN IS A Little Like Saying AES IS
INSECURE WHEN THE IS KNOWN - The Whole Point Is That The Key Is Unknown To The Attacker. It's Quite
Arguable That The Mac Primitive, Like Anything Else with a key, is allowed to collapse if the key leaks. The
Basic IDEA of Open Crypto Design, After All, IS To Migrate All Secrecy and Security Out of the Algorithm and
into the key. Is it really fair to complain when, given this design, risks show up with the key being lost? Probably not. For all the analysis in this paper, the multicollision attack against MD5 remains relatively
WEAK, WITH SPECIAL CIRCUMSTANCES Required for An Attack to successd. for tripwire to be committed, the
Initial Trust Database Needed to Be Infected. for Digital Signatures To BE Affected, Something Only Apparently
Innocuous Needed to Be Signed First. in Both Cases, The Use of MD5 Opened Up A SMALL THREAT VECTOR
Would HMAC HAVE CHANGED THIS? IF An Attacker Has Access to a System Such That Files May Be Altered, HE
PROBABLY Also Has Access To Whatever Hmac Key Tripwire Had Been Reconfigured To Use (Assuming The
Entire Contents of the File System Aren't Being Street Over The Network To An Uncomised Host). so
HMAC Doesn't Change The Threat Scenario for Tripwire. And for a Digital Signature Bait and Switch, The
Attacker Has To Have The Hmac Key To Create A Signable Payload for the Third Party Attester.
Ultimately, As Most Uses of MD5 Are Immune to the MultiCollision Vulnerability, So Too Are Most USES
Of HMAC-MD5. But when Md5 Does Experience An Applied THREAT, HMAC-MD5 Provides Limited IF ANY
Protection.
MD5 to be considered harmful someday 8
X. Strikeback: traitor Tracing
Security Is a Battle Between Attackers and Defenders, and Defenders Do Not Necessarily Need To CEDE The
Ground of cryptographic exploitation to attackers alone. a study path known as "strikeback" Examines
The Mechanisms by Which a Defender Under Attack Can Exploit Weakness in His Attackers To Defend His
Systems.
There Are Strikeback Implications To this MD5 Research.
The Proof of Concept for Stripwire WAS SIMPLE: Take An Audio File Encoded in The MPEG-1 Layer 3 (MP3) Format. Append It to Both vec1 and vec2. Note That the agglomerated files Both Play flawlessly and
Identically. this wasn't a surprising result; mp3 is a bitstream Format and as such is highly resistant to potalled
"Junk Data" Infecting The DataStream. But this Was The First Proof That Two Files with bit-Differences
And the Same MD5 Hash Could Still Function Correctly Given a Cooperative Execution Harness, And LED to the
Basic Design of Stripwire.
IT Also Yielded An Mp3 File That Contained An Extra Bit of Information - WHETER VEC1 or VEC2 HAD BEEN
Prepended. A Single Bit is Not Useful. But we are not constrained to a single bit.
Wang Has Disclosed That, Given An Arbitrary Md5 System State, Her Implementation IS Capable of Finding
A Multicollision-Capable Set After Approximately One Hour of Computation with One DoppelGanger Computable
Every Fifteen Minutes After That. It is Well With The Realm of Feasiblity To Compute 16 Sequential
Multicollision Sets, Each Adapting to the MD5 State Emitted by The Previous, With 256 (or 28) computed
DoppelGangers for Each Set. Now, Instead of the Single Bit of Information Reperesented by The Choice Between
Vec1 and vec2, we have 8 Bits of Information Per Prepended Block - and There Are 16 block. this
Yields Space for A 128 Bit Signature, And Things Just Got Much More Interesting.
A. MP3
Consider the problem of traacing the path of an mp3 file as it WINDS ITS WAY THROUGH a peer to peer
NetWork. (Peer to Peer Networks Are, Of Course, Just A Special Case of A Distributed Content Network of A Distributed Content Network of A
Which there is. - Google, For One.) Since MP3 Files Arere Error-Resilient, OneCould Connect A Custom Client To The Network That Prepended a Unique 128-Bit Serial Number To Every Song
Transmitted. EVERY Second-Level Copy Would Now Be Individually Tagged and It'd Be Possible To TRACE EVERY
File on the network back to the second-level host this retrieved it from the custom file server. Adding
A Deviarating Serial Number To Each File Transmitted Would Normal Cause Problems As Both the Search
Algorithms and File Integrity Checks on P2P NetWorks TEND to BE MD5 CENTRIC. But Since The Serial Number
Is Represented in a form there md5 is blind to, Nothing fails - Except Perhaps Some of the Opacity of Tore
P2P NetWork.
That's Not to Say The aren't countermeasures. The Serial Number is Easy To Detect, Can Be TriVially
Stripped, IS Simple To Alter, And Can Be Rendered Inoperable SIMPLY by Switching The Network To Another
Hashing Algorithm. But Even Here There Caveats - Detection May Be Simple, But Eliminating The Serial
Number Entirely Will Yield a Different Hash Value, Subtly Breaking The Network's Ability To Co. manufacturers, ALL
Identical PayLoads. and while it's possible for hosts on the p2p network to "Mix and Match" DoppelGanger
Sets from Several Hosts, It's Relative STRIGHTFORWARD to Identify A Cryptographical Secure Subset of The
2128 Possible Serial Numbers That Makes It Impossible for Uses To Synthesize Valid Serial Numbers Through
Any Other Means of Acquiring Them From A First Or Second Generation Source. And Finally Peer To Peer
NetWorks Are Still Networks and as Such Are Vulnerable to The Greatest Caveat of Network Effects: Even After
Faults are identified, so many nodes may dependhing. ..........................
There Is One Special Case on P2P NetWorks - Some Designs Allow A File to Be Acquired in Pieces from
SEVERAL DIFFERENT NODES. ONE Solution To this is to seed a file with serial number,
Md5 to be considered harmful someday 9
Perhaps three sets every 128 kilobytes. this is a much more compute-intensive operation, though, since
The Multicollision Sets Must Be Comput ON A Per-File Basis As Different Data Will Preceed Each Group of
Doppelgangers.
B. EXECUTABLES
Barring Some of The More Creative and Noticeably Illegal Designs Which Infect MP3 Files with Executable
Content, It's Not Possible for MP3-Embedded Signatures To Yield Any More Evidence Except for What THEY
Present by their existence on any number of hosts.
Actual Executables Are Another Story. They Are Generally Quite Full of undocunted and undocumentable
Functionality, Much of It Inserted by a compiler. (Thus The Limits of Auditors - They May BE
Able To Read Source, But How Many Can Read What THE Compiler Actually Emits? Because That What THE
System Ultimately Needs to Trust.) Particularly IF An Executable IS AGGRESSILY PROTECTED FROM PUBLIC DISTRIBUTION,
There Can Be No Expectation of Publicly Safe Behavior (in Fact, That's generally why aggressive
Protections Are Instituted In The First Place. That and profit motives.) It would be quite irresponsible to
Embed Code That ERASED HARD DRIVES or FLOODED NETWORKS ...
But why not locate the source of the leak?
128 Bits IS A FAIR AMOUNT OF CAPACITY - ENGLISH TEXT Only Takes 1.3 Bits Per Character, Compressed -andT'd Be Reasonable To Quadruple That If Needed. Before DISTRIBUTING An IlliTLY ACQUIRED Executable,
An Attacker Is Likey to Test It During Their Packaging Process. During this Testing, The Executable Installer
Could Be Configured To Collect PII (Personally Identifiable Information) from across the file system. The file system..
128 TO 512 MOST VALUABLE BITS OF INFORMATION WOULD BE LOCALLY TRANSFORMED INTO The Requisite MD5-BLIND
Series of DoppelGangers, and INJECTED BACK INTO The Installer Upon ITS Exit Before Mass Distribution Could
Take Place. The Range of Acquirable Data IS Extensive - Potential Sources Include:
1. NetWork Data - IP Address, DNS Name, Default Name Server, Mac Address
2. Browser Cookies, Caches, And Password Stores - Online Banking, Hotmail, Amazon 1-Click
3. Cached Instant Messenger Credentials - Yahoo, AOL IM, MSN, TRILLIAN
4. P2P Memberships - Kazaa, Gnutella2
5. Corporate Identifiers - VPN Client Data / Logs
6. SHIPPED Material - CPU ID, Vendor ID, Windows Activation Key
7. System Configurations - Time Zone, Telephone API Area Code
8. Wireless Data - Mac Addresses of Local Access Points
9. Existence Tests - Special Files in Download Directory
Also Possible But Legally ProbleMatic Would Be Acquiring Not Just One Hop's Worth of Data But Watching
The Executable As It Travels Across Large Networks, Containing Identifying Information for As Many Previous
HOPACITY BECOMES A Problem, As It Does with IP's "Record Route" Option, But We CAN
Handle It by Dynamically Reducing Resolution (The Rrdtool Approach) or by Simply Keeping An Overflow
Counter (What IP DOES).
This is not the first scheme assembled to uniquely tag executables. What's interesting here is that thesetags are self-updating as the file is trafficked, and that the self-updating tags are difficult to detect even
With Dedicated File Integrity Checks (MD5SUMS). IN A Very Unique Sense, this Is A Steganographic Strategy
AIMED NOT AT The Human Analyst But at The Precise Internal. It's quite effective.
Xi. Contlusions
THE POINT IS Not That MD5 HAS Collapsed. It isn't. The point Clear Trend Regarding
The security level of md5, and it isn't good. it is now understand That The Selection of MD5 Matters - The SELECTION OF MD5 MATTERS - THE
ConsTRAINT THAT DEPLOYED IMPLEMENTATIONS OF THE One-Way Hash Primitive Be Functionally Identical Has Been
Md5 to be considered harmful someday 10
Broken. The Failures Detected Are Not Merely Algorithmic or theoretical, Rather New Capabilities Above and
Beyond What The Primitive Specifies Are Made Available By The Selection of Md5. It is not expected.. IT IS not expertted
THIS Paper Will Cause A Precipitous Decline In The Use of Md5; That Will Probably Occur When A Means of
Silently Introducing Single-Bit Errors in Arbitrary (Rather Than Chosen) MD5 PayLoads is Discovered.
But in the security community, we measure "Phase Change" Nature of Our Systems
That Suddenly Collapse from Secure to Insecure on The Discovery of a "Zero Day" Exploit. The Phase Change
For MD5 ISN'T Here Yet, But It Will Come, Someday. NoBody Should Be Surprised When That Day Arrives.
References
[1] Antoine Joux, "MultiCollisions in Iterated Hash Functions. Applications to Cascaded Constructions.,"
2004.
[2] xiaoyunwang, Xiaoyun FENG, XUEJIA LAI, AND Hongbo Yu, "Collisions for Hash Functions MD4, MD5, HAVAL-128 and Ripemd," Cryptology EPRINT Archive, Report 2004/199, 2004, http: / / /).
Iacr. org /.
[3] Hans Dobbertin, "Cryptanalysis of Md5 Compress," 1996, http: / / / citeseer. IST. PSU.
EDU / 68442. HTML.
[4] Philip Hawkes, Michael Paddon, And Gregory G. Rose, "Musings on the Wang et al. MD5 COLLISION,"
Cryptology Eprint Archive, Report 2004/264, 2004, http: / / eprint. Iacr. ORG /.
[5] Stefan Lucks, "Design Principles for Iterated Hash Functions," Cryptology EPRINT Archive, Report
2004/253, 2004, http: / / /.
[6] David MayNor, "Trust No One, Not Even Yourself, or the Weak Link Might Be Your Build Tools," in
The Black Hat Briefings USA, 2004, http: / / www. Blackhat. Com / presentations /
BH-USA-04 / BH-US-04-MayNor. PDF.
[7] Ken Thompson, "Reflections On Trusting Trust," Commun. ACM, VOL. 27, NO. 8, PP. 761-763, 1984.
[8] Andy Green Peter Barth, Jeff Mears, "Project B (Hacking) Overview," 2004, http: / // www.
Xbox-linux. ORG / DOCS / ProjectBoverView. HTML.