I recently worked on adding support to oclHashcat in order to crack the different versions of password protected MS Office documents. So far I've finished MS Office 2013, 2010, 2007 and the 97-2003 versions (there are different ones). The next version of oclHashcat (v1.31) will ship with support for all these versions.
While I was working on the 97-2003 version I found out that there's a weakness in the scheme that I want to share. I'm not sure if this technique is already known, but I've never seen a cracker that does what is neccessary to exploit it. OTH, I don't know all of them. Also, as far as I understood it, this attack is different to the flaw discovered back in 2005, when Office is using the same keystream to encrypt two different versions / revisions of the same document, but don't nail me on that!
The following explanations describe the vulnerability to a meet-in-the-middle attack. If you don't know what I mean with meet-in-the-middle attack, see here: https://hashcat.net/events/p13/js-ocohaaaa.pdf
With a single hd7970 I'm able to collide every password protected MS Office 97-2003 documented using MD5/RC4 in less than 6 hours! And yes, you can open the document "normally" with a collided password. In order to explain it you first need to understand the KDF MS office uses.
KDF
OK, what is this encryptedVerifier and encryptedVerifierHash? From oclHashcat's view it's just two blocks of random data of length 16 bytes. Why is MS office doing this? The answer is it enables a key validation. The idea is the following: generate 16 bytes of random data, then run it through MD5. Then store both, the original random data and the MD5 result after both were encrypted with RC4. Now, if one has the correct RC4 Key, he can decrypt back to the original random data and MD5 it. If the result matches the decryptedVerifierHash data we know we use the correct RC4 Key. Office will work the same way.
MS original description: http://msdn.microsoft.com/en-us/library/...12%29.aspx
Problem
The problem here is the truncation of the MD5 digest in step 6. OK, the key for the RC4 is 128 bit, but it is calculated out of a MD5 which uses just a 40 bit input (because the last 32 bit are just zeros).
Exploitation
The idea is to iterate through those 2^40 combinations, beginning from step 8. Once we find the correct RC4 Key, which is the case when step 13 is true, we do not need to do those steps ever again. From now on, in oclHashcat, we will just calculate steps 1-5 and then compare the first 5 byte with our pre-cracked intermediate hash. That's the meet-in-the-middle attack.
To make it more comfortable, this technique requires two seperate oclHashcat modes. The first to brute-force the intermediate hash and once you cracked it just store it. Then you can use oclHashcat "normally" with the second mode. This mode works with the previosly cracked key and therefore only needs to calculates steps 1-5. Note that in this second mode you are not stick to Brute-Force, you can use any attack-mode you want. Actually you will not need to do that unless you try to find the real password that was used (forensic stuff). Just continue reading...
So I made some real life tests how the cracking performance changes. One need to know RC4 isn't very GPU friendly. If we can avoid to compute it we will speed up the performance alot. The current speed, on a single hd7970, to crack password protected MS office 97-2003 documents is ~65 MH/s. That's still faster than any other cracker but if we exploit the weakness the numbers changes as following (on a single hd7970):
So we can speed up from 66MH/s to 390MH/s on a single hd7970. Not too bad, eh?
Colliding
With 390MH MH/s, all we need to find is a MD5 digest where the first 5 byte matches the precomputed intermediate hash. Problem is that in theory even by iterating through 2^40 combinations it's not guaranteed to find at least one matching one. But I believe there's still a good chance as it's only the first 5 byte we need to collide. After that, the collision will work as a valid password in MS Office.
In the following experiment below I found my first collision after 14 seconds :-)
Here's the log from my experiment that I made to ensure all this theoretical stuff is really working:
Note: If you're wondering what this code section is about, ignore it. It's just info for myself if I want to reproduce it
First mode, precompute intermediate hash
Now that we have the intermediate hash $HEX[f2ab1219ae], we can move forward to our second module:
Second mode, collide intermediate hash
Collision found in 14 seconds
Of course, there's no guarantee always to find a collision in 14 seconds, but it will not take to long...
Test
Yes, you can use the collision to open the document. If you want to try, here's the .doc I've been using:
Document: https://hashcat.net/misc/DocOld2010.doc
You can use both passwords:
Jens "atom" Steube
While I was working on the 97-2003 version I found out that there's a weakness in the scheme that I want to share. I'm not sure if this technique is already known, but I've never seen a cracker that does what is neccessary to exploit it. OTH, I don't know all of them. Also, as far as I understood it, this attack is different to the flaw discovered back in 2005, when Office is using the same keystream to encrypt two different versions / revisions of the same document, but don't nail me on that!
The following explanations describe the vulnerability to a meet-in-the-middle attack. If you don't know what I mean with meet-in-the-middle attack, see here: https://hashcat.net/events/p13/js-ocohaaaa.pdf
With a single hd7970 I'm able to collide every password protected MS Office 97-2003 documented using MD5/RC4 in less than 6 hours! And yes, you can open the document "normally" with a collided password. In order to explain it you first need to understand the KDF MS office uses.
KDF
- Generate 16 byte random salt
- Calculate MD5 of unicode version of the password
- Truncate 16 byte result to 5 byte
- Generate a string of length 336 byte by repeating the string "$digest$salt" 16 times -- (16 * (5 + 16)) = 336
- MD5 the 336 bytes
- Truncate 16 byte result to 5 byte
- Append 4 byte zeros to result
- MD5 the 9 bytes
- Use 16 byte result as 128 bit RC4 Key
- Decrypt encryptedVerifier with RC4 to decryptedVerifier
- Decrypt encryptedVerifierHash with RC4 to decryptedVerifierHash
- MD5 the decrypted encryptedVerifier
- Compare 16 byte result with decrypted encryptedVerifierHash
OK, what is this encryptedVerifier and encryptedVerifierHash? From oclHashcat's view it's just two blocks of random data of length 16 bytes. Why is MS office doing this? The answer is it enables a key validation. The idea is the following: generate 16 bytes of random data, then run it through MD5. Then store both, the original random data and the MD5 result after both were encrypted with RC4. Now, if one has the correct RC4 Key, he can decrypt back to the original random data and MD5 it. If the result matches the decryptedVerifierHash data we know we use the correct RC4 Key. Office will work the same way.
MS original description: http://msdn.microsoft.com/en-us/library/...12%29.aspx
Problem
The problem here is the truncation of the MD5 digest in step 6. OK, the key for the RC4 is 128 bit, but it is calculated out of a MD5 which uses just a 40 bit input (because the last 32 bit are just zeros).
Exploitation
The idea is to iterate through those 2^40 combinations, beginning from step 8. Once we find the correct RC4 Key, which is the case when step 13 is true, we do not need to do those steps ever again. From now on, in oclHashcat, we will just calculate steps 1-5 and then compare the first 5 byte with our pre-cracked intermediate hash. That's the meet-in-the-middle attack.
To make it more comfortable, this technique requires two seperate oclHashcat modes. The first to brute-force the intermediate hash and once you cracked it just store it. Then you can use oclHashcat "normally" with the second mode. This mode works with the previosly cracked key and therefore only needs to calculates steps 1-5. Note that in this second mode you are not stick to Brute-Force, you can use any attack-mode you want. Actually you will not need to do that unless you try to find the real password that was used (forensic stuff). Just continue reading...
So I made some real life tests how the cracking performance changes. One need to know RC4 isn't very GPU friendly. If we can avoid to compute it we will speed up the performance alot. The current speed, on a single hd7970, to crack password protected MS office 97-2003 documents is ~65 MH/s. That's still faster than any other cracker but if we exploit the weakness the numbers changes as following (on a single hd7970):
- The first mode to pre-crack 2^40 combinations with a speed of ~80MH/s
- The first mode is guaranteed to succeed and will take a maximum of 4 hours (2 hours avg)
- The second mode to calculate steps 1-5. I tried it with a hand-hacked kernel, as a form of a prototype, it will do ~ 390MH/s
So we can speed up from 66MH/s to 390MH/s on a single hd7970. Not too bad, eh?
Colliding
With 390MH MH/s, all we need to find is a MD5 digest where the first 5 byte matches the precomputed intermediate hash. Problem is that in theory even by iterating through 2^40 combinations it's not guaranteed to find at least one matching one. But I believe there's still a good chance as it's only the first 5 byte we need to collide. After that, the collision will work as a valid password in MS Office.
In the following experiment below I found my first collision after 14 seconds :-)
Here's the log from my experiment that I made to ensure all this theoretical stuff is really working:
Note: If you're wondering what this code section is about, ignore it. It's just info for myself if I want to reproduce it
First mode, precompute intermediate hash
Code:
// kernel hacked at position step 5, check final hash to get out the intermediate hash
// comment out everything up to here
// use ?b?b?b?b?b mask to simulate countercracker
// --outfile-format 5
// --markov-disable (easier to debug)
// remove unicode in host
// w0_t[0] = digest[0];
// w0_t[1] = digest[1] & 0xff;
w0_t[0] = w0[0];
w0_t[1] = w0[1] & 0xff;
Quote:
root@et:~/oclHashcat-1.31# ./oclHashcat64.bin -m 9700 hash -a 3 ?b?b?b?b?b -w 3 --potfile-disable
oclHashcat v1.31 starting...
Device #1: Tahiti, 2967MB, 1000Mhz, 32MCU
Device #2: Tahiti, 2967MB, 1000Mhz, 32MCU
Device #3: Tahiti, 2967MB, 1000Mhz, 32MCU
Hashes: 1 hashes; 1 unique digests, 1 unique salts
Bitmaps: 8 bits, 256 entries, 0x000000ff mask, 1024 bytes
Applicable Optimizers:
* Zero-Byte
* Precompute-Init
* Not-Iterated
* Single-Hash
* Single-Salt
* Brute-Force
Watchdog: Temperature abort trigger set to 90c
Watchdog: Temperature retain trigger set to 80c
Device #1: Kernel ./amd/m9700_a3.cl (31816 bytes)
Device #1: Kernel ./amd/markov_le_v1.cl (10294 bytes)
Device #1: Kernel ./amd/bzero.cl (887 bytes)
Device #2: Kernel ./amd/m9700_a3.cl (31816 bytes)
Device #2: Kernel ./amd/markov_le_v1.cl (10294 bytes)
Device #2: Kernel ./amd/bzero.cl (887 bytes)
Device #3: Kernel ./amd/m9700_a3.cl (31816 bytes)
Device #3: Kernel ./amd/markov_le_v1.cl (10294 bytes)
Device #3: Kernel ./amd/bzero.cl (887 bytes)
$oldoffice$1*d6aabb63363188b9b73a88efb9c9152e*afbbb9254764273f8f4fad9a5d82981f*6f09fd2eafc4ade522b5f2bee0eaf66d:$HEX[f2ab1219ae]
Session.Name...: oclHashcat
Status.........: Cracked
Input.Mode.....: Mask (?b?b?b?b?b) [5]
Hash.Target....: $oldoffice$1*d6aabb63363188b9b73a88efb9c9152e*afbbb9254764273f8f4fad9a5d82981f*6f09fd2eafc4ade522b5f2bee0eaf66d
Hash.Type......: MS Office <= 2003 MD5 + RC4, oldoffice$0, oldoffice$1
Time.Started...: Mon Sep 8 19:44:59 2014 (52 mins, 9 secs)
Speed.GPU.#1...: 79666.8 kH/s
Speed.GPU.#2...: 79675.2 kH/s
Speed.GPU.#3...: 79660.5 kH/s
Speed.GPU.#*...: 239.0 MH/s
Recovered......: 1/1 (100.00%) Digests, 1/1 (100.00%) Salts
Progress.......: 747777294336/1099511627776 (68.01%)
Skipped........: 0/747777294336 (0.00%)
Rejected.......: 0/747777294336 (0.00%)
HWMon.GPU.#1...: 92% Util, 43c Temp, 37% Fan
HWMon.GPU.#2...: 88% Util, 43c Temp, 37% Fan
HWMon.GPU.#3...: 92% Util, 44c Temp, 37% Fan
Started: Mon Sep 8 19:44:59 2014
Stopped: Mon Sep 8 20:37:12 2014
Now that we have the intermediate hash $HEX[f2ab1219ae], we can move forward to our second module:
Second mode, collide intermediate hash
Code:
// kernel hacked at Position step 5, check intermediate Hash:
// switch back to unicode
uintx a = 0;
uintx b = 0;
uintx c = 0;
uintx d = 0;
if ((digest[0] == swap_workaround (0xf2ab1219)) && ((digest[1] & 0xff) == swap_workaround (0xae000000)))
{
a = search[0];
b = search[1];
c = search[2];
d = search[3];
}
// comment out everything from here
Quote:
root@et:~/oclHashcat-1.31# ./oclHashcat64.bin -m 9700 hash -a 3 ?a?a?a?a?a?a -w 3 --potfile-disable
oclHashcat v1.31 starting...
Device #1: Tahiti, 2967MB, 1000Mhz, 32MCU
Device #2: Tahiti, 2967MB, 1000Mhz, 32MCU
Device #3: Tahiti, 2967MB, 1000Mhz, 32MCU
Hashes: 1 hashes; 1 unique digests, 1 unique salts
Bitmaps: 8 bits, 256 entries, 0x000000ff mask, 1024 bytes
Applicable Optimizers:
* Zero-Byte
* Precompute-Init
* Not-Iterated
* Single-Hash
* Single-Salt
* Brute-Force
Watchdog: Temperature abort trigger set to 90c
Watchdog: Temperature retain trigger set to 80c
Device #1: Kernel ./amd/m9700_a3.cl (32100 bytes)
Device #1: Kernel ./amd/markov_le_v1.cl (10294 bytes)
Device #1: Kernel ./amd/bzero.cl (887 bytes)
Device #2: Kernel ./amd/m9700_a3.cl (32100 bytes)
Device #2: Kernel ./amd/markov_le_v1.cl (10294 bytes)
Device #2: Kernel ./amd/bzero.cl (887 bytes)
Device #3: Kernel ./amd/m9700_a3.cl (32100 bytes)
Device #3: Kernel ./amd/markov_le_v1.cl (10294 bytes)
Device #3: Kernel ./amd/bzero.cl (887 bytes)
$oldoffice$1*d6aabb63363188b9b73a88efb9c9152e*afbbb9254764273f8f4fad9a5d82981f*6f09fd2eafc4ade522b5f2bee0eaf66d:zvDtu!
Session.Name...: oclHashcat
Status.........: Cracked
Input.Mode.....: Mask (?a?a?a?a?a?a) [6]
Hash.Target....: $oldoffice$1*d6aabb63363188b9b73a88efb9c9152e*afbbb9254764273f8f4fad9a5d82981f*6f09fd2eafc4ade522b5f2bee0eaf66d
Hash.Type......: MS Office <= 2003 MD5 + RC4, oldoffice$0, oldoffice$1
Time.Started...: Mon Sep 8 21:31:16 2014 (14 secs)
Speed.GPU.#1...: 392.6 MH/s
Speed.GPU.#2...: 392.6 MH/s
Speed.GPU.#3...: 392.6 MH/s
Speed.GPU.#*...: 1177.8 MH/s
Recovered......: 1/1 (100.00%) Digests, 1/1 (100.00%) Salts
Progress.......: 15860301824/735091890625 (2.16%)
Skipped........: 0/15860301824 (0.00%)
Rejected.......: 0/15860301824 (0.00%)
HWMon.GPU.#1...: 98% Util, 43c Temp, 37% Fan
HWMon.GPU.#2...: 98% Util, 43c Temp, 39% Fan
HWMon.GPU.#3...: 98% Util, 43c Temp, 41% Fan
Started: Mon Sep 8 21:31:16 2014
Stopped: Mon Sep 8 21:31:34 2014
Collision found in 14 seconds
Of course, there's no guarantee always to find a collision in 14 seconds, but it will not take to long...
Test
Yes, you can use the collision to open the document. If you want to try, here's the .doc I've been using:
Document: https://hashcat.net/misc/DocOld2010.doc
You can use both passwords:
- hashcat
- zvDtu!
Jens "atom" Steube