Crackmes.de - GordonBM's Reverse Keygenme
This blog posts presents my solution to the Reverse Keygen by GordonBM on the “reversers’s playground” at crackmes.de. I first used IDA Pro to get the CIL representation of the key generator. You can see the full listing of the key generator function here here.
The code responsible for generating the key is in the method pump
. This function receives the message as the first argument. The function generates the key in either the simple of hard mode based on the check box status passed as the second argument. The function returns the key back to the caller, who simply displays the value by copying it to the message text box. These are the first few lines of pump
:
.method public hidebysig instance string pump(string str, bool check) // CODE XREF: button1_Click+25p { .maxstack 5 .locals init (char[] V0, string[] V1, string V2, int32 V3, string V4, bool V5) nop ldarg.1 callvirt instance char[] [mscorlib]System.String::ToCharArray() stloc.0 ldloc.0 ldlen conv.i4 newarr [mscorlib]System.String stloc.1 ldstr "" stloc.2 ldarg.2 ldc.i4.0 ceq stloc.s 5 ldloc.s 5 brtrue.s loc_4A1 // branch if check is FALSE nop ldc.i4.0 stloc.3 br.s loc_48F
The lines roughly translate to the following pseudo C# code:
string pump(string szMessage, bool bHardcoreMode) { /* v0: rgchMessage (char array representation of szMessage) v1: rgszResult (empty string array of same size as v0) v2: szResult (empty null-terminated string) v3: iIndex (index of char array) */ char[] rgchMessage = szMessage.ToCharArray(); string[] rgszResult = new string[szMessage.Length]; string szResult = ""; if( bHardcoreMode ) { // do hardcore mode } else { // do simple mode at loc_4A1 } }
The snippet converts the message string to an array of characters and stores the result in rgchMessage
. It also initializes two results variables szResult
(an empty string) and szResult
(an empty array with same size as the message). After that, the code either continues with the hardcore keygenerator (when the checkbox is set) or jumps to simple mode key generating at loc_4A1
. Let’s start with the easy version.
Simple Mode
Let’s first investigate the simple version, which starts at line loc_4A1
loc_4A1: // CODE XREF: pump+1Fj nop ldc.i4.0 stloc.3 br.s loc_4C1 (...) loc_4C1: // CODE XREF: pump+64j ldloc.3 ldarg.1 callvirt instance int32 [mscorlib]System.String::get_Length() clt stloc.s 5 ldloc.s 5 brtrue.s loc_4A6 nop loc_4D1: // CODE XREF: pump+5Fj ldloc.2 stloc.s 4 br.s loc_4D6 loc_4D6: ldloc.s 4 ret }
This snippet doesn’t to much. We are probably looking at a for
loop that iterates over all characters in szMessage
. After the for
-loop, the method pump
returns the string in szResult
. Here is the code in pseudo C#:
// loc_4A1: int iIndex = 0 // in v3 // loc_4C1: if( iIndex < szMessage.Length ) // jump to loc_4A6 return szResult;
The body of the loop is at loc_4A6
:
loc_4A6: // CODE XREF: pump+8Ej nop ldloc.1 ldloc.3 ldloc.0 ldloc.3 ldelem.u2 call string [mscorlib]System.Convert::ToString(int32) stelem.ref ldloc.2 ldloc.1 ldloc.3 ldelem.ref call string [mscorlib]System.String::Concat(string, string) stloc.2 nop ldloc.3 ldc.i4.1 add stloc.3
We can decompile it to the following pseudo C# code:
/* if rgchMessage = {'t', ...} then nCharacter = 116 and szCharacterCode = "116" */ unsigned short nCharacter = rgchMessage[iIndex]; // ldelem.u2 string szCharacterCode = nCharacter.ToString(); rgszResult[iIndex] = szCharacterCode; szResult = string.Concat(szResult, szCharacterCode); iIndex++; // continue at loc_4C1
Those few lines implement the whole magic of the simple encryption:
- They take the letter at
iIndex
into the message. For example, if the message is “test” andiIndex
is 2, then the character would bee
- The character is implicitly converted to an unsigned short. By doing this we get the ASCII code of the letter. In our example for letter
e
we get101
. - The integer code is then converted to a string, so
101
becomes"101"
Putting all parts together leads to the following key generation algorithm:
string pump(string szMessage, bool bHardcoreMode) { /* v0: rgchMessage (char array representation of szMessage) v1: rgszResult (empty string array of same size as v0) v2: szResult (empty null-terminated string) v3: iIndex (index of char array) */ char[] rgchMessage = szMessage.ToCharArray(); string[] rgszResult = new string[szMessage.Length]; string szResult = ""; if( bHardcoreMode ) { // do hardcore mode } else { // do simple mode for( int iIndex = 0; iIndex < szMessage.Length; iIndex++ ) { unsigned short nCharacter = rgchMessage[iIndex]; string szCharacterCode = nCharacter.ToString(); rgszResult[iIndex] = szCharacterCode; szResult = string.Concat(szResult, szCharacterCode); } return szResult; } }
Reversing encrypted text is straightforward. The only tricky part is differentiating between the two digit ASCII codes and the three digits ones. The latter always start with 1, while the former never do (the ASCII codes 10 to 19 are not printable). The following Python script first tokenizes the key into ASCII codes, and then transforms those code back to characters:
import argparse import re def decrypt(message): res = "" for c in re.findall('([2-9]\d|1\d{2})', message): res += chr(int(c)) return res if __name__=="__main__": parser = argparse.ArgumentParser(description="Simple Keygen") parser.add_argument("encrypted") args = parser.parse_args() print(decrypt(args.encrypted))
Example:
$ python simple_keygen.py 84104105115327311532653284101115116 This Is A Test
Hardcore Mode
Decompiling the Algorithm
The hardcore mode starts right after the branch to the simple version:
nop ldc.i4.0 stloc.3 br.s loc_48F (...) loc_48F: // CODE XREF: pump+24j ldloc.3 ldarg.1 callvirt instance int32 [mscorlib]System.String::get_Length() clt stloc.s 5 ldloc.s 5 brtrue.s loc_466 nop br.s loc_4D1 (...) loc_4D1: // CODE XREF: pump+5Fj ldloc.2 stloc.s 4 br.s loc_4D6 loc_4D6: ldloc.s 4 ret }
This snippet translates to:
int iIndex = 0 // in v3 // loc_48F if( iIndex < szMessage.Length ) // jump to loc_466 return szResult;
which, as for the simple mode, is simply a loop over all characters. The body of the loop at loc_466
reads as:
loc_466: // CODE XREF: pump+5Cj nop ldloc.1 ldloc.3 ldloc.0 ldloc.3 ldelem.u2 conv.r8 ldarg.0 ldarg.1 callvirt instance int32 [mscorlib]System.String::get_Length() call instance float64 Encrypter.Goodies.Engine::ran(int32 l) add call string [mscorlib]System.Convert::ToString(float64) stelem.ref ldloc.2 ldloc.1 ldloc.3 ldelem.ref call string [mscorlib]System.String::Concat(string, string) stloc.2 nop ldloc.3 ldc.i4.1 add stloc.3 loc_48F: // CODE XREF: pump+24j ldloc.3 ldarg.1 callvirt instance int32 [mscorlib]System.String::get_Length() clt stloc.s 5 ldloc.s 5 brtrue.s loc_466 nop br.s loc_4D1
or in C#:
double dbCharacter = (double)rgchMessage[iIndex]; double dbRanResult = ran(szMessage.Length); double dbSum = dbCharacter + dbRanResult; string szSum = dbSum.ToString(); rgszResult[iIndex] = szSum; szResult = string.Concat(szResult, szSum); iIndex++; // continue loop at loc_48F
So the hardcore mode, just like the simple version, operates on the ASCII codes of the letters and concatenates the results. This time, however, there is an additional method ran
whose return value is added to the ASCII character codes. Also the code uses double types instead of integers. Let’s try to decompile ran
:
.method private hidebysig instance float64 ran(int32 l) // CODE XREF: pump+34p { .maxstack 5 .locals init (float64 V0, int32 V1, float64 V2, bool V3) nop ldc.r8 0.0 stloc.0 ldc.i4.0 stloc.1 br.s loc_53A (...) loc_53A: // CODE XREF: ran+Dj ldloc.1 ldarg.1 ldc.i4.2 mul clt stloc.3 ldloc.3 brtrue.s loc_4EF ldloc.0 stloc.2 br.s loc_548 loc_548: ldloc.2 ret }
The function decompiles to:
double ran(int iStringLength) { double dbV0 = 0.0; int iCounter = 0; // in v1 // loc_53A: double dbTemp = 2*iStringLength; if( dbTemp > dbV0 ) // goto loc_4EF return dbV0; }
At loc_4EF
we find:
loc_4EF: // CODE XREF: ran+62j nop ldloc.1 ldc.i4.2 rem ldc.i4.0 ceq ldc.i4.0 ceq stloc.3 ldloc.3 brtrue.s loc_522 nop ldloc.0 ldloc.0 ldc.r8 2.0 mul ldarg.0 ldfld class [mscorlib]System.Random Encrypter.Goodies.Engine::random ldc.i4.0 ldc.i4.2 callvirt instance int32 [mscorlib]System.Random::Next(int32, int32) ldloc.1 ldc.i4.6 xor mul conv.r8 add add stloc.0 nop br.s loc_535
which in C# is:
if( iCounter % 2 ) // goto loc_522 else double mul = dbV0*2.0 Random rnd = new Random(); int nRandZeroOrOne = rnd.Next(0, 2); int iXOR = iCounter ^ 6; double mul2 = (double) iXOR * nRandZeroOrOne; mul2 = mul2 + mul; dbV0 = mul2 + dbV0; // go to loc_535
If the iCounter
is odd, the snippet at loc_522
gets executed:
loc_522: // CODE XREF: ran+1Bj nop ldloc.0 ldloc.0 ldc.r8 2.0 mul ldc.i4.0 conv.r8 add add stloc.0 nop loc_535: (...)
which translates to the following C# pseudo code:
double mul = 2.0 * dbV0; double res = mul + 0.0; res = res + dbV0; dbV0 = res
We can further simplify this to:
dbV0 = 3.0*dbV0;
After the two cases for iCounter
(even and odd) follows line loc_535
:
loc_535: // CODE XREF: ran+40j nop ldloc.1 ldc.i4.1 add stloc.1
These lines just increment the iCounter
variable:
iCounter++;
So to summarize all parts, this is the whole ran
method:
double ran(int iStringLength) { double dbV0 = 0.0; for(int iCounter = 0; iCounter < 2.0*iStringLength; iCounter++ ) { if( iCounter % 2 ) dbV0 = 3.0*dbV0; else double mul = dbV0*2.0 Random rnd = new Random(); int nRandZeroOrOne = rnd.Next(0, 2); int iXOR = iCounter ^ 6; double mul2 = (double) iXOR * nRandZeroOrOne; mul2 = mul2 + mul; dbV0 = mul2 + dbV0; } return dbV0; }
We can simplify this code by unrolling the even and odd case:
double ran(int iStringLength) { double dbV0 = 0.0; for(int iCounter = 0; iCounter < iStringLength; iCounter++ ) { double mul = dbV0*2.0 Random rnd = new Random(); int nRandZeroOrOne = rnd.Next(0, 2); int iXOR = (2*iCounter) ^ 6; double mul2 = (double) iXOR * nRandZeroOrOne; mul2 = mul2 + mul; dbV0 = mul2 + dbV0; dbV0 = 3.0*dbV0; } return dbV0; }
Putting all calculations in one line leads to:
double ran(int iStringLength) { Random rnd = new Random(); double dbV0 = 0.0; for(int iCounter = 0; iCounter < iStringLength; iCounter++ ) { result += 3*( ((2*iCounter) ^ 6) * rnd.Next(0, 2) + dbV0*3) return dbV0;
So to summarize: The hardcore mode also iterates over all characters and takes the ASCII codes. But then it adds random noise with the function ran
. The hardest part about reversing this key generation algorithm is definitely the uncertainty of this noise. The next section investigates this noise in more detail.
Learning about the Noise
Strings of Length 1
Let’s start with the easiest case - strings of length 1. The for loop is executed exactly once. The expression (2*iCounter) ^ 6 evaluates to 6. The random function rnd.Next(0, 2) either evaluates to 0 or to 1. This means the result of ran
is either 0 or 18.
Strings of Length 2
The for
-loop gets executed twice for two letter words. After the first pass, dbV0
holds either 0 or 18 as seen before. For the next iteration (2*iCounter) ^ 6 evaluates to 4. The random function rnd.Next(0, 2) again is 0 or 1. So if dbV0
was 0 after the first iteration, we get either 0 or 4. If, on the other hand, dbV0
was 18, we get the value 162 or 174.
Strings of Length n
The following recursive Python function generates all possible noise values for a given length:
def ran_values(length, values=None, pos=0, val=0): """get all potential random values for length as list """ if values is None: values = set() if pos == length: values.add(val) else: xor = (2*pos) ^ 6 pos += 1 # case 1: rand is 0 next_val = 9*val ran_values(length, values, pos, next_val) # case 2: rand is 1 next_val = 3*(val*3 + xor) ran_values(length, values, pos, next_val) return sorted(values) if __name__ == "__main__": for i in range(1,7): tmp = " {} {} " print(tmp.format(i, ', '.join([str(x) for x in ran_values(i)])))
Running it yields these values:
<th>
noise values
</th>
</tr>
<tr>
<td>
1
</td>
<td>
0, 18
</td>
</tr>
<tr>
<td>
2
</td>
<td>
0, 12, 162, 174
</td>
</tr>
<tr>
<td>
3
</td>
<td>
0, 6, 108, 114, 1458, 1464, 1566, 1572
</td>
</tr>
<tr>
<td>
4
</td>
<td>
0, 54, 972, 1026, 13122, 13176, 14094, 14148
</td>
</tr>
<tr>
<td>
5
</td>
<td>
0, 42, 486, 528, 8748, 8790, 9234, 9276, 118098, 118140, 118584, 118626, 126846, 126888, 127332, 127374
</td>
</tr>
<tr>
<td>
6
</td>
<td>
0, 36, 378, 414, 4374, 4410, 4752, 4788, 78732, 78768, 79110, 79146, 83106, 83142, 83484, 83520, 1062882, 1062918, 1063260, 1063296, 1067256, 1067292, 1067634, 1067670, 1141614, 1141650, 1141992, 1142028, 1145988, 1146024, 1146366, 1146402
</td>
</tr>
iStringLength |
---|
iCounter = 3
.
Not only do the number of potential noise values increase, the number of digits also varies more and more. For 6 letter strings, for example, the noise could be 0 up to the 7 digit variable 1146402. So an additional problem is to know how many characters the message has. The next section examines the mean key length for different message sizes.
Average Key Length
To guess how many characters were in the message, we need to have a statistic for the average key length given a certain message length. The following Python script simply generates all potential noise values with the method ran_values
shown above. It then adds the average ASCII code value. For the average ASCII code, I’m assuming the characters of the message are withing the range 32 to 126. This ranges includes all printable characters. The mean character would therefore have a code of 79. This is, of course, a very rough estimate. Better algorithms would determine the mean based on average messages. But to just guess the string length it should be fine. Here’s a script that lists the average length of the key for different message lengths up to 14:
from noise_overview import ran_values ASCII_RANGE = [32, 126] def generate_len_db(limit): """generate a list of mean key lengths for all msg lengths Args: limit: up to which msg length should the mean be calculated Returns: a list of mean key lengths, index i corresponds to msg length i """ len_db = [] noise_values = set() mean_off = sum(ASCII_RANGE)/float(2) for i in range(0,limit): noise_values = ran_values(i) mean_val = 0 for noise in noise_values: char = noise + mean_off mean_val += len(str(char)) mean_val *= i mean_val /= len(noise_values) len_db.append(mean_val) return len_db for i,l in enumerate(generate_len_db(15)): print("<tr><td>{}</td><td>{}</td></tr>".format(i, l))
Note that the code takes a while to complete. But the values need to be computed only once and can later be hard coded into the reverse key generator.
<th>
avg. key length
</th>
</tr>
<tr>
<td>
</td>
<td>
</td>
</tr>
<tr>
<td>
1
</td>
<td>
2
</td>
</tr>
<tr>
<td>
2
</td>
<td>
5
</td>
</tr>
<tr>
<td>
3
</td>
<td>
9
</td>
</tr>
<tr>
<td>
4
</td>
<td>
16
</td>
</tr>
<tr>
<td>
5
</td>
<td>
23
</td>
</tr>
<tr>
<td>
6
</td>
<td>
33
</td>
</tr>
<tr>
<td>
7
</td>
<td>
44
</td>
</tr>
<tr>
<td>
8
</td>
<td>
56
</td>
</tr>
<tr>
<td>
9
</td>
<td>
72
</td>
</tr>
<tr>
<td>
10
</td>
<td>
90
</td>
</tr>
<tr>
<td>
11
</td>
<td>
110
</td>
</tr>
<tr>
<td>
12
</td>
<td>
132
</td>
</tr>
<tr>
<td>
13
</td>
<td>
156
</td>
</tr>
<tr>
<td>
14
</td>
<td>
182
</td>
</tr>
message length |
---|
Given the average key length we can estimate the length of the message.
Average Key Length
The example key given by the GordonBM is 9247109931023928283286380308924708453882326686447837
and has 52 characters. The closest value in the above table is 56 which corresponds to a message length of 8. The real message “GordonBM” indeed has 8 characters. The following Python snippet returns the best guess for the message length based on the hardcoded results from the previous section:
def guess_length(key): """get the best guess for the length of the message based on hardcoded mean key lengths""" len_db = [0,2,5,9,16,23,33,44,56,72,90,110,132,156,182] len_msg = len(key) diffs = [abs(c-len_msg) for c in len_db] val, idx = min((val, idx) for (idx, val) in enumerate(diffs)) return (idx, val)
With the above code we can guess the expected length of the message. Given this information we can then calculate the noise values. The “only” remaining part is to actually crack the key. I’m doing this with brute force.
Brute-Forcing the Message
With the length information of the message and, more importantly, the resulting noise values, we can brute force the message. The following algorithm starts at the beginning of the key and iterates over all potential noise values. It then checks if any character from the ASCII_RANGE = [32, 126]
could have produced the key at hand. If yes, the algorithm advances the resulting number of digits and repeats the procedure. If it manages to reach the end of the key with all valid message characters (meaning they are in the ASCII_RANGE
), then the function tests if the length of the message checks out and simply prints the message to stdout:
ASCII_RANGE = [32, 126] def brute_force(crypt, msg_len, noises, pos=0, res=""): """brute-force potential messages Prints all potential messages (characters in ASCII_RANGE) Args: crypt: the key that should be reverse msg_len: the length of the message noises: the list of potential noise values for the given length Returns: nothing, prints all strings to stdout """ for noise in noises: low, upp = (tmp + noise for tmp in ASCII_RANGE) for span in range(len(str(low)), len(str(upp))+1): digits = crypt[pos:pos+span] code = int(digits) if low <= code <= upp: msg_char = chr(code - noise) concat = res + msg_char if pos+span >= len(crypt): if len(concat) == msg_len: print(concat) else: pass # string does not match expected nr of chars else: brute_force(crypt, msg_len, noises, pos+span, concat)
Putting all together
The entire reverse key generators looks as follows:
"""Reverse key generator for GordonBM's Reverse Keygenme see http://crackmes.cf/users/gordonbm/reverse_keygenme/ (hardcore mode)""" import argparse ASCII_RANGE = [32, 126] def ran_values(length, values=None, pos=0, val=0): """get all potential random values for length as list """ if values is None: values = set() if pos == length: values.add(val) else: xor = (2*pos) ^ 6 pos += 1 # case 1: rand is 0 next_val = 9*val ran_values(length, values, pos, next_val) # case 2: rand is 1 next_val = 3*(val*3 + xor) ran_values(length, values, pos, next_val) return sorted(values) def guess_length(key): """get the best guess for the length of the message based on hardcoded mean key lengths""" len_db = [0,2,5,9,16,23,33,44,56,72,90,110,132,156,182] len_msg = len(key) diffs = [abs(c-len_msg) for c in len_db] val, idx = min((val, idx) for (idx, val) in enumerate(diffs)) return (idx, val) def brute_force(crypt, msg_len, noises, pos=0, res=""): """brute-force potential messages Prints all potential messages (characters in ASCII_RANGE) Args: crypt: the key that should be reverse msg_len: the length of the message noises: the list of potential noise values for the given length Returns: nothing, prints all strings to stdout """ for noise in noises: low, upp = (tmp + noise for tmp in ASCII_RANGE) for span in range(len(str(low)), len(str(upp))+1): digits = crypt[pos:pos+span] code = int(digits) if low <= code <= upp: msg_char = chr(code - noise) concat = res + msg_char if pos+span >= len(crypt): if len(concat) == msg_len: print(concat) else: pass # string does not match expected nr of chars else: brute_force(crypt, msg_len, noises, pos+span, concat) def crack(key): """crack the key""" msg_len, error = guess_length(key) print("message length is probably {}, (delta {})".format(msg_len, error)) noise = ran_values(msg_len) print("there are {} different noise values".format(len(noise))) print("potential messages are:") brute_force(key, msg_len, noise) if __name__ == "__main__": """ reverses keys like: 142641551691142 9247109931023928283286380308924708453882326686447837 """ parser = argparse.ArgumentParser(description="Hardcore Reverse Keygen") parser.add_argument('key', help="the key to be reversed") args = parser.parse_args() crack(args.key)
Let’s test it with the key 142641551691142
which was produced entering the message “test
”:
$ python hardcore_keygen.py 142641551691142 message length is probably 4, (delta 1) there are 8 different noise values potential messages are: test
Nice! It found the message. Now let’s try the longer example key 9247109931023928283286380308924708453882326686447837
:
$ python hardcore_keygen.py 9247109931023928283286380308924708453882326686447837 message length is probably 8, (delta 4) there are 128 different noise values potential messages are: _ordonBe _ordonBM _ordon*e _ordon*M _ordWnBe _ordWnBM _ordWn*e _ordWn*M _orLonBe _orLonBM _orLon*e _orLon*M _orLWnBe _orLWnBM _orLWn*e _orLWn*M _oZdonBe _oZdonBM _oZdon*e _oZdon*M _oZdWnBe _oZdWnBM _oZdWn*e _oZdWn*M _oZLonBe _oZLonBM _oZLon*e _oZLon*M _oZLWnBe _oZLWnBM _oZLWn*e _oZLWn*M GordonBe GordonBM Gordon*e Gordon*M GordWnBe GordWnBM GordWn*e GordWn*M GorLonBe GorLonBM GorLon*e GorLon*M GorLWnBe GorLWnBM GorLWn*e GorLWn*M GoZdonBe GoZdonBM GoZdon*e GoZdon*M GoZdWnBe GoZdWnBM GoZdWn*e GoZdWn*M GoZLonBe GoZLonBM GoZLon*e GoZLon*M GoZLWnBe GoZLWnBM GoZLWn*e GoZLWn*M
Because the message is longer (and therefore also the potential noise values), we get not just one message back but 64 slightly different ones. Among those values is also the original message “GordonBM
”. But without additional knowledge about the message we can’t do better than provide the extensive list of potential message.
Archived Comments
Note: I removed the Disqus integration in an effort to cut down on bloat. The following comments were retrieved with the export functionality of Disqus. If you have comments, please reach out to me by Twitter or email.