Crackmes.de - GordonBM's Reverse Keygenme
This blog posts presents my solution to the Reverse Keygen by GordonBM on the “reversers’s playground” at crackmes.de. I first used IDA Pro to get the CIL representation of the key generator. You can see the full listing of the key generator function here here.
The code responsible for generating the key is in the method pump. This function receives the message as the first argument. The function generates the key in either the simple of hard mode based on the check box status passed as the second argument. The function returns the key back to the caller, who simply displays the value by copying it to the message text box. These are the first few lines of pump:
.method public hidebysig instance string pump(string str, bool check)
// CODE XREF: button1_Click+25p
{
.maxstack 5
.locals init (char[] V0,
string[] V1,
string V2,
int32 V3,
string V4,
bool V5)
nop
ldarg.1
callvirt instance char[] [mscorlib]System.String::ToCharArray()
stloc.0
ldloc.0
ldlen
conv.i4
newarr [mscorlib]System.String
stloc.1
ldstr ""
stloc.2
ldarg.2
ldc.i4.0
ceq
stloc.s 5
ldloc.s 5
brtrue.s loc_4A1 // branch if check is FALSE
nop
ldc.i4.0
stloc.3
br.s loc_48F
The lines roughly translate to the following pseudo C# code:
string pump(string szMessage, bool bHardcoreMode) {
/* v0: rgchMessage (char array representation of szMessage)
v1: rgszResult (empty string array of same size as v0)
v2: szResult (empty null-terminated string)
v3: iIndex (index of char array)
*/
char[] rgchMessage = szMessage.ToCharArray();
string[] rgszResult = new string[szMessage.Length];
string szResult = "";
if( bHardcoreMode ) {
// do hardcore mode
}
else {
// do simple mode at loc_4A1
}
}
The snippet converts the message string to an array of characters and stores the result in rgchMessage. It also initializes two results variables szResult (an empty string) and szResult (an empty array with same size as the message). After that, the code either continues with the hardcore keygenerator (when the checkbox is set) or jumps to simple mode key generating at loc_4A1. Let’s start with the easy version.
Simple Mode
Let’s first investigate the simple version, which starts at line loc_4A1
loc_4A1: // CODE XREF: pump+1Fj
nop
ldc.i4.0
stloc.3
br.s loc_4C1
(...)
loc_4C1: // CODE XREF: pump+64j
ldloc.3
ldarg.1
callvirt instance int32 [mscorlib]System.String::get_Length()
clt
stloc.s 5
ldloc.s 5
brtrue.s loc_4A6
nop
loc_4D1: // CODE XREF: pump+5Fj
ldloc.2
stloc.s 4
br.s loc_4D6
loc_4D6:
ldloc.s 4
ret
}
This snippet doesn’t to much. We are probably looking at a for loop that iterates over all characters in szMessage. After the for-loop, the method pump returns the string in szResult. Here is the code in pseudo C#:
// loc_4A1:
int iIndex = 0 // in v3
// loc_4C1:
if( iIndex < szMessage.Length )
// jump to loc_4A6
return szResult;
The body of the loop is at loc_4A6:
loc_4A6: // CODE XREF: pump+8Ej
nop
ldloc.1
ldloc.3
ldloc.0
ldloc.3
ldelem.u2
call string [mscorlib]System.Convert::ToString(int32)
stelem.ref
ldloc.2
ldloc.1
ldloc.3
ldelem.ref
call string [mscorlib]System.String::Concat(string, string)
stloc.2
nop
ldloc.3
ldc.i4.1
add
stloc.3
We can decompile it to the following pseudo C# code:
/* if rgchMessage = {'t', ...} then nCharacter = 116 and
szCharacterCode = "116" */
unsigned short nCharacter = rgchMessage[iIndex]; // ldelem.u2
string szCharacterCode = nCharacter.ToString();
rgszResult[iIndex] = szCharacterCode;
szResult = string.Concat(szResult, szCharacterCode);
iIndex++;
// continue at loc_4C1
Those few lines implement the whole magic of the simple encryption:
- They take the letter at
iIndexinto the message. For example, if the message is “test” andiIndexis 2, then the character would bee - The character is implicitly converted to an unsigned short. By doing this we get the ASCII code of the letter. In our example for letter
ewe get101. - The integer code is then converted to a string, so
101becomes"101"
Putting all parts together leads to the following key generation algorithm:
string pump(string szMessage, bool bHardcoreMode) {
/* v0: rgchMessage (char array representation of szMessage)
v1: rgszResult (empty string array of same size as v0)
v2: szResult (empty null-terminated string)
v3: iIndex (index of char array)
*/
char[] rgchMessage = szMessage.ToCharArray();
string[] rgszResult = new string[szMessage.Length];
string szResult = "";
if( bHardcoreMode ) {
// do hardcore mode
}
else {
// do simple mode
for( int iIndex = 0; iIndex < szMessage.Length; iIndex++ ) {
unsigned short nCharacter = rgchMessage[iIndex];
string szCharacterCode = nCharacter.ToString();
rgszResult[iIndex] = szCharacterCode;
szResult = string.Concat(szResult, szCharacterCode);
}
return szResult;
}
}
Reversing encrypted text is straightforward. The only tricky part is differentiating between the two digit ASCII codes and the three digits ones. The latter always start with 1, while the former never do (the ASCII codes 10 to 19 are not printable). The following Python script first tokenizes the key into ASCII codes, and then transforms those code back to characters:
import argparse
import re
def decrypt(message):
res = ""
for c in re.findall('([2-9]\d|1\d{2})', message):
res += chr(int(c))
return res
if __name__=="__main__":
parser = argparse.ArgumentParser(description="Simple Keygen")
parser.add_argument("encrypted")
args = parser.parse_args()
print(decrypt(args.encrypted))Example:
$ python simple_keygen.py 84104105115327311532653284101115116
This Is A TestHardcore Mode
Decompiling the Algorithm
The hardcore mode starts right after the branch to the simple version:
nop
ldc.i4.0
stloc.3
br.s loc_48F
(...)
loc_48F: // CODE XREF: pump+24j
ldloc.3
ldarg.1
callvirt instance int32 [mscorlib]System.String::get_Length()
clt
stloc.s 5
ldloc.s 5
brtrue.s loc_466
nop
br.s loc_4D1
(...)
loc_4D1: // CODE XREF: pump+5Fj
ldloc.2
stloc.s 4
br.s loc_4D6
loc_4D6:
ldloc.s 4
ret
}This snippet translates to:
int iIndex = 0 // in v3
// loc_48F
if( iIndex < szMessage.Length )
// jump to loc_466
return szResult;which, as for the simple mode, is simply a loop over all characters. The body of the loop at loc_466 reads as:
loc_466: // CODE XREF: pump+5Cj
nop
ldloc.1
ldloc.3
ldloc.0
ldloc.3
ldelem.u2
conv.r8
ldarg.0
ldarg.1
callvirt instance int32 [mscorlib]System.String::get_Length()
call instance float64 Encrypter.Goodies.Engine::ran(int32 l)
add
call string [mscorlib]System.Convert::ToString(float64)
stelem.ref
ldloc.2
ldloc.1
ldloc.3
ldelem.ref
call string [mscorlib]System.String::Concat(string, string)
stloc.2
nop
ldloc.3
ldc.i4.1
add
stloc.3
loc_48F: // CODE XREF: pump+24j
ldloc.3
ldarg.1
callvirt instance int32 [mscorlib]System.String::get_Length()
clt
stloc.s 5
ldloc.s 5
brtrue.s loc_466
nop
br.s loc_4D1or in C#:
double dbCharacter = (double)rgchMessage[iIndex];
double dbRanResult = ran(szMessage.Length);
double dbSum = dbCharacter + dbRanResult;
string szSum = dbSum.ToString();
rgszResult[iIndex] = szSum;
szResult = string.Concat(szResult, szSum);
iIndex++;
// continue loop at loc_48FSo the hardcore mode, just like the simple version, operates on the ASCII codes of the letters and concatenates the results. This time, however, there is an additional method ran whose return value is added to the ASCII character codes. Also the code uses double types instead of integers. Let’s try to decompile ran:
.method private hidebysig instance float64 ran(int32 l) // CODE XREF: pump+34p
{
.maxstack 5
.locals init (float64 V0,
int32 V1,
float64 V2,
bool V3)
nop
ldc.r8 0.0
stloc.0
ldc.i4.0
stloc.1
br.s loc_53A
(...)
loc_53A: // CODE XREF: ran+Dj
ldloc.1
ldarg.1
ldc.i4.2
mul
clt
stloc.3
ldloc.3
brtrue.s loc_4EF
ldloc.0
stloc.2
br.s loc_548
loc_548:
ldloc.2
ret
}The function decompiles to:
double ran(int iStringLength) {
double dbV0 = 0.0;
int iCounter = 0; // in v1
// loc_53A:
double dbTemp = 2*iStringLength;
if( dbTemp > dbV0 )
// goto loc_4EF
return dbV0;
}At loc_4EF we find:
loc_4EF: // CODE XREF: ran+62j
nop
ldloc.1
ldc.i4.2
rem
ldc.i4.0
ceq
ldc.i4.0
ceq
stloc.3
ldloc.3
brtrue.s loc_522
nop
ldloc.0
ldloc.0
ldc.r8 2.0
mul
ldarg.0
ldfld class [mscorlib]System.Random Encrypter.Goodies.Engine::random
ldc.i4.0
ldc.i4.2
callvirt instance int32 [mscorlib]System.Random::Next(int32, int32)
ldloc.1
ldc.i4.6
xor
mul
conv.r8
add
add
stloc.0
nop
br.s loc_535which in C# is:
if( iCounter % 2 )
// goto loc_522
else
double mul = dbV0*2.0
Random rnd = new Random();
int nRandZeroOrOne = rnd.Next(0, 2);
int iXOR = iCounter ^ 6;
double mul2 = (double) iXOR * nRandZeroOrOne;
mul2 = mul2 + mul;
dbV0 = mul2 + dbV0;
// go to loc_535If the iCounter is odd, the snippet at loc_522 gets executed:
loc_522: // CODE XREF: ran+1Bj
nop
ldloc.0
ldloc.0
ldc.r8 2.0
mul
ldc.i4.0
conv.r8
add
add
stloc.0
nop
loc_535:
(...)which translates to the following C# pseudo code:
double mul = 2.0 * dbV0;
double res = mul + 0.0;
res = res + dbV0;
dbV0 = resWe can further simplify this to:
dbV0 = 3.0*dbV0;
After the two cases for iCounter (even and odd) follows line loc_535:
loc_535: // CODE XREF: ran+40j
nop
ldloc.1
ldc.i4.1
add
stloc.1These lines just increment the iCounter variable:
iCounter++;
So to summarize all parts, this is the whole ran method:
double ran(int iStringLength) {
double dbV0 = 0.0;
for(int iCounter = 0; iCounter < 2.0*iStringLength; iCounter++ ) {
if( iCounter % 2 )
dbV0 = 3.0*dbV0;
else
double mul = dbV0*2.0
Random rnd = new Random();
int nRandZeroOrOne = rnd.Next(0, 2);
int iXOR = iCounter ^ 6;
double mul2 = (double) iXOR * nRandZeroOrOne;
mul2 = mul2 + mul;
dbV0 = mul2 + dbV0;
}
return dbV0;
}We can simplify this code by unrolling the even and odd case:
double ran(int iStringLength) {
double dbV0 = 0.0;
for(int iCounter = 0; iCounter < iStringLength; iCounter++ ) {
double mul = dbV0*2.0
Random rnd = new Random();
int nRandZeroOrOne = rnd.Next(0, 2);
int iXOR = (2*iCounter) ^ 6;
double mul2 = (double) iXOR * nRandZeroOrOne;
mul2 = mul2 + mul;
dbV0 = mul2 + dbV0;
dbV0 = 3.0*dbV0;
}
return dbV0;
}Putting all calculations in one line leads to:
double ran(int iStringLength) {
Random rnd = new Random();
double dbV0 = 0.0;
for(int iCounter = 0; iCounter < iStringLength; iCounter++ ) {
result += 3*( ((2*iCounter) ^ 6) * rnd.Next(0, 2) + dbV0*3)
return dbV0;So to summarize: The hardcore mode also iterates over all characters and takes the ASCII codes. But then it adds random noise with the function ran. The hardest part about reversing this key generation algorithm is definitely the uncertainty of this noise. The next section investigates this noise in more detail.
Learning about the Noise
Strings of Length 1
Let’s start with the easiest case - strings of length 1. The for loop is executed exactly once. The expression (2*iCounter) ^ 6 evaluates to 6. The random function rnd.Next(0, 2) either evaluates to 0 or to 1. This means the result of ran is either 0 or 18.
Strings of Length 2
The for-loop gets executed twice for two letter words. After the first pass, dbV0 holds either 0 or 18 as seen before. For the next iteration (2*iCounter) ^ 6 evaluates to 4. The random function rnd.Next(0, 2) again is 0 or 1. So if dbV0 was 0 after the first iteration, we get either 0 or 4. If, on the other hand, dbV0 was 18, we get the value 162 or 174.
Strings of Length n
The following recursive Python function generates all possible noise values for a given length:
def ran_values(length, values=None, pos=0, val=0):
"""get all potential random values for length as list """
if values is None:
values = set()
if pos == length:
values.add(val)
else:
xor = (2*pos) ^ 6
pos += 1
# case 1: rand is 0
next_val = 9*val
ran_values(length, values, pos, next_val)
# case 2: rand is 1
next_val = 3*(val*3 + xor)
ran_values(length, values, pos, next_val)
return sorted(values)
if __name__ == "__main__":
for i in range(1,7):
tmp = "
{}
{}
"
print(tmp.format(i, ', '.join([str(x) for x in ran_values(i)])))
Running it yields these values:
<th>
noise values
</th>
</tr>
<tr>
<td>
1
</td>
<td>
0, 18
</td>
</tr>
<tr>
<td>
2
</td>
<td>
0, 12, 162, 174
</td>
</tr>
<tr>
<td>
3
</td>
<td>
0, 6, 108, 114, 1458, 1464, 1566, 1572
</td>
</tr>
<tr>
<td>
4
</td>
<td>
0, 54, 972, 1026, 13122, 13176, 14094, 14148
</td>
</tr>
<tr>
<td>
5
</td>
<td>
0, 42, 486, 528, 8748, 8790, 9234, 9276, 118098, 118140, 118584, 118626, 126846, 126888, 127332, 127374
</td>
</tr>
<tr>
<td>
6
</td>
<td>
0, 36, 378, 414, 4374, 4410, 4752, 4788, 78732, 78768, 79110, 79146, 83106, 83142, 83484, 83520, 1062882, 1062918, 1063260, 1063296, 1067256, 1067292, 1067634, 1067670, 1141614, 1141650, 1141992, 1142028, 1145988, 1146024, 1146366, 1146402
</td>
</tr>
| iStringLength |
|---|
iCounter = 3.
Not only do the number of potential noise values increase, the number of digits also varies more and more. For 6 letter strings, for example, the noise could be 0 up to the 7 digit variable 1146402. So an additional problem is to know how many characters the message has. The next section examines the mean key length for different message sizes.
Average Key Length
To guess how many characters were in the message, we need to have a statistic for the average key length given a certain message length. The following Python script simply generates all potential noise values with the method ran_values shown above. It then adds the average ASCII code value. For the average ASCII code, I’m assuming the characters of the message are withing the range 32 to 126. This ranges includes all printable characters. The mean character would therefore have a code of 79. This is, of course, a very rough estimate. Better algorithms would determine the mean based on average messages. But to just guess the string length it should be fine. Here’s a script that lists the average length of the key for different message lengths up to 14:
from noise_overview import ran_values
ASCII_RANGE = [32, 126]
def generate_len_db(limit):
"""generate a list of mean key lengths for all msg lengths
Args:
limit: up to which msg length should the mean be calculated
Returns:
a list of mean key lengths, index i corresponds to msg length i
"""
len_db = []
noise_values = set()
mean_off = sum(ASCII_RANGE)/float(2)
for i in range(0,limit):
noise_values = ran_values(i)
mean_val = 0
for noise in noise_values:
char = noise + mean_off
mean_val += len(str(char))
mean_val *= i
mean_val /= len(noise_values)
len_db.append(mean_val)
return len_db
for i,l in enumerate(generate_len_db(15)):
print("<tr><td>{}</td><td>{}</td></tr>".format(i, l))
Note that the code takes a while to complete. But the values need to be computed only once and can later be hard coded into the reverse key generator.
<th>
avg. key length
</th>
</tr>
<tr>
<td>
</td>
<td>
</td>
</tr>
<tr>
<td>
1
</td>
<td>
2
</td>
</tr>
<tr>
<td>
2
</td>
<td>
5
</td>
</tr>
<tr>
<td>
3
</td>
<td>
9
</td>
</tr>
<tr>
<td>
4
</td>
<td>
16
</td>
</tr>
<tr>
<td>
5
</td>
<td>
23
</td>
</tr>
<tr>
<td>
6
</td>
<td>
33
</td>
</tr>
<tr>
<td>
7
</td>
<td>
44
</td>
</tr>
<tr>
<td>
8
</td>
<td>
56
</td>
</tr>
<tr>
<td>
9
</td>
<td>
72
</td>
</tr>
<tr>
<td>
10
</td>
<td>
90
</td>
</tr>
<tr>
<td>
11
</td>
<td>
110
</td>
</tr>
<tr>
<td>
12
</td>
<td>
132
</td>
</tr>
<tr>
<td>
13
</td>
<td>
156
</td>
</tr>
<tr>
<td>
14
</td>
<td>
182
</td>
</tr>
| message length |
|---|
Given the average key length we can estimate the length of the message.
Average Key Length
The example key given by the GordonBM is 9247109931023928283286380308924708453882326686447837 and has 52 characters. The closest value in the above table is 56 which corresponds to a message length of 8. The real message “GordonBM” indeed has 8 characters. The following Python snippet returns the best guess for the message length based on the hardcoded results from the previous section:
def guess_length(key):
"""get the best guess for the length of the message based on hardcoded
mean key lengths"""
len_db = [0,2,5,9,16,23,33,44,56,72,90,110,132,156,182]
len_msg = len(key)
diffs = [abs(c-len_msg) for c in len_db]
val, idx = min((val, idx) for (idx, val) in enumerate(diffs))
return (idx, val)With the above code we can guess the expected length of the message. Given this information we can then calculate the noise values. The “only” remaining part is to actually crack the key. I’m doing this with brute force.
Brute-Forcing the Message
With the length information of the message and, more importantly, the resulting noise values, we can brute force the message. The following algorithm starts at the beginning of the key and iterates over all potential noise values. It then checks if any character from the ASCII_RANGE = [32, 126] could have produced the key at hand. If yes, the algorithm advances the resulting number of digits and repeats the procedure. If it manages to reach the end of the key with all valid message characters (meaning they are in the ASCII_RANGE), then the function tests if the length of the message checks out and simply prints the message to stdout:
ASCII_RANGE = [32, 126]
def brute_force(crypt, msg_len, noises, pos=0, res=""):
"""brute-force potential messages
Prints all potential messages (characters in ASCII_RANGE)
Args:
crypt: the key that should be reverse
msg_len: the length of the message
noises: the list of potential noise values for the given length
Returns:
nothing, prints all strings to stdout
"""
for noise in noises:
low, upp = (tmp + noise for tmp in ASCII_RANGE)
for span in range(len(str(low)), len(str(upp))+1):
digits = crypt[pos:pos+span]
code = int(digits)
if low <= code <= upp:
msg_char = chr(code - noise)
concat = res + msg_char
if pos+span >= len(crypt):
if len(concat) == msg_len:
print(concat)
else:
pass # string does not match expected nr of chars
else:
brute_force(crypt, msg_len, noises, pos+span, concat)Putting all together
The entire reverse key generators looks as follows:
"""Reverse key generator for GordonBM's Reverse Keygenme
see http://crackmes.cf/users/gordonbm/reverse_keygenme/
(hardcore mode)"""
import argparse
ASCII_RANGE = [32, 126]
def ran_values(length, values=None, pos=0, val=0):
"""get all potential random values for length as list """
if values is None:
values = set()
if pos == length:
values.add(val)
else:
xor = (2*pos) ^ 6
pos += 1
# case 1: rand is 0
next_val = 9*val
ran_values(length, values, pos, next_val)
# case 2: rand is 1
next_val = 3*(val*3 + xor)
ran_values(length, values, pos, next_val)
return sorted(values)
def guess_length(key):
"""get the best guess for the length of the message based on hardcoded
mean key lengths"""
len_db = [0,2,5,9,16,23,33,44,56,72,90,110,132,156,182]
len_msg = len(key)
diffs = [abs(c-len_msg) for c in len_db]
val, idx = min((val, idx) for (idx, val) in enumerate(diffs))
return (idx, val)
def brute_force(crypt, msg_len, noises, pos=0, res=""):
"""brute-force potential messages
Prints all potential messages (characters in ASCII_RANGE)
Args:
crypt: the key that should be reverse
msg_len: the length of the message
noises: the list of potential noise values for the given length
Returns:
nothing, prints all strings to stdout
"""
for noise in noises:
low, upp = (tmp + noise for tmp in ASCII_RANGE)
for span in range(len(str(low)), len(str(upp))+1):
digits = crypt[pos:pos+span]
code = int(digits)
if low <= code <= upp:
msg_char = chr(code - noise)
concat = res + msg_char
if pos+span >= len(crypt):
if len(concat) == msg_len:
print(concat)
else:
pass # string does not match expected nr of chars
else:
brute_force(crypt, msg_len, noises, pos+span, concat)
def crack(key):
"""crack the key"""
msg_len, error = guess_length(key)
print("message length is probably {}, (delta {})".format(msg_len, error))
noise = ran_values(msg_len)
print("there are {} different noise values".format(len(noise)))
print("potential messages are:")
brute_force(key, msg_len, noise)
if __name__ == "__main__":
""" reverses keys like:
142641551691142
9247109931023928283286380308924708453882326686447837
"""
parser = argparse.ArgumentParser(description="Hardcore Reverse Keygen")
parser.add_argument('key', help="the key to be reversed")
args = parser.parse_args()
crack(args.key)Let’s test it with the key 142641551691142 which was produced entering the message “test”:
$ python hardcore_keygen.py 142641551691142 message length is probably 4, (delta 1) there are 8 different noise values potential messages are: test
Nice! It found the message. Now let’s try the longer example key 9247109931023928283286380308924708453882326686447837:
$ python hardcore_keygen.py 9247109931023928283286380308924708453882326686447837 message length is probably 8, (delta 4) there are 128 different noise values potential messages are: _ordonBe _ordonBM _ordon*e _ordon*M _ordWnBe _ordWnBM _ordWn*e _ordWn*M _orLonBe _orLonBM _orLon*e _orLon*M _orLWnBe _orLWnBM _orLWn*e _orLWn*M _oZdonBe _oZdonBM _oZdon*e _oZdon*M _oZdWnBe _oZdWnBM _oZdWn*e _oZdWn*M _oZLonBe _oZLonBM _oZLon*e _oZLon*M _oZLWnBe _oZLWnBM _oZLWn*e _oZLWn*M GordonBe GordonBM Gordon*e Gordon*M GordWnBe GordWnBM GordWn*e GordWn*M GorLonBe GorLonBM GorLon*e GorLon*M GorLWnBe GorLWnBM GorLWn*e GorLWn*M GoZdonBe GoZdonBM GoZdon*e GoZdon*M GoZdWnBe GoZdWnBM GoZdWn*e GoZdWn*M GoZLonBe GoZLonBM GoZLon*e GoZLon*M GoZLWnBe GoZLWnBM GoZLWn*e GoZLWn*M
Because the message is longer (and therefore also the potential noise values), we get not just one message back but 64 slightly different ones. Among those values is also the original message “GordonBM”. But without additional knowledge about the message we can’t do better than provide the extensive list of potential message.
Archived Comments
Note: I removed the Disqus integration in an effort to cut down on bloat. The following comments were retrieved with the export functionality of Disqus. If you have comments, please reach out to me by Twitter or email.
