Practical Reverse Engineering Solutions – Page 78 (Part III)
my go at mystery7, mystery8 and mystery 9 on pages 78ff- Mystery 7
- ARM or Thumb
- Instruction Semantic
- Types
- Function Prototype
- Prologue and Epilogue
- Purpose and Pseudo-code
- C-Code
- Mystery 8
- ARM or Thumb
- Instruction Semantic
- Types
- Function Prototype
- Prologue and Epilogue
- Purpose and Pseudo-code
- Mystery 9
- ARM or Thumb
- Instruction Semantic
- Types
- Function Prototype
- Prologue and Epilogue
- Purpose and Pseudo-code
This blog post presents my solution to exercises 7 to 9 on page 78ff from the book Practical Reverse Engineering by Bruce Dang, Alexandre Gazet and Elias Bachaalany (ISBN: 1118787315). The book is my first contact with reverse engineering, so take my statements with a grain of salt. All code snippets are on GitHub. For an overview of my solutions consult this progress page.
For the code in each exercise, do the following in order (whenever possible):
- Determine whether it is in Thumb or ARM state.
- Explain each instruction’s semantic. If the instruction is
LDR/STR
, explain the addressing mode as well.- Identify the types (width and signedness) for every possible object. For structures, recover field size, type, and friendly name whenever possible. Not all structure fields will be recoverable because the function may only access a few fields. For each type recovered, explain to yourself (or someone else) how you inferred it.
- Recover the function prototype.
- Identify the function prologue and epilogue.
- Explain what the function does and then write pseudo-code for it.
- Decompile the function back to C and give it a meaningful name.
Mystery 7
Figure 2-13 illustrates a common routine, but you may not have seen it
implemented this way.
This is the code from Figure 2-13 (note that the listing in the book has a typo in line 13, 00 2B CMB R3, #0
should of course read 00 2B CMP R3, #0
):
mystery7 02 46 MOV R2, R0 08 B9 CBNZ R0, loc_100E1D8 00 20 MOVS R0, #0 70 47 BX LR loc_100E1D8 90 F9 00 30 LDRSB.W R3, [R0] 02 E0 B loc_100E1E4 loc_100E1DE 01 32 ADDS R2, #1 92 F9 00 30 LDRSB.W R3, [R2] loc_100E1E4 00 2B CMP R3, #0 FA D1 BNE loc_100E1DE 10 1A SUBS R0, R2, R0 6F F3 9F 70 BFC.W R0, #0x1E, #2 70 47 BX LR ; End of function mystery7
ARM or Thumb
The code is in Thumb state: The snippet uses 16bit and 32bit instructions and some instructions have the .W
suffix.
Instruction Semantic
The only non common instruction is BFC.W R0, #0x1E, #2
which does a bit field clear. The instruction sets the two most significant bits to zero, so 0xFFFFFFFF
would become 0x3FFFFFFF
.
Types
The function only takes one argument arg1 = R0
. The loop in lines 9 to 14 iterate over bytes of arg1
(LDRSB.W
in line 11 accesses a single byte, and ADDS R2, #1
in line 10 increments the array index by one byte). Furthermore, the loop ends if in line 13 an array element is . This indicates that
arg1
is a pointer to a null terminated string. The function returns an unsigned int.
Function Prototype
The function prototype is:
UNSIGNED INT mystery7(CHAR*);
Prologue and Epilogue
There is no prologue or epilogue. The function does not overwrite any registers except R0
, R2
and R3
. It returns with BX LR
which switches back to ARM state.
Purpose and Pseudo-code
The function searches for the null byte in string arg1
. Line 15 SUBS R0, R2, R0
computes the difference between the address of the null byte and the address of the start of the string. This corresponds to the length of the string. The function implements strlen. I don’t understand the purpose of setting the two most significant bits of the difference to zero. Those bits shouldn’t be set in the first place for any reasonable strings.
FUNCTION mystery7(str) len = 0 WHILE str[len] != 0 DO len = len + 1 ENDWHILE RETURN len END
C-Code
unsigned int strlen(const char *s) { unsigned int len = 0; while( s[len] != '\0' ) len++; return len; }
Mystery 8
In Figure 2-14, byteArray is a 256-character array whose content is
byteArray[] = {0, 1, …, 0xff}.
This is the code from Figure 2-14:
mystery8 2D E9 78 48 PUSH.W {R3–R6,R11,LR} 0D F2 10 0B ADDW R11, SP, #0x10 0C 4E LDR R6, =byteArray 09 E0 B loc_100E34C loc_100E338 05 78 LDRB R5, [R0] 01 3A SUBS R2, #1 4D B1 CBZ R5, loc_100E352 0B 78 LDRB R3, [R1] 9C 5D LDRB R4, [R3,R6] AB 5D LDRB R3, [R5,R6] A3 42 CMP R3, R4 04 D1 BNE loc_100E352 01 30 ADDS R0, #1 01 31 ADDS R1, #1 loc_100E34C 00 2A CMP R2, #0 F3 DC BGT loc_100E338 01 3A SUBS R2, #1 loc_100E352 00 2A CMP R2, #0 01 DA BGE loc_100E35A 00 20 MOVS R0, #0 04 E0 B locret_100E364 loc_100E35A 0B 78 LDRB R3, [R1] 9A 5D LDRB R2, [R3,R6] 03 78 LDRB R3, [R0] 9B 5D LDRB R3, [R3,R6] 98 1A SUBS R0, R3, R2 locret_100E364 BD E8 78 88 POP.W {R3–R6,R11,PC} ; End of function mystery8
ARM or Thumb
The code is in Thumb state:
- The snippet uses 16bit and 32bit instructions.
- Some instructions have the
.W
suffix. - The
CBZ
instruction is only available in Thumb state. - The code uses
PUSH.W
andPOP.W
as function prologue and epilogue
Instruction Semantic
Nothing special, except maybe LDR R6, =byteArray
which is a pseudo-instruction that sets R6
to point to the array {0,1,2,…,255}
Types
The function takes three arguments. arg1 = R0
and arg2 = R1
are both used in an array fashion: Lines 6 to 19 iterate over bytes of those to arrays. The code also compares elements of arg1
to arg2
, so the two parameters are probably of the same type. In line 9 there’s a check if the elements in arg1
are , so
arg1
and arg2
are null terminated strings.
The third parameter arg3
acts as a limit counter (see lines 8 and 18/19). So the type is probably an unsigned int (or any other unsigned integer type).
Function Prototype
The function prototype is:
UNSIGNED INT mystery8(CHAR*, CHAR*, UNSIGNED INT);
Prologue and Epilogue
Lines 2 and 33 preserve registers R3-R6
and R11
. Apart from the three function arguments R0
to R2
the functions doesn’t write any other registers. The PUSH
and POP
also save and jump to the return address respectively.
Purpose and Pseudo-code
If found it easiest to start with the loop in line 6 to line 19. Here’s an almost one to one translation to pseudo-code:
FUNCTION mystery8(string1, string2, limit) byteArray = {0, 1, 2, ..., 255} index = 0 DO IF limit <= 0 THEN // break to line 20 R5 = string1[index] limit = limit -1 IF R5 == 0 THEN // break to line 21 R4 = byteArray[string2[index]] R3 = byteArray[string1[index]] index = index + 1 WHILE R3 != R4 index = index - 1 // line 21
Line 20 just decrements limit
, we can eliminate the instruction by moving limit = limit-1
up and changing to a strict check limit < 0
.
Starting with line 21 the snippet checks if the limit is not yet zero (meaning the code did take the second BREAK
). If this is the case, the code returns a difference based on first array elements where there was a difference.
If the limit is zero, then the first BREAK
must have been taken and the code returns .
FUNCTION mystery8(string1, string2, limit) byteArray = {0, 1, 2, ..., 255} index = 0 DO limit = limit -1 IF limit < 0 THEN BREAK R5 = string1[index] IF R5 == 0 THEN BREAK R4 = byteArray[string2[index]] R3 = byteArray[string1[index]] index = index + 1 WHILE R3 != R4 // line 21 index = index - 1 IF limit >= 0 THEN R2 = byteArray[string2[index]] R3 = byteArray[string1[index]] RETURN R3 - R2 ELSE RETURN 0
The byteArray
is {0,1,…,255}
, so for all bytes n
we have byteArray[n] = n
. We can further simplify the code by moving the last IF/ELSE
construct inside the loop:
FUNCTION mystery8(string1, string2, limit) FOR index = 0 to limit - 1 DO IF string1[index] == 0 OR string1[index] != string2[index] THEN RETURN string1[index] - string2[index] ENDIF ENDFOR RETURN 0 END
From this it should be obvious that the snippet implements strncmp, which compares two strings up to limit
characters. If the strings are equal up to limit
, the snippet returns 0. Otherwise it returns a negative number if the second string comes lexicographically after the first, and a positive number vice-versa.
Example:
string 1 | string 2 | limit | result |
---|---|---|---|
“string a” | “string b” | 1000 | -1 |
“string a” | “string b” | 5 | |
“string a” | “String b” | 1000 | 32 |
“word” | “wording” | 5 | -105 |
Mystery 9
What does the function shown in Figure 2-15 do?
The code is a stripped down version of mystery8:
mystery9 2D E9 30 48 PUSH.W {R4,R5,R11,LR} 0D F2 08 0B ADDW R11, SP, #8 09 4D LDR R5, =byteArray 06 E0 B loc_100E312 loc_100E304 0B 78 LDRB R3, [R1] 5A 5D LDRB R2, [R3,R5] 63 5D LDRB R3, [R4,R5] 93 42 CMP R3, R2 04 D1 BNE loc_100E318 01 30 ADDS R0, #1 01 31 ADDS R1, #1 loc_100E312 04 78 LDRB R4, [R0] 00 2C CMP R4, #0 F5 D1 BNE loc_100E304 loc_100E318 0B 78 LDRB R3, [R1] 5A 5D LDRB R2, [R3,R5] 03 78 LDRB R3, [R0] 5B 5D LDRB R3, [R3,R5] 98 1A SUBS R0, R3, R2 BD E8 30 88 POP.W {R4,R5,R11,PC} ; End of function mystery9
ARM or Thumb
The code is in Thumb state:
- The snippet uses 16bit and 32bit instructions.
- Some instructions have the
.W
suffix. - The code uses
PUSH.W
andPOP.W
as function prologue and epilogue
Instruction Semantic
See mystery 8.
Types
The function takes two arguments. arg1 = R0
and arg2 = R1
. They are used in the same manner as in mystery 8 so arg1
and arg2
are null terminated strings.
Function Prototype
The function prototype is:
UNSIGNED INT mystery9(CHAR*, CHAR*);
Prologue and Epilogue
The same as in mystery 8 applies, except the function doesn’t need R6
and therefore doesn’t save and restore it.
Purpose and Pseudo-code
We can get the pseudo-code by removing all missing parts from mystery8:
FUNCTION mystery8(string1, string2) index = 0 WHILE TRUE IF string1[index] == 0 OR string1[index] != string2[index] THEN RETURN string1[index] - string2[index] ENDIF index = index + 1 ENDWHILE RETURN 0 END
The code implements strcmp, which is the unlimited version of strncmp from mystery8. So the comparison always checks the entire strings up to the length of the shorter one:
Example:
string 1 | string 2 | result |
---|---|---|
“string a” | “string b” | -1 |
“string a” | “String b” | 32 |
“word” | “wording” | -105 |