Practical Reverse Engineering Solutions – Page 78 (Part I)
my go at mystery 1 to mystery 4 on pages 78ffThis blog post presents my solution to the first four exercises on page 78ff from the book Practical Reverse Engineering by Bruce Dang, Alexandre Gazet and Elias Bachaalany (ISBN: 1118787315). The book is my first contact with reverse engineering, so take my statements with a grain of salt. All code snippets are on GitHub. For an overview of my solutions consult this progress page
For the code in each exercise, do the following in order (whenever possible):
- Determine whether it is in Thumb or ARM state.
- Explain each instruction’s semantic. If the instruction is
LDR/STR
, explain the addressing mode as well.- Identify the types (width and signedness) for every possible object. For structures, recover field size, type, and friendly name whenever possible. Not all structure fields will be recoverable because the function may only access a few fields. For each type recovered, explain to yourself (or someone else) how you inferred it.
- Recover the function prototype.
- Identify the function prologue and epilogue.
- Explain what the function does and then write pseudo-code for it.
- Decompile the function back to C and give it a meaningful name.
Mystery 1
Figure 2-8 shows a function that takes two arguments. It may seem somewhat challenging at first, but its functionality is very common. Have patience.
This is the function in Figure 2-8:
01: mystery1 02: F0 01 2D E9 STMFD SP!, {R4-R8} 03: 00 30 D0 E5 LDRB R3, [R0] 04: 2D 00 53 E3 CMP R3, #0x2D 05: 29 00 00 0A BEQ loc_B34806 06: 2B 00 53 E3 CMP R3, #0x2B 07: 00 60 A0 E3 MOV R6, #0 08: 01 30 F0 05 LDREQB R3, [R0,#1]! 09: loc_B2AC 10: 30 00 53 E3 CMP R3, #0x30 11: 04 00 00 1A BNE loc_B2C8 12: 01 30 80 E2 ADD R3, R0, #1 13: loc_B2B8 14: 03 00 A0 E1 MOV R0, R3 15: 01 20 D3 E4 LDRB R2, [R3],#1 16: 30 00 52 E3 CMP R2, #0x30 17: FB FF FF 0A BEQ loc_B2B8 18: loc_B2C8 19: 00 C0 A0 E3 MOV R12, #0 20: 00 40 A0 E3 MOV R4, #0 21: 00 50 A0 E3 MOV R5, #0 22: 0A 80 A0 E3 MOV R8, #0xA 23: 01 00 00 EA B loc_B2E4 24: loc_B2DC 25: 07 40 92 E0 ADDS R4, R2, R7 26: C7 5F A3 E0 ADC R5, R3, R7,ASR#31 27: loc_B2E4 28: 0C 70 D0 E7 LDRB R7, [R0,R12] 29: 01 c0 8C E2 ADD R12, R12, #1 30: 94 28 83 E0 UMULL R2, R3, R4, R8 31: 30 70 57 E2 SUBS R7, R7, #0x30 32: 07 00 00 4A BMI loc_B318 33: 09 00 57 E3 CMP R7, #9 34: 98 35 23 E0 MLA R3, R8, R5, R3 35: 04 00 00 CA BGT loc_B318 36: 0B 00 5C E3 CMP R12, #0xB 37: F3 FF FF 1A BNE loc_B2DC 38: loc_B30C 39: 00 00 A0 E3 MOV R0, #0 40: loc_B310 41: F0 01 BD E8 LDMFD SP!, {R4-R8} 42: 1E FF 2F E1 BX LR 43: loc_B318 44: 06 20 54 E0 SUBS R2, R4, R6 45: C6 3F C5 E0 SBC R3, R5, R6,ASR#31 46: 02 01 52 E3 CMP R2, #0x80000000 47: 00 00 D3 E2 SBCS R0, R3, #0 48: F7 FF FF AA BGE loc_B30c 49: 00 00 56 E3 CMP R6, #0 50: 01 00 00 0A BEQ loc_B33C 51: 00 40 74 E2 RSBS R4, R4, #0 52: 00 50 E5 E2 RSC R5, R5, #0 53: loc_B33C 54: 00 40 81 E5 STR R4, [R1] 55: 01 00 A0 E3 MOV R0, #1 56: F1 FF FF EA B loc_B310 57: loc_B348 58: 01 30 F0 E5 LDRB R3, [R0,#1]! 59: 01 60 A0 E3 MOV R6, #1 60: D5 FF FF EA B loc_B2Ac 61: ; End of function mystery1
ARM or Thumb
The code is in ARM state for the following reasons:
- The code uses only 32bit instruction, none of the has the .W suffix required by many Thumb 32bit instructions. For example,
LDRB
instead ofLDRB.W
- The code uses conditional instructions like
LDREQ
in line 8. Condition codes are not allowed in ARM state. - The
RSC
instruction in line 52 is not available in Thumb state (only theRSB
instruction is).
Instruction Semantic
Only the more interesting instructions are discussed:
Line 2,
STMFD SP!, {R4-R8}
: Store multiple, decrement before. The stack picture changes as follows:SP -> ▭ ▭
R8 R7 R6 R5 SP -> R4Line 3,
LDRB R3, [R0]
: Load register byte.R3
becomesZeroExtend( Memu(R0,1), 32)
, i.e., the byte at memory locationR0
.Line 8,
LDREQB R3, [R0,#1]!
: Load register byte pre-indexed, if zero byte is set. IfR3 == 43
according to line 6, thenR3
becomesZeroExtend( Memu(R0+1,1), 32)
, andR0 = R0+1
.Line 15,
LDRB R2, [R3],#1
: Load byte post-indexed.R2
becomesZeroExtend( Memu(R3,1), 32)
, andR3 = R3+1
.Line 26,
ADC R5, R3, R7,ASR#31
: Add with carry.R5 = R3 + R7>>31 + APSR.C
Line 28,
LDRB R7, [R0,R12]
: Load byte.R7
becomesZeroExtend( Memu(R0+R12, 1), 32)
.Line 30,
UMULL R2, R3, R4, R8
: 64bit multiply(R3,R2) = R4*R8
, i.e.,R2 = (R4*R8)<31:0>
andR3 = (R4*R8)<63:32>
.Line 34,
MLA R3, R8, R5, R4
: multiply and add.R3 = ((R8*R5) + R3)<31:0>
.Line 41,
LDMFD SP!, {R4-R8}
: Load multiple, increment after. Does the reverse of line 2.Line 45,
SBC R3, R5, R6,ASR#31
: Subtract with carry. SinceR6
is 0 or 1 at this point,R6, ASRR#31
is zero, therefore the instruction boils down toR3 = -R5
.Line 51,
RSBS R4, R4, #0
: Reverse subtract with S modifier. The instruction negates the registerR4 = -R4
.Line 52,
RSC R5, R5, #0
: Reverse subtract with carry. The instruction negates the register, including the carry flag:R5 = -(R5 + ASPR.C)
Line 54,
STR R4, [R1]
: StoreR4
at memory locationR1
, i.e.,MemU(R1, 4) = R4
Line 58,
LDRB R3, [R0,#1]!
: Same as line 8, except unconditional.
The instructions 25 and 26 implement 64 addition, the instructions 30 and 34 implement 64 multiplication.
Types
The blocks in lines 13 to 17 as well as 27-37 implement loops that iterates over bytes in R0
. Therefore arg1
is probably an array. Since the increment is one byte, and data is retrieved with LDREQB
(which loads bytes). The elements of the array are probably of type char. In line 4 and line 6 the first entry of the array arg1
is compared to 43 and 45 respectively. Those are the Ascii codes for -
and +
. In lines 10 and 16 the code compares array entries with 48, the Ascii code for "0"
. Lines 31/32 and 33/35 check if array entries are in range "0"
to "9"
. From this we conclude that arg1
is probably a string that contains a representation of a number. The number has the following formats:
- An optional leading sign ‘-‘ or ‘+’.
- Optionally any number of leading zeros.
- Numbers 0-9
The Regex for valid strings is [-+]?0*[0-9]+
.
The result of the conversion is stored in arg2
, which therefore is a pointer to an integer. The return value of the function is a boolean. FALSE
means the conversion of the string to an integer failed, TRUE
means the conversion succeeded.
The pairs R3,R2
and R5, R4
for the most part contain 64bit numbers, where R3
and R5
hold the higher 32 bits, and R2
and R4
hold to lower 32 bits.
Function Prototype
The function prototype is:
BOOL string2integer(char*, int*);
Prologue and Epilogue
Line 2 preserves registers R4
to R8
, line 41 restores the registers. Apart from the function parameters R0
to R4
, only register R12
is changed by the function.
Purpose and Pseudo-code
The function converts a string to a signed integer. The string can start with +
, -
or nothing, and can contain leading zeros. Only integer numbers are allowed. The string needs to be terminated by a non-number, e.g., the null-byte. The string needs to be in range -231 to 231, i.e., represent a 32bit signed integer.
Examples:
'123\0'
will be converted to123
'+00437Euro'
will be converted to437
'-00001 '
will be converted to-1
'643'
gives an unpredictable result.'-6.37'
will be converted to-6
'1E7'
will be converted to1
'3000000000'
will return FALSE, because the number exceeds 231
The following pseudo-code describes the algorithm:
bool string2integer(char *str, int *result) { index = 0 res = 0 sign = 1 # parse the (optional) sign if str[index] == '+' then index = index + 1 elsif str[index] == '-' then index = index + 1 sign = -1 # skip any leading zeros while str[index] == '0' do index = index + 1 # parse the number while '0' <= str[index] <= '9' do res = res*10 + (str[index] - '0') if abs(res) >= 2^31 then return FALSE else *result = res return TRUE }
C-Code
#include <stdlib.h> bool string2integer(char *str, int *result) { int index = 0; long res = 0; char sign = 1; // parse the(optional) sign if (str[index] == '+') { index++; } else if (str[index] == '-') { index++; sign = -1; } // skip any leading zeros while (str[index] == '0') index++; // parse the number while ('0' <= str[index] <= '9') res = res * 10 + (str[index] - '0'); if (abs(res) >= 2 ^ 31) return false; else { *result = res; return true; } }
Mystery 2
Figure 2-9 shows a function that was found in the export table.
This is the function in Figure 2-9:
01: mystery2 02: 28 B1 CBZ R0, loc_C672 03: 90 F8 63 00 LDRB.W R0, [R0,#0x63] 04: 00 38 SUBS R0, #0 05: 18 BF IT NE 06: 01 20 MOVNE R0, #1 07: 70 47 BX LR 08: loc_C672 09: 01 20 MOVS R0, #1 10: 70 47 BX LR 11: ; End of function mystery2
ARM or Thumb
The code is in Thumb state for the following reasons:
- The instructions
CBZ
andIT NE
are only available in Thumb - Line 3 shows an instruction with
.W
suffix - All instructions except line 3 are 16bit
Instruction Semantic
Only the more interesting instructions are discussed:
- Line 2,
B1 CBZ R0, loc_C672
: means ifR0
is zero, gotoloc_C672
, else continue with line 3. - Line 3,
LDRB.W R0, [R0,#0x63]
, doeschar R0 = MemU(R0 + 63h, 1)
- Line 4, 5, 6 implement the check: if R0 != 0 then return 1 else return 0
Types
The function takes one argument arg1 = R0
. Line 3 accesses a byte at offset 63h, the parameter is probably a struct:
struct1 ... +0x063 field63_c ; char
Function Prototype
The function prototype is:
BOOL field63c_isnotzero(struct1*);
Prologue and Epilogue
There is no prologue or epilogue.
Purpose and Pseudo-code
The function checks, if the member of type byte at offset 63h of a struct is zero. If it is zero, the function returns 0, if the member is non-zero, or if the struct is a null pointer, then 1 is returned:
BOOL field63c_isnotzero(struct1 *s) { if s == NULL then return TRUE elsif s->field63c == 0 then return FALSE else return TRUE }
C-Code
#include <stdlib.h> bool field63c_isnotzero(struct1 *s) { if (s && s->field63c) return true; else return false; }
Mystery 3
Here is a simple function: [see below]
This is the function:
01: mystery3
02: 83 68 LDR R3, [R0,#8] 03: 0B 60 STR R3, [R1] 04: C3 68 LDR R3, [R0,#0xC] 05: 00 20 MOVS R0, #0 06: 4B 60 STR R3, [R1,#4] 07: 70 47 BX LR 08: ; End of function mystery3
ARM or Thumb
The code is clearly in Thumb state because all instructions are 16 bit.
Instruction Semantic
Only the more interesting instructions are discussed:
- Line 2,
LDR R3, [R0,#8]
: sets ``R3 = Memu(R0+8, 4) - Line 3,
STR R3, [R1]
, setsR1->field0_i = R3
- Line 4,
LDR R3, [R0,#0xC]
, setsR3 = Memu(R0+12, 4)
- Line 5,
STR R3, [R1, #4]
, setsR1->field4_i = R3
Types
The function takes two arguments. Both might be structs, the first parameter:
struct_arg1
..
x0x008 field08_i ; int
x0x00C field0c_i ; int</pre>
the second parameter
struct_arg2
x0x000 field00_i ; int, same type as struct_arg1.field08_i
x0x004 field04_i ; int, same type as struct_arg1.field0c_i
Function Prototype
The function prototype is:
BOOL copy_members(struct_arg1*, struct_arg2*);
Prologue and Epilogue
There is no prologue or epilogue. The function only touches function parameters ``R1`` to ``R3`` anyway.
Purpose and Pseudo-code
The function copies two members of a struct passed as the first parameter to a struct passed as the second parameter.
BOOL copymembers(struct_arg1 *s1, struct_arg2 *s2)
{
s1->field00_i = s2->field08_i
s1->field04_i = s2->field0c_i
return FALSE
}
C-Code
#include <stdlib.h>
bool copymembers(struct_arg1 *s1, struct_arg2 *s2)
{
s1->field00_i = s2->field08_i;
s1->field04_i = s2->field0c_i;
return false;
}
Mystery 4
Figure 2-10 shows another easy function.
This is the function:
01: mystery4 02: 08 B9 CBNZ R0, loc_100C3DA 03: 00 20 MOVS R0, #0 04: 70 47 BX LR 05: loc_100C3DA 06: 50 F8 08 0C LDR.W R0, [R0,#–8] 07: 70 47 BX LR 08: ; End of function mystery4
ARM or Thumb
The code is clearly in Thumb state for the following reasons:
- The instruction ``CBNZ`` only exists in Thumb state
- Most instructions are 16 bit
- The only 32bit instruction has the ``.W`` suffix
Instruction Semantic
Only the more interesting instructions are discussed:
- Line 2, ``CBNZ R0, loc_100C3DA``: does ``if arg1 == 0 then goto loc_100C3DA else continue with line 3``
- Line 3, ``MOVS R0, #0``, sets ``R0 = 0``
- Line 4, and 7 ``BX LR``, returns
- Line 6, ``LDR.W R0, [R0,#-8]``, sets ``int return_value = *(arg1-8)``
Types
The function take one argument, which is a pointer. At offset -8 to this pointer we find an 32 bit value. The function returns this value as R0
, therefore
Function Prototype
The function prototype is:
INT32 retrieve_value_at_minus8(int*);
Prologue and Epilogue
There is no prologue or epilogue. The function only changes the function parameter in R0
, it returns with BX LR
(changing the state back to ARM).
Purpose and Pseudo-code
The function returns the 32 bit value at memory location arg1 -8
. If arg1
is null, the function returns null. INT32 retrieve_value_at_minus8(int* something) { if not something then return NULL else return *(something - 8) }
C-Code
#include <stdlib.h> int retrieve_value_at_minus8(int* something) { if (!something) return 0; else return *(something - 8); }
Archived Comments
Note: I removed the Disqus integration in an effort to cut down on bloat. The following comments were retrieved with the export functionality of Disqus. If you have comments, please reach out to me by Twitter or email.