Practical Reverse Engineering Solutions – Page 78 (Part II)
my go at mystery5 and mystery6 on pages 78ffThis blog post presents my solution to exercises 5 and 6 on page 78ff from the book Practical Reverse Engineering by Bruce Dang, Alexandre Gazet and Elias Bachaalany (ISBN: 1118787315). The book is my first contact with reverse engineering, so take my statements with a grain of salt. All code snippets are on GitHub. For an overview of my solutions consult this progress page.
For the code in each exercise, do the following in order (whenever possible):
- Determine whether it is in Thumb or ARM state.
- Explain each instruction’s semantic. If the instruction is
LDR/STR
, explain the addressing mode as well.- Identify the types (width and signedness) for every possible object. For structures, recover field size, type, and friendly name whenever possible. Not all structure fields will be recoverable because the function may only access a few fields. For each type recovered, explain to yourself (or someone else) how you inferred it.
- Recover the function prototype.
- Identify the function prologue and epilogue.
- Explain what the function does and then write pseudo-code for it.
- Decompile the function back to C and give it a meaningful name.
Mystery 5
Figure 2-11 is simple as well. The actual string names have been removed
so you cannot cheat by searching the Internet.
This is the function in Figure 2-11:
mystery5 03 46 MOV R3, R0 06 2B CMP R3, #6 0D D0 BEQ loc_1032596 07 2B CMP R3, #7 09 D0 BEQ loc_1032592 08 2B CMP R3, #8 05 D0 BEQ loc_103258E 09 2B CMP R3, #9 01 D0 BEQ loc_103258A 09 48 LDR R0, =aA ; "A" 70 47 BX LR loc_103258A 07 48 LDR R0, =aB ; "B" 70 47 BX LR loc_103258E 05 48 LDR R0, =ac ; "C" 70 47 BX LR loc_1032592 03 48 LDR R0, =aD ; "D" 70 47 BX LR loc_1032596 01 48 LDR R0, =aE ; "E" 70 47 BX LR ; End of function mystery5
ARM or Thumb
The code is in Thumb state, all instructions are 16bit.
Instruction Semantic
Nothing fancy. Instructions like LDR R0, =aE
are pseudo-instructions which set R0
to point to strings (see page 51 of the book).
Types
The function takes one argument arg1
which is compared to the literals 6 to 9. The type could be an unsigned char or any other integer type. The return value is a string.
Function Prototype
The function prototype is:
CHAR* get_message_for_code(UNSIGNED CHAR*);
Again, the type of the function parameter could be any other integer type.
Prologue and Epilogue
There is no prologue or epilogue. The function does not overwrite any registers except R3
and it returns with BX LR
which switches back to ARM state.
Purpose and Pseudo-code
The function implements a translation of numbers to strings. It could for instance translate error codes to error messages.
FUNCTION(integer number) CASE number OF 6 : RETURN "E" 7 : RETURN "D" 8 : RETURN "C" 9 : RETURN "B" OTHERS: return "A" ENDCASE END
C-Code
char* number_to_string(unsigned char number) { switch (number) { case 6: return "E"; case 7: return "D"; case 8: return "C"; case 9: return "B"; default: return "A"; } }
Mystery 6
Figure 2-12 involves some twiddling.
Here is the code from Figure 2-12:
mystery6 2D E9 18 48 PUSH.W {R3,R4,R11,LR} 0D F2 08 0B ADDW R11, SP, #8 04 68 LDR R4, [R0] 00 22 MOVS R2, #0 00 2C CMP R4, #0 06 DD BLE loc_103B3B6 loc_103B3A8 50 F8 04 3F LDR.W R3, [R0,#4]! 8B 42 CMP R3, R1 06 D0 BEQ loc_103B3BE 01 32 ADDS R2, #1 A2 42 CMP R2, R4 F8 DB BLT loc_103B3A8 loc_103B3B6 00 20 MOVS R0, #0 00 21 MOVS R1, #0 locret_103B3BA BD E8 18 88 POP.W {R3,R4,R11,PC} loc_103B3BE B2 F1 20 03 SUBS.W R3, R2, #0X20 01 21 MOVS R1, #1 99 40 LSLS R1, R3 01 23 MOVS R3, #1 13 FA 02 F0 LSLS.W R0, R3, R2 F5 E7 B locret_103B3BA ; End of function mystery6
ARM or Thumb
The snippet is in Thumb state:
- Many instructions are 16bit.
- 32bit instructions have the
W
or.W
suffix. - The function prologue and epilogue uses
PUSH.W
andPOP.W
.
Instruction Semantic
Nothing fancy. The instructions in lines 21 to 25 implement a 64 bit 232, the upper 32 bits of the result are stored in R1, to lower 32 bits in R0.
Types
The function takes two parameters arg1 = R0
and arg2 = R1
.
The first parameter has a 32bit value at offset 0 which is loaded in line 4. Line 13 compares R2 (a counter, see below) to this value and loops if the counter is smaller. Therefore, at offset 0 of argument 1 we probably have a size. Starting with line 9, the code iterates over 32bit values starting at offset 4 of argument 1. Here we probably have an array of 32bit values. Argument 1 is a struct:
struct_arg1 +0x000: size ; int (size of following array) +0x004: array ; array of 32bit values (same type as arg2)
Argument 2 is compared to to elements of the array in
arg1
. Therefore,arg2
has the same 32bit type as the elements inarg1->array
As mentioned above, lines 22 to 25 implement a 64bit version of 2R2, the result of which is stored in
R0
andR1
. The return value of the function is therefore a 64bit number, where R0 holds the lower and R1 the upper 32 bits respectively. The return value is always a power of 2 (or zero), and probably a bitmaskThe loop uses
R2
as the index into the array
Function Prototype
The function prototype is:
INT64 mistery6(struct_arg1*, int32);
Prologue and Epilogue
The function saves and restores registers R3
, R4
and R11
. Apart from those registers, only R0
and R1
are modified, they hold the return value. The function preserves all registers R2
and higher.
The prologue and epilogue use the same PUSH
and POP
instructions to save the return address and return to the caller.
Purpose and Pseudo-code
The function searches for a 32bit value passed as the second function parameter inside an array contained in a struct passed as the first function parameter. If no match is found, the function returns 0. Otherwise it returns a 64bit bitmask, where the position of the one represents the first match.
For example, let’s say for the struct s
of type struct_arg1
we have s.size = 9
and s.array = [8,1,6,7,5,7,9,3,2]
. Then mystery6(&s, 7)
would return 23 or 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00001000
, and mystery6(&s, 13)
would return which indicates that no match was found.
Obviously, if the array exceeds 64 elements then the return value cannot represent matches beyond the 64th element.
FUNCTION bitmask_of_match(s, m) SIZE = s->size ARRAY = s->array FOR index = 0 to SIZE-1 IF ARRAY[index] == m THEN RETURN 2^index ENDIF ENDFOR RETURN 0 END
C-Code
struct struct_arg1 { int size; int array[]; }; __int64 bitmask_of_match(struct struct_arg1 *s, int m) { int size = s->size; int index; for (index = 0; index < size; index++) { if (s->array[index] == m) return __int64( 1 << index ); } return 0; }