cover image for post 'Practical Reverse Engineering Solutions – Page 78 (Part II)'

Practical Reverse Engineering Solutions – Page 78 (Part II)

my go at mystery5 and mystery6 on pages 78ff

This blog post presents my solution to exercises 5 and 6 on page 78ff from the book Practical Reverse Engineering by Bruce Dang, Alexandre Gazet and Elias Bachaalany (ISBN: 1118787315). The book is my first contact with reverse engineering, so take my statements with a grain of salt. All code snippets are on GitHub. For an overview of my solutions consult this progress page.

For the code in each exercise, do the following in order (whenever possible):

  • Determine whether it is in Thumb or ARM state.
  • Explain each instruction’s semantic. If the instruction is LDR/STR, explain the addressing mode as well.
  • Identify the types (width and signedness) for every possible object. For structures, recover field size, type, and friendly name whenever possible. Not all structure fields will be recoverable because the function may only access a few fields. For each type recovered, explain to yourself (or someone else) how you inferred it.
  • Recover the function prototype.
  • Identify the function prologue and epilogue.
  • Explain what the function does and then write pseudo-code for it.
  • Decompile the function back to C and give it a meaningful name.

Mystery 5

Figure 2-11 is simple as well. The actual string names have been removed
so you cannot cheat by searching the Internet.

This is the function in Figure 2-11:

mystery5
03 46          MOV R3, R0
06 2B          CMP R3, #6
0D D0          BEQ loc_1032596
07 2B          CMP R3, #7
09 D0          BEQ loc_1032592
08 2B          CMP R3, #8
05 D0          BEQ loc_103258E
09 2B          CMP R3, #9
01 D0          BEQ loc_103258A
09 48          LDR R0, =aA ; "A"
70 47          BX LR
loc_103258A
07 48          LDR R0, =aB ; "B"
70 47          BX LR
loc_103258E
05 48          LDR R0, =ac ; "C"
70 47          BX LR
loc_1032592
03 48          LDR R0, =aD ; "D"
70 47          BX LR
loc_1032596
01 48          LDR R0, =aE ; "E"
70 47          BX LR
; End of function mystery5

ARM or Thumb

The code is in Thumb state, all instructions are 16bit.

Instruction Semantic

Nothing fancy. Instructions like LDR R0, =aE are pseudo-instructions which set R0 to point to strings (see page 51 of the book).

Types

The function takes one argument arg1 which is compared to the literals 6 to 9. The type could be an unsigned char or any other integer type. The return value is a string.

Function Prototype

The function prototype is:

CHAR* get_message_for_code(UNSIGNED CHAR*);

Again, the type of the function parameter could be any other integer type.

Prologue and Epilogue

There is no prologue or epilogue. The function does not overwrite any registers except R3 and it returns with BX LR which switches back to ARM state.

Purpose and Pseudo-code

The function implements a translation of numbers to strings. It could for instance translate error codes to error messages.

FUNCTION(integer number)
    CASE number OF
        6   : RETURN "E"
        7   : RETURN "D"
        8   : RETURN "C"
        9   : RETURN "B"
        OTHERS:
            return "A"
    ENDCASE
END

C-Code

char* number_to_string(unsigned char number)
{
	switch (number) {
	case 6: return "E";
	case 7: return "D";
	case 8: return "C";
	case 9: return "B";
	default: return "A";
	}
}

Mystery 6

Figure 2-12 involves some twiddling.

Here is the code from Figure 2-12:

mystery6
2D E9 18 48    PUSH.W {R3,R4,R11,LR}
0D F2 08 0B    ADDW R11, SP, #8
04 68          LDR R4, [R0]
00 22          MOVS R2, #0
00 2C          CMP R4, #0
06 DD          BLE loc_103B3B6
loc_103B3A8
50 F8 04 3F    LDR.W R3, [R0,#4]!
8B 42          CMP R3, R1
06 D0          BEQ loc_103B3BE
01 32          ADDS R2, #1
A2 42          CMP R2, R4
F8 DB          BLT loc_103B3A8
loc_103B3B6
00 20          MOVS R0, #0
00 21          MOVS R1, #0
locret_103B3BA
BD E8 18 88    POP.W {R3,R4,R11,PC}
loc_103B3BE
B2 F1 20 03    SUBS.W R3, R2, #0X20
01 21          MOVS R1, #1
99 40          LSLS R1, R3
01 23          MOVS R3, #1
13 FA 02 F0    LSLS.W R0, R3, R2
F5 E7          B locret_103B3BA
; End of function mystery6

ARM or Thumb

The snippet is in Thumb state:

  • Many instructions are 16bit.
  • 32bit instructions have the W or .W suffix.
  • The function prologue and epilogue uses PUSH.W and POP.W.

Instruction Semantic

Nothing fancy. The instructions in lines 21 to 25 implement a 64 bit 232, the upper 32 bits of the result are stored in R1, to lower 32 bits in R0.

Types

The function takes two parameters arg1 = R0 and arg2 = R1.

  • The first parameter has a 32bit value at offset 0 which is loaded in line 4. Line 13 compares R2 (a counter, see below) to this value and loops if the counter is smaller. Therefore, at offset 0 of argument 1 we probably have a size. Starting with line 9, the code iterates over 32bit values starting at offset 4 of argument 1. Here we probably have an array of 32bit values. Argument 1 is a struct:

    struct_arg1
    +0x000: size  ; int (size of following array)
    +0x004: array ; array of 32bit values (same type as arg2)

  • Argument 2 is compared to to elements of the array in arg1. Therefore, arg2 has the same 32bit type as the elements in arg1->array

  • As mentioned above, lines 22 to 25 implement a 64bit version of 2R2, the result of which is stored in R0 and R1. The return value of the function is therefore a 64bit number, where R0 holds the lower and R1 the upper 32 bits respectively. The return value is always a power of 2 (or zero), and probably a bitmask

  • The loop uses R2 as the index into the array

Function Prototype

The function prototype is:

INT64 mistery6(struct_arg1*, int32);

Prologue and Epilogue

The function saves and restores registers R3, R4 and R11. Apart from those registers, only R0 and R1 are modified, they hold the return value. The function preserves all registers R2 and higher.

The prologue and epilogue use the same PUSH and POP instructions to save the return address and return to the caller.

Purpose and Pseudo-code

The function searches for a 32bit value passed as the second function parameter inside an array contained in a struct passed as the first function parameter. If no match is found, the function returns 0. Otherwise it returns a 64bit bitmask, where the position of the one represents the first match.

For example, let’s say for the struct s of type struct_arg1 we have s.size = 9 and s.array = [8,1,6,7,5,7,9,3,2]. Then mystery6(&s, 7) would return 23 or 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00001000, and mystery6(&s, 13) would return which indicates that no match was found.

Obviously, if the array exceeds 64 elements then the return value cannot represent matches beyond the 64th element.

FUNCTION bitmask_of_match(s, m)
    SIZE = s->size
    ARRAY = s->array

    FOR index = 0 to SIZE-1
        IF ARRAY[index] == m THEN
            RETURN 2^index
        ENDIF
    ENDFOR

    RETURN 0
END

C-Code

struct struct_arg1 {
	int size;
	int array[];
};


__int64 bitmask_of_match(struct struct_arg1 *s, int m)
{
	int size = s->size;	
	int index;
	for (index = 0; index < size; index++) {
		if (s->array[index] == m)
			return __int64( 1 << index );
	}
	return 0;
}