Practical Reverse Engineering Solutions – Page 78 (Part I)

my go at mystery 1 to mystery 4 on pages 78ff

This blog post presents my solution to the first four exercises on page 78ff from the book Practical Reverse Engineering by Bruce Dang, Alexandre Gazet and Elias Bachaalany (ISBN: 1118787315). The book is my first contact with reverse engineering, so take my statements with a grain of salt. All code snippets are on GitHub. For an overview of my solutions consult this progress page

For the code in each exercise, do the following in order (whenever possible):
Determine whether it is in Thumb or ARM state.
Explain each instruction’s semantic. If the instruction is LDR/STR, explain the addressing mode as well.
Identify the types (width and signedness) for every possible object. For structures, recover field size, type, and friendly name whenever possible. Not all structure fields will be recoverable because the function may only access a few fields. For each type recovered, explain to yourself (or someone else) how you inferred it.
Recover the function prototype.
Identify the function prologue and epilogue.
Explain what the function does and then write pseudo-code for it.
Decompile the function back to C and give it a meaningful name.

Mystery 1

Figure 2-8 shows a function that takes two arguments. It may seem somewhat challenging at first, but its functionality is very common. Have patience.

This is the function in Figure 2-8:

01: mystery1
02: F0 01 2D E9    STMFD SP!, {R4-R8}
03: 00 30 D0 E5    LDRB R3, [R0]
04: 2D 00 53 E3    CMP R3, #0x2D
05: 29 00 00 0A    BEQ loc_B34806
06: 2B 00 53 E3    CMP R3, #0x2B
07: 00 60 A0 E3    MOV R6, #0
08: 01 30 F0 05    LDREQB R3, [R0,#1]!
09: loc_B2AC
10: 30 00 53 E3    CMP R3, #0x30
11: 04 00 00 1A    BNE loc_B2C8
12: 01 30 80 E2    ADD R3, R0, #1
13: loc_B2B8
14: 03 00 A0 E1    MOV R0, R3
15: 01 20 D3 E4    LDRB R2, [R3],#1
16: 30 00 52 E3    CMP R2, #0x30
17: FB FF FF 0A    BEQ loc_B2B8
18: loc_B2C8
19: 00 C0 A0 E3    MOV R12, #0
20: 00 40 A0 E3    MOV R4, #0
21: 00 50 A0 E3    MOV R5, #0
22: 0A 80 A0 E3    MOV R8, #0xA
23: 01 00 00 EA    B loc_B2E4
24: loc_B2DC
25: 07 40 92 E0    ADDS R4, R2, R7
26: C7 5F A3 E0    ADC R5, R3, R7,ASR#31
27: loc_B2E4
28: 0C 70 D0 E7    LDRB R7, [R0,R12]
29: 01 c0 8C E2    ADD R12, R12, #1
30: 94 28 83 E0    UMULL R2, R3, R4, R8
31: 30 70 57 E2    SUBS R7, R7, #0x30
32: 07 00 00 4A    BMI loc_B318
33: 09 00 57 E3    CMP R7, #9
34: 98 35 23 E0    MLA R3, R8, R5, R3
35: 04 00 00 CA    BGT loc_B318
36: 0B 00 5C E3    CMP R12, #0xB
37: F3 FF FF 1A    BNE loc_B2DC
38: loc_B30C
39: 00 00 A0 E3    MOV R0, #0
40: loc_B310
41: F0 01 BD E8    LDMFD SP!, {R4-R8}
42: 1E FF 2F E1    BX LR
43: loc_B318
44: 06 20 54 E0    SUBS R2, R4, R6
45: C6 3F C5 E0    SBC R3, R5, R6,ASR#31
46: 02 01 52 E3    CMP R2, #0x80000000
47: 00 00 D3 E2    SBCS R0, R3, #0
48: F7 FF FF AA    BGE loc_B30c
49: 00 00 56 E3    CMP R6, #0
50: 01 00 00 0A    BEQ loc_B33C 
51: 00 40 74 E2    RSBS R4, R4, #0
52: 00 50 E5 E2    RSC R5, R5, #0
53: loc_B33C
54: 00 40 81 E5    STR R4, [R1]
55: 01 00 A0 E3    MOV R0, #1
56: F1 FF FF EA    B loc_B310
57: loc_B348
58: 01 30 F0 E5    LDRB R3, [R0,#1]!
59: 01 60 A0 E3    MOV R6, #1
60: D5 FF FF EA    B loc_B2Ac
61: ; End of function mystery1

ARM or Thumb

The code is in ARM state for the following reasons:

The code uses only 32bit instruction, none of the has the .W suffix required by many Thumb 32bit instructions. For example, LDRB instead of LDRB.W
The code uses conditional instructions like LDREQ in line 8. Condition codes are not allowed in ARM state.
The RSC instruction in line 52 is not available in Thumb state (only the RSB instruction is).

Instruction Semantic

Only the more interesting instructions are discussed:

Line 2, STMFD SP!, {R4-R8}: Store multiple, decrement before. The stack picture changes as follows:
SP -> ▭ ▭
R8 R7 R6 R5 SP -> R4
Line 3, LDRB R3, [R0]: Load register byte. R3 becomes ZeroExtend( Memu(R0,1), 32), i.e., the byte at memory location R0.
Line 8, LDREQB R3, [R0,#1]!: Load register byte pre-indexed, if zero byte is set. If R3 == 43 according to line 6, then R3 becomes ZeroExtend( Memu(R0+1,1), 32), and R0 = R0+1.
Line 15, LDRB R2, [R3],#1: Load byte post-indexed. R2 becomes ZeroExtend( Memu(R3,1), 32), and R3 = R3+1.
Line 26, ADC R5, R3, R7,ASR#31: Add with carry. R5 = R3 + R7>>31 + APSR.C
Line 28, LDRB R7, [R0,R12]: Load byte. R7 becomes ZeroExtend( Memu(R0+R12, 1), 32).
Line 30, UMULL R2, R3, R4, R8: 64bit multiply (R3,R2) = R4*R8, i.e., R2 = (R4*R8)<31:0> and R3 = (R4*R8)<63:32>.
Line 34, MLA R3, R8, R5, R4: multiply and add. R3 = ((R8*R5) + R3)<31:0>.
Line 41, LDMFD SP!, {R4-R8}: Load multiple, increment after. Does the reverse of line 2.
Line 45, SBC R3, R5, R6,ASR#31: Subtract with carry. Since R6 is 0 or 1 at this point, R6, ASRR#31 is zero, therefore the instruction boils down to R3 = -R5.
Line 51, RSBS R4, R4, #0: Reverse subtract with S modifier. The instruction negates the register R4 = -R4.
Line 52, RSC R5, R5, #0: Reverse subtract with carry. The instruction negates the register, including the carry flag: R5 = -(R5 + ASPR.C)
Line 54, STR R4, [R1]: Store R4 at memory location R1, i.e., MemU(R1, 4) = R4
Line 58, LDRB R3, [R0,#1]!: Same as line 8, except unconditional.

The instructions 25 and 26 implement 64 addition, the instructions 30 and 34 implement 64 multiplication.

Types

The blocks in lines 13 to 17 as well as 27-37 implement loops that iterates over bytes in R0. Therefore arg1 is probably an array. Since the increment is one byte, and data is retrieved with LDREQB (which loads bytes). The elements of the array are probably of type char. In line 4 and line 6 the first entry of the array arg1 is compared to 43 and 45 respectively. Those are the Ascii codes for - and +. In lines 10 and 16 the code compares array entries with 48, the Ascii code for "0". Lines 31/32 and 33/35 check if array entries are in range "0" to "9". From this we conclude that arg1 is probably a string that contains a representation of a number. The number has the following formats:

An optional leading sign ‘-‘ or ‘+’.
Optionally any number of leading zeros.
Numbers 0-9

The Regex for valid strings is [-+]?0*[0-9]+.

The result of the conversion is stored in arg2, which therefore is a pointer to an integer. The return value of the function is a boolean. FALSE means the conversion of the string to an integer failed, TRUE means the conversion succeeded.

The pairs R3,R2 and R5, R4 for the most part contain 64bit numbers, where R3 and R5 hold the higher 32 bits, and R2 and R4 hold to lower 32 bits.

Function Prototype

The function prototype is:

BOOL string2integer(char*, int*);

Prologue and Epilogue

Line 2 preserves registers R4 to R8, line 41 restores the registers. Apart from the function parameters R0 to R4, only register R12 is changed by the function.

Purpose and Pseudo-code

The function converts a string to a signed integer. The string can start with +, - or nothing, and can contain leading zeros. Only integer numbers are allowed. The string needs to be terminated by a non-number, e.g., the null-byte. The string needs to be in range -2³¹ to 2³¹, i.e., represent a 32bit signed integer.

Examples:

'123\0' will be converted to 123
'+00437Euro' will be converted to 437
'-00001 ' will be converted to -1
'643' gives an unpredictable result.
'-6.37' will be converted to -6
'1E7' will be converted to 1
'3000000000' will return FALSE, because the number exceeds 2³¹

The following pseudo-code describes the algorithm:

bool string2integer(char *str, int *result) {
    index = 0
    res = 0
    sign = 1

    # parse the (optional) sign
    if str[index] == '+' then
        index = index + 1
    elsif str[index] == '-' then
        index = index + 1
        sign = -1

    # skip any leading zeros
    while str[index] == '0' do
        index = index + 1

    # parse the number
    while '0' <= str[index] <= '9' do
        res = res*10 + (str[index] - '0')

    if abs(res) >= 2^31 then
        return FALSE
    else
        *result = res
        return TRUE
}

C-Code

#include <stdlib.h>

bool string2integer(char *str, int *result) {
	int index = 0;
	long res = 0;
	char sign = 1;

	// parse the(optional) sign
	if (str[index] == '+') {
		index++;
	}
	else if (str[index] == '-') {
		index++;
		sign = -1;
	}

	// skip any leading zeros
	while (str[index] == '0')
		index++;

	// parse the number
	while ('0' <= str[index] <= '9')
		res = res * 10 + (str[index] - '0');

	if (abs(res) >= 2 ^ 31)
		return false;
	else {
		*result = res;
		return true;
	}
}

Mystery 2

Figure 2-9 shows a function that was found in the export table.

This is the function in Figure 2-9:

01: mystery2
02: 28 B1          CBZ R0, loc_C672
03: 90 F8 63 00    LDRB.W R0, [R0,#0x63]
04: 00 38          SUBS R0, #0
05: 18 BF          IT NE
06: 01 20          MOVNE R0, #1
07: 70 47          BX LR
08: loc_C672
09: 01 20          MOVS R0, #1
10: 70 47          BX LR
11: ; End of function mystery2

ARM or Thumb

The code is in Thumb state for the following reasons:

The instructions CBZ and IT NE are only available in Thumb
Line 3 shows an instruction with .W suffix
All instructions except line 3 are 16bit

Instruction Semantic

Only the more interesting instructions are discussed:

Line 2, B1 CBZ R0, loc_C672: means if R0 is zero, goto loc_C672, else continue with line 3.
Line 3, LDRB.W R0, [R0,#0x63], does char R0 = MemU(R0 + 63h, 1)
Line 4, 5, 6 implement the check: if R0 != 0 then return 1 else return 0

Types

The function takes one argument arg1 = R0. Line 3 accesses a byte at offset 63h, the parameter is probably a struct:

struct1
...
    +0x063 field63_c  ; char

Function Prototype

The function prototype is:

BOOL field63c_isnotzero(struct1*);

Prologue and Epilogue

There is no prologue or epilogue.

Purpose and Pseudo-code

The function checks, if the member of type byte at offset 63h of a struct is zero. If it is zero, the function returns 0, if the member is non-zero, or if the struct is a null pointer, then 1 is returned:

BOOL field63c_isnotzero(struct1 *s)
{
    if s == NULL then
        return TRUE
    elsif s->field63c == 0 then
        return FALSE
    else
        return TRUE
}

C-Code

#include <stdlib.h>

bool field63c_isnotzero(struct1 *s)
{
	if (s && s->field63c)
		return true;
	else
		return false;
}

Mystery 3

Here is a simple function: [see below]
This is the function:
01: mystery3

02: 83 68 LDR R3, [R0,#8] 03: 0B 60 STR R3, [R1] 04: C3 68 LDR R3, [R0,#0xC] 05: 00 20 MOVS R0, #0 06: 4B 60 STR R3, [R1,#4] 07: 70 47 BX LR 08: ; End of function mystery3

ARM or Thumb

The code is clearly in Thumb state because all instructions are 16 bit.

Instruction Semantic

Only the more interesting instructions are discussed:

Line 2, LDR R3, [R0,#8]: sets ``R3 = Memu(R0+8, 4)
Line 3, STR R3, [R1], sets R1->field0_i = R3
Line 4, LDR R3, [R0,#0xC], sets R3 = Memu(R0+12, 4)
Line 5, STR R3, [R1, #4], sets R1->field4_i = R3

Types

The function takes two arguments. Both might be structs, the first parameter:

struct_arg1
..
x0x008 field08_i ; int
x0x00C field0c_i ; int</pre>

the second parameter

struct_arg2
x0x000 field00_i ; int, same type as struct_arg1.field08_i
x0x004 field04_i ; int, same type as struct_arg1.field0c_i

Function Prototype

The function prototype is:

BOOL copy_members(struct_arg1*, struct_arg2*);

Prologue and Epilogue

There is no prologue or epilogue. The function only touches function parameters ``R1`` to ``R3`` anyway.

Purpose and Pseudo-code

The function copies two members of a struct passed as the first parameter to a struct passed as the second parameter.

BOOL copymembers(struct_arg1 *s1, struct_arg2 *s2)
{
s1->field00_i = s2->field08_i
s1->field04_i = s2->field0c_i
return FALSE
}

C-Code

#include &lt;stdlib.h>

bool copymembers(struct_arg1 *s1, struct_arg2 *s2)
{
    s1->field00_i = s2->field08_i;
    s1->field04_i = s2->field0c_i;
    return false;
}

Mystery 4

Figure 2-10 shows another easy function.

This is the function:

01: mystery4
02: 08 B9          CBNZ R0, loc_100C3DA
03: 00 20          MOVS R0, #0
04: 70 47          BX LR
05: loc_100C3DA
06: 50 F8 08 0C    LDR.W R0, [R0,#–8]
07: 70 47          BX LR
08: ; End of function mystery4

ARM or Thumb

The code is clearly in Thumb state for the following reasons:

The instruction ``CBNZ`` only exists in Thumb state
Most instructions are 16 bit
The only 32bit instruction has the ``.W`` suffix

Instruction Semantic

Only the more interesting instructions are discussed:

Line 2, ``CBNZ R0, loc_100C3DA``: does ``if arg1 == 0 then goto loc_100C3DA else continue with line 3``
Line 3, ``MOVS R0, #0``, sets ``R0 = 0``
Line 4, and 7 ``BX LR``, returns
Line 6, ``LDR.W R0, [R0,#-8]``, sets ``int return_value = *(arg1-8)``

Types

The function take one argument, which is a pointer. At offset -8 to this pointer we find an 32 bit value. The function returns this value as R0, therefore

Function Prototype

The function prototype is:

INT32 retrieve_value_at_minus8(int*);

Prologue and Epilogue

There is no prologue or epilogue. The function only changes the function parameter in R0, it returns with BX LR (changing the state back to ARM).

Purpose and Pseudo-code

The function returns the 32 bit value at memory location arg1 -8. If arg1 is null, the function returns null. INT32 retrieve_value_at_minus8(int* something) { if not something then return NULL else return *(something - 8) }

C-Code

#include <stdlib.h>

int retrieve_value_at_minus8(int* something)
{
	if (!something)
		return 0;
	else
		return *(something - 8);
}

Archived Comments

Note: I removed the Disqus integration in an effort to cut down on bloat. The following comments were retrieved with the export functionality of Disqus. If you have comments, please reach out to me by Twitter or email.

sebas sujeen Jun 27, 2015 08:52:49 UTC

Hi, in mystery1, in line 44, there is a subs r2, r4, r6. r6(is_neg) is set to 1, if there is a minus sign in the start of the string. Following that is the sbc r3, r5, r6, #asr 31. Both instructions combine to do a 64-bit subtraction (if r2, r3 represent a temp var and r4, r5 represent the acc, then the resulting pseudocode is temp = acc - is_neg).
And then we have a cmp r2, #0x80000000/ sbcs r0, r3, #0 / bge loc. I am not sure about the reason we have sbcs r0, r3, #0. Do you have any idea why?

sebas sujeen Jun 27, 2015 08:59:52 UTC

I think I got it. let's say the input string is "-0". Then the acc will be 0 and the is_neg = 1. temp = acc - is_neg will result in temp be 0xFFFFFFFF (low 32 bit) and 0xFFFFFFFF (high 32 bit).

Now, we cmp r2, #0x80000000 (since r2 is 0xFFFFFFFF), the carry flag will be set. Also the overflow flag and negative flag are clear. Now if there is no sbcs r0, r3, #0, then the bge will succeed because the condition for bge to succeed is overflow flag == negative flag. but because of the sbcs r0, r3, #0, the negative flag is set therefore the bge would fail and we get the abs(-0) which is 0 and write the value to the pointer which is pointed to by r1.

sebas sujeen Jun 28, 2015 01:57:04 UTC

The pseudo code for mystery 2 should be
if (s == NULL || s->field63c)
return 1;
else
return 0;