Practical Reverse Engineering Solutions – Page 78 (Part III)

my go at mystery7, mystery8 and mystery 9 on pages 78ff

This blog post presents my solution to exercises 7 to 9 on page 78ff from the book Practical Reverse Engineering by Bruce Dang, Alexandre Gazet and Elias Bachaalany (ISBN: 1118787315). The book is my first contact with reverse engineering, so take my statements with a grain of salt. All code snippets are on GitHub. For an overview of my solutions consult this progress page.

For the code in each exercise, do the following in order (whenever possible):
Determine whether it is in Thumb or ARM state.
Explain each instruction’s semantic. If the instruction is LDR/STR, explain the addressing mode as well.
Identify the types (width and signedness) for every possible object. For structures, recover field size, type, and friendly name whenever possible. Not all structure fields will be recoverable because the function may only access a few fields. For each type recovered, explain to yourself (or someone else) how you inferred it.
Recover the function prototype.
Identify the function prologue and epilogue.
Explain what the function does and then write pseudo-code for it.
Decompile the function back to C and give it a meaningful name.

Mystery 7

Figure 2-13 illustrates a common routine, but you may not have seen it
implemented this way.

This is the code from Figure 2-13 (note that the listing in the book has a typo in line 13, 00 2B CMB R3, #0 should of course read 00 2B CMP R3, #0):

mystery7
02 46            MOV R2, R0
08 B9            CBNZ R0, loc_100E1D8
00 20            MOVS R0, #0
70 47            BX LR
loc_100E1D8
90 F9 00 30      LDRSB.W R3, [R0]
02 E0            B loc_100E1E4
loc_100E1DE
01 32            ADDS R2, #1
92 F9 00 30      LDRSB.W R3, [R2]
loc_100E1E4
00 2B            CMP R3, #0
FA D1            BNE loc_100E1DE
10 1A            SUBS R0, R2, R0
6F F3 9F 70      BFC.W R0, #0x1E, #2
70 47            BX LR
; End of function mystery7

ARM or Thumb

The code is in Thumb state: The snippet uses 16bit and 32bit instructions and some instructions have the .W suffix.

Instruction Semantic

The only non common instruction is BFC.W R0, #0x1E, #2 which does a bit field clear. The instruction sets the two most significant bits to zero, so 0xFFFFFFFF would become 0x3FFFFFFF.

Types

The function only takes one argument arg1 = R0. The loop in lines 9 to 14 iterate over bytes of arg1 (LDRSB.W in line 11 accesses a single byte, and ADDS R2, #1 in line 10 increments the array index by one byte). Furthermore, the loop ends if in line 13 an array element is . This indicates that arg1 is a pointer to a null terminated string. The function returns an unsigned int.

Function Prototype

The function prototype is:

UNSIGNED INT mystery7(CHAR*);

Prologue and Epilogue

There is no prologue or epilogue. The function does not overwrite any registers except R0, R2 and R3. It returns with BX LR which switches back to ARM state.

Purpose and Pseudo-code

The function searches for the null byte in string arg1. Line 15 SUBS R0, R2, R0 computes the difference between the address of the null byte and the address of the start of the string. This corresponds to the length of the string. The function implements strlen. I don’t understand the purpose of setting the two most significant bits of the difference to zero. Those bits shouldn’t be set in the first place for any reasonable strings.

FUNCTION mystery7(str)
    len = 0
    WHILE str[len] != 0 DO
        len = len + 1
    ENDWHILE
    RETURN len
END

C-Code

unsigned int strlen(const char *s) {
    unsigned int len = 0;
    while( s[len] != '\0' )
        len++;
    return len;    
}

Mystery 8

In Figure 2-14, byteArray is a 256-character array whose content is byteArray[] = {0, 1, …, 0xff}.

This is the code from Figure 2-14:

mystery8
2D E9 78 48      PUSH.W {R3–R6,R11,LR}
0D F2 10 0B      ADDW R11, SP, #0x10
0C 4E            LDR R6, =byteArray
09 E0            B loc_100E34C
loc_100E338
05 78            LDRB R5, [R0]
01 3A            SUBS R2, #1
4D B1            CBZ R5, loc_100E352
0B 78            LDRB R3, [R1]
9C 5D            LDRB R4, [R3,R6]
AB 5D            LDRB R3, [R5,R6]
A3 42            CMP R3, R4
04 D1            BNE loc_100E352
01 30            ADDS R0, #1
01 31            ADDS R1, #1
loc_100E34C
00 2A            CMP R2, #0
F3 DC            BGT loc_100E338
01 3A            SUBS R2, #1
loc_100E352
00 2A            CMP R2, #0
01 DA            BGE loc_100E35A
00 20            MOVS R0, #0
04 E0            B locret_100E364
loc_100E35A
0B 78            LDRB R3, [R1]
9A 5D            LDRB R2, [R3,R6]
03 78            LDRB R3, [R0]
9B 5D            LDRB R3, [R3,R6]
98 1A            SUBS R0, R3, R2
locret_100E364
BD E8 78 88      POP.W {R3–R6,R11,PC}
; End of function mystery8

ARM or Thumb

The code is in Thumb state:

The snippet uses 16bit and 32bit instructions.
Some instructions have the .W suffix.
The CBZ instruction is only available in Thumb state.
The code uses PUSH.W and POP.W as function prologue and epilogue

Instruction Semantic

Nothing special, except maybe LDR R6, =byteArray which is a pseudo-instruction that sets R6 to point to the array {0,1,2,…,255}

Types

The function takes three arguments. arg1 = R0 and arg2 = R1 are both used in an array fashion: Lines 6 to 19 iterate over bytes of those to arrays. The code also compares elements of arg1 to arg2, so the two parameters are probably of the same type. In line 9 there’s a check if the elements in arg1 are , so arg1 and arg2 are null terminated strings.

The third parameter arg3 acts as a limit counter (see lines 8 and 18/19). So the type is probably an unsigned int (or any other unsigned integer type).

Function Prototype

The function prototype is:

UNSIGNED INT mystery8(CHAR*, CHAR*, UNSIGNED INT);

Prologue and Epilogue

Lines 2 and 33 preserve registers R3-R6 and R11. Apart from the three function arguments R0 to R2 the functions doesn’t write any other registers. The PUSH and POP also save and jump to the return address respectively.

Purpose and Pseudo-code

If found it easiest to start with the loop in line 6 to line 19. Here’s an almost one to one translation to pseudo-code:

FUNCTION mystery8(string1, string2, limit)
    byteArray = {0, 1, 2, ..., 255}
    index = 0 
    DO
        IF limit <= 0 THEN
            // break to line 20

        R5 = string1[index]
        limit = limit -1
        IF R5 == 0 THEN
            // break to line 21

        R4 = byteArray[string2[index]]
        R3 = byteArray[string1[index]]
        index = index + 1
    WHILE R3 != R4
    index = index - 1

    // line 21

Line 20 just decrements limit, we can eliminate the instruction by moving limit = limit-1 up and changing to a strict check limit < 0.

Starting with line 21 the snippet checks if the limit is not yet zero (meaning the code did take the second BREAK). If this is the case, the code returns a difference based on first array elements where there was a difference.

If the limit is zero, then the first BREAK must have been taken and the code returns .

FUNCTION mystery8(string1, string2, limit)
    byteArray = {0, 1, 2, ..., 255}
    index = 0 
    DO
        limit = limit -1
        IF limit < 0 THEN
            BREAK

        R5 = string1[index]
        IF R5 == 0 THEN
            BREAK 

        R4 = byteArray[string2[index]]
        R3 = byteArray[string1[index]]
        index = index + 1
    WHILE R3 != R4

    // line 21
    index = index - 1
    IF limit >= 0 THEN
        R2 = byteArray[string2[index]]
        R3 = byteArray[string1[index]]
        RETURN R3 - R2
    ELSE
        RETURN 0

The byteArray is {0,1,…,255}, so for all bytes n we have byteArray[n] = n. We can further simplify the code by moving the last IF/ELSE construct inside the loop:

FUNCTION mystery8(string1, string2, limit)
    FOR index = 0 to limit - 1 DO 
        IF string1[index] == 0 OR string1[index] != string2[index] THEN
            RETURN string1[index] - string2[index]
        ENDIF
    ENDFOR
    RETURN 0
END

From this it should be obvious that the snippet implements strncmp, which compares two strings up to limit characters. If the strings are equal up to limit, the snippet returns 0. Otherwise it returns a negative number if the second string comes lexicographically after the first, and a positive number vice-versa.

Example:

string 1	string 2	limit	result
“string a”	“string b”	1000	-1
“string a”	“string b”	5
“string a”	“String b”	1000	32
“word”	“wording”	5	-105

Mystery 9

What does the function shown in Figure 2-15 do?

The code is a stripped down version of mystery8:

mystery9
2D E9 30 48      PUSH.W {R4,R5,R11,LR}
0D F2 08 0B      ADDW R11, SP, #8
09 4D            LDR R5, =byteArray
06 E0            B loc_100E312
loc_100E304
0B 78            LDRB R3, [R1]
5A 5D            LDRB R2, [R3,R5]
63 5D            LDRB R3, [R4,R5]
93 42            CMP R3, R2
04 D1            BNE loc_100E318
01 30            ADDS R0, #1
01 31            ADDS R1, #1
loc_100E312
04 78            LDRB R4, [R0]
00 2C            CMP R4, #0
F5 D1            BNE loc_100E304
loc_100E318
0B 78            LDRB R3, [R1]
5A 5D            LDRB R2, [R3,R5]
03 78            LDRB R3, [R0]
5B 5D            LDRB R3, [R3,R5]
98 1A            SUBS R0, R3, R2
BD E8 30 88      POP.W {R4,R5,R11,PC}
; End of function mystery9

ARM or Thumb

The code is in Thumb state:

The snippet uses 16bit and 32bit instructions.
Some instructions have the .W suffix.
The code uses PUSH.W and POP.W as function prologue and epilogue

Instruction Semantic

See mystery 8.

Types

The function takes two arguments. arg1 = R0 and arg2 = R1. They are used in the same manner as in mystery 8 so arg1 and arg2 are null terminated strings.

Function Prototype

The function prototype is:

UNSIGNED INT mystery9(CHAR*, CHAR*);

Prologue and Epilogue

The same as in mystery 8 applies, except the function doesn’t need R6 and therefore doesn’t save and restore it.

Purpose and Pseudo-code

We can get the pseudo-code by removing all missing parts from mystery8:

FUNCTION mystery8(string1, string2)
    index = 0
    WHILE TRUE
        IF string1[index] == 0 OR string1[index] != string2[index] THEN
            RETURN string1[index] - string2[index]
        ENDIF
        index = index + 1
    ENDWHILE
    RETURN 0
END

The code implements strcmp, which is the unlimited version of strncmp from mystery8. So the comparison always checks the entire strings up to the length of the shorter one:

Example:

string 1	string 2	result
“string a”	“string b”	-1
“string a”	“String b”	32
“word”	“wording”	-105