cover image for post 'Practical Reverse Engineering Solutions – Page 78 (Part IV)'

Practical Reverse Engineering Solutions – Page 78 (Part IV)

my go at mystery10 and mystery 11 on pages 78ff

This blog post presents my solution to exercises 10 and 11 on page 78ff from the book Practical Reverse Engineering by Bruce Dang, Alexandre Gazet and Elias Bachaalany (ISBN: 1118787315). The book is my first contact with reverse engineering, so take my statements with a grain of salt. All code snippets are on GitHub. For an overview of my solutions consult this progress page.

Problem Statement

For the code in each exercise, do the following in order (whenever possible):
1. Determine whether it is in Thumb or ARM state.

2. Explain each instruction’s semantic. If the instruction is LDR/STR
explain the addressing mode as well.
,
3. Identify the types (width and signedness) for every possible object. For structures, recover field size, type, and friendly name whenever possible. Not all structure fields will be recoverable because the function may only access a few fields. For each type recovered, explain to yourself (or someone else) how you inferred it.

4. Recover the function prototype.

5. Identify the function prologue and epilogue.

6. Explain what the function does and then write pseudo-code for it.

7. Decompile the function back to C and give it a meaningful name.

Mystery 10

Exercise 10

Figure 2-16 is a function from Windows RT. Read MSDN if needed. Ignore the security PUSH/POP cookie routines.

This is mystery11 from Figure 2-17:

mystery10
2D E9 70 48 PUSH.W {R4–R6,R11,LR}
0D F2 0C 0B ADDW R11, SP, #0xC
37 F0 CC F9 BL __security_push_cookie
84 B0 SUB SP, SP, #0x10
0D 46 MOV R5, R1
00 24 MOVS R4, #0
10 2D CMP R5, #0x10
16 46 MOV R6, R2
0C D3 BCC loc_1010786
1A 4B LDR R3, =__imp_GetSystemTime
68 46 MOV R0, SP
1B 68 LDR R3, [R3]
98 47 BLX R3
00 9B LDR R3, [SP,#0x1C+var_1C]
10 24 MOVS R4, #0x10
33 60 STR R3, [R6]
01 9B LDR R3, [SP,#0x1C+var_18]
73 60 STR R3, [R6,#4]
02 9B LDR R3, [SP,#0x1C+var_14]
B3 60 STR R3, [R6,#8]
03 9B LDR R3, [SP,#0x1C+var_10]
F3 60 STR R3, [R6,#0xC]
loc_1010786
2B 1B SUBS R3, R5, R4
04 2B CMP R3, #4
04 D3 BCC loc_1010796
11 4B LDR R3, =__imp_GetCurrentProcessId
1B 68 LDR R3, [R3]
98 47 BLX R3
30 51 STR R0, [R6,R4]
04 34 ADDS R4, #4
loc_1010796
2B 1B SUBS R3, R5, R4
04 2B CMP R3, #4
04 D3 BCC loc_10107A6
0C 4B LDR R3, =__imp_GetTickCount
1B 68 LDR R3, [R3]
98 47 BLX R3
30 51 STR R0, [R6,R4]
04 34 ADDS R4, #4
loc_10107A6
2B 1B SUBS R3, R5, R4
08 2B CMP R3, #8
09 D3 BCC loc_10107C0
07 4B LDR R3, =__imp_QueryPerformanceCounter
68 46 MOV R0, SP
1B 68 LDR R3, [R3]
98 47 BLX R3
00 9B LDR R3, [SP,#0x1C+var_1C]
32 19 ADDS R2, R6, R4
33 51 STR R3, [R6,R4]
01 9B LDR R3, [SP,#0x1C+var_18]
08 34 ADDS R4, #8
53 60 STR R3, [R2,#4]
loc_10107C0
20 46 MOV R0, R4
04 B0 ADD SP, SP, #0x10
37 F0 A4 F9 BL __security_pop_cookie
BD E8 70 88 POP.W {R4–R6,R11,PC}
; End of function mystery10

ARM or Thumb

The code is in Thumb state:

  • The code uses PUSH.W and POP.W pattern.
  • There are 16bit instructions.
  • 32bit instructions have the .W suffix

Instruction Semantic

  • The code uses BL and BLX to call subroutines. The latter – BLX – switches state from Thumb to ARM.
  • The instruction BCC branches on unsigned lower.
  • In instructions like LDR R3, [SP,#0x1C+var_1C] in line 15, the value var_1C comes from the disassembler and probably has the value -0x1C. So the instruction boils down to LDR R3, [SP,#0]

Types

The first function parameter in R0 is never read. The second function parameter in R1 is used to determine which calls to execute, it is an unsigned integer type (we know this because of the BCC comparisons). The third parameter in R3 points to a structure that holds the return values of the four system calls:

typedef struct _STRUCT1 {
  WORD wYear;
  WORD wMonth;
  WORD wDayOfWeek;
  WORD wDay;
  WORD wHour;
  WORD wMinute;
  WORD wSecond;
  WORD wMilliseconds;
  DWORD dwProcessId;
  DWORD dwTickCount;
  LARGE_INTEGER liPerformanceCounter;
} STRUCT1;

The function returns the number of bytes placed in the above structure.

Function Prototype

The function prototype is:

unsigned int mystery10(UNKNOWN, unsigned integer, struct1*);

Prologue and Epilogue

The prologue and epilogue save and restore all registers that the function modifies, except of course the first three registers that hold function parameters. The function returns by pushing/popping LR.

Purpose and Pseudo-code

The function calls up to four different API functions and stores their return value in a structure passed as the third function parameter. Depending on the value of the third function parameter, API calls are made or not:

Value of R1Get SystemTimeGet CurrentProcessIDGet TickCountQuery PerformanceCounter
R1 ≥ 26executedexecutedexecutedexecuted
26 > R1 ≥ 18executedexecutedexecuted-
18 > R1 ≥ 14executedexecuted--
14 > R1 ≥ 10executed---
10 > R1 ≥ 8-executedexecuted-
8 > R1 ≥ 4-executed--
4 > R1 ≥ 0----

The value of R1 is probably supposed to be between 26 (meaning all four API calls are made) and 10 (only GetSystemTime is executed. If R1 is less than 10, then the result values are placed at the wrong location in struct1. The four API calls are well documented:

GetSystemTime

Get the current system time and store it in the structure passed as the first function parameter, see MSDN. The structure is

typedef struct _SYSTEMTIME {
  WORD wYear;
  WORD wMonth;
  WORD wDayOfWeek;
  WORD wDay;
  WORD wHour;
  WORD wMinute;
  WORD wSecond;
  WORD wMilliseconds;
} SYSTEMTIME, *PSYSTEMTIME;

Our code passes the stack pointer arg3 to GetSystemTime. In line 5 it created a 16 byte stack frame that can hold the 8 two byte values of SYSTEMTIME. The code then loads members at offset , 4, 8 and 12. The code uses LDR which loads 32 bits or two members of SYSTEMTIME. So the four LDR/STR pairs copy the entire structure to arg3.

GetCurrentProcessId

This API call takes no parameters and returns a DWORD with the process id, see MSDN. In line 40 the return value (the process id) is placed in the structure at arg3->dwProcessId (again, assuming that arg2 is not smaller than 10).

GetTickCount

Another easy function with no parameters that returns “the number of milliseconds that have elapsed since the system was started”, see MSDN. The return value is stored in arg3->dwTickCount.

QueryPerformanceCounter

The function takes “a pointer to a variable that receives the current performance-counter value, in counts” as the only parameter. The type of this parameter is LARGE_INTEGER, which has 8 bytes. The function stores those 8 bytes with two STR instructions in arg1->liPerformanceCounter.

C-Code

unsigned int system_info(void arg1, unsigned int nr_bytes, struct1 *result)
{
    unsigned int nr_of_copied_bytes = 0;
    if ( nr_bytes >= 26 ) {
        SYSTEMTIME SystemTime;
        GetSystemTime(&SystemTime);
        memcpy(result, &SystemTime, sizeof(struct SYSTEMTIME));
        nr_of_copied_bytes += sizeof(struct SYSTEMTIME);
    }
    if ( nr_bytes >= 18 ) {
        result->dwProcessId = GetCurrentProcessId();
        nr_of_copied_bytes += sizeof(DWORD);
    }
    if ( nr_bytes >= 14 ) {
        result->dwTickCount = GetTickCount()
        nr_of_copied_bytes += sizeof(DWORD);
    }
    if ( nr_bytes >= 10 ) {
        LARGE_INTEGER perfCounter;
        QueryPerformanceCounter(&perfCounter);
        result->liPerformanceCounter = perfCounter;
        nr_of_copied_bytes += sizeof(LARGE_INTEGER);
    }
    return nr_of_copied_bytes;
}

Mystery 11

exercise 11

In Figure 2-17, sub_101651C takes three arguments and returns nothing. If you complete this exercise, you should pat yourself on the back.

I wasn’t able to solve this exercise, but I’m posting my preliminary results regardless. Maybe they help someone else reverse the code.

This is the code from Figure 2-16:

mystery11
2D E9 F8 4F    PUSH.W {R3–R11,LR}
0D F2 20 0B    ADDW R11, SP, #0x20
B0 F9 5A 30    LDRSH.W R3, [R0,#0x5A]
07 46          MOV R7, R0
90 46          MOV R8, R2
00 EB 83 03    ADD.W R3, R0, R3,LSL#2
D3 F8 84 A0    LDR.W R10, [R3,#0x84]
7B 8F          LDRH R3, [R7,#0x3A]
89 46          MOV R9, R1
CB B9          CBNZ R3, loc_1018602
B0 F9 5A 40    LDRSH.W R4, [R0,#0x5A]
17 F1 20 02    ADDS.W R2, R7, #0x20
00 EB 44 03    ADD.W R3, R0, R4,LSL#1
B3 F8 5C 50    LDRH.W R5, [R3,#0x5C]
00 EB 84 03    ADD.W R3, R0, R4,LSL#2
D3 F8 84 00    LDR.W R0, [R3,#0x84]
83 89          LDRH R3, [R0,#0xC]
06 6C          LDR R6, [R0,#0x40]
03 EB 45 03    ADD.W R3, R3, R5,LSL#1
9B 19          ADDS R3, R3, R6
1C 78          LDRB R4, [R3]
5B 78          LDRB R3, [R3,#1]
43 EA 04 24    ORR.W R4, R3, R4,LSL#8
43 8A          LDRH R3, [R0,#0x12]
23 40          ANDS R3, R4
99 19          ADDS R1, R3, R6
FD F7 8D FF    BL sub_101651C
loc_1018602
BA 8E          LDRH R2, [R7,#0x34]
BB 6A          LDR R3, [R7,#0x28]
D0 18          ADDS R0, R2, R3
9A F8 02 30    LDRB.W R3, [R10,#2]
0B B1          CBZ R3, loc_1018612
00 22          MOVS R2, #0
00 E0          B loc_1018614
loc_1018612
3A 6A          LDR R2, [R7,#0x20]
loc_1018614
FB 8E          LDRH R3, [R7,#0x36]
B8 F1 00 0F    CMP.W R8, #0
01 D0          BEQ loc_1018620
80 18          ADDS R0, R0, R2
9B 1A          SUBS R3, R3, R2
loc_1018620
C9 F8 00 30    STR.W R3, [R9]
BD E8 F8 8F    POP.W {R3–R11,PC}
; End of function mystery11

ARM or Thumb

The code is in Thumb state:

  • The function uses PUSH.W and POP.W as function prologue and epilogue.
  • There are both 16bit and 32bit instructions.
  • 32bit instructions have the .W suffix.
  • The CBZ instruction is only available in Thumb state.

Instruction Semantic

LDRB.W loads an unsigned byte, LDRH.W loads an unsigned short, LDRSH.W loads a (signed) short, and LDR.W loads 32bit integers,

Types

The first function parameter arg1 in R0 is a pointer to a complicated structure. Let this structure be struct1. One can infer a couple of members of this structure from the different LDR instructions. I came up with the following picture, but again, since I couldn’t figure out what the function does the picture could be completely off:

mystery11_structures.png

The second parameter arg2 is pointer to a 32bit integer. The third parameter arg3 is only compared to zero and could be almost anything, e.g., a pointer, an integer or a boolean. The return value of mystery11 is an integer.

Function Prototype

The function prototype might look like this:

int mystery11(struct1*, int*, unknown*) 

Prologue and Epilogue

The function preserves registers R3 to R11 with PUSH.W/POP.W instructions. It uses the same two instructions to store LR and to return.

Purpose and Pseudo-code

I have no clue what the function does. Most lines just access different members of the structure in arg1. The three instructions starting in line 22 are interesting:

1C 78          LDRB R4, [R3]
5B 78          LDRB R3, [R3,#1]
43 EA 04 24    ORR.W R4, R3, R4,LSL#8

They load two bytes from memory location R3, multiply the value of the first byte by 255 and add the second byte to the result. So this snippet is essentially loading a big-endian 16bit short. This could indicate that the function is operating on external data structure with big-endian shorts, like TCP/IP data.

My draft of the pseudo-code is:

INT mystery11(STRUCT1 *arg1, INT *arg2, UNKNOWN *arg3) 

    struct2* pS2 = arg1 + 2*(arg1->field5A_s)
    struct3* pS3 = arg1 + 4*(arg1->field5A_s)
    struct4* pS4 = pS3->field84_p

    IF arg1->field3A_s == 0 THEN
        int index = pS4->field0C_s + 2*pS2->field5C_s
        unsigned short bigEndian = pS4->field40_p[index]
        unsigned short val = CONVERT_BIG_ENDIAN_SHORT(bigEndian)
        int index2 = pS4->field12_s & val  
        sub_101651C(pS3, pS5[index2], arg1->field20_i)
    ENDIF

    int offset;
    IF pS4->field02_c == 0 THEN
        offset = arg1->field20_i;
    ELSE
        offset = 0;
    ENDIF


    int return_value = arg1->field28_i + arg1->field34_s
    unsigned short new_value = arg1->field36_s

    IF arg3 != 0 THEN
        return_value = return_value + offset
        new_value = new_value - offset
    ENDIF

    *arg2 = new_value
    RETURN return_value