Practical Reverse Engineering Solutions – Page 35 (Part I)

my go at the exercises 1 to 4 on page 35

Table of Contents

Exercise 1
Exercise 2
include <TlHelp32.h>
include <intrin.h>
Exercise 3
Exercise 4
strlen
strchr
memcpy
memset
strcmp
strset

This blog post presents my solutions to exercises from the book Practical Reverse Engineering by Bruce Dang, Alexandre Gazet and Elias Bachaalany (ISBN: 1118787315). The book is my first contact with reverse engineering, so take my statements with a grain of salt. All code snippets are on GitHub.

The walk-through in the book has a few minor typos:

page 31, last listing: 0x80047400h won’t work, needs to be 0x80047400 (no trailing h).
page 33, middle: Line 42 tests the return value of Process32Next should read Line 42 tests the return value of Process32First.
page 34, while-listing: “explorer”.exe" should read “explorer.exe”.
page 34, top: continue execution at 37 should read continue execution at 73.
page 34, top: is also a jump target in line 43 should read is also a jump target in line 51.

Furthermore, there is a check for th32ParentProcessID == th32ProcessID not mentioned in the book.

Exercise 1

Repeat the walk-through by yourself. Draw the stack layout, including parameters and local variables.

The function uses the STDCALL convention, hence, all three function parameters are put on the stack before calling it:

The first few lines are:

push ebp
mov ebp, esp
sub esp, 130h
push edi
sidt fword ptr [ebp-8]
mov eax, [ebp-6]
cmp eax, 8003F400h
jbe short loc_10001C88 (line 18)
cmp eax, 80047400h
jnb short loc_10001C88 (line 18)

This contains the function prologue in lines 3 and 4, the creation of a stack frame for local variables in line 5, and pushing edi on the stack:

I assume that one of the two jumps is taken. At loc_10001C88 we find the following assembly code:

loc_10001C88:
xor eax, eax
mov ecx, 49h
lea edi, [ebp-12Ch]
mov dword ptr [ebp-130h], 0
push eax
push 2
rep stosd
call CreateToolhelp32Snapshot
mov edi, eax
cmp edi, 0FFFFFFFFh
jnz short loc_10001CB9 (line 35)

The first few lines initialize tagPROCESSENTRY32. The structure has 296 bytes, 260 of which are for the szExeFile member. Here’s where the members of the structure are located on the stack:

Lines 23 and 24 push arguments for CreateToolhelp32Snapshot on the stack. Since this function uses the STDCALL convention, the callee cleans up the stack:

I assume the jump in line 29 is taken. The lines at loc_10001CB9 are as follows:

loc_10001CB9:
lea eax, [ebp-130h]
push esi
push eax
push edi
mov dword ptr [ebp-130h], 128h
call Process32First
test eax, eax
jz short loc_10001D24 (line 70)
mov esi, ds:_stricmp
lea ecx, [ebp-10Ch]
push 10007C50h
push ecx
call esi ; _stricmp
add esp, 8
test eax, eax
jz short loc_10001D16 (line 66)

I assume the jump in 43 is not taken. In this snippet we have two function calls. Process32First uses the STDCALL convention. stricmp on the other hand uses CDECL. In the latter case, the stack pointer is adjusted by the caller:

If the jump in line 51 is not taken, the above procedure will basically be repeated as long as “explorer.exe” matches the process name or the call to Process32Next fails. The only difference in lines 53 to line 65 is the call to Process32Next instead of Process32First. The stack picture will look the same; I therefore take the jump to loc_10001D16.

loc_10001D16:
mov eax, [ebp-118h]
mov ecx, [ebp-128h]
jmp short loc_10001D2A (line 73)

The above snippet doesn’t change the stack.

loc_10001D2A:
cmp eax, ecx
pop esi
jnz short loc_10001D38 (line 82)

Line 75 restores ESI. Let’s assume we jump to line 82:

loc_10001D38:
mov eax, [ebp+0Ch]
dec eax
jnz short loc_10001D53 (line 93)
push 0
push 0
push 0
push 100032D0h
push 0
push 0
call ds:CreateThread

I don’t take the jump in line 85. The stack for lines starting at line 66 should look like this:

The only remaining lines set the return value and clean up the stack:

loc_10001D53:
mov eax, 1
pop edi
mov esp, ebp
pop ebp
retn 0Ch

Exercise 2

In the example walk-through, we did a nearly one-to-one translation of the assembly code to C. As an exercise, re-decompile this whole function so that it looks more natural. What can you say about the developer’s skill level/experience? Explain your reasons. Can you do a better job?

Here’s my decompiled version with references to the assembly lines:

#include <windows.h> 
## include <TlHelp32.h> 
## include <intrin.h> 

typedef struct _IDTR {
	DWORD base;
	SHORT limit;
} IDTR, *PIDTR;

BOOL APIENTRY DllMain(HMODULE hModule,
	DWORD ul_reason_for_call,
	LPVOID lpReserved
	) // line 1
{
	// line 2 ---
	IDTR idtr;
	__sidt(&idtr);	
	if (idtr.base > 0x8003F400 && idtr.base < 0x80047400) {
		return FALSE;
	}
	// --- line 17
	// line 19 ---
	PROCESSENTRY32 procentry;
	memset(&procentry, 0, sizeof(PROCESSENTRY32));
	procentry.dwSize = sizeof(procentry); // 0x128
	HANDLE h;	
	h = CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS, 0);
	if (h == INVALID_HANDLE_VALUE)
		return FALSE;
	// --- line 34
	// line 36 ---
	int ret = Process32First(h, &procentry);
	while (ret) {
		// line 44 - line 51 AND line 59 - line 65
		if (!wcscmp(procentry.szExeFile, L"explorer.exe")) {						
			break;
		}
		ret = Process32Next(h, &procentry);
	}
	// --- line 65
	// line 66 --
	if (ret)		
		if (procentry.th32ParentProcessID == procentry.th32ProcessID)
			return FALSE;	
	// --- line 81
	// line 70 ---
	else 
	    /*if (ul_reason_for_call == DLL_PROCESS_DETACH)
	        return FALSE; (no such check, error in book */
            return FALSE;
	// --- line 81

	// line 82
	if (ul_reason_for_call == DLL_PROCESS_ATTACH)
		CreateThread(0, 0, (LPTHREAD_START_ROUTINE)0x100032D0, 0, 0, 0);
	return TRUE;
}

Exercise 3

In some of the assembly listings, the function name has a @ prefix followed by a number. Explain when and why this decoration exists.

According to this source:

Names with _ prefix and @h postfix indicate the __stdcall calling convention. It is the default for Windows dll`s. The callee has to clean-up the stack. The number n in the postfix says how many bytes are used for function parameters. Our _DllMain@12 therefore uses 12 Bytes as parameters, i.e., one byte for each of the three parameters.

Exercise 4

You can find the full examples, including how to use the function, on my GitHub page. All functions use the CDECL calling conventions.

strlen

Declaration:

size_t strlen(const char *str)

My assembly x86 implementation:

strlen:
push ebp
mov ebp, esp
mov edi, [ebp+8]     ; get first parameter
mov edx, edi         ; copy address to start of string
xor eax, eax         ; set eax to null byte
mov ecx, -1          ; make sure ecx does not become zero 
repne scasb          ; search null byte
sub edi, edx         ; substract start address from end address 
dec edi              ; decrement difference to compensate for null byte
mov eax, edi         ; return strlen result
mov esp, ebp
pop ebp
ret

strchr

Declaration:

char *strchr(const char *str, int c)

My assembly x86 implementation:

strchr:
push ebp
mov ebp, esp
mov edi, [ebp+8]     ; get first parameter
mov bl, [ebp+12]     ; set bl to second parameter
mov al, 0            ; set al to null byte
_loop:
mov cl, [edi]        ; store current character
cmp cl, bl           ; check if character is what we search         
jz _return           ; jump to return if match
scasb                ; check if null byte
jnz _loop            ; loop if no match
mov edi, 0           ; set edi to zero, so function will return null
_return:
mov eax, edi         ; return pointer to first occurence 
mov esp, ebp
pop ebp
ret

memcpy

Declaration:

void *memcpy(void *str1, const void *str2, size_t n)

My assembly x86 implementation:

memcpy:
push ebp
mov ebp, esp
mov esi, [ebp+8]     ; src location (first parameter) 
mov edi, [ebp+12]    ; dst location (second parameter) 
mov ecx, [ebp+16]    ; number of bytes (third parameter)
_loop:
mov al, [esi];       ; copy byte from src ...
mov [edi], al;       ; ... to dst
inc esi              ; go to next byte in src ...
inc edi              ; ... and dst
dec ecx              ; decrement counter
jnz _loop            ; loop n-times
mov esp, ebp
pop ebp
ret

memset

Declaration:

void *memset(void *str, int c, size_t n)

My assembly x86 implementation:

memset:
push ebp
mov ebp, esp
mov edi, [ebp+8]     ; string (first parameter) 
mov al, [ebp+12]     ; character (second parameter) 
mov ecx, [ebp+16]    ; number of bytes (third parameter)
rep stosb
mov esp, ebp
pop ebp
ret

strcmp

Declaration:

int strcmp(const char *str1, const char *str2)

My assembly x86 implementation (uses the strlen routine):

strcmp:
push ebp
mov ebp, esp
mov edi, [ebp+12]    ; get second string 
push edi             ; next for lines calc len of string b
call strlen          ; ^^ 
add esp, 4           ; ^^                   
mov ebx, eax         ; ^^ 
mov esi, [ebp+8]     ; get first string
push esi             ; next for lines calc len of string a
call strlen          ; ^^ 
add esp, 4           ; ^^                   
_check:
cmp eax, ebx         ; compare lengths
ja _greater          ; string a is longer than string b 
jb _less             ; string b is longer than string a 
jmp _equal_length    ; strings have same length
_greater:
mov eax, 1
jmp _return
_less:
mov eax, -1
jmp _return
_equal_length:
mov edi, [ebp+12]    ; get second string (restore)
mov esi, [ebp+8]     ; get first string (restore)
mov ecx, eax         ; length of strings
repe cmpsb           ; compare strings
jg _greater          ; string a is greater
jl _less             ; string b is greater
mov eax, 0           ; strings are equal
jmp _return
_return:
mov esp, ebp
pop ebp
ret

strlen:
push ebp
mov ebp, esp
mov edi, [ebp+8]     ; get first parameter
mov edx, edi         ; copy address to start of string
xor eax, eax         ; set eax to null byte
mov ecx, -1          ; make sure ecx does not become zero 
repne scasb          ; search null byte
sub edi, edx         ; substract start address from end address 
dec edi              ; decrement difference to compensate for null byte
mov eax, edi         ; return strlen result
mov esp, ebp
pop ebp
ret

strset

Declaration:

char *strset( const char *str,char ch );

My assembly x86 implementation (uses the strlen routine):

strset:
push ebp
mov ebp, esp
mov edi, [ebp+8]     ; get first string
mov edx, edi         ; make copy of esi
push edi             ; next for line put str length in ecx
call strlen          ; ^
add esp, 4           ; ^
mov ecx, eax         ; ^
mov al, [ebp+12]     ; get fill character 
mov edi, edx         ; restore esi
rep stosb            ; fill string
mov eax, edx         ; return reference to string 
mov esp, ebp
pop ebp
ret

strlen:
push ebp
mov ebp, esp
mov edi, [ebp+8]     ; get first parameter
mov edx, edi         ; copy address to start of string
xor eax, eax         ; set eax to null byte
mov ecx, -1          ; make sure ecx does not become zero 
repne scasb          ; search null byte
sub edi, edx         ; substract start address from end address 
dec edi              ; decrement difference to compensate for null byte
mov eax, edi         ; return strlen result
mov esp, ebp
pop ebp
ret

Archived Comments

Note: I removed the Disqus integration in an effort to cut down on bloat. The following comments were retrieved with the export functionality of Disqus. If you have comments, please reach out to me by Twitter or email.

sean Sep 15, 2014 21:46:18 UTC

hi,
i have been confused by some concepts in exercise 2.
i know that at loc_10001D24 there are mov eax, [ebp+0ch] and mov ecx [ebp+0ch] and then in line 74 comes with an instruction cmp eax, ecx.
does it means to compare the same address ?? because eax and ecx are the same dword in the identical address ( [ebp+0ch] or fdwReason ).
so, if eax and ecx are identical, the ZF flag would be set to 1, finally the it would reach at line 81??

also i am wondering where the instruction imply
"if (ul_reason_for_call == DLL_PROCESS_DETACH)
return FALSE;
"

thanks in advance!

Bader Sep 16, 2014 13:21:14 UTC

You are right about loc_10001D24. There are only two ways to reach this location:

- In line 41, the call to Process32First fails
- In line 56, the call to Process32Next fails

So if - and only if - no next process information could be retrieved before reaching "explorer.exe", we get to loc_10001D24. The two lines 71 and 72 set eax and ecx to the same value. The actual value (fdwReason), is never used, so the lines might just as well be

xor eax, eax
xor ecx, ecx

or
pop esi
jmp

The result is always that the function returns NULL in line 81, meaning it couldn't find the "explorer.exe" process.

As for your second question, there is no check

if (ul_reason_for_call == DLL_PROCESS_DETACH)
return FALSE;

I got confused by the book where they mistakenly came up with this check - thinking lines 70-74 compare fdwReason to 0. I downloaded sample J and disassembled it with OllyDbg, the lines 71 and 72 are as printed in the book and definitely do not lead to fdwReason == 0 in line 74.

dagan Dec 08, 2014 20:44:39 UTC

which program did you use to draw call stack picture for exercise 1?

Johannes Dec 08, 2014 21:11:05 UTC

I wrote a small Python script to generate most of the graphics in SVG, after that Adobe Illustrator to clean up.

sebas sujeen Jun 09, 2015 06:21:18 UTC

In your memset implementation, it should be rep stosb and not repne stosb

Johannes Bader Jun 10, 2015 19:52:55 UTC

You are right of course, fixed it. Thanks you reporting..

lisz Nov 25, 2016 02:00:32 UTC

About Exercise 2: in my opinion, you spotted some errors author of the book did, but you also got it a bit wrong. Jump to line 70 happens when Process32First or Process32Next fail. In that case, EAX and EXC are set to a same value (fdwReason). If Process32First/Process32Next and _stricmp both succeed ("explorer.exe" is found"), then EAX and EXC are set to ParentProcessId and ProcessId of PROCESSENTRY32. So, only then EAX and ECX are not a same number. In either case, line 74 is executed next, which does "cmp EAX, EXC", and that produces non-zero only if "explorer.exe" was found. So, in my opinion, relevant part of the function should look like this:

DWORD parentProcId = fdwReason;
DWORD procId = fdwReason;
if (Process32First(snapshot, pe) != 0) {
while (_stricmp(pe.szExeFile, "explorer.exe") != 0)
if (Process32Next(snapshot, pe) == 0)
break;

if (_stricmp(pe.szExeFile, "explorer.exe") == 0) {
parentProcId = pe.ParentProcessId;
procId = pe.ProcessId;
}
}
if (parentProcId == procId) return 0;
if (fdwReason == DLL_PROCESS_ATTACH)
CreateThread(0, 0, (LPTHREAD_START_ROUTINE)0x100032D0, 0, 0);
return 1;

(It's not real code - just to show how this function performs)

slinkin Aug 15, 2019 21:01:39 UTC

You have a bug in memcpy function. The function always copy at least one byte. If the third param is 0 (number of bytes) - occurred integer overflow and you will get loop with max of size_t iterations.

Johannes Bader Aug 16, 2019 14:29:40 UTC

Good catch, there should be a check for non-positive sizes. I'll fix that next week.