Practical Reverse Engineering Solutions – Page 11
my go at exercise 1 on page 11- Problem Statement
- Context of the Snippet
- Walk-Through
- ► Line 1: <code>mov edi, [ebp+8]</code>
- ► Line 2: <code>mov edx, edi</code>
- ► Line 3: <code>xor eax, eax</code>
- ► Line 4: <code>or ecx, 0FFFFFFFFh</code>
- ► Line 5: <code>repne scasb</code>
- ► Line 6: <code>add ecx, 2</code>
- ► Line 7: <code>neg ecx</code>
- ► Line 8: <code>mov al, [ebp+0Ch]</code>
- ► Line 9: <code>mov edi, edx</code>
- ► Line 10: <code>rep stosb</code>
- ► Line 11: <code> mov eax, edx</code>
- C-Code
This blog post presents my solutions to exercises from the book Practical Reverse Engineering by Bruce Dang, Alexandre Gazet and Elias Bachaalany (ISBN: 1118787315). The book is my first contact with reverse engineering, so take my statements with a grain of salt. All code snippets are on GitHub. For an overview of my solutions consult this progress page.
Problem Statement
This function uses a combination
SCAS
andSTOS
to do its work. First, explain what is the type of the[EBP+8]
and[EBP+C]
in line 1 and 8, respectively. Next, explain what this snippet does:
01: 8B 7D 08 mov edi, [ebp+8] 02: 8B D7 mov edx, edi 03: 33 C0 xor eax, eax 04: 83 C9 FF or ecx, 0FFFFFFFFh 05: F2 AE repne scasb 06: 83 C1 02 add ecx, 2 07: F7 D9 neg ecx 08: 8A 45 0C mov al, [ebp+0Ch] 09: 8B AA mov edi, edx 10: F3 AA rep stosb 11: 8B C2 mov eax, edx
Context of the Snippet
The function snippet probably get’s its parameters in C style. This convention places the function parameter on the stack before the call is made. The parameters are placed in reverse order from the prototype of the function, i.e., the last parameter is placed first. The CALL
then places the instruction pointer EIP
on the stack. Finally, the standard function prologue pushes the base pointer on the stack and sets the value of EBP
to the stack pointer ESP
. This leads to the following stack image before line 1 of the exercise snippet is executed (see left hand side):
- In the following analysis we see that
[EBP+8]
(the first function parameter) is of typechar *
, i.e., a pointer to a sequence of bytes. The function snippet requires that sequence is delimited by zero, so it probably is a null-terminated string. - The value at
[EBP+C]
(the second function parameter) is of typechar
, i.e., a single Byte like a letter.
I’m using the string “The pool on the roof must have a leak.” (with null byte at the end) as argument 1 at [EBP+8]
and character ‘x’
for the second parameter at [EBP+12]
. See the right stack in the above figure. Note that while ‘x’
is actually placed at EBP+C
, the frame at EBP+8
contains a memory address pointing to the first letter of the string.
To check my guesses of what the code snippet does, I put the function prologue and epilogue around it and added a caller to get a fully functional assembly code (GitHub link):
SECTION .data my_str: db 'The pool on the roof must have a leak.', 0 SECTION .text GLOBAL _start _start: nop push byte 'x' ; second function parameter push dword my_str ; first function parameter call black_out ; call function add esp, 8 ; cleaning out the stack mov ebx,0 ; parameter for exit call (return value) mov eax,1 ; exit system call int 080h ; run system call, see page 79 pal black_out: push ebp ; function prologue, save stack base pointer mov ebp, esp ; point base pointer to ESP ; ------------ start code from book --------- mov edi, [ebp+8] mov edx, edi xor eax, eax or ecx, 0FFFFFFFFh repne scasb add ecx, 2 neg ecx mov al, [ebp+0Ch] mov edi, edx rep stosb mov eax, edx ; ------------ end code from book ----------- mov esp, ebp ; restore stack pointer pop ebp ; restore stack base pointer ret
I compiled the code on a 64bit machine with:
$ nasm -f elf32 -g -F dwarf code.asm $ ld -m elf_i386 -o code code.o
and started debugging with:
$ gdb -q code Reading symbols from code...done. (gdb) break *_start Breakpoint 1 at 0x8048080: file code.asm, line 7. (gdb) run Starting program: /home/jb/pre/chapter_1/page_11/exercise_1/code Breakpoint 1, _start () at code.asm:7 7 nop
The caller first pushes the second function parameters ‘x’
on the stack:
(gdb) s 8 push byte 'x' (gdb) s 9 push dword my_str (gdb) x/cb $esp 0xffffcfec: 120 'x'
Then it pushes the first parameter “The pool on the roof must have a leak."
:
(gdb) s 10 call black_out
In contrast to the second parameter, the stack value is a pointer to the string in memory. The command x/xw $esp
gives the value in memory referenced by ESP
:
(gdb) x/xw $esp 0xffffcfe8: 0x080490c0
So the string is stored at 0x080490c0
:
(gdb) x/s 0x080490c0 0x80490c0 <my_str>: "The pool on the roof must have a leak."
The next three instructions call the function and run the function prologue:
(gdb) s 17 push ebp (gdb) p/x $ebp $1 = 0x0 (gdb) s 18 mov ebp, esp (gdb) p/x $esp $1 = 0xffffcfe0 (gdb) s black_out () at code.asm:20 20 mov edi, [ebp+8]
After that we enter the snippet that is analyzed step-by-step in the next secion.
Walk-Through
► Line 1: mov edi, [ebp+8]
As discussed before, [ebp+8]
is a value in stack representing the first function parameter (see right hand side of stack image). This instruction copies the parameter, a pointer to the string, to register EDI
. Now EDI
references our string:
(gdb) x/s $edi 0x80490c0 <my_str>: "The pool on the roof must have a leak."
► Line 2: mov edx, edi
This simply makes a copy of EDI
. The reason for that will be clear in line 5. For reference, EDI
and EDX
contain the double word 0x80490c0
:
(gdb) p/x $edi $5 = 0x80490c0
► Line 3: xor eax, eax
This sets the value of EAX
to zero:
(gdb) p/x $eax $6 = 0x0
Again, the purpose of this will be clear in line 5.
► Line 4: or ecx, 0FFFFFFFFh
This sets the value of ECX
to 0xFFFFFFFF
:
(gdb) p/x $ecx $7 = 0xffffffff
We interpret ECX
as a signed integer -1
:
(gdb) p/d $ecx $7 = -1
The register ECX
is used in the next instruction.
► Line 5: repne scasb
Line 5 is where a lot of the magic happens. The instruction scasb
searches the memory for the byte in EAX
, starting at EDI
. The instruction decreases the value of ECX
after each byte comparison by one, and increases the value of EDI
by one.
In our example, we search the null byte (in EAX
) in the null terminated string “The pool on the roof must have a leak.” (referenced by EDI
). The counter ECX
starts from -1. The following image illustrates the registers before and after repne scasb
:
So ECX
ends up being -40
(gdb) p/d $ecx $8 = -40
The value of EDI
changes too, that’s why in line 2 we made a copy of the value:
(gdb) p/x $edi $9 = 0x80490e7
(the start of the string is at 0x80490c0
).
► Line 6: add ecx, 2
Add 2 to ECX
so ECX
becomes -38:
(gdb) p/d $ecx $10 = -38
This corresponds to -1 times the length of the string. Adding two compensates for firstly not starting to count down from 0 (remember we started at -1), and secondly also counting the null byte.
► Line 7: neg ecx
This simply negates the value of ECX
, so now it actually corresponds to the string length:
(gdb) p/d $ecx $11 = 38
To summarize: Up to and including line 7, the snippet actually calculates the length of the string passed at [EBP+8]
.
► Line 8: mov al, [ebp+0Ch]
Starting with line 8, we enter the second part of the snippet. This instruction copies the byte at stack location [EBP+8]
to register AL
, i.e., the second function parameter. Since the second parameter is of type char
– only one byte in size – the value fits in the lower 8 bits of the EAX
register. AL
now holds the character ‘x’
:
(gdb) p/c $al $12 = 120 'x'
► Line 9: mov edi, edx
The instruction following in line 10 again operates on EDI
. Since line 5 modified the value and it no longer points to the start of the string, we restore it from the backup in EDX
that we created in line 2. After that, EDI
should once again point to the string:
(gdb) p/x $edi $13 = 0x80490c0
(compare 0x80490c0
to the output in line 2).
► Line 10: rep stosb
Again a very powerful instruction. It copies the byte in AL
(in our case the character ‘x’
) to every byte in the sequence starting at EDI
(in our case the string “The pool on the roof must have a leak.”). It does it exactly ECX
times (so in our case for the entire length of the string). In other words, this instruction does a memset
, effectively overwriting the entire string with a single character. After the instruction, the content of our string is blacked out by ‘x’
s:
(gdb) x/s $edx 0x80490c0 <my_str>: 'x'
(The instruction again modifies EDI
, so you have to use EDX
to reference the string.)
► Line 11: mov eax, edx
This copies the address of the string to EAX
. EAX
holds the return value of the function, so the snippet returns a pointer to the modified string.
C-Code
The walk-through demonstrated that the function is overwriting every character in the string passed as the first function parameter with a character passed as the second argument. Here’s a working C-Code, where the function black_out
corresponds to the snippet in this exercise:
#include <stdio.h> char* black_out(char *str, char ch) { /* find length of string */ int len = 0; char *str_cpy = str; while (*str_cpy != '\0') { len++; str_cpy++; } /* set each character of string to <ch> */ while (len-- > 0) { str[len] = ch; } return str; } int main (int argc, char *argv[] ) { if (argc != 3 ) printf("usage: %s string character", argv[0]); else { char *test2 = black_out(argv[1], *argv[2]); printf("%s\n", test2); } }
The function can be simplified by using the strlen
and memset
functions:
char* black_out(char *str, char ch) { /* find length of string */ int len = strlen(str); /* set each character of string to */ memset(str, ch, len); return str; }
Archived Comments
Note: I removed the Disqus integration in an effort to cut down on bloat. The following comments were retrieved with the export functionality of Disqus. If you have comments, please reach out to me by Twitter or email.