cover image for post 'Practical Reverse Engineering Solutions – Page 11'

Practical Reverse Engineering Solutions – Page 11

my go at exercise 1 on page 11

This blog post presents my solutions to exercises from the book Practical Reverse Engineering by Bruce Dang, Alexandre Gazet and Elias Bachaalany (ISBN: 1118787315). The book is my first contact with reverse engineering, so take my statements with a grain of salt. All code snippets are on GitHub. For an overview of my solutions consult this progress page.

Problem Statement

This function uses a combination SCAS and STOS to do its work. First, explain what is the type of the [EBP+8] and [EBP+C] in line 1 and 8, respectively. Next, explain what this snippet does:

01: 8B 7D 08    mov edi, [ebp+8]
02: 8B D7       mov edx, edi
03: 33 C0       xor eax, eax
04: 83 C9 FF    or ecx, 0FFFFFFFFh
05: F2 AE       repne scasb
06: 83 C1 02    add ecx, 2
07: F7 D9       neg ecx
08: 8A 45 0C    mov al, [ebp+0Ch]
09: 8B AA       mov edi, edx
10: F3 AA       rep stosb
11: 8B C2       mov eax, edx

Context of the Snippet

The function snippet probably get’s its parameters in C style. This convention places the function parameter on the stack before the call is made. The parameters are placed in reverse order from the prototype of the function, i.e., the last parameter is placed first. The CALL then places the instruction pointer EIP on the stack. Finally, the standard function prologue pushes the base pointer on the stack and sets the value of EBP to the stack pointer ESP. This leads to the following stack image before line 1 of the exercise snippet is executed (see left hand side):

  • In the following analysis we see that [EBP+8] (the first function parameter) is of type char *, i.e., a pointer to a sequence of bytes. The function snippet requires that sequence is delimited by zero, so it probably is a null-terminated string.
  • The value at [EBP+C] (the second function parameter) is of type char, i.e., a single Byte like a letter.

I’m using the string “The pool on the roof must have a leak.” (with null byte at the end) as argument 1 at [EBP+8] and character ‘x’ for the second parameter at [EBP+12]. See the right stack in the above figure. Note that while ‘x’ is actually placed at EBP+C, the frame at EBP+8 contains a memory address pointing to the first letter of the string.

To check my guesses of what the code snippet does, I put the function prologue and epilogue around it and added a caller to get a fully functional assembly code (GitHub link):

SECTION  .data
my_str: 
    db     'The pool on the roof must have a leak.', 0
SECTION  .text
GLOBAL _start
_start: 
    nop
    push byte 'x'      ; second function parameter
    push dword my_str  ; first function parameter
    call black_out     ; call function
    add esp, 8         ; cleaning out the stack
    mov  ebx,0         ; parameter for exit call (return value) 
    mov  eax,1         ; exit system call
    int 080h           ; run system call, see page 79 pal

black_out:
    push ebp           ; function prologue, save stack base pointer
    mov ebp, esp       ; point base pointer to ESP    
    ; ------------ start code from book ---------
    mov edi, [ebp+8]   
    mov edx, edi       
    xor eax, eax       
    or ecx, 0FFFFFFFFh 
    repne scasb        
    add ecx, 2         
    neg ecx            
    mov al, [ebp+0Ch]  
    mov edi, edx       
    rep stosb          
    mov eax, edx       
    ; ------------ end code from book -----------
    mov esp, ebp       ; restore stack pointer
    pop ebp            ; restore stack base pointer
    ret

I compiled the code on a 64bit machine with:

$ nasm -f elf32 -g -F dwarf code.asm
$ ld -m elf_i386 -o code code.o

and started debugging with:

$ gdb -q code
Reading symbols from code...done.
(gdb) break *_start
Breakpoint 1 at 0x8048080: file code.asm, line 7.
(gdb) run
Starting program: /home/jb/pre/chapter_1/page_11/exercise_1/code 

Breakpoint 1, _start () at code.asm:7
7	    nop

The caller first pushes the second function parameters ‘x’ on the stack:

(gdb) s
8	    push byte 'x'      
(gdb) s
9	    push dword my_str  
(gdb) x/cb $esp
0xffffcfec:	120 'x'

Then it pushes the first parameter “The pool on the roof must have a leak.":

(gdb) s
10	    call black_out     

In contrast to the second parameter, the stack value is a pointer to the string in memory. The command x/xw $esp gives the value in memory referenced by ESP:

(gdb) x/xw $esp
0xffffcfe8:	0x080490c0

So the string is stored at 0x080490c0:

(gdb) x/s 0x080490c0
0x80490c0 <my_str>:	"The pool on the roof must have a leak."

The next three instructions call the function and run the function prologue:

(gdb) s
17	    push ebp           
(gdb) p/x $ebp
$1 = 0x0
(gdb) s
18	    mov ebp, esp       
(gdb) p/x $esp
$1 = 0xffffcfe0
(gdb) s
black_out () at code.asm:20
20	    mov edi, [ebp+8]   

After that we enter the snippet that is analyzed step-by-step in the next secion.

Walk-Through

► Line 1: mov edi, [ebp+8]

As discussed before, [ebp+8] is a value in stack representing the first function parameter (see right hand side of stack image). This instruction copies the parameter, a pointer to the string, to register EDI. Now EDI references our string:

(gdb) x/s $edi
0x80490c0 <my_str>:	"The pool on the roof must have a leak."

► Line 2: mov edx, edi

This simply makes a copy of EDI. The reason for that will be clear in line 5. For reference, EDI and EDX contain the double word 0x80490c0:

(gdb) p/x $edi
$5 = 0x80490c0

► Line 3: xor eax, eax

This sets the value of EAX to zero:

(gdb) p/x $eax
$6 = 0x0

Again, the purpose of this will be clear in line 5.

► Line 4: or ecx, 0FFFFFFFFh

This sets the value of ECX to 0xFFFFFFFF:

(gdb) p/x $ecx
$7 = 0xffffffff

We interpret ECX as a signed integer -1:

(gdb) p/d $ecx
$7 = -1

The register ECX is used in the next instruction.

► Line 5: repne scasb

Line 5 is where a lot of the magic happens. The instruction scasb searches the memory for the byte in EAX, starting at EDI. The instruction decreases the value of ECX after each byte comparison by one, and increases the value of EDI by one.

In our example, we search the null byte (in EAX) in the null terminated string “The pool on the roof must have a leak.” (referenced by EDI). The counter ECX starts from -1. The following image illustrates the registers before and after repne scasb:

scasb5.png

So ECX ends up being -40

(gdb) p/d $ecx
$8 = -40

The value of EDI changes too, that’s why in line 2 we made a copy of the value:

(gdb) p/x $edi
$9 = 0x80490e7

(the start of the string is at 0x80490c0).

► Line 6: add ecx, 2

Add 2 to ECX so ECX becomes -38:

(gdb) p/d $ecx
$10 = -38

This corresponds to -1 times the length of the string. Adding two compensates for firstly not starting to count down from 0 (remember we started at -1), and secondly also counting the null byte.

► Line 7: neg ecx

This simply negates the value of ECX, so now it actually corresponds to the string length:

(gdb) p/d $ecx
$11 = 38

To summarize: Up to and including line 7, the snippet actually calculates the length of the string passed at [EBP+8].

► Line 8: mov al, [ebp+0Ch]

Starting with line 8, we enter the second part of the snippet. This instruction copies the byte at stack location [EBP+8] to register AL, i.e., the second function parameter. Since the second parameter is of type char – only one byte in size – the value fits in the lower 8 bits of the EAX register. AL now holds the character ‘x’:

(gdb) p/c $al
$12 = 120 'x'

► Line 9: mov edi, edx

The instruction following in line 10 again operates on EDI. Since line 5 modified the value and it no longer points to the start of the string, we restore it from the backup in EDX that we created in line 2. After that, EDI should once again point to the string:

(gdb) p/x $edi
$13 = 0x80490c0

(compare 0x80490c0 to the output in line 2).

► Line 10: rep stosb

Again a very powerful instruction. It copies the byte in AL (in our case the character ‘x’) to every byte in the sequence starting at EDI (in our case the string “The pool on the roof must have a leak.”). It does it exactly ECX times (so in our case for the entire length of the string). In other words, this instruction does a memset, effectively overwriting the entire string with a single character. After the instruction, the content of our string is blacked out by ‘x’s:

(gdb) x/s $edx
0x80490c0 <my_str>:	'x'

(The instruction again modifies EDI, so you have to use EDX to reference the string.)

► Line 11: mov eax, edx

This copies the address of the string to EAX. EAX holds the return value of the function, so the snippet returns a pointer to the modified string.

C-Code

The walk-through demonstrated that the function is overwriting every character in the string passed as the first function parameter with a character passed as the second argument. Here’s a working C-Code, where the function black_out corresponds to the snippet in this exercise:

#include <stdio.h>

char* black_out(char *str, char ch) 
{
    /* find length of string */
    int len = 0;
    char *str_cpy = str;
    while (*str_cpy != '\0') {
        len++;
        str_cpy++;
    }
    /* set each character of string to <ch> */
    while (len-- > 0) {
        str[len] = ch;
    }
    return str;
}

int main (int argc, char *argv[] ) 
{
    if (argc != 3 ) 
        printf("usage: %s string character", argv[0]);
    else {
        char *test2 = black_out(argv[1], *argv[2]);
        printf("%s\n", test2);
    }
}

The function can be simplified by using the strlen and memset functions:

char* black_out(char *str, char ch) 
{
    /* find length of string */
    int len = strlen(str);
    /* set each character of string to  */
    memset(str, ch, len);
    return str;
}

Archived Comments

Note: I removed the Disqus integration in an effort to cut down on bloat. The following comments were retrieved with the export functionality of Disqus. If you have comments, please reach out to me by Twitter or email.

eric Jun 27, 2014 00:34:55 UTC

This is such a beautifully formatted and organized writeup. I scribbled my answers on a napkin while reading this book in my car at lunch. Your rigorous approach has made me realize that I should at least invest in a notebook :)

But seriously, great work. This is the standard that all reversing write-ups should be held to. I will certainly be referring back to this for inspiration in any of my work.

steve Dec 17, 2014 14:30:38 UTC

I was sat in a coffee shop wishing for an Intel manual to decipher this exercise. Kudos to you for offering a smartphone enabled- solution!

helper Feb 06, 2015 20:10:07 UTC

Line 7: neg ecx

Doesn't "simply negates the value of ECX". It's the two's complement negation. I think this is important to mention

https://en.wikipedia.org/wi... see NEG
https://en.wikipedia.org/wi...

Bob Feb 07, 2015 22:13:28 UTC

Very nice writeup! But now that we understand >what< the code does, it begs another question: >why< was it done this way. All that arithmetic with ecx seems a bit confusing and unnecessary, since you could just as easily get the length of the string by sub edi, edx.

nop Jul 07, 2016 13:21:00 UTC

Why is there a question mark at the end of your string in the first graphic?

nop Jul 07, 2016 13:32:46 UTC

Sorry, I mean in the second one!

Johannes Bader Jul 08, 2016 13:17:09 UTC

It stands for unknown data. The string ends with the zero byte, after that might follow another string, any other data type, uninitialized data whatever.

The point is that repne scasbe will have edi point to the byte *after* the one searched for.

Matthew Nunes Sep 27, 2016 18:25:59 UTC

How to do you know the arguments to the function are of type Char? Couldn't they be an array of ints?

Johannes Bader Sep 30, 2016 08:32:51 UTC

Because "scasb" operates on bytes (~char). If it were an array of ints (DWORD), you would expect "scasd".a

Sieu Truc Oct 21, 2016 21:13:08 UTC

mov al, [ebx+8] => so this is one byte move => type byte(char)

Book Reader May 05, 2017 11:13:05 UTC

Great info ! Thank you very much.