Practical Reverse Engineering Solutions – Page 17
my go at the exercises on page 17This blog post presents my solutions to exercises from the book Practical Reverse Engineering by Bruce Dang, Alexandre Gazet and Elias Bachaalany (ISBN: 1118787315). The book is my first contact with reverse engineering, so take my statements with a grain of salt. All code snippets are on GitHub. For an overview of my solutions consult this progress page.
Exercise 1
Given what you learned about
CALL
andRET
, explain how you would read the value ofEIP
? Why can’t you just doMOV EAX, EIP
?
MOV EAX, EIP
does not work, because EIP
not an ordinary register. There is no real need to read the EIP
, as is handled for you by the processor.
The CALL
instruction places the EIP
register onto the stack before jumping to the function address. So the stack entering the function looks like that:
We can therefore get the value of EIP
by jumping to a dummy function read_eip
(thereby placing EIP
at the top of the stack), and then copying the value from the stack memory to a register, i.e., EAX
:
SECTION .data SECTION .text GLOBAL _start _start: nop call read_eip mov ebx,0 mov eax,1 int 080h read_eip: mov eax, [esp] ret
Let’s test the code with gdb. The value of EIP
before calling read_eip
is 0x8048061
:
$ nasm -f elf32 -g -F dwarf code.asm $ ld -m elf_i386 -o code code.o phreak@phreak:exercise 1]$ gdb -q code Reading symbols from code...done. (gdb) set disassemble-next-line on (gdb) break *_start Breakpoint 1 at 0x8048060: file code.asm, line 5. (gdb) run Starting program: /home/jb/pre/chapter_1/page_17/exercise_1/code Breakpoint 1, _start () at code.asm:5 5 nop => 0x08048060 <_start+0>: 90 nop (gdb) s 6 call read_eip => 0x08048061 <_start+1>: e8 0c 00 00 00 call 0x8048072 <read_eip> (gdb) p/x $eip $1 = 0x8048061
If we inspect EAX
right after the function call we get the value 0x8048066
; which now is also the value of EIP
.
(gdb) s _start () at code.asm:7 7 mov ebx,0 => 0x08048066 <_start+6>: bb 00 00 00 00 mov $0x0,%ebx (gdb) p/x $eax $3 = 0x8048066 (gdb) p/x $eip $3 = 0x8048066
So in fact we get the EIP
after the CALL
, which is 5 bytes (the number of bytes for the instruction code CALL
) greater than before the CALL
.
Exercise 2
Come up with at least two code sequences to set
EIP
to 0xAABBCCDD
I know three instructions that manipulate the EIP
:
RET
JMP
CALL
Version 1 – Based on RET
The instruction RET
jumps to the address stored at the top of the stack, i.e., sets the EIP
to the double word stored at ESP
. So by pushing the desired address on the stack, followed by RET
, should set the EIP
:
SECTION .data SECTION .text GLOBAL _start _start: nop push 0AABBCCDDh ret
We can check with the GNU debugger:
(gdb) s 6 push 0AABBCCDDh (gdb) p/x $eip $1 = 0x8048061 (gdb) s _start () at version_1.asm:7 7 ret (gdb) s 0xaabbccdd in ?? () (gdb) p/x $eip $2 = 0xaabbccdd
Version 2 – Based on JMP
Instead of pushing the address on the stack and using RET
to jump to an address, doing a plain JMP
also works:
SECTION .data SECTION .text GLOBAL _start _start: nop jmp 0AABBCCDDh
Again let’s check with the GNU debugger:
(gdb) s 6 jmp 0AABBCCDDh (gdb) p/x $eip $1 = 0x8048061 (gdb) s 0xaabbccdd in ?? () (gdb) p/x $eip $2 = 0xaabbccdd
Version 3 – Based on CALL
CALL
works similar to JMP
(compared to version 2 it does an unnecessary push of the EIP
to the stack):
SECTION .data SECTION .text GLOBAL _start _start: nop call 0AABBCCDDh
In GNU debugger:
(gdb) s 6 call 0AABBCCDDh (gdb) p/x $eip $1 = 0x8048061 (gdb) s 0xaabbccdd in ?? () (gdb) p/x $eip $2 = 0xaabbccdd
Exercise 3
In the example function,
addme
, what would happen if the stack pointer were not properly restored before executingRET
?
You can see the addme
function below, with the referenced instruction highlighted:
SECTION .data SECTION .text GLOBAL _start _start: nop mov eax, 7 mov ecx, 5 _before: push eax push ecx call add_me add esp, 8 _after: mov ebx,0 mov eax,1 int 080h add_me: push ebp mov ebp, esp movsx eax, word [ebp+8] movsx eax, word [ebp+0Ch] add eax, ecx mov esp, ebp pop ebp retn
The restore is part of the function epilogue, which is standard for C-style functions. Resetting the ESP
ensures that any values placed on the stack whithin the function, but not cleaned up, don’t mess with the RET
statement. If, for instance, the function would have pushed a value on the stack but never retrieve it, then the RET
instruction would jump to this location instead of the EIP
. Restoring the ESP
prevents this. But if the function properly cleans the stack there is no need to backup and restore the ESP
. In the present add_me
function there are not instruction that modify the ESP
between the prologue and epilogue. So there is no need to restore the ESP
, removing the instruction will have no effect.
Here’s validation with the GNU debugger, first with the restore instruction:
$ gdb -q addme_with_restore Reading symbols from addme_with_restore...done. (gdb) break *_before Breakpoint 1 at 0x804806b: file addme_with_restore.asm, line 9. (gdb) break *_after Breakpoint 2 at 0x8048075: file addme_with_restore.asm, line 14. (gdb) run Starting program: /home/baderj/chapter 1/page 17/exercise 3/addme_with_restore Breakpoint 1, _before () at addme_with_restore.asm:9 9 push eax (gdb) p/x $esp $1 = 0xffffd000 (gdb) c Continuing. Breakpoint 2, _after () at addme_with_restore.asm:14 14 mov ebx,0 (gdb) p/x $esp $2 = 0xffffd000
and the same without the restore instruction:
Breakpoint 1, _before () at addme_without_restore.asm:9 9 push eax (gdb) p/x $esp $1 = 0xffffd000 (gdb) c Continuing. Breakpoint 2, _after () at addme_without_restore.asm:14 14 mov ebx,0 (gdb) p/x $esp $2 = 0xffffd000
Exercise 4
In all of the calling conventions explained, the return value is stored in a 32-bit register (
EAX
). What happens when the return value does not fit in a 32-bit register? Write a program to experiment and evaluate your answer. Does the mechanism change from compiler to compiler?
I use the following C code:
#include <stdio.h> struct data { int n1; int n2; }; struct data test_return(void) { struct data test_object; test_object.n1 = 7; test_object.n2 = 5; return test_object; } int main (int argc, char *argv[] ) { struct data ret; ret = test_return(); int res = (ret.n1 + ret.n2); return res; }
The struct
contains two integer values and should therefore be bigger than 32bit. I use gcc
to compile the code:
gcc -fno-asynchronous-unwind-tables -masm=intel -Os -S -m32 code.c
The full output is on GitHub, here’s the function excerpt:
test_return: push ebp mov ebp, esp mov eax, DWORD PTR [ebp+8] mov DWORD PTR [eax], 7 mov DWORD PTR [eax+4], 5 pop ebp ret 4
- Line 2 and 3 are part of the standard function prologue.
- Line 3 gets the value from stack
[EBP + 8]
. - Line 4 and 5 store the values 5,7 at the location referenced by
EAX
, i.e.,[EBP+8]
. - Line 6 and 7 are the function epilogue.
The return value is placed in memory at a location given by the stack [EBP+8]
. So in order to use the function, the caller needs to reserve space for the struct in memory, and push the address onto the stack before calling the function. Compiling the c code with -Os
flag produces assembly code where the function is never called (since the return value is always 12). To see the call I recompiled the code with -O0
. The function now contains unnecessary mov
statements, but in essence is the same (see GitHub for full output).
The main function now does call the function:
main: push ebp mov ebp, esp sub esp, 20 lea eax, [ebp-8] mov DWORD PTR [esp], eax call test_return
In Line 4 the call sub esp, 20
reserves 20 bytes on stack. The next two instructions get the address of [EBP-8]
, and put the value on the stack. The following images shows how the stack changes for the six lines above:
The value at the top of the stack contains the address of the stack memory at ESP+4
. The stack before the function epilogue, i.e., after mov DWORD PTR [eax+4], 5
looks like the right hand side of the above image. EAX
contains the value of the memory at [EBP+8]
, and therefore contains the address of the stack at EBP+12
. The function places the member n1
of the struct at EAX
(= EBP+12
) and the member n2
at EAX+4
(= EBP+16
).
So long story short, the function places its return value on the stack and returns the address of the stack location to the caller. The caller has to reserve the necessary space on the stack and has to pass the address to that reserved space to the function (doesn’t therefore need to check the return value, the caller knows the address already).
I got very similar results with Clang. Again the caller reserves space for the structure and moves the address to the free space last on the stack (lea edx, dword ptr [ebp - 32]
, and mov dword ptr [esp], edx
):
sub esp, 40 mov eax, dword ptr [ebp + 12] mov ecx, dword ptr [ebp + 8] lea edx, dword ptr [ebp - 32] mov dword ptr [ebp - 4], 0 mov dword ptr [ebp - 8], ecx mov dword ptr [ebp - 12], eax mov dword ptr [esp], edx call test_return
Clang moves more stuff on the stack, but that’s probably a matter of optimization. The function looks almost the same as for GCC:
push ebp mov ebp, esp sub esp, 8 mov eax, dword ptr [ebp + 8] mov dword ptr [ebp - 8], 7 mov dword ptr [ebp - 4], 5 movsd xmm0, qword ptr [ebp - 8] movsd qword ptr [eax], xmm0 add esp, 8 pop ebp ret 4
Instead of moving to stack space below EBP
(i.e., at higher addresses), Clang moves the data above the EBP
(at lower addresses). The function doesn’t use the pointer passed by the caller, but reserve the space within the function doing sub esp, 8
in line 3.
Archived Comments
Note: I removed the Disqus integration in an effort to cut down on bloat. The following comments were retrieved with the export functionality of Disqus. If you have comments, please reach out to me by Twitter or email.