cover image for post 'Practical Reverse Engineering Solutions – Page 35 (Part V)'

Practical Reverse Engineering Solutions – Page 35 (Part V)

my go at KeInitializeDpc, KeInitializeApc and ObFastDereferenceObject

This blog post presents my solution to KeInitializeDpc, KeInitializeApc and ObFastDereferenceObject of exercise 5 on page 35 from the book Practical Reverse Engineering by Bruce Dang, Alexandre Gazet and Elias Bachaalany (ISBN: 1118787315). The book is my first contact with reverse engineering, so take my statements with a grain of salt. All code snippets are on GitHub. For an overview of my solutions consult this progress page.

Problem Statement

Decompile the following kernel routines in Windows:

  • KeInitializeDpc
  • KeInitializeApc
  • ObFastDereferenceObject (and explain its calling convention)
  • KeInitializeQueue
  • KxWaitForLockChainValid
  • KeReadyThread
  • KiInitializeTSS
  • RtlValidateUnicodeString


I’m using a virtual Vista Ultimate 32bit with the kernel debugger in WinDbg as part of the Windows 7 SDK. I use the great LiveKd from Microsoft Sysinternals to access the kernel debugger on the live system. The syntax of the kernel routines are from Windows DevCenter.


► KeInitializeDpc


The routine has the following syntax, see Windows DevCenter:

VOID KeInitializeDpc(
  _Out_     PRKDPC Dpc,
  _In_      PKDEFERRED_ROUTINE DeferredRoutine,
  _In_opt_  PVOID DeferredContext


The KDPC structure looks as follows on Vista 32bit:

> dt _KDPC
   +0x000 Type             : UChar
   +0x001 Importance       : UChar
   +0x002 Number           : Uint2B
   +0x004 DpcListEntry     : _LIST_ENTRY
   +0x00c DeferredRoutine  : Ptr32     void
   +0x010 DeferredContext  : Ptr32 Void
   +0x014 SystemArgument1  : Ptr32 Void
   +0x018 SystemArgument2  : Ptr32 Void
   +0x01c DpcData          : Ptr32 Void

It’s a straightforward effort to convert this to C++ syntax. I’m using the Windows Data Types, so the unsigned 2 bytes Number member becomes a WORD (the signed 16-bit value is INT16, and the unsigned version is WORD, there’s no UWORD):

typedef struct _KDPC
     UCHAR Type;
     UCHAR Importance;
     WORD Number;
     LIST_ENTRY DpcListEntry;
     PVOID DeferredRoutine;
     PVOID DeferredContext;
     PVOID SystemArgument1;
     PVOID SystemArgument2;
     PVOID DpcData;


The kernel debugger command uf unassembles kernel routines:

> uf KeInitializeDpc
81a41776 8bff            mov     edi,edi
81a41778 55              push    ebp
81a41779 8bec            mov     ebp,esp
81a4177b 8b4508          mov     eax,dword ptr [ebp+8]
81a4177e 8b4d0c          mov     ecx,dword ptr [ebp+0Ch]
81a41781 83601c00        and     dword ptr [eax+1Ch],0
81a41785 89480c          mov     dword ptr [eax+0Ch],ecx
81a41788 8b4d10          mov     ecx,dword ptr [ebp+10h]
81a4178b c60013          mov     byte ptr [eax],13h
81a4178e c6400101        mov     byte ptr [eax+1],1
81a41792 66c740020000    mov     word ptr [eax+2],0
81a41798 894810          mov     dword ptr [eax+10h],ecx
81a4179b 5d              pop     ebp
81a4179c c20c00          ret     0Ch

Lines 2 to 4 are the prologue; the last two lines are the epilogue. Line 5 accesses the first parameter Dpc, line 6 the second parameter DeferredRoutine and line 9 the third parameter DeferredContext. The remaining lines mostly just set members of Dpc. Here’s the decompiled version:

VOID KeInitializeDpc(
        _Out_     PRKDPC Dpc,
        _In_      PKDEFERRED_ROUTINE DeferredRoutine,
        _In_opt_  PVOID DeferredContext
    Dpc->DpcData = 0;
    Dpc->DeferredRoutine = DeferredRoutine;
    DeferredRoutine = DeferredContext;
    Dpc->Type = 0x13;
    Dpc->Importance = 1;
    Dpc->Number = 0;
    Dpc->DeferredContext = DeferredContext;

► KeInitializeApc


The routine has the following syntax, see page 132 of the book:

    PKAPC Apc,
    PKTHREAD Thread,
    KAPC_ENVIRONMENT Environment,
    PKKERNEL_ROUTINE KernelRoutine,
    PKRUNDOWN_ROUTINE RundownRoutine,
    PKNORMAL_ROUTINE NormalRoutine,
    KPROCESSOR_MODE ProcessorMode,
    PVOID NormalContext


The APC structure looks as follows on Vista 32bit:

   +0x000 Type             : UChar
   +0x001 SpareByte0       : UChar
   +0x002 Size             : UChar
   +0x003 SpareByte1       : UChar
   +0x004 SpareLong0       : Uint4B
   +0x008 Thread           : Ptr32 _KTHREAD
   +0x00c ApcListEntry     : _LIST_ENTRY
   +0x014 KernelRoutine    : Ptr32     void
   +0x018 RundownRoutine   : Ptr32     void
   +0x01c NormalRoutine    : Ptr32     void
   +0x020 NormalContext    : Ptr32 Void
   +0x024 SystemArgument1  : Ptr32 Void
   +0x028 SystemArgument2  : Ptr32 Void
   +0x02c ApcStateIndex    : Char
   +0x02d ApcMode          : Char
   +0x02e Inserted         : UChar

which looks like this in C++:

typedef struct _KAPC
    UCHAR Type;
    UCHAR SpareByte0;
    UCHAR Size;
    UCHAR SpareByte1;
    DWORD SpareLong0;
    KTHREAD* Thread;
    LIST_ENTRY ApcListEntry;
    PVOID KernelRoutine;
    PVOID RundownRoutine;
    PVOID NormalRoutine;
    PVOID NormalContext;
    PVOID SystemArgument1;
    PVOID SystemArgument2;
    CHAR ApcStateIndex;
    CHAR ApcMode;
    UCHAR Inserted;

The KTHREAD structure is huge – the only relevant member is:

kd> dt _KTHREAD
   +0x130 ApcStateIndex    : UChar

Finally, the parameter Environment is of type KAPC_ENVIRONMENT, which is an enum (see page 133 of the book):

typedef enum _KAPC_ENVIRONMENT {


Here’s the dissassembly of KeInitializeApc:

81ab3956 8bff            mov     edi,edi
81ab3958 55              push    ebp
81ab3959 8bec            mov     ebp,esp
81ab395b 8b4508          mov     eax,dword ptr [ebp+8]
81ab395e 8b5510          mov     edx,dword ptr [ebp+10h]
81ab3961 83fa02          cmp     edx,2
81ab3964 8b4d0c          mov     ecx,dword ptr [ebp+0Ch]
81ab3967 c60012          mov     byte ptr [eax],12h
81ab396a c6400230        mov     byte ptr [eax+2],30h
81ab396e 7506            jne     nt!KeInitializeApc+0x20 (81ab3976)
81ab3970 8a9130010000    mov     dl,byte ptr [ecx+130h]
81ab3976 894808          mov     dword ptr [eax+8],ecx
81ab3979 8b4d14          mov     ecx,dword ptr [ebp+14h]
81ab397c 894814          mov     dword ptr [eax+14h],ecx
81ab397f 8b4d18          mov     ecx,dword ptr [ebp+18h]
81ab3982 88502c          mov     byte ptr [eax+2Ch],dl
81ab3985 894818          mov     dword ptr [eax+18h],ecx
81ab3988 8b4d1c          mov     ecx,dword ptr [ebp+1Ch]
81ab398b 33d2            xor     edx,edx
81ab398d 3bca            cmp     ecx,edx
81ab398f 89481c          mov     dword ptr [eax+1Ch],ecx
81ab3992 740e            je      nt!KeInitializeApc+0x4c (81ab39a2)
81ab3994 8a4d20          mov     cl,byte ptr [ebp+20h]
81ab3997 88482d          mov     byte ptr [eax+2Dh],cl
81ab399a 8b4d24          mov     ecx,dword ptr [ebp+24h]
81ab399d 894820          mov     dword ptr [eax+20h],ecx
81ab39a0 eb06            jmp     nt!KeInitializeApc+0x52 (81ab39a8)
81ab39a2 88502d          mov     byte ptr [eax+2Dh],dl
81ab39a5 895020          mov     dword ptr [eax+20h],edx
81ab39a8 88502e          mov     byte ptr [eax+2Eh],dl
81ab39ab 5d              pop     ebp
81ab39ac c22000          ret     20h

I’m doing the decompilation without additional software and start with a first pass to pseudocode:

eax = Apc
    edx = Environment
    ecx = Thread
    Apc->Type = 12h
    Apc-Thread = 30h
    if edx != 2 then
        # jump to +0x20
        # continue with +0x1a
        # edx must be 2, so the higher bytes are 0
        edx = Thread->ApcStateIndex
    # line +0x20
    Apc->Environment = Thread
    ecx = KernelRoutine
    Apc->KernelRoutine = ecx
    ecx = RundownRoutine
    Apc->ApcStateIndex = edx
    Apc->RundownRoutine = ecx
    ecx = NormalRoutine
    Apc->NormalRoutine = NormalRoutine
    edx = 0
    if NormalRoutine == 0 then
        # line 0x4c
        Apc->ApcMode = edx # = 0
        Apc->NormalContext = 0 # = 0 
        # continue with +0x3e
        ecx = ProcessMode
        Apc->ApcMode = ProcessMode
        ecx = NormalContext
        Apc->NormalContext = NormalContext
        # jump to +0x52
    # line +0x52
    Apc->Inserted = edx # = 0

Given the pseudo-code it’s easy to come up with C++ code:

    PKAPC Apc,
    PKTHREAD Thread,
    KAPC_ENVIRONMENT Environment,
    PKKERNEL_ROUTINE KernelRoutine,
    PKRUNDOWN_ROUTINE RundownRoutine,
    PKNORMAL_ROUTINE NormalRoutine,
    KPROCESSOR_MODE ProcessorMode,
    PVOID NormalContext
    Apc->Type = 0x12;
    Apc-Thread = 0x30;
    if( Environment == CurrentApcEnvironment)
        Apc->ApcStateIndex == Thread->ApcStateIndex;
        Apc->ApcStateIndex = Environment;
    Apc->Environment = Thread;
    Apc->KernelRoutine = KernelRoutine;    
    Apc->RundownRoutine = RundownRoutine;
    Apc->NormalRoutine = NormalRoutine;
    if( NormalRoutine == NULL) {
        Apc->ApcMode = 0;
        Apc->NormalContext = NULL;
    else {
        Apc->ApcMode = ProcessMode;
        Apc->NormalContext = NormalContext;
    Apc->Inserted = 0;

► ObFastDereferenceObject


According to this source, the routine has the following syntax:

VOID FASTCALL ObFastDereferenceObject	(	
    IN PEX_FAST_REF 	FastRef,
    IN PVOID 	Object 


The EX_FAST_REF structure looks as follows:

kd> dt _EX_FAST_REF
   +0x000 Object           : Ptr32 Void
   +0x000 RefCnt           : Pos 0, 3 Bits
   +0x000 Value            : Uint4B

All members are located at offset 0, i.e., they are overlapping and likely
part of a union:

typedef struct _EX_FAST_REF
          PVOID Object;
          ULONG RefCnt: 3;
          ULONG Value;

The : 3 notation specifies a bit field. The first member is obviously a pointer to an Object. Because of memory alignment, those pointers need to be multiples of 8, i.e., the last 3 bits are always zero. Windows uses this space to encode another variable RefCnt. The last member Value represents the combination of the two values.


Here’s the dissassembly of ObFastDereferenceObject:

kd> uf ObFastDereferenceObject
81aadb39 8bff            mov     edi,edi
81aadb3b 55              push    ebp
81aadb3c 8bec            mov     ebp,esp
81aadb3e 8b0a            mov     ecx,dword ptr [edx]
81aadb40 56              push    esi
81aadb41 57              push    edi
81aadb42 8bc1            mov     eax,ecx
81aadb44 eb13            jmp     nt!ObFastDereferenceObject+0x20 (81aadb59)
81aadb46 8d4101          lea     eax,[ecx+1]
81aadb49 8bf0            mov     esi,eax
81aadb4b 8bfa            mov     edi,edx
81aadb4d 8bc1            mov     eax,ecx
81aadb4f f00fb137        lock cmpxchg dword ptr [edi],esi
81aadb53 3bc1            cmp     eax,ecx
81aadb55 7412            je      nt!ObFastDereferenceObject+0x30 (81aadb69)
81aadb57 8bc8            mov     ecx,eax
81aadb59 334508          xor     eax,dword ptr [ebp+8]
81aadb5c 83f807          cmp     eax,7
81aadb5f 72e5            jb      nt!ObFastDereferenceObject+0xd (81aadb46)
81aadb61 8b4d08          mov     ecx,dword ptr [ebp+8]
81aadb64 e8c671f9ff      call    nt!ObfDereferenceObject (81a44d2f)
81aadb69 5f              pop     edi
81aadb6a 5e              pop     esi
81aadb6b 5d              pop     ebp
81aadb6c c20400          ret     4

We can deduce the following about the calling convention:

  • The snippet cleans up the stack with ret 4, it is a callee cleans-up convention.
  • The ret 4 cleans one stack element, one of the two function parameters is passed on the stack
  • The other function parameter must be passed via register. The FASTCALL convention uses ecx or edx or both. In our case, ecx is overwritten in Line 6 and can’t contain a function parameter. Edx on the other hand is referenced in the same line without the value being set before within the function scope. Hence, one function parameter is passed in edx.

Since in FASTCALL the first parameters are passed in registers, we can assume that:

  • FastRef is passed as edx
  • Object is passed on the stack at [ebp+8]

Let’s start with the first block:

mov     edi,edi
    push    ebp
    mov     ebp,esp
    mov     ecx,dword ptr [edx]
    push    esi
    push    edi
    mov     eax,ecx
    jmp     nt!ObFastDereferenceObject+0x20 (81aadb59)

In Line 3 we have the hot patch point. Lines 4,5 and 7,8 are part of the function prologue. Line 6 and 9 create a copy of the structure referenced by FastRef:

    efr_1 = FastRef->Value
    efr_2 = FastRef->Value
    GOTO nt!ObFastDereferenceObject+0x20

The previous block always jumps to offset +0x20:

    xor     eax,dword ptr [ebp+8]
    cmp     eax,7
    jb      nt!ObFastDereferenceObject+0xd (81aadb46)

The one-to-one translation of this snippet is:

eax = FastRef->Value ^ Object
    IF eax < 7 THEN
        GOTO nt!ObFastDereferenceObject+0xd
        GOTO nt!ObFastDereferenceObject+0x28

In order for eax < 7 two things need to be true:

  1. All except the three least significant bits of FastRef-Value and Object need to match. Since due to memory alignment the remaining 3 bits are always zero this also means FastRef->Object == Object
  2. The three least significant bits are not 111 (=7), so the line also checks FastRef->RefCnt != 7

We can therefore rewrite the check as:

IF FastRef-Object == Object AND FastRef->RefCnt != 7 THEN
        GOTO nt!ObFastDereferenceObject+0xd
        GOTO  nt!ObFastDereferenceObject+0x28

The next block is the hardest one:

   lea     eax,[ecx+1]
   mov     esi,eax
   mov     edi,edx
   mov     eax,ecx
   lock    cmpxchg dword ptr [edi],esi
   cmp     eax,ecx
   je      nt!ObFastDereferenceObject+0x30 (81aadb69)

The lock cmpxchg dwort ptr [edi], esi does the following:

IF [edi] == eax THEN
        [edi] == esi
        eax == [edi]

Given this, we can translate the snippet to:

    efr_2.value = efr_1.value
    efr_2.RefCnt += 1
    EX_FAST_REF esi = efr_2.value
    PEX_FAST_REF edi = FastRef
    efr_2.value = efr_1.value
    IF FastRef->value == efr_2.value THEN
        FastRef->RefCnt += 1 
        efr_2 == *FastRef 
    IF efr_2.value == efr_1.value THEN
        # increment successful
        GOTO nt!ObFastDereferenceObject+0x30
        # increment failed
        GOTO nt!ObFastDereferenceObject+0x1e

The lines do only one thing: They try to increment atomically FastRef->RefCnt. If this succeeds, the code will jump to line nt!ObFastDereferenceObject+0x30, else it continues to 0x1e:

_Atomic(EX_FAST_REF) *FastRef;
    old = *atomic_load(FastRef);
    new = old;
    new.RefCnt += 1;
    IF !atomic_compare_exchange_strong(FastRef, *old, new) THEN
        GOTO nt!ObFastDereferenceObject+0x1e
        GOTO with nt!ObFastDereferenceObject+0x30

At 0x30 we have the function epilogue and return statement:

81aadb69 5f              pop     edi
81aadb6a 5e              pop     esi
81aadb6b 5d              pop     ebp
81aadb6c c20400          ret     4

At 0x1e we find:

    mov     ecx,eax

This just restores efr_1 = FastRef->value and then continues with line 0x20 which we already tackled.

The only remaining snippet is

    mov     ecx,dword ptr [ebp+8]
    call    nt!ObfDereferenceObject (81a44d2f)

It copies the second function parameter Object to ecx and calls ObfDereferenceObject. This function must use the FASTCALL convention and probably receives one parameter:


Here’s the complete Pseudocode:

efr_1 = FastRef->Value
    efr_2 = FastRef->Value
    IF FastRef-Object == Object AND FastRef->RefCnt != 7 THEN
        _Atomic(EX_FAST_REF) *FastRef;
        old = *atomic_load(FastRef);
        new = old;
        new.RefCnt += 1;
        IF atomic_compare_exchange_strong(FastRef, *old, new) THEN
            efr_1 = FastRef->value
            goto nt!ObFastDereferenceObject+0x20    
        return ObfDereferenceObject(Object)

Which is in C++:

VOID FASTCALL ObFastDereferenceObject	(	
    IN PEX_FAST_REF 	FastRef,
    IN PVOID 	Object 
    while( FastRef->Object == Object && FastRef->RefCnt < 7 ) {
        _Atomic(EX_FAST_REF) *FastRef;
        EX_FAST_REF oldVal = *atomic_load(FastRef);
        EX_FAST_REF newVal = old;
        newVal.RefCnt += 1;
        if( atomic_compare_exchange_strong(FastRef, *oldVal, newVal) )
            return; // Successful
    // fast dereference didn't work out, try the regular dereference routine

Archived Comments

Note: I removed the Disqus integration in an effort to cut down on bloat. The following comments were retrieved with the export functionality of Disqus. If you have comments, please reach out to me by Twitter or email.

Atlas Oct 11, 2014 21:34:38 UTC

Hello , sorry for bothering you , but what software did you use to debug windows kernel ?
I'm new in windows , I installed windbg and click on "kernel debug" but when I type "dt _KDPC" I have "Symbol _KDPC not found"
Thank you in advance !

Johannes Oct 13, 2014 12:33:48 UTC

I used livekd.exe to get read access to the kernel. It uses kd by default, but if you run livekd with the "-w" flag it uses WinDbg. If that fails, you can also specify the complete path to WinDbg, like so:

C:\livekd>livekd.exe -k "C:\Program Files (x86)\Windows Kits\8.1\Debuggers\x64\windbg.exe"

If the debugger can't find a symbol give the module name too, e.g, instead of "dt _KDPC" run "dt ntdll!_KDPC". If that still doesn't work you might need to manually set the symbols path before running livekd.exe:

C:\livekd>set _NT_SYMBOL_PATH=srv*c:\symbolscache*

Paul Mar 16, 2015 01:07:24 UTC


I was comparing my work with yours. I notice in your KeInitializeApc you are setting Apc->Thread = 0x30. With the structures you have and the source you have you would be correct. I think you are running into a mismatch of Vista and some other Windows version. Vista structures or source tend to be quite different than the rest of Windows at times. the 0x30 is being set to apc->Size. 0x30 is the size of the structure, think of it like apc->size = sizeof(KAPC).

soffensive Jun 20, 2017 08:26:53 UTC

You wrote: "Since due to memory alignment the remaining 3 bits are always zero this also means FastRef->Object == Object"
Why do the remaining 3 bits always have to be zero due to memory alignment? If I understand correctly, this would only be the case for 64 bit systems and not necessarily for 32 bit systems?

xorecx Aug 08, 2017 15:01:44 UTC

I think you missed something about the "lock cmpxchg dword ptr [edi],esi" command,
it initially checks if eax==&edi, (value of eax eq to value at pointer of edi), if you follow the code correctly, you should see that eax at this point is ecx which is "value at pointer of edx". and edi IS edx, so "value at pointer of edi" is the same as "value at pointer of edx". So generally this check should always be true, setting eax to be with the value of pointer at edx, which is the same value of esi, then this should always be true - unless one of the values is changed by a different thread running at the same time.

Take a look at my solution,