Wednesday, July 17, 2013

Postmortem Debugging with WinDBG or NTSD

I have mentioned several times before that using application verifier will help root cause and solve many bugs like for example memory bugs (leaks, heap corruptions, etc.), locking bugs, etc.  If in user mode (UM) you can just attach the debugger to your UM process and run your tests until in breaks in, or if in KM you would use a KD but the same thing applies.  

Let’s say that you have a heap corruption that is hard to track down in your UM process.  Normally you would do this:
1.       Isolate your code into a single process with a unique image name.
2.       Enable page heap on that image name using gflags.
3.       Start that process and attach a debugger (windbg)
4.       Reproduce the issue that causes the corruption and watch the debugger break in
5.       Then use the handy !heap –triage command to give you some more clues as to what is going on

Alternatively depending on what the memory problem is, in step 2, application verifier could also help.  It can be enable on your image with appverif.exe or gflags.exe.  If in KM, use driver verifier and KD. 

However with some bugs relating to race conditions, or memory corruptions, enabling these tools on your process or attaching the debugger its self could be enough to make it so that the issue doesn’t repro anymore.  That can be frustrating.

If directly debugging your process causes you to lose the repro, there are a few other techniques you can employ:
-          Tracing: adding WPP tracing to your code can be a great way to trouble shoot some issues with your code that are hard to capture with a debugger.  This is an especially good method for race conditions and deadlocks that don’t hit with a debugger.
-          KD: you can set up your machine for kernel debugging.  When you UM process has an unhandled exception, then the KD will break in.  KD is always a solid choice, but kernel debugging may be harder for some than UM debugging.  Also, some machines are hard to setup for KD like Ultrabooks where the debugable USB port may be internal to the chassis and soldered to the web cam, or it is an ARM tablet.
-          WER: Windows Error Reporting can be configured to capture dumps of your process when it crashes.  You can debug the dump after the fact.  Dump files are not as nice as debugging a live machine, but this could be a great option where your code is deployed to a lot of machines, and you want to see what happened on a machine where the bug repros.
-          Postmortem Debugging: you can configure Windows to launch and attach a debugger in response to a un handled exception.  This is called postmortem debugging.

All of these have their uses, but I will focus on postmortem debugging because it was the only way I could chase down a heap corruption that I have been trying to root cause for the past few days.

First off, you can use this link to read about it in detail if you like:

The basics are:
-          Simply type in an elevated command prompt “windbg –I” to make windbg the postmortem debugger.  This works great for processes in your login session. 
-          Use NTSD if you need to used named pipe debugging.  Type “ntsd –iae”.  You can use the –iaec “extra options” to add extra options, but it doesn’t work for –server like you need for name pipe debugging.  You need to manually edit the reg key  at \\HKEY_LOCAL_MACHINE\Software\Microsoft\Windows NT\CurrentVersion\AeDebug. 


Friday, June 28, 2013

GFlags.exe and Pageheap

I was tracking down a heap corruption today, and it occurred to me that I haven't mentioned how to enable pageheap.  For instance, when you use the handy "!heap -traige", if there was a heap corruption, sometimes it will tell to try the repro again with pageheap enabled.  How do you do that?

There are actually several ways to enable pageheap, but I will only talk about one, modifying the global flags using gflags.exe.  It should get installed when you install windbg.

Basically for a user mode (UM) process, you go to the image file tab, and type in the name of your exe.  If you run in a svchost.exe, then you should probably break out your service to a uniquely named servicehost.exe.  I use myhost.exe normally for debugging a service.  After that, check "Enable page heap."

Next, you need to run the code in question under  the debugger, and then you reproduce the heap corruption.  Normally the debugger will break in This time you will be able to get a lot more useful information out of !heap and !analyze.

Tuesday, June 4, 2013

See the OS Version in WinDBG

I seem to forget this command every so often, so it is ripe for a post.  Often I will be debugging a remote KD from some other team, and I want to know what version of the OS I am debugging.  The command for this is vertarget.  Super obvious.  Obviously the next most obvious command would be version, which is for the version of the debugger, and ver which is nothing.

Here is some sample output of vertarget:

0:005> vertarget
Windows 8 Version 9200 MP (4 procs) Free x64 <= looks like the output of GetVersionEx which lies after 8, and always says it Windows 8.
Product: WinNt, suite: SingleUserTS
kernel32.dll version: 6.3.9418.0 (winmain.130530-1753) <= This looks the actual version you would get fro m RtlGetVersion
Machine Name: "bob"
Debug session time: Fri May 31 19:12:24.000 2013 (UTC - 7:00)
System Uptime: 0 days 1:03:12.334
Process Uptime: 0 days 0:16:35.000
  Kernel time: 0 days 0:00:00.000
  User time: 0 days 0:00:00.000

This is what it looks like from a KD:
0: kd> vertarget
Windows 8 Kernel Version 9658 MP (4 procs) Free x64
Product: WinNt, suite: TerminalServer SingleUserTS
Built by: 9658.0.amd64fre.winmain.131120-1618
Machine Name: "bobby-brown"
Kernel base = 0xfffff801`d3e15000 PsLoadedModuleList = 0xfffff801`d40dfb90
Debug session time: Fri Nov 22 14:28:29.746 2013 (UTC - 8:00)
System Uptime: 0 days 0:45:58.445

Thursday, March 28, 2013

WinDBG: dt vs ??

dt

The dt command stands for display type.

usage:
dt var

dt address

??

The ?? command is for evaluating a C++ expression.  You can use it like dt, but you can use it to do even more.  I personally like to just use it most of the time because C are more accessible to me often than trying to remember which switches I need to use with dt.

usage:
?? sizeof(var)

?? this

?? this->m_member_obj

?? array[25].member_ptr->val

?? g_some_global

Thursday, March 21, 2013

Checking an Event's Signaled State in KM w/ WinDBG

I've talked about debugging events in the past in UM in simple terms.  !handle works great, and although is use is slightly different in UM vs KM it provides a lot of good information, but lets say you are in KD and want to know if your event is signaled or not.  !handle will not tell you that.  This is how you do it.  Note: you need to have a KD to look at the kernel's objects.

1. Find the index of handle


1: kd> dt g_hMyEvent
my_bin!g_hMyEvent
0x00001218
Void



2. Use !handle to get the address of the event object.

1: kd> !handle 0x00001218 f -1 Event

Searching for handles of type Event

PROCESS 845f4b40  SessionId: 0  Cid: 040c    Peb: 7fbb4000  ParentCid: 023c
    DirBase: 7ffea200  ObjectTable: a194d280  HandleCount: 1071.
    Image: svchost.exe

Handle table at a194d280 with 1071 entries in use

1218: Object: aa934288  GrantedAccess: 001f0003 Entry: bb591430
Object: aa934288  Type: (845692b8) Event
    ObjectHeader: aa934270 (new version)
        HandleCount: 1  PointerCount: 62

3. Cast that pointer to the dispatch header it is
1: kd> ?? (PDISPATCHER_HEADER)0xaa934288
PDISPATCHER_HEADER 0xaa934288
   +0x000 Type             : 0 ''
   +0x001 TimerControlFlags : 0 ''
   +0x001 Absolute         : 0y0
   +0x001 Wake             : 0y0
   +0x001 EncodedTolerableDelay : 0y000000 (0)
   +0x001 Abandoned        : 0 ''
   +0x001 Signalling       : 0 ''
   +0x002 ThreadControlFlags : 0x4 ''
   +0x002 CycleProfiling   : 0y0
   +0x002 CounterProfiling : 0y0
   +0x002 GroupScheduling  : 0y1
   +0x002 AffinitySet      : 0y0
   +0x002 Reserved         : 0y0000
   +0x002 Hand             : 0x4 ''
   +0x002 Size             : 0x4 ''
   +0x003 TimerMiscFlags   : 0 ''
   +0x003 Index            : 0y0
   +0x003 Processor        : 0y00000 (0)
   +0x003 Inserted         : 0y0
   +0x003 Expired          : 0y0
   +0x003 DebugActive      : 0 ''
   +0x003 DpcActive        : 0 ''
   +0x000 Lock             : 0n262144
   +0x000 LockNV           : 0n262144
   +0x004 SignalState      : 0n1
   +0x008 WaitListHead     : _LIST_ENTRY [ 0xaa934290 - 0xaa934290 ]

There you go, you can see this event is in a signaled state.

By the way, the member Type 0 means event notification.  Other common values:

    EventNotificationObject = 0,
    EventSynchronizationObject = 1,
    MutantObject = 2,
    ProcessObject = 3,
    QueueObject = 4,
    SemaphoreObject = 5,
    ThreadObject = 6,




Tuesday, March 5, 2013

Setting FILETIME Timeout for SetThreadpoolWait

Using Window's new (since Vista) Threadpool is great.  You can read up on it here.  If you want some sample code on usage patterns, message me, and I can do a post on it later.

The threadpool has a generic wait on an handle (eg. event, process, etc.) mechanism.  It even has a timeout so say you want to wait either for an event or a timeout, but the timeout is in FILETIME format.  How do you convert a specified amount say in ms of time into a FILETIME?


HRESULT DoWaitWithTimeout(
    __in DWORD Timeout) // in ms
{
    HRESULT hr = S_OK;
    PMY_CONTEXT pContext = NULL;
    FILETIME timeout = {0};

    pContext = (PMY_CONTEXT)my_alloc(sizeof(MY_CONTEXT));

    if (!pContext) {
        hr = E_OUTOFMEMORY;
        goto Exit;
    }

    ZeroMemory(pContext, sizeof(MY_CONTEXT));
    pContext->hEvent = CreateEvent(NULL, TRUE, FALSE, NULL);

    if (!pContext->hEvent) {
        hr = HRESULT_FROM_WIN32(GetLastError());
        goto Exit;
    }

    pContext->hWait = CreateThreadpoolWait(
            MyWaitCallback,
            pContext,
            NULL);

    if (!pContext->hWait) {
        hr = HRESULT_FROM_WIN32(GetLastError());
        goto Exit;
    }

    if (Timeout) {
        ULARGE_INTEGER ulTimeout = {0};

        // Timeout in ms, FILETIME is in 100 ns increments
        ulTimeout.QuadPart = Timeout * -10000; 
        timeout.dwHighDateTime = ulTimeout.HighPart;
        timeout.dwLowDateTime = ulTimeout.LowPart;
    }
  
    SetThreadpoolWait(
            pContext->hWait,
            pContext->hEvent,
            &timeout);
// ...



Thursday, February 21, 2013

How to Debug a Deadlock in KD & WinDBG

In my last post, I was looking at some threads in KD and noticed there was a deadlock.  I will quickly go through the analysis of that particular deadlock.  You can review that post here.

As a quick review, I dumped the stacks of the running threads in the process.  The one that we knew was hung, was waiting to be notified.  Then the problem thread that was supposed to notify was hung waiting on an object.  Inspecting the stack led us to frame 9.  Looking at the source:


DWORD WINAPI CNotificationQueue::ThreadProc( void * pv )
{
...
        while ( 0 == s_lThreadShuttingDown )
        {
            EnterCriticalSection( &s_cs );
            poNotification = RemoveHead();
            if ( poNotification )

We can see this code is taking a lock on s_cs, a global.  

0: kd> ?? CNotificationQueue::s_cs

struct _RTL_CRITICAL_SECTION
   +0x000 DebugInfo        : 0x026acca0 _RTL_CRITICAL_SECTION_DEBUG
   +0x004 LockCount        : 0n-1
   +0x008 RecursionCount   : 0n0
   +0x00c OwningThread     : (null)
   +0x010 LockSemaphore    : 0x00000558 Void
   +0x014 SpinCount        : 0x20007d0
0: kd> !cs 0x026acca0
-----------------------------------------
Critical section   = 0x026acca0 (+0x26ACCA0)
Uninitialized or deleted.
LOCKED
LockCount          = 0xE4C5EB20
WaiterWoken        = Yes
OwningThread       = 0x00857380
RecursionCount     = 0x2658C70
LockSemaphore      = 0x0
SpinCount          = 0x00000031
0: kd>

So, this code is taking a lock on an invalid critical section, and hangs.  At this point if this bug reproduces, it would be good to run this again with Application Verifier enabled on the faulty code.

How to Set the Context & View a Thread in KD



In KD you have already:
  1. Found the process you were looking for
  2. Set the context to that process
  3. Looked at the process and its threads

Often the next thing you want to do while debugging your process in user mode (UM) is look at other threads running in your process.  That is kind of what post #3 in this series was about.

In UM you might do something like:
> ~* kn 
To see what the other the other threads are doing.

In KM the equivalent it is more like:
0: kd> !process -1 7

Normally there are a lot of system threads that you don't care about, threadpool threds, SCM threads, threads injected by the kernel, etc.  Generally you can ignore 75% of the thread stacks.

In this case, I am debugging a hang in a program.  There are two threads that are actually interesting:

        THREAD 86e35400  Cid 02b8.04d4  Teb: 7f8fc000 Win32Thread: 927e99d0 WAIT: (UserRequest) UserMode Non-Alertable
            86cfd7c8  SynchronizationEvent
            86d46838  SynchronizationEvent
        Not impersonating
        DeviceMap                 a9dee8e8
        Owning Process            88d07a80       Image:         program.exe
        Attached Process          N/A            Image:         N/A
        Wait Start TickCount      122295         Ticks: 1859 (0:00:00:29.046)
        Context Switch Count      575            IdealProcessor: 0
        UserTime                  00:00:00.031
        KernelTime                00:00:00.015
Loading symbols for 75f20000       msvcrt.dll ->   msvcrt.dll
        Win32 Start Address msvcrt!_threadstartex (0x75f554bd)
        Stack Init a8d77ed8 Current a8d777f0 Base a8d78000 Limit a8d75000 Call 0
        Priority 10 BasePriority 8 UnusualBoost 1 ForegroundBoost 0 IoPriority 2 PagePriority 5
Loading symbols for 76590000       USER32.dll ->   USER32.dll
        Child-SP RetAddr  : Args to Child                       : Call Site
        a8d77800 82442d6c : 00000000 00000002 86d26580 825d7580 : nt!KiSwapContext+0x1a
        a8d77838 8244288c : 00000000 86e35400 825d8580 00000000 : nt!KiSwapThread+0x374
        a8d77870 8252e48c : 86e354d8 00000002 86e354d8 86e35400 : nt!KiCommitThreadWait+0x208
        a8d778b0 82699ab8 : ba7d97c8 00000004 00000000 00000000 : nt!KeWaitForMultipleObjects+0x3d0
        a8d77920 826997ae : 00000001 00000000 a8d77bc0 00000000 : nt!ObWaitForMultipleObjects+0x2f0
        a8d77bb0 824eae80 : 00000000 a8d77bc0 ffffff01 ea453738 : nt!NtWaitForMultipleObjects+0xba
        a8d77d00 824eac06 : a8d77bc0 00000e01 00000000 0000011a : nt!KiSystemService+0x100
        a8d77d40 77d8d29a : 825d84f0 00000000 761d6398 77ce5339 : nt!KiSWIException+0xe6 (TrapFrame @ a8d77d40)
        01dafa10 758ca340 : 01dafa28 00863888 01dafbb0 00000000 : ntdll!ZwWaitForMultipleObjects+0x6
        01dafa10 765ae4a2 : 01dafa28 00863888 01dafbb0 00000000 : KERNELBASE!WaitForMultipleObjectsEx+0x108
        01dafb98 765ae3a6 : 00000000 00000080 0001d4c0 00000000 : USER32!RealMsgWaitForMultipleObjectsEx+0xc2
        01dafbf8 00dad464 : 00000000 00000000 0001d4c0 765ae369 : USER32!MsgWaitForMultipleObjects+0x3e
        01dafc18 00dafbf4 : 00001cbf 00000000 00020158 00000400 : my_dll!obj4::WaitForNotification+0x74
        01dafc60 00dc5a68 : 00695f38 00000000 01dafc80 00000000 : my_dll!obj3::Listen+0x37c
        01dafdb8 00dbfbf4 : 0086f700 77e069d0 826803f9 01dafdd8 : my_dll!obj2::Execute+0x184
        01dafdd0 75f554a2 : 01dafde0 75f554a3 77d59b01 01dafe00 : my_dll!obj1::method1+0x1c
        01dafdd8 75f5550a : 77d59b01 01dafe00 01dafdf0 75f5550b : msvcrt!_callthreadstartex+0x16
        01dafde8 765116c0 : 765116b5 75f554bd 01dafdf8 765116c1 : msvcrt!_threadstartex+0x4e
        01dafdf8 77d842f0 : 01dafe10 77d842f1 00000000 77d842cd : KERNEL32!BaseThreadInitThunk+0xc
        01dafe00 00000000 : 00000000 77d842cd 00000000 00000000 : ntdll!RtlUserThreadStart+0x24

Then this the other interesting thread:

        THREAD 86ddd300  Cid 02b8.0cf8  Teb: 7f8f4000 Win32Thread: 925bbcf0 WAIT: (UserRequest) UserMode Non-Alertable
            86c45678  SynchronizationEvent
        Not impersonating
        DeviceMap                 a9dee8e8
        Owning Process            88d07a80       Image:         program.exe
        Attached Process          N/A            Image:         N/A
        Wait Start TickCount      122295         Ticks: 1859 (0:00:00:29.046)
        Context Switch Count      164            IdealProcessor: 1
        UserTime                  00:00:00.078
        KernelTime                00:00:00.000
Loading symbols for 6ce70000      FunDisc.dll ->   FunDisc.dll
        Win32 Start Address FunDisc!CNotificationQueue::ThreadProc (0x6ce73725)
        Stack Init 933d7ed8 Current 933d7b70 Base 933d8000 Limit 933d5000 Call 0
        Priority 11 BasePriority 8 UnusualBoost 0 ForegroundBoost 2 IoPriority 2 PagePriority 5
        Child-SP RetAddr  : Args to Child                       : Call Site
        933d7b80 82442d6c : 00000000 00000002 86e35400 825d7580 : nt!KiSwapContext+0x1a
        933d7bb8 824428c4 : 00000000 00000001 00000000 00000002 : nt!KiSwapThread+0x374
        933d7bf0 8247aa1c : 86ddd3d8 00000000 86ddd3d8 00000000 : nt!KiCommitThreadWait+0x240
        933d7c30 826f79a8 : 00100000 86c45660 00000001 86c45660 : nt!KeWaitForSingleObject+0x3b0
        933d7c98 824eae80 : 00000000 933d7cac 00000000 82937f69 : nt!NtWaitForSingleObject+0xec
        933d7d00 824eac06 : 00000001 00000d02 00000000 000000e7 : nt!KiSystemService+0x100
        933d7d40 77d8cfe2 : 8c2514f0 00000000 0068a5e0 75f6dd09 : nt!KiSWIException+0xe6 (TrapFrame @ 933d7d40)
        02bbfa08 758bc8fa : 00000000 006925c0 6ce873c4 6ce85048 : ntdll!ZwWaitForSingleObject+0x6
        02bbfa08 6ce737a6 : 00000000 006925c0 6ce873c4 6ce85048 : KERNELBASE!WaitForSingleObjectEx+0xbe
        02bbfa68 765116c0 : 0071f180 00000000 00000000 00000000 : system_dll!CNotificationQueue::ThreadProc+0x82
        02bbfab0 77d842f0 : 02bbfac8 77d842f1 00000000 77d842cd : KERNEL32!BaseThreadInitThunk+0xc
        02bbfab8 00000000 : 00000000 77d842cd 00000000 00000000 : ntdll!RtlUserThreadStart+0x24

Clearly thread 86e35400 is deadlocked waiting on thread 86ddd300 which also appears to be deadlocked waiting for an event (probably).  Since 86ddd300 deadlock is probably the issue here because is 86e35400 waiting for it, that is a good place to start.  In this case it is a system DLL.  You may source debug it directly if you have source code access to Windows system components as I have, or at least you can get symbols if you don't.

This post isn't about debugging deadlocks, so I am not going to show you how to solve this issue in this post, but as I have said in the past, Application Verifier is great at root causing even tricky deadlocks. 

This post is how to change your thread context in KD.  With WinDBG in UM, you do this like:
> ~3 s 
Where 3 is the thread number, and s is set.

In KD however, the kernel is omnipresent and all threads are accessible.  This how you change your thread context in KD:
0: kd> .thread /p /r 86ddd300
Implicit thread is now 86ddd300
Implicit process is now 88d07a80
.cache forcedecodeuser done
Loading User Symbols
.ModLoad: 00d60000 00df0000   D:\program.exe
.ModLoad: 77d30000 77e76000   C:\Windows\SYSTEM32\ntdll.dll
.ModLoad: 764a0000 76589000   C:\Windows\system32\KERNEL32.DLL
.ModLoad: 758a0000 75941000   C:\Windows\system32\KERNELBASE.dll
.ModLoad: 75f20000 75f99000   C:\Windows\system32\msvcrt.dll
...



/p <= will page in the threads pages if needed
/r <= is will do .relaod for you

Now you can use your UM favorite commands like you did in WinDBG like: kn, .frame, dv, r, etc.

Monday, February 11, 2013

How to View a Process' Threads in KD

In KD you have already:
  1. Found the process you were looking for
  2. And, set the context to that process
How do you look at the threads of that given process?  In windbg, you can type something like ~* kn to show the stacks of the process you are debugging.  That doesn't work in a KD.  How can you do something similar in KD?  You can use .process to show you detailed information about the threads in a process

!process ffffe00002d0a940 7 <= will show the stacks and detailed information about the process ffffe00002d0a940's threads

If you are only interested in the current process, then -1 should be specified.

!process -1 7 <= will show the threads from the current process' context

Setting the Process Context in KD

In my last post I showed you how to find your UM process in KD.  It produced output something like this:

kd> !process /m my_module.dll 0 0
Searching processes with loaded module 'my_module.dll'
**** NT ACTIVE PROCESS DUMP ****
Unable to get LeftChild of nt!_MMVAD_SHORT at ffffe00002d075c0
failed to count VADs
PROCESS ffffe00002d0a940
    SessionId: 0  Cid: 0394    Peb: 7ff72d81d000  ParentCid: 025c
    DirBase: 0ca19000  ObjectTable: ffffc00001423e40  HandleCount: 706.
    Image: svchost.exe

To change your context to this process, you use the .process command.

.process /r /p ffffe00002d0a940

/r <= is for reloading the symbols for the new process context
/p <= is for translating the page table entries so you can inspect the process' memory translated to KM

How To Find a User Mode Process With KD

Windbg, NTSD, and KD are based off the same debug engine so most things are the same; however, when debugging in kernel mode, some things are a little different.  Specifically you are debugging the entire computer and not an instance of the program.

How do you find your process within a kernel debugger (KD)?

Normally when you have a UM break in your KD, the debugger is put into the break's process and thread context.  What if you want to debug another process, say the RPC server?

!process is your command.  It lets you walk the kernel's list of process objects.

!process 0 0 <= will list the basic information about all processes
!process 0 7 <= will display detailed information about all process, this can be a lot
!process fb667780 7 <= will display detailed information about process fb667780, but you need to know the process ID or process address

If you don't know the process ID or address, but do know the name of the module that is in the process you are looking for, you can try this:

!process /m module.dll 0 0 <= this will give you the needed information like process ID to set the context to debug that process.

It will look something like this:

kd> !process /m my_module.dll 0 0
Searching processes with loaded module 'my_module.dll'
**** NT ACTIVE PROCESS DUMP ****
Unable to get LeftChild of nt!_MMVAD_SHORT at ffffe00002d075c0
failed to count VADs
PROCESS ffffe00002d0a940
    SessionId: 0  Cid: 0394    Peb: 7ff72d81d000  ParentCid: 025c
    DirBase: 0ca19000  ObjectTable: ffffc00001423e40  HandleCount: 706.
    Image: svchost.exe
...