Wednesday, July 17, 2013

Postmortem Debugging with WinDBG or NTSD

I have mentioned several times before that using application verifier will help root cause and solve many bugs like for example memory bugs (leaks, heap corruptions, etc.), locking bugs, etc.  If in user mode (UM) you can just attach the debugger to your UM process and run your tests until in breaks in, or if in KM you would use a KD but the same thing applies.  

Let’s say that you have a heap corruption that is hard to track down in your UM process.  Normally you would do this:
1.       Isolate your code into a single process with a unique image name.
2.       Enable page heap on that image name using gflags.
3.       Start that process and attach a debugger (windbg)
4.       Reproduce the issue that causes the corruption and watch the debugger break in
5.       Then use the handy !heap –triage command to give you some more clues as to what is going on

Alternatively depending on what the memory problem is, in step 2, application verifier could also help.  It can be enable on your image with appverif.exe or gflags.exe.  If in KM, use driver verifier and KD. 

However with some bugs relating to race conditions, or memory corruptions, enabling these tools on your process or attaching the debugger its self could be enough to make it so that the issue doesn’t repro anymore.  That can be frustrating.

If directly debugging your process causes you to lose the repro, there are a few other techniques you can employ:
-          Tracing: adding WPP tracing to your code can be a great way to trouble shoot some issues with your code that are hard to capture with a debugger.  This is an especially good method for race conditions and deadlocks that don’t hit with a debugger.
-          KD: you can set up your machine for kernel debugging.  When you UM process has an unhandled exception, then the KD will break in.  KD is always a solid choice, but kernel debugging may be harder for some than UM debugging.  Also, some machines are hard to setup for KD like Ultrabooks where the debugable USB port may be internal to the chassis and soldered to the web cam, or it is an ARM tablet.
-          WER: Windows Error Reporting can be configured to capture dumps of your process when it crashes.  You can debug the dump after the fact.  Dump files are not as nice as debugging a live machine, but this could be a great option where your code is deployed to a lot of machines, and you want to see what happened on a machine where the bug repros.
-          Postmortem Debugging: you can configure Windows to launch and attach a debugger in response to a un handled exception.  This is called postmortem debugging.

All of these have their uses, but I will focus on postmortem debugging because it was the only way I could chase down a heap corruption that I have been trying to root cause for the past few days.

First off, you can use this link to read about it in detail if you like:

The basics are:
-          Simply type in an elevated command prompt “windbg –I” to make windbg the postmortem debugger.  This works great for processes in your login session. 
-          Use NTSD if you need to used named pipe debugging.  Type “ntsd –iae”.  You can use the –iaec “extra options” to add extra options, but it doesn’t work for –server like you need for name pipe debugging.  You need to manually edit the reg key  at \\HKEY_LOCAL_MACHINE\Software\Microsoft\Windows NT\CurrentVersion\AeDebug.