I have mentioned several times before that using application
verifier will help root cause and solve many bugs like for example memory bugs
(leaks, heap corruptions, etc.), locking bugs, etc. If in user mode (UM) you can just attach the
debugger to your UM process and run your tests until in breaks in, or if in KM
you would use a KD but the same thing applies.
Let’s say that you have a heap corruption that is hard to
track down in your UM process. Normally
you would do this:
1.
Isolate your code into a single process with a
unique image name.
2.
Enable page heap on that image name using
gflags.
3.
Start that process and attach a debugger
(windbg)
4.
Reproduce the issue that causes the corruption
and watch the debugger break in
5.
Then use the handy !heap –triage command to give
you some more clues as to what is going on
Alternatively depending on what the memory problem is, in
step 2, application verifier could also help.
It can be enable on your image with appverif.exe or gflags.exe. If in KM, use driver verifier and KD.
However with some bugs relating to race conditions, or
memory corruptions, enabling these tools on your process or attaching the debugger
its self could be enough to make it so that the issue doesn’t repro
anymore. That can be frustrating.
If directly debugging your process causes you to lose the
repro, there are a few other techniques you can employ:
-
Tracing: adding WPP tracing to your code can be
a great way to trouble shoot some issues with your code that are hard to capture
with a debugger. This is an especially
good method for race conditions and deadlocks that don’t hit with a debugger.
-
KD: you can set up your machine for kernel
debugging. When you UM process has an
unhandled exception, then the KD will break in.
KD is always a solid choice, but kernel debugging may be harder for some
than UM debugging. Also, some machines
are hard to setup for KD like Ultrabooks where the debugable USB port may be
internal to the chassis and soldered to the web cam, or it is an ARM tablet.
-
WER: Windows Error Reporting can be configured
to capture dumps of your process when it crashes. You can debug the dump after the fact. Dump files are not as nice as debugging a
live machine, but this could be a great option where your code is deployed to a
lot of machines, and you want to see what happened on a machine where the bug
repros.
-
Postmortem Debugging: you can configure Windows
to launch and attach a debugger in response to a un handled exception. This is called postmortem debugging.
All of these have their uses, but I will focus on postmortem
debugging because it was the only way I could chase down a heap corruption that
I have been trying to root cause for the past few days.
First off, you can use this link to read about it in detail if
you like:
The basics are:
-
Simply type in an elevated command prompt “windbg
–I” to make windbg the postmortem debugger.
This works great for processes in your login session.
-
Use NTSD if you need to used named pipe
debugging. Type “ntsd –iae”. You can use the –iaec “extra options” to add
extra options, but it doesn’t work for –server like you need for name pipe
debugging. You need to manually edit the
reg key at \\HKEY_LOCAL_MACHINE\Software\Microsoft\Windows
NT\CurrentVersion\AeDebug.