Friday, August 24, 2012

C++11 and Modern C++ for Modern Nay Metro Nay Windows 8 Style Apps for the C Programmer

As a developer who almost exclusively does low-level C and C like C++ systems programming for Windows and avoids high-level abstractions like STL, looking at the modern Windows 8 code samples which are decidedly C++11, one can feel like one is reading code from some other language.  At first I was a little bemused at the C++11 syntax used heavily throughout the samples, but as a developer one must keep current and always be moving forward, or become obsolete … right!? RIGHT!?!?  I guess you only need so many Windows kernel programmers.

If you want, you can crazy and program modern apps using JavaScript or C# as you would C++ in the new Windows runtime, but I will start with getting my feet wet with C++.  Plus C# and JavaScript will eventually get thunked down to C++ for the runtime projections, and C++ code calls directly into the projections without thunking.  I guess if you are going with JavaScript or C#, you don’t really care about those kind of things anyway, but I digress.  

I found this MSDN article which is a nice primer.  Read this first: http://msdn.microsoft.com/en-us/library/hh279654.aspx
I think it does a good job going over the big ideas of what this C++11 paradigm in VS12 is about.  To me, it feels like they were trying to take the goals of various C++ programming models found in previous versions of STL, ATL, COM, and so on and refine it to the next "modern" version of C++.  I was resistant at first, but maybe C++11 is for me as I am not a big fan of those C++ programming models because of their various shortcomings and generally avoid them.

You can read all of the official documentation if you have time, but I just like to jump in with the code and get my hands dirty.  Often you learn more anyway when you have to accomplish something.  I am going to start with the device enumeration RT projection, because I normally do a lot of device-related programming and I have done a few posts in the past on how to enumerate devices with the various Windows APIs.

http://code.msdn.microsoft.com/windowsapps/Device-Enumeration-Sample-a6e45169

This sample shows you how to enumerate device containers and interfaces.  The same API is also used for devnodes and everything else.  So, jumping to the containers.xaml.cpp file which enumerates containers you will find the EnumerateDeviceContainers method which does the actual work of enumerating.

void Containers::EnumerateDeviceContainers(Object^ sender, RoutedEventArgs^ eventArgs)
{
    Windows::UI::Xaml::FocusState focusState = EnumerateContainersButton->FocusState;
    EnumerateContainersButton->IsEnabled = false;

    DeviceContainersOutputList->Items->Clear();

    auto properties = ref new Vector();
    properties->Append("System.ItemNameDisplay");
    properties->Append("System.Devices.ModelName");
    properties->Append("System.Devices.Connected");

    task(
        PnpObject::FindAllAsync(PnpObjectType::DeviceContainer, properties))
        .then([this](PnpObjectCollection^ containers)
    {
        rootPage->NotifyUser(containers->Size + " device container(s) found\n\n", NotifyType::StatusMessage);
;

        std::for_each(begin(containers), end(containers),
            [this](PnpObject^ container)
        {
            DisplayDeviceContainer(container);
        });
    });

    EnumerateContainersButton->IsEnabled = true;
}
  

C++/CX Pointers and Handles

For the C++11 uninitiated, there are a lot of weird things going on here.  First off, what does those carat/hat '^' instead of the '*'  mean by the types (ex. PnpObjectCollection^ containers)?
The very basic idea is that * types are the standard pointers that you are familiar with, and the ^ types are really handles to managed allocations.  The simplistic idea is that these var^ var allocation lifetimes are managed for you (think: ATL smart pointers).


Lambda Expressions

Lambdas are used all over the place in the samples.  What are they?  Simplistically they are functions that you can define inline or declare in say a method  ... more or less. ;)

You May go directly to the official documentation on lambdas on MSDN if you want to get it straight from the horse’s mouth.

The syntax: http://msdn.microsoft.com/en-us/library/dd293603.aspx
Use examples of lambdas: http://msdn.microsoft.com/en-us/library/dd293599.aspx

        std::for_each(begin(containers), end(containers),
            [this](PnpObject^ container)
        {
            DisplayDeviceContainer(container);
        });

From the sample, you see this for_each iterator from std.  The last parameter of the for each is the lambda.


[this] <= means you are "capturing" the "this" variable by value
(PnpObject^ container) <= parameter list for the function
{DisplayDeviceContainer(container);} <= this is the body of the function.

Asynchronus Programming

More details:

You may be confused by this syntax:
task(
        PnpObject::FindAllAsync(PnpObjectType::DeviceContainer, properties))
        .then([this](PnpObjectCollection^ containers)
    {
What is this task class actually doing?  Simplistically put, it means execute this task asynchronously.  



Tuesday, June 19, 2012

How to Safely Walk a MultiSz String List

You may have ran into MultiSz string lists in your Windows programming.  They are used in the registry (REG_MULTI_SZ), as device properties (DEVPROP_TYPE_STRING_LIST), et cetera.  The format of the multisz isn't standardized either.  The format really depends on the API that you are using, so walking these multisz can be tricky.  When there is tricky code in C, programmers often get it wrong and introduce bugs.  The code below will work correctly regardless of what kind of multisz is passed in.

What is a MultiSz?

The basic idea is that you take some strings, concatenate them together, and add add an extra NULL at the end.  The idea is kind of cool.  You only need one allocation for many strings.  They can also help reduce heap fragmentation by not having a lot of small allocations.  But like I mentioned before, programmers screw up using them all of the time.  Also, they are put at an unfair disadvantage with multiszs not being standardized.

simple many string ex.  Lets take two strings L"str1" and L"str2".  As strings by themselves, they would look like wchar_t buffers five elements long.

L’s’
L’t’
L’r’
L’1’
L’\0’
L’s’
L’t’
L’r’
L’2’
L’\0’


As a muiltisz, they would look like this:

L’s’
L’t’
L’r’
L’1’
L’\0’
L’s’
L’t’
L’r’
L’2’
L’\0’
L’\0’


You can now extrapolate what a multisz would look like with many strings in it.  What if there is only one string?  What is the valid format?


single string ex.  This is what the buffer would look like for L"str1".  Same as the last example.

L’s’
L’t’
L’r’
L’1’
L’\0’


As a multisz, it could look like this:

L’s’
L’t’
L’r’
L’1’
L’\0’
L’\0’


But, it could actually look like this too:

L’s’
L’t’
L’r’
L’1’
L’\0’


Which one is correct?  Both are; deal with them and don't AV.  Remember the is no standard for these kind of cases.

empty list ex. Lets say you have a empty list.  What are the possible formats?

NULL is possible.


A buffer like this is valid:

L’\0’


And a buffer like this is also valid:

L’\0’
L’\0’


When style should you code for?  All of them.

Code Example for Walking MultiSzs

This C function walks two MulitSzs.  I think the function is selfexplanatory, but you pass in a string list, called StringList, that what to check to see if it contains any strings from another list, ContainsStringList.  If there is at least one string match between the two lists, it returns true, else false.  There is also an ignore case flag.  

/******************************************/
BOOL StringListContains(
        __in PCZZWSTR StringList,
        __in PCZZWSTR ContainsStringList,
        __in BOOL IgnoreCase)
{
    BOOL Result = FALSE;
    PCZZWSTR Temp = NULL;

    for (; *ContainsStringList; ContainsStringList += wcslen(ContainsStringList) + 1) {
        for (Temp = StringList; *Temp; Temp += wcslen(Temp) + 1) {
            if ((IgnoreCase && !_wcsicmp(Temp, ContainsStringList))
                    || (!IgnoreCase && !wcscmp(Temp, ContainsStringList))) {
                Result = TRUE;
                break;
            }
        }

        if (Result)
            break;
    }

    return Result;
}
/****************************************/

There is one thing to point out, that may be confusing; I am not checking for NULL before dereferencing the string lists and I did say you should handle an empty string list being NULL.  SAL "__in" annotation implies that they cannot be NULL.  Use SAL annotations and run prefast against your code unless you are the type that likes to waste time finding silly bugs later.  If you don't use SAL, just check for NULL in the for (ex. for (; ContainsStringList && *ContainsStringList; ...)

Tuesday, June 5, 2012

Quality: Tracking Down Memory Usage With Application Verifier, TruScan, and UMDH

This post is about how to drive up code quality specifically relating to memory usage.  These are things you should consider doing before you ship your code even if you feel like your code's memory usage is where it should be.  I will be talking about three tools here: Appverif, TruScan, affiliated with TTT (Time Travel Tracing aka iDNA), and UMDH.

Step 1: Enable Application Verifier on your code and attach Windbg to the process!  Get it here.  I think I may have said this a million times in this blog, but it will find almost all common C/C++ type defects, not just leaks.  This should be part of your normal workflow.  Period.  Always validate significant changes with appverif.  If you have this part of day to day development workflow, then you will not miss most defects.  If you are writing drivers, there is also Driver Verifier.

Step 2: Use TruScan.  TruScan utilizes TTT to keep track of all of your allocations.  It knows, correctly most of the time, when there are no more references to allocated memory and the memory is leaked.  It kind of takes a while, so this is something good to do like at a end of a milestone and you feel your code is mostly stable.

Step 3: User Mode Heap Dump (umhd.exe) is where you go next when all leaks are eradicated.  Your typical working set is not where you would like it to be, so you can use UMHD to help you profile your code to see what stacks are the biggest memory hogs.  After you identify these code paths, you should rationalize whether or not the memory usage was worth the cost, or maybe you were doing something dumb that should be changed.

Actually, there is a Step 0.

Step 0: Architect and design your code to be high performance and efficient to begin with.  Yes, the Analysis of Algorithms course you take in your CS program really is important.  Common sense often also applies here.

I didn't go into details on how to use these tools.  I figured you should be able to find them on the web, and their documentation should get you off the ground.  Feel free to give some feed back if you want more instructions on these tools.

Thursday, March 29, 2012

Targeting Only Specific Moduels with Appverifier



I have always been a proponent of using Appverifier whenever you are debugging your code with Windbg, but lets say you keep hitting verifier breaks in other modules that you aren't currently trying to debug.  Aside from being a good practice to fix all verifier breaks, and it is good to help other teams debug their components, sometimes you just want those breaks to go away to you can focus on your code.  After enabling Appverifier, you can do just that in the debugger.

Lets say you only care about foo.dll.  This is how you can have verifier only enabled on that module.

0:005> !avrf -skp all
Verifier package version >= 3.00
Exclusion ranges and suspend period have been reset.
0:005> !avrf -trg foo

It is that simple.

From Windbg help on !avrf.
"
-trg [ Start End | dll Module | all ]
Specifies a target range. Start is the beginning address of the target range. End is the ending address of the target range. Module specifies the name (including the .exe or .dll extension, but not including the path) of a module to be targeted. If you enter -trg all, all target ranges are reset. If you enter -trg with no additional parameters, the current target ranges are displayed.
-skp [ Start End | dll Module | all | Time ]
Specifies an exclusion range. Start is the beginning address of the exclusion range. End is the ending address of the exclusion range. Module specifies the name of a module to be targeted or excluded. Module specifies the name (including the .exe or .dll extension, but not including the path) of a module to be excluded. If you enter -skp all, all target ranges or exclusion ranges are reset. If you enter aTime value, all faults are suppressed for Time milliseconds after execution resumes.
"

Tuesday, March 20, 2012

Windows Kernel Function Prefixes


When kernel debugging, you will see a lot of windows kernel internal function names with two letter prefixes (assuming you have symbols).  Knowing what the prefixes mean can help you figure out what is going on.  I will give you a quick rundown of some of the basics.

Common Prefixes:
Cc
Cache manager
Cm
Configuration manager
Ex
Executive support routines
FsRtl
File system driver run time lib
Hal
Hardware abstraction layer
Io
IO manager
Ke
Kernel
Lpc
Local procedure call
Lsa
Local security authority
Mm
Memory manager
Nt
System services
Ob
Object manager
Po
Power manager
Pp
PnP manager
Ps
Process support
Rtl
Runtime lib
Se
Security
Wmi
Windows management instrumentation
Zw
Kernel version of Nt functions

Within these prefixes, there are variations to denote internal (second letter changed to an 'i') or private functions (an extra p is tacked on the end of the prefix.  For instance, an internal PnP function would have the Pi prefix instead of Pp.

Monday, March 19, 2012

Windows System Events + Windbg Debugging

Events are great ways to synchronize threads, processes, and even UM and KM code.  Creating and waiting for events is almost always the best (most efficient) way to synchronize.  Spin waiting is generally bad (ie do {/*empty*/} while (!bFlag);), and spinning with a sleep is worse because it will surely yield the CPU loosing at least an order of magnitude of more cycles.

What is an event?  The simple answer is it is a kernel object.  At the fundamental level, it is a data structure in the kernel.  In fact, because an event is the simplest kernel object, its structure is the header for all other kernel objects.

Basics:

This is how you create an event.  Crating an event, if successful, returns a handle to the corresponding kernel object.  This is how you close an event.  You close the handle to it like you would for any other kernel object.  To signal the event, you use SetEvent, and to reset the event, you use ResetEvent if it is set to be manually reset.  Nothing too complicated here.  To wait for events, you simply use the normal wait functions like WaitForSingleObject.  Obviously after you create an event, you must close the handle when you are done with it.  You are also required to close the handle for each time you duplicate one.  Most of this stuff applies to all kernel objects because like I mentioned before, the heeders of their structures are the same (they are the same as events).


Debugging:


Many common handle bugs are easily found by just doing a code review.  For instance, for each event create, make sure there is a close handle.  Make sure you don't close the handle before you are done using it.  Etc.  If a code review doesn't do it for you, application verifier can auto detect many handle related bugs.  Use it, love it.


Windbg: next regular debugging can help a lot to validate your expectations.  Next the debugger extension !handle is your friend.  I can show detailed information about your handles.  Some of the information about handle are only available in KM.

0:000> !handle
Handle 4
  Type          Section
Handle 8
  Type          Event
Handle c
  Type          Event
Handle 10
  Type          Event
Handle 14
  Type          Directory
Handle 5c
  Type          File
6 Handles
Type            Count
Event           3
Section         1
File            1
Directory       1

0:002> !handle 160 7
Handle 160
  Type          Event
  Attributes    0
  GrantedAccess 0x1f0003:
         Delete,ReadControl,WriteDac,WriteOwner,Synch
         QueryState,ModifyState
  HandleCount   2
  PointerCount  65
  Name          


Wednesday, March 7, 2012

How to Setup a KD (Kernel Deugger) in Windows With 1394 or Over the Network

Lets say you are starting to write drivers and need some kernel mode (km) debugging, or lets say you've decided that user mode (UM) debugging using windbg on the host is for sissies.  In this post I will show you how to setup a KD.

Assumptions:
You will need two machines: the TARGET machine that you want to debug, and the HOST machine that you will be doing the actual debugging.

First thing you need to do is install windbg on both the target and host.  You can find the installer here.

Pick a You KD Method:
Next decide what kind of debugging you want to do.  The options are:
NET (i.e. debugging over a TCP/IP network just using NICs) (supported on Win8+),
1394 (supported on WinXP+),
COM (serial) (supported since the dawn of KD), or
USB (2.0 supported on Vista+, 3.0 in Win8+)

Generally the port you use is decided for you based on what OS you need to debug, and what hardware your machines have.  I will make it simple, use 1394 (aka firewire) if you can, or if the machines aren't close, net.

If you two machines are next to each other favor 1394.  If you are going to be kernel debugging often and don't have 1394 in your machines, buy some cards.  1394 is simple and fast.

If your target isn't close to your debugger machine, use net, short for network, debugging, but note it is a Win8+ feature at the moment.  Net debugging is also great for getting someone else to remote debug something.  Also in Win8, over 90+% of the mainstream NICs are supported for net debugging; most Intel, Broadcom, and Realtek NICs are supported.

COM is slow, but works assuming your machines have serial ports.

USB might be a choice if your USB controllers support kernel debugging.  In my experience, they rarely do.  This is especially true when the machine doesn't have 1394 and you can't net debug.  You are kind of screwed at this point.  The joke is even funnier when you do find a port that does support KD, but it is internally wired to the built in webcam, or doesn't have an external port.

Setting Up a 1394 KD

TARGET

  1. open a command prompt
  2. bcdedit -debug on
  3. bcdedit -dbgsettings 1394 channel:1
    - you will have to pass bus params if you have more than one 1394 controler)
    - channel can be 1-62
  4. reboot


HOST

  1. plug in 1394 cable into target and host
  2. open a command prompt
  3. kd -k 1394:channel=1

    windbg work instead of kd as well
Setting Up a NET KD

TARGET
  1. open a command prompt
  2. bcdedit -dbgsettings net hostip:192.168.1.11 port:50000
    - for hostip, put your machine's IP instead of  192.168.1.11
    - you can pick whatever TCP port you want as long it is between 49151 and 65536.
  3. It will output something like:
    "Key=
    aaaaaaaaaaaaa.vvvvvvvvvvvvv.yyyyyyyyyyyyy.xxxxxxxxxxxxx"
    Save that string in a text file to a thumb drive or network share, you will need it again on the host
  4. bcdedit -debug on
  5. reboot
HOST
  1. open a command prompt
  2. windbg -k net:port= 50000,key=aaaaaaaaaaaaa.vvvvvvvvvvvvv.yyyyyyyyyyyyy.xxxxxxxxxxxxx

    you can use kd instead of windbg if you want



Friday, March 2, 2012

Debugging Heap Failures

Sometimes heap failures can be mysterious, but they don't have to be.  For instance, today I got this kd (kernel debugger) break.

////////////////////////////////////////////////////////////////////////////
Output of !analyze -v



*******************************************************************************
*                                                                             *
*                        Exception Analysis                                   *
*                                                                             *
*******************************************************************************


Loading symbols for 680c0000          component.dll ->    component .dll
Loading symbols for 75fb0000     KERNEL32.DLL ->   KERNEL32.DLL
Force unload of C:\Windows\SYSTEM32\user32.dll
Loading symbols for 76680000       user32.dll ->   user32.dll
ModLoad: 76680000 767a1000   C:\Windows\SYSTEM32\user32.dll
Force unload of C:\Windows\system32\ole32.dll
Loading symbols for 76560000        ole32.dll ->   ole32.dll
ModLoad: 76560000 76672000   C:\Windows\system32\ole32.dll
Debugger Dbgportaldb Connection::Open failed 80040e4d
Database Dbgportaldb not connected


FAULTING_IP:
ntdll!RtlReportCriticalFailure+33
001b:77d912d6 cc              int     3


EXCEPTION_RECORD:  ffffffff -- (.exr 0xffffffffffffffff)
ExceptionAddress: 77d912d6 (ntdll!RtlReportCriticalFailure+0x00000033)
   ExceptionCode: 80000003 (Break instruction exception)
  ExceptionFlags: 00000000
NumberParameters: 3
   Parameter[0]: 00000000
   Parameter[1]: 83b70d40
   Parameter[2]: 0000fffd


ERROR_CODE: (NTSTATUS) 0x80000003 - {EXCEPTION}  Breakpoint  A breakpoint has been reached.


EXCEPTION_CODE: (HRESULT) 0x80000003 (2147483651) - One or more arguments are invalid


EXCEPTION_PARAMETER1:  00000000


EXCEPTION_PARAMETER2:  83b70d40


EXCEPTION_PARAMETER3:  0000fffd


NTGLOBALFLAG:  0


APPLICATION_VERIFIER_FLAGS:  0


APP:  ntkrpamp.exe


LAST_CONTROL_TRANSFER:  from 77d924a1 to 77d912d6


FAULTING_THREAD:  ffffffff


BUGCHECK_STR:  APPLICATION_FAULT_ACTIONABLE_HEAP_CORRUPTION_heap_failure_lfh_bitmap_mismatch


PRIMARY_PROBLEM_CLASS:  ACTIONABLE_HEAP_CORRUPTION_heap_failure_lfh_bitmap_mismatch


DEFAULT_BUCKET_ID:  ACTIONABLE_HEAP_CORRUPTION_heap_failure_lfh_bitmap_mismatch


STACK_TEXT:
0225f520 77d924a1 c0000374 77dc0130 0225f564 ntdll!RtlReportCriticalFailure+0x33
0225f530 77d9168d 00000002 934d01f7 00000016 ntdll!RtlpReportHeapFailure+0x21
0225f564 77d6434b 0000000e 00ce0000 00dd22d0 ntdll!RtlpLogHeapFailure+0xa2
0225f604 77cc3561 00000050 00fa6728 00000000 ntdll!RtlpLowFragHeapAllocFromContext+0x2d4
0225f68c 680ca40f 00ce0000 00000000 00000050 ntdll!RtlAllocateHeap+0x105
0225f6d4 680c981a 00fa6728 00d4b774 00000000  component !DoMoreWork+0x37
0225f768 680c8f84 00000001 00d4b774 00000000  component !DoWork+0xce
0225f918 680d0169 00d27c48 02483530 02483590  component !Query+0x12b
0225f92c 77cf5935 00d4d170 00000000 02483530  component !WorkDispatchThreadProc+0x99
0225fa70 77cd9139 0225fad4 02483590 934d08cb ntdll!TppWorkpExecuteCallback+0x338
0225fc58 75fb2a32 00cef980 0225fca4 77d0cdfe ntdll!TppWorkerThread+0x6da
0225fc64 77d0cdfe 00cef980 934d0837 00000000 KERNEL32!BaseThreadInitThunk+0xe
0225fca4 77d0cdaa ffffffff 77d88566 00000000 ntdll!__RtlUserThreadStart+0x4a
0225fcb4 00000000 77d0d633 00cef980 00000000 ntdll!_RtlUserThreadStart+0x1c


...



1: kd> .frame 2
02 0225f564 77d6434b ntdll!RtlpLogHeapFailure+0xa2 [d:\5858\minkernel\ntos\rtl\heaplog.c @ 672]
1: kd> dv
0225f56c              FailureType = heap_failure_lfh_bitmap_mismatch (0n14)
0225f570              HeapAddress = 0x00ce0000
0225f574                  Address = 0x00dd22d0
0225f578                   Param1 = 0x00000000
0225f57c                   Param2 = 0x00000000
0225f580                   Param3 = 0x00000000



////////////////////////////////////////////////////////////////////////////////


My code in question is uninteresting:



    pData = (PDATA)HeapAlloc(GetProcessHeap(), 0, sizeof(DATA) * m_cData);

    if (!pData) {
        hr = E_OUTOFMEMORY;
        goto Exit;
    }

//////////////////////////////////////////////////////////////////////


So what is a HEAP_FAILURE_LFH_BITMAP_MISMATCH?  I wasn't sure, so I had to find out.  First off LFH refers to the Low-Fragmentation Heap which became the default in Windows Vista; you can read more about here.  Basically the LFH uses bitmap masks to track whether blocks are free or busy.  This information is also available in each LFH block's metadata.  This failure indicates that the busy status between these two do not agree and is therefore corrupted.  This still doesn't solve the mystery.  Luckily there is a debugger extension help us find more clues, !heap.

/////////////////////////////////////////////////////////////

1: kd> !heap -triage ce0000
**************************************************************
*                                                            *
*                  HEAP ERROR DETECTED                       *
*                                                            *
**************************************************************


Details:


Heap address:  00ce0000
Error address: 00dd22d0
Error type:    HEAP_FAILURE_LFH_BITMAP_MISMATCH
Details:       The LFH detected a mismatch between an individual
               block's metadata and its corresponding subsegment's
               metadata.
Follow-up:     Enable pageheap.
Error type: Unrecognized failure.
Follow-up:  This may be a bug in the extension. Send a
            remote or dump to ______.




Stack trace:
                77d6434b: ntdll!RtlpLowFragHeapAllocFromContext+0x000002d4
                77cc3561: ntdll!RtlAllocateHeap+0x00000105
                680ca40f: component!DoMoreWork+0x00000037
                680c981a: component!DoWork+0x000000ce
                680c8f84: component!DoWork+0x0000012b
                680d0169: Component!Query+0x00000099
                77cf5935: ntdll!TppWorkpExecuteCallback+0x00000338
                77cd9139: ntdll!TppWorkerThread+0x000006da
                75fb2a32: KERNEL32!BaseThreadInitThunk+0x0000000e
                77d0cdfe: ntdll!__RtlUserThreadStart+0x0000004a
                77d0cdaa: ntdll!_RtlUserThreadStart+0x0000001c


**********************************************************
** !heap: Searching for the heap and segment that
**        contain the specified address. To search
**        for the entry that contains this address,
**        use !heap -x 00ce0000.
**********************************************************


** !heap: Analyzing heap at 00ce0000...


** !heap: The following LFH allocations are missing a flag in their
          'unused bytes' field that identifies them as LFH allocations.
          This is usually caused by entry corruption in the client
          application.
** !heap: To view the state of the invalid blocks:
          !heap -i
          !heap -i


Heap address        Entry address       Unused bytes
----------------------------------------------------------------------------
ce0000              dd1b98              49
ce0000              dd22d0              49
ce0000              dd2590              49
ce0000              dd1b98              49
ce0000              dd22d0              49
ce0000              dd2590              49
ce0000              dd1b98              49
ce0000              dd22d0              49
ce0000              dd2590              49
ce0000              dd1b98              49
ce0000              dd22d0              49
ce0000              dd2590              49




** !heap: If these failures are easily reproducible, they can
          be detected as they occur by enabling pageheap for
          this scenario.


1: kd>
///////////////////////////////////////////////////////////////////////////////////////

Ok, still no luck.  I will try enabling pageheap if this issue is reproducible.

This is how you enable page heap verification.

//////// update /////////

The corruption of the heap is indicative of a buffer overrun.  I think I have been able to pinpoint the source.  I have a RTL linked list that tracks the state of sub-operations within a larger RPC client driven operation.  It turned out there were two latent conditions where this list was not correctly locked: the first one was when a sub-operation failed to initialize and I would remove it out of the list without locking it, and the second was when sub-operations in other threads would send a state update (which causes the lock to be taken) at the same moment when the list was getting torn down.  Obviously removing elements from the list while some other thread is actively traversing it can cause the other thread to be executing on bad Flinks.  Fixing these issues should make this corruption go away.  This was a very seldom repro on x86 and AMD64, but apparently common on ARM.

RTL linked lists

critical sections


Friday, February 17, 2012

Conditional WPP Tracing with WPP_LEVEL_ENABLED

In general WPP tracing is a no-op if tracing is not enabled on your component.  For instance, you don't need to worry about if this line is making your production code slower when tracing is not enabled:

LogTrace(MY_ERROR_LEVEL, L"Failed to start the RPC subsystem. hr = 0x%08x", hr);

But, there are times when you would like to trace something more involved than capturing an error and HRESULT.

For instance in this code, I wanted to be able to trace what SIDs I was generating, so to trace the string of the SID, some memory needed to be allocated.  I didn't want this to happen in my production code if tracing wasn't enabled.



LPWSTR pszSid = NULL;


if (!ConvertSidToStringSid(pTokenGroups->Groups[i].Sid, &pszSid)) {
    LogTrace(DAS_ERROR_LEVEL, "Could not generate SID string 0x%x\n", HRESULT_FROM_WIN32(GetLastError()));
}
else {
    LogTrace(DAS_INFO_LEVEL, "Created process SID %ls\n", (LPCWSTR)pszSid);
}


if (pszSid) {
    LocalFree(pszSid);
}


Sure, the LogTrace is a no-op if tracing isn't enabled, but I will always be allocating the string regardless.  I didn't want that in my production code.  This is where the WPP_LEVEL_ENABLED macro comes in.

WPP_LEVEL_ENABLED(level) can check to see if a certain level is currently enabled.  It is generated by the preproccessor, and uses the same mechanism the other WPP generated functions use to decided when to actually do work.

So, to fix that code snip:


if (WPP_LEVEL_ENABLED(MY_INFO_LEVEL) || WPP_LEVEL_ENABLED(MY_ERROR_LEVEL)) {
    LPWSTR pszSid = NULL;


    if (!ConvertSidToStringSid(pTokenGroups->Groups[i].Sid, &pszSid)) {
        LogTrace(MY_ERROR_LEVEL, "Could not generate SID string 0x%x\n", HRESULT_FROM_WIN32(GetLastError()));
    }
    else {
        LogTrace(MY_INFO_LEVEL, "Created process SID %ls\n", (LPCWSTR)pszSid);
    }


    if (pszSid) {
        LocalFree(pszSid);
    }
}

Now ConvertSidToStringSid will not be called and the string will not be allocated unless the component's WPP info level or error level is enabled.

Use this macro then to programmatically skip work that is only needed for WPP logging when logging isn't turned on.

Thursday, January 5, 2012

Looking at Heap Allocations With Windbg

I will keep this brief.  !heap is main debugger extension for allocation debugging.  I will show you some tricks to get started using it.  You can read up on all of the options by either typing !heap -? in the debugger or reading up on it in MSDN.

As always, don't do it the hard way.  Use tools like Application Verifier to help you find issues more quickly.  Appverif is awesome at identifying and even root causing many heap, leak, deadlock, and many other issues.  You should get in the habit of always debugging with it on.  Still, you should still know how to debug heap issues, so read on.


First think you will notice is that most of !heap's options take either a heap index or a heap address.  You can use 0 for process' heap, but there can be a lot and you might have to wait a while.  I will show you how to get it, and then you can play with the other options to learn what they do. 

Generally you have a pointer of an allocation, and you want to know more about it.  To find the heap for the allocation you are looking for there are two simple ways that I can think of.
  1. !heap -x ptr_address

    This will cause the debugger to search for the heap block with that address.  This can take a little while depending on how many allocations there are.  Using the -v option will take even longer, so you should probably avoid using it.

    example, double free:
    0:000> !analyze -v
    ...
    STACK_TEXT: 
    008ffbf4 77b37d01 c0000374 77b67130 008ffc38 ntdll!RtlReportCriticalFailure+0x33
    008ffc04 77b36eb5 00000002 91731eca 00c50000 ntdll!RtlpReportHeapFailure+0x21
    008ffc38 77b06e39 00000008 00c50000 00c50d88 ntdll!RtlpLogHeapFailure+0xa2
    008ffc6c 76de9898 00c50000 00000000 00c50d90 ntdll!RtlFreeHeap+0x5c
    008ffcb8 00df14c2 00c50d90 00000000 00c50d90 msvcrt!free+0x65
    008ffcd0 00df1884 00000001 00c50c80 00c52f10 program!wmain+0x42
    008ffd20 00df193f 008ffd34 759c2914 7fe6f000 program!__wmainCRTStartup+0x164
    008ffd28 759c2914 7fe6f000 008ffd74 77ab7fb6 program!wmainCRTStartup+0xf
    008ffd34 77ab7fb6 7fe6f000 91731f86 00000000 KERNEL32!BaseThreadInitThunk+0xe
    008ffd74 77ab7f62 ffffffff 77b2db43 00000000 ntdll!__RtlUserThreadStart+0x4a
    008ffd84 00000000 00df1930 7fe6f000 00000000 ntdll!_RtlUserThreadStart+0x1c

    FOLLOWUP_IP:
    program!wmain+42 [s:\program\program.cpp @ 423]
    00df14c2 83c404          add     esp,4

    FAULTING_SOURCE_CODE: 
       419:     UNREFERENCED_PARAMETER(argv);
       420:
       421:     p = malloc(1);
       422:     free(p);
    >  423:     free(p);
       424:
       425:

    So the double free in this case is obvious, but normally if you hit one it wasn't.

    0:000> .frame 7
    07 008ffcd0 00df1884 program!wmain+0x42 [s:\program\program.cpp @ 423]
    0:000> dv
               argc = 1
               argv = 0x00c50c80
                 hr = 0x00000000
                  p = 0x00c50d90
    0:000> !heap -x 0x00c50d90
    Entry     User      Heap      Segment       Size  PrevSize  Unused    Flags
    -----------------------------------------------------------------------------
    00c50d88  00c50d90  00c50000  00c50000       4c0       110         0  free fill

    0:000> !heap -a 00c50000
    Index   Address  Name      Debugging options enabled
      3:   00c50000
        Segment at 00c50000 to 00c5f000 (00004000 bytes committed)
        Flags:                40001062
        ForceFlags:           40000060
        Granularity:          8 bytes
        Segment Reserve:      00100000
        Segment Commit:       00002000
        DeCommit Block Thres: 00000200
        DeCommit Total Thres: 00002000
        Total Free Size:      000002a7
        Max. Allocation Size: 7ffdefff
        Lock Variable at:     00c50248
        Next TagIndex:        0000
        Maximum TagIndex:     0000
        Tag Entries:          00000000
        PsuedoTag Entries:    00000000
        Virtual Alloc List:   00c5009c
        Uncommitted ranges:   00c5008c
                00c54000: 0000b000  (45056 bytes)
        FreeList[ 00 ] at 00c500c0: 00c52388 . 00c50d90 
            00c50d88: 00110 . 004c0 [104] - free
            00c53af0: 00be8 . 004f0 [104] - free
            00c52380: 00818 . 00b88 [104] - free

    Segment00 at 00c50000:
            Flags:           00000000
            Base:            00c50000
            First Entry:     00c50498
            Last Entry:      00c5f000
            Total Pages:     0000000f
            Total UnCommit:  0000000b
            Largest UnCommit:00000000
            UnCommitted Ranges: (1)

    Heap entries for Segment00 in Heap 00c50000
             address: psize . size  flags   state (requested size)
            00c50000: 00000 . 00498 [101] - busy (497)
            00c50498: 00498 . 00118 [107] - busy (117), tail fill Internal
            00c505b0: 00118 . 00230 [107] - busy (214), tail fill
            00c507e0: 00230 . 00498 [107] - busy (480), tail fill
            00c50c78: 00498 . 00110 [107] - busy (f8), tail fill
            00c50d88: 00110 . 004c0 [104] free fill        00c51248: 004c0 . 00238 [107] - busy (220), tail fill
            00c51480: 00238 . 00650 [107] - busy (632), tail fill
            00c51ad0: 00650 . 00098 [107] - busy (80), tail fill
            00c51b68: 00098 . 00818 [107] - busy (800), tail fill
            00c52380: 00818 . 00b88 [104] free fill
            00c52f08: 00b88 . 00be8 [107] - busy (bd0), tail fill
            00c53af0: 00be8 . 004f0 [104] free fill
            00c53fe0: 004f0 . 00020 [111] - busy (1d)
            00c54000:      0000b000      - uncommitted bytes.

    You can see the memory is already freed. 
  2. If your heap call is on the stack at the time, you can find this value by walking the stack to and dumping variables.  This can be helpful where there are a lot of allocations and -x will take a long time.  The heap handle is the first parameter passed in to the ntdll RtlHeap*/Heap* functions.

    example:
    free((void*)1234);This will cause an access violation.
    0:000> kn
     # ChildEBP RetAddr 
    00 0102fa6c 77b37d01 ntdll!RtlReportCriticalFailure+0x33 [d:\6588\minkernel\ntos\rtl\rtlutil.c @ 161]
    01 0102fa7c 77b36eb5 ntdll!RtlpReportHeapFailure+0x21 [d:\6588\minkernel\ntos\rtl\heaplog.c @ 161]
    02 0102fab0 77b06e4c ntdll!RtlpLogHeapFailure+0xa2 [d:\6588\minkernel\ntos\rtl\heaplog.c @ 672]
    03 (Inline) -------- ntdll!RtlpProbeUserBufferUnsafe+0x8066d [d:\6588\minkernel\ntos\rtl\heappriv.h @ 2720]
    04 (Inline) -------- ntdll!RtlpProbeUserBuffer+0x80677 [d:\6588\minkernel\ntos\rtl\heappriv.h @ 2754]
    05 0102fae4 76de9898 ntdll!RtlFreeHeap+0x6c [d:\6588\minkernel\ntos\rtl\heap.c @ 1910]
    06 0102fb30 00ce14a1 msvcrt!free+0x65 [d:\6588\minkernel\crts\crtw32\heap\free.c @ 183]
    07 0102fb44 00ce1864 program!wmain+0x21 [s:\program\program.cpp @ 420]
    08 0102fb94 00ce191f program!__wmainCRTStartup+0x164 [s:\dep\minkernel\crts\crtw32\dllstuff\crtexe.c @ 692]
    09 0102fb9c 759c2914 program!wmainCRTStartup+0xf [s:\dep\minkernel\crts\crtw32\dllstuff\crtexe.c @ 510]
    0a 0102fba8 77ab7fb6 KERNEL32!BaseThreadInitThunk+0xe [d:\6588\base\win32\client\thread.c @ 65]
    0b 0102fbe8 77ab7f62 ntdll!__RtlUserThreadStart+0x4a [d:\6588\minkernel\ntdll\rtlstrt.c @ 1018]
    0c 0102fbf8 00000000 ntdll!_RtlUserThreadStart+0x1c [d:\6588\minkernel\ntdll\rtlstrt.c @ 936]
    0:000> .frame 5
    05 0102fae4 76de9898 ntdll!RtlFreeHeap+0x6c [d:\6588\minkernel\ntos\rtl\heap.c @ 1910]
    0:000> dv
         HeapHandle = 0x010d0000          Flags = 0
        BaseAddress = 0x000004d2
          BusyBlock = 0x00000000
        Interceptor =
    0:000> !heap -a 0x010d0000Index   Address  Name      Debugging options enabled
      3:   010d0000
        Segment at 010d0000 to 010df000 (00004000 bytes committed)
        Flags:                40001062
        ForceFlags:           40000060
        Granularity:          8 bytes
        Segment Reserve:      00100000
        Segment Commit:       00002000
        DeCommit Block Thres: 00000200
        DeCommit Total Thres: 00002000
        Total Free Size:      000002a7
        Max. Allocation Size: 7ffdefff
        Lock Variable at:     010d0248
        Next TagIndex:        0000
        Maximum TagIndex:     0000
        Tag Entries:          00000000
        PsuedoTag Entries:    00000000
        Virtual Alloc List:   010d009c
        Uncommitted ranges:   010d008c
                010d4000: 0000b000  (45056 bytes)
        FreeList[ 00 ] at 010d00c0: 010d2388 . 010d0d90 
            010d0d88: 00110 . 004c0 [104] - free
            010d3af0: 00be8 . 004f0 [104] - free
            010d2380: 00818 . 00b88 [104] - free
    Segment00 at 010d0000:
            Flags:           00000000
            Base:            010d0000
            First Entry:     010d0498
            Last Entry:      010df000
            Total Pages:     0000000f
            Total UnCommit:  0000000b
            Largest UnCommit:00000000
            UnCommitted Ranges: (1)
    Heap entries for Segment00 in Heap 010d0000
             address: psize . size  flags   state (requested size)
            010d0000: 00000 . 00498 [101] - busy (497)
            010d0498: 00498 . 00118 [107] - busy (117), tail fill Internal
            010d05b0: 00118 . 00230 [107] - busy (214), tail fill
            010d07e0: 00230 . 00498 [107] - busy (480), tail fill
            010d0c78: 00498 . 00110 [107] - busy (f8), tail fill
            010d0d88: 00110 . 004c0 [104] free fill
            010d1248: 004c0 . 00238 [107] - busy (220), tail fill
            010d1480: 00238 . 00650 [107] - busy (632), tail fill
            010d1ad0: 00650 . 00098 [107] - busy (80), tail fill
            010d1b68: 00098 . 00818 [107] - busy (800), tail fill
            010d2380: 00818 . 00b88 [104] free fill
            010d2f08: 00b88 . 00be8 [107] - busy (bd0), tail fill
            010d3af0: 00be8 . 004f0 [104] free fill
            010d3fe0: 004f0 . 00020 [111] - busy (1d)
            010d4000:      0000b000      - uncommitted bytes.