Tuesday, June 19, 2012

How to Safely Walk a MultiSz String List

You may have ran into MultiSz string lists in your Windows programming.  They are used in the registry (REG_MULTI_SZ), as device properties (DEVPROP_TYPE_STRING_LIST), et cetera.  The format of the multisz isn't standardized either.  The format really depends on the API that you are using, so walking these multisz can be tricky.  When there is tricky code in C, programmers often get it wrong and introduce bugs.  The code below will work correctly regardless of what kind of multisz is passed in.

What is a MultiSz?

The basic idea is that you take some strings, concatenate them together, and add add an extra NULL at the end.  The idea is kind of cool.  You only need one allocation for many strings.  They can also help reduce heap fragmentation by not having a lot of small allocations.  But like I mentioned before, programmers screw up using them all of the time.  Also, they are put at an unfair disadvantage with multiszs not being standardized.

simple many string ex.  Lets take two strings L"str1" and L"str2".  As strings by themselves, they would look like wchar_t buffers five elements long.

L’s’
L’t’
L’r’
L’1’
L’\0’
L’s’
L’t’
L’r’
L’2’
L’\0’


As a muiltisz, they would look like this:

L’s’
L’t’
L’r’
L’1’
L’\0’
L’s’
L’t’
L’r’
L’2’
L’\0’
L’\0’


You can now extrapolate what a multisz would look like with many strings in it.  What if there is only one string?  What is the valid format?


single string ex.  This is what the buffer would look like for L"str1".  Same as the last example.

L’s’
L’t’
L’r’
L’1’
L’\0’


As a multisz, it could look like this:

L’s’
L’t’
L’r’
L’1’
L’\0’
L’\0’


But, it could actually look like this too:

L’s’
L’t’
L’r’
L’1’
L’\0’


Which one is correct?  Both are; deal with them and don't AV.  Remember the is no standard for these kind of cases.

empty list ex. Lets say you have a empty list.  What are the possible formats?

NULL is possible.


A buffer like this is valid:

L’\0’


And a buffer like this is also valid:

L’\0’
L’\0’


When style should you code for?  All of them.

Code Example for Walking MultiSzs

This C function walks two MulitSzs.  I think the function is selfexplanatory, but you pass in a string list, called StringList, that what to check to see if it contains any strings from another list, ContainsStringList.  If there is at least one string match between the two lists, it returns true, else false.  There is also an ignore case flag.  

/******************************************/
BOOL StringListContains(
        __in PCZZWSTR StringList,
        __in PCZZWSTR ContainsStringList,
        __in BOOL IgnoreCase)
{
    BOOL Result = FALSE;
    PCZZWSTR Temp = NULL;

    for (; *ContainsStringList; ContainsStringList += wcslen(ContainsStringList) + 1) {
        for (Temp = StringList; *Temp; Temp += wcslen(Temp) + 1) {
            if ((IgnoreCase && !_wcsicmp(Temp, ContainsStringList))
                    || (!IgnoreCase && !wcscmp(Temp, ContainsStringList))) {
                Result = TRUE;
                break;
            }
        }

        if (Result)
            break;
    }

    return Result;
}
/****************************************/

There is one thing to point out, that may be confusing; I am not checking for NULL before dereferencing the string lists and I did say you should handle an empty string list being NULL.  SAL "__in" annotation implies that they cannot be NULL.  Use SAL annotations and run prefast against your code unless you are the type that likes to waste time finding silly bugs later.  If you don't use SAL, just check for NULL in the for (ex. for (; ContainsStringList && *ContainsStringList; ...)

Tuesday, June 5, 2012

Quality: Tracking Down Memory Usage With Application Verifier, TruScan, and UMDH

This post is about how to drive up code quality specifically relating to memory usage.  These are things you should consider doing before you ship your code even if you feel like your code's memory usage is where it should be.  I will be talking about three tools here: Appverif, TruScan, affiliated with TTT (Time Travel Tracing aka iDNA), and UMDH.

Step 1: Enable Application Verifier on your code and attach Windbg to the process!  Get it here.  I think I may have said this a million times in this blog, but it will find almost all common C/C++ type defects, not just leaks.  This should be part of your normal workflow.  Period.  Always validate significant changes with appverif.  If you have this part of day to day development workflow, then you will not miss most defects.  If you are writing drivers, there is also Driver Verifier.

Step 2: Use TruScan.  TruScan utilizes TTT to keep track of all of your allocations.  It knows, correctly most of the time, when there are no more references to allocated memory and the memory is leaked.  It kind of takes a while, so this is something good to do like at a end of a milestone and you feel your code is mostly stable.

Step 3: User Mode Heap Dump (umhd.exe) is where you go next when all leaks are eradicated.  Your typical working set is not where you would like it to be, so you can use UMHD to help you profile your code to see what stacks are the biggest memory hogs.  After you identify these code paths, you should rationalize whether or not the memory usage was worth the cost, or maybe you were doing something dumb that should be changed.

Actually, there is a Step 0.

Step 0: Architect and design your code to be high performance and efficient to begin with.  Yes, the Analysis of Algorithms course you take in your CS program really is important.  Common sense often also applies here.

I didn't go into details on how to use these tools.  I figured you should be able to find them on the web, and their documentation should get you off the ground.  Feel free to give some feed back if you want more instructions on these tools.