Tuesday, June 19, 2012

How to Safely Walk a MultiSz String List

You may have ran into MultiSz string lists in your Windows programming.  They are used in the registry (REG_MULTI_SZ), as device properties (DEVPROP_TYPE_STRING_LIST), et cetera.  The format of the multisz isn't standardized either.  The format really depends on the API that you are using, so walking these multisz can be tricky.  When there is tricky code in C, programmers often get it wrong and introduce bugs.  The code below will work correctly regardless of what kind of multisz is passed in.

What is a MultiSz?

The basic idea is that you take some strings, concatenate them together, and add add an extra NULL at the end.  The idea is kind of cool.  You only need one allocation for many strings.  They can also help reduce heap fragmentation by not having a lot of small allocations.  But like I mentioned before, programmers screw up using them all of the time.  Also, they are put at an unfair disadvantage with multiszs not being standardized.

simple many string ex.  Lets take two strings L"str1" and L"str2".  As strings by themselves, they would look like wchar_t buffers five elements long.

L’s’
L’t’
L’r’
L’1’
L’\0’
L’s’
L’t’
L’r’
L’2’
L’\0’


As a muiltisz, they would look like this:

L’s’
L’t’
L’r’
L’1’
L’\0’
L’s’
L’t’
L’r’
L’2’
L’\0’
L’\0’


You can now extrapolate what a multisz would look like with many strings in it.  What if there is only one string?  What is the valid format?


single string ex.  This is what the buffer would look like for L"str1".  Same as the last example.

L’s’
L’t’
L’r’
L’1’
L’\0’


As a multisz, it could look like this:

L’s’
L’t’
L’r’
L’1’
L’\0’
L’\0’


But, it could actually look like this too:

L’s’
L’t’
L’r’
L’1’
L’\0’


Which one is correct?  Both are; deal with them and don't AV.  Remember the is no standard for these kind of cases.

empty list ex. Lets say you have a empty list.  What are the possible formats?

NULL is possible.


A buffer like this is valid:

L’\0’


And a buffer like this is also valid:

L’\0’
L’\0’


When style should you code for?  All of them.

Code Example for Walking MultiSzs

This C function walks two MulitSzs.  I think the function is selfexplanatory, but you pass in a string list, called StringList, that what to check to see if it contains any strings from another list, ContainsStringList.  If there is at least one string match between the two lists, it returns true, else false.  There is also an ignore case flag.  

/******************************************/
BOOL StringListContains(
        __in PCZZWSTR StringList,
        __in PCZZWSTR ContainsStringList,
        __in BOOL IgnoreCase)
{
    BOOL Result = FALSE;
    PCZZWSTR Temp = NULL;

    for (; *ContainsStringList; ContainsStringList += wcslen(ContainsStringList) + 1) {
        for (Temp = StringList; *Temp; Temp += wcslen(Temp) + 1) {
            if ((IgnoreCase && !_wcsicmp(Temp, ContainsStringList))
                    || (!IgnoreCase && !wcscmp(Temp, ContainsStringList))) {
                Result = TRUE;
                break;
            }
        }

        if (Result)
            break;
    }

    return Result;
}
/****************************************/

There is one thing to point out, that may be confusing; I am not checking for NULL before dereferencing the string lists and I did say you should handle an empty string list being NULL.  SAL "__in" annotation implies that they cannot be NULL.  Use SAL annotations and run prefast against your code unless you are the type that likes to waste time finding silly bugs later.  If you don't use SAL, just check for NULL in the for (ex. for (; ContainsStringList && *ContainsStringList; ...)

No comments:

Post a Comment