Monday, March 16, 2015

Making Changes to the Windows Kernel

We all have to modify the Windows kernel from time to time...am I right?  If you are like me, you don't have to do all that often, but when I do, I always seem to forget to do these small things.

The kernel you build must be signed or it will not load.  As a side note, also many core DLLs must be singed like cfgmgr32.dll for instance.  They are checked by smss.exe early in boot, and will cause your system to bugcheck (with 0xc000021a aka STATUS_SYSTEM_PROCESS_TERMINATED) if they are not signed.  Also, how is the build signed?  If it is PRS signed build, you will need to install the test signing certificate to the target if you want to run a test signed kernel.  Make sure these environment variables are set.
set NT_SIGNCODE=1
set NT_SIGNCODE_PH=1

Your kernel and HAL need to match.  This may also be case with other components like ACPI, etc. but these are less likely to cause you problems.  You should just always build replace the kernel and the HAL together.

The are found in c:\windows\system32\ and are called:
x86
ntkrpamp.exe
halmacpi.dll
=or=
AMD64
ntkrnlmp.exe
hal.dll

You can just clobber them, and reboot, but your system will probably just bugcheck.  I would suggest replacing them with alternative names.  Try the following:
reagentc /disable
bcdedit /bootdebug on
bcdedit /set BootStatusPolicy IgnoreAllFailures
bcdedit /set testsigning yes
bcdedit /set kernel mykernel.exe
bcdedit /set hal myhal.dll

Likewise, you should setup up a KD on the target so you can see what bugchecks you are seeing.  Ex:
    bcdedit /debug on
    bcdedit /dbgsettings 1394 channel:1


    But, what if you forgot one of these steps and now your machine is in a bugcheck loop that you can't debug?  You can add them temporarily by pressing F10 to while the Windows boot manager is running, and you can add them in as boot options.

    If you need to change the kernel and the PC will not boot.  You can simply change the kernel or HAL offline using WinPE.  There are lots of ways to get into WinPE, so I won't describe them here.

    Handy tip: here is the command to see what drive is mounted as what in WinPE
    wmic LOGICALDISK LIST BRIEF

    Hopefully this was a handy refresher!

    Tuesday, February 17, 2015

    SAL to SAL2 Porting Guide

    I have written before that you should use SAL in your Windows C/C++ code.  It allows to you do neat things like use Microsoft's static analysis and kill lots of common bugs at compile time.  Whereas C can be ambiguous how say a pointer parameter is used, SAL can make the intended use of parameters and struct members very crisp.  A: if it is hard to figure out what SAL annotation defines your parameter's use in the function, you are probably doing it wrong.  In general, you should make your functions use the parameters in straightforward ways.  B: Likewise, there is always a correct set of SAL annotations for a parameter's behavior, so don't cop out and figure it out.  If you are still confused, refer to A.

    Why port to SAL2?  Because 2 > 1 obviously.  SAL2 does add some new functionality.  You should not be lazy and just use SAL2 in new code, and clean up old code to use SAL2 while you are in there working on it.  Should you go back and fix old code to use SAL2?  It is up to you.  It has some new functionality, however, SAL1 will continue to work just fine.

    Const Protection, Use It

    This isn't SAL, just regular C.  Use const protection.  Parameters that should not be modified should be const protected.  For instance: _In_ const char *p is more correct than _In_ char *p if the p buffer should not be modified.  Please use const correctly for parameters and struct members.

    Here is a table for the basics of SAL2.

    Here is my version with SAL to SAL2 notes.

    Pointer Parameters

    For the annotations in the following table, when a pointer parameter is being annotated, the analyzer reports an error if the pointer is null. This applies to pointers and to any data item that's pointed to.

    SAL 1
    SAL 2
    Description
    __in
    _In_
    Annotates input parameters that are scalars, structures, pointers to structures and the like. Explicitly may be used on simple scalars. The parameter must be valid in pre-state and will not be modified.
    __out
    _Out_
    Annotates output parameters that are scalars, structures, pointers to structures and the like. Do not apply this to an object that cannot return a value—for example, a scalar that's passed by value. The parameter does not have to be valid in pre-state but must be valid in post-state.
    __inout
    _Inout_
    Annotates a parameter that will be changed by the function. It must be valid in both pre-state and post-state, but is assumed to have different values before and after the call. Must apply to a modifiable value.
    __in_z
    _In_z_
    A pointer to a null-terminated string that's used as input. The string must be valid in pre-state. Variants of PSTR, which already have the correct annotations, are preferred.  <= This is a pet annoyance of mine, don't be the guy that does _In_z_ PCWSTR.
    __inout_z
    _Inout_z_
    A pointer to a null-terminated character array that will be modified. It must be valid before and after the call, but the value is assumed to have changed. The null terminator may be moved, but only the elements up to the original null terminator may be accessed.
    __in_ecount(s)
    __in_bcount(s)
    _In_reads_(s)
    _In_reads_bytes_(s)
    A pointer to an array, which is read by the function. The array is of size s elements, all of which must be valid.
    The _bytes_ variant gives the size in bytes instead of elements. Use this only when the size cannot be expressed as elements. For example, char strings would use the _bytes_ variant only if a similar function that uses wchar_t would.
    __in_ecount_z(s) 
    _In_reads_z_(s)
    A pointer to an array that is null-terminated and has a known size. The elements up to the null terminator—or s if there is no null terminator—must be valid in pre-state. If the size is known in bytes, scale s by the element size.

    _In_reads_or_z_(s)
    A pointer to an array that is null-terminated or has a known size, or both. The elements up to the null terminator—or s if there is no null terminator—must be valid in pre-state. If the size is known in bytes, scale s by the element size. (Used for the strn family.)

    _Out_writes_(s)
    _Out_writes_bytes_(s)
    A pointer to an array of s elements (resp. bytes) that will be written by the function. The array elements do not have to be valid in pre-state, and the number of elements that are valid in post-state is unspecified. If there are annotations on the parameter type, they are applied in post-state. For example, consider the following code.
    C++
    typedef _Null_terminated_ wchar_t *PWSTR;
    void MyStringCopy(_Out_writes_ (size) PWSTR p1,
       _In_ size_t size,
       _In_ PWSTR p2);
    In this example, the caller provides a buffer of size elements for p1. MyStringCopy makes some of those elements valid. More importantly, the _Null_terminated_ annotation onPWSTR means that p1 is null-terminated in post-state. In this way, the number of valid elements is still well-defined, but a specific element count is not required.
    The _bytes_ variant gives the size in bytes instead of elements. Use this only when the size cannot be expressed as elements. For example, char strings would use the _bytes_ variant only if a similar function that uses wchar_t would.

    _Out_writes_z_(s)
    A pointer to an array of s elements. The elements do not have to be valid in pre-state. In post-state, the elements up through the null terminator—which must be present—must be valid. If the size is known in bytes, scale s by the element size.

    _Inout_updates_(s)
    _Inout_updates_bytes_(s)
    A pointer to an array, which is both read and written to in the function. It is of size s elements, and valid in pre-state and post-state.
    The _bytes_ variant gives the size in bytes instead of elements. Use this only when the size cannot be expressed as elements. For example, char strings would use the _bytes_ variant only if a similar function that uses wchar_t would.

    _Inout_updates_z_(s)
    A pointer to an array that is null-terminated and has a known size. The elements up through the null terminator—which must be present—must be valid in both pre-state and post-state. The value in the post-state is presumed to be different from the value in the pre-state; this includes the location of the null terminator. If the size is known in bytes, scale s by the element size.

    _Out_writes_to_(s,c)
    _Out_writes_bytes_to_(s,c)
    _Out_writes_all_(s)
    _Out_writes_bytes_all_(s)
    A pointer to an array of s elements. The elements do not have to be valid in pre-state. In post-state, the elements up to the c-th element must be valid. If the size is known in bytes, scale s and c by the element size or use the _bytes_ variant, which is defined as:
    C++
       _Out_writes_to_(_Old_(s), _Old_(s))
       _Out_writes_bytes_to_(_Old_(s), _Old_(s))
    In other words, every element that exists in the buffer up to s in the pre-state is valid in the post-state. For example:
    C++
    void *memcpy(_Out_writes_bytes_all_(s) char *p1,
       _In_reads_bytes_(s) char *p2,
       _In_ int s);
    void * wordcpy(_Out_writes_all_(s) DWORD *p1,
       _In_reads_(s) DWORD *p2,
       _In_ int s);

    _Inout_updates_to_(s,c)
    _Inout_updates_bytes_to_(s,c)
    A pointer to an array, which is both read and written by the function. It is of size s elements, all of which must be valid in pre-state, and c elements must be valid in post-state.
    The _bytes_ variant gives the size in bytes instead of elements. Use this only when the size cannot be expressed as elements. For example, char strings would use the _bytes_ variant only if a similar function that uses wchar_t would.

    _Inout_updates_all_(s)
    _Inout_updates_bytes_all_(s)
    A pointer to an array, which is both read and written by the function of size s elements. Defined as equivalent to:
    C++
       _Inout_updates_to_(_Old_(s), _Old_(s))
       _Inout_updates_bytes_to_(_Old_(s), _Old_(s))
    In other words, every element that exists in the buffer up to s in the pre-state is valid in the pre-state and post-state.
    The _bytes_ variant gives the size in bytes instead of elements. Use this only when the size cannot be expressed as elements. For example, char strings would use the _bytes_ variant only if a similar function that uses wchar_t would.

    _In_reads_to_ptr_(p)
    A pointer to an array for which the expression p – _Curr_ (that is, p minus _Curr_) is defined by the appropriate language standard. The elements prior to p must be valid in pre-state.

    _In_reads_to_ptr_z_(p)
    A pointer to a null-terminated array for which the expression p – _Curr_ (that is, p minus_Curr_) is defined by the appropriate language standard. The elements prior to p must be valid in pre-state.

    _Out_writes_to_ptr_(p)
    A pointer to an array for which the expression p – _Curr_ (that is, p minus _Curr_) is defined by the appropriate language standard. The elements prior to p do not have to be valid in pre-state and must be valid in post-state.

    _Out_writes_to_ptr_z_(p)
    A pointer to a null-terminated array for which the expression p – _Curr_ (that is, p minus_Curr_) is defined by the appropriate language standard. The elements prior to p do not have to be valid in pre-state and must be valid in post-state.

    Wednesday, January 28, 2015

    WRL and the ComPtr

    More recently I have been forced out of my comfort zone of C-style systems programming where you manage your own memory and concurrency and in to the world of writing code for WinRT ABI.

    In case you don't know, WinRT (universal, modern, tailored, Windows Store?) apps can be written in C#, C++, or even *gasp* JavaScript.  It actually doesn't matter which laguage you use, because they all thunk down to the WinRT ABI.  The ABI is native (read: c++) code, so it is fast and efficient.  Under the covers, with WinRT API is just COM, or more accurately modern COM.  The MIDL syntax has been updated to make way modern runtime APIs.

    How does a systems programmer write modern COM for ABI code?  In short, the Windows Runtime Library or WRL. Modern COM still has classic COM under the hood.  One thing they tried to do was to abstract away some of the error prone aspects of COM while making it more developer-friendly.  In some ways it is similar in purpose to ATL but without all of the ATL grossness.  It uses ComPtr instead of CComPtr for smart pointers, and I don't totally hate them like I did CComPtrs.  Modern COM uses a lot of templates types, and doesn't use exceptions.  Most Windows systems programmers avoid exceptions to make the code easier to debug with the KD.  Personally I also hate debugging templated code, but I guess it is needed for the generic programming concepts to work.

    ComPtr

    ComPtr is the Modern COM smart pointer.  Its behavior is similar to that of the ^ (hat) operator in WinRT CX code.  ComPtr is only for COM objects, and the lifetime automatically managed.  The ^ can be a smart pointer for non-COM things.


    Wednesday, June 18, 2014

    Managing ASSERT & NT_ASSERT in WinDBG & KD

    What are ASSERTs and how to use them:

    In general, Windows developers may use ASSERT and NT_ASSERT to validate excepted assumptions in their code.  In CHK (checked non-optimized) binaries, an exception will be thrown if the expression does not evaluate to true.

    For example, you may have an assumption in a function, that a pointer should never be NULL and the buffer size should be less than 1k.  Then you can write an ASSERT like this:

    ASSERT(pBuff && (cbBuff < 1024));

    This assertion will fire if the expression is false, or in this case: !pBuff || cbBuff >= 1024.

    0:003> p
    Assertion s:\dllsrv\info.cpp(83): pBuff && (cbBuff < 1024)
    ...

    There are differences between the two, but normally it is better to use NT_ASSERT, but this is not what this article is about.  There is also a NT_VERIFY which will assert even in FRE (free optimized binaries), but I can 't think of any reasons of the top of my head why you would need to use it.

    As a rule of thumb, you shouldn't use ASSERTs too heavily.  For instance, in my example, you can validate the same thing using SAL annotations where bugs can be found at compile time with static analysis tools like prefast which is always preferable to runtime debugging.  You may consider adding ASSERTs while developing your code, and removing a bunch of them once it has stabilized.

    ASSERTs in WinDBG & KD

    Normally when I am testing my code, I run my code under WinDBG or KD.  I normally copy private CHK versions of my binaries with symbols.  It is always a good idea to enable application verifier as well, or use gflags to add standard checks.

    In the simplest form, lets say your ASSERT throws an exception because it evaluates false, you do the normal debugging thing and figure out why your assumptions were not correct, fix the bug(s), replace the binary, and restart the test.  That is the point of the asset after all.

    But, lets say you want to continue on.  What are the commands to do that?

    gh - this is the basic one.  It is like pressing g or F5 except you are telling to not worry about the exception.

    ah - you use ah to control a specific assert.  For instance, you can use ahi to ignore an assertion, if it is noisy and you don't want or can't change the binary to remove or fix it.

    sx - is used to control exceptions globally.  sx* asrt will disable all asserts for instance.

    Wednesday, July 17, 2013

    Postmortem Debugging with WinDBG or NTSD

    I have mentioned several times before that using application verifier will help root cause and solve many bugs like for example memory bugs (leaks, heap corruptions, etc.), locking bugs, etc.  If in user mode (UM) you can just attach the debugger to your UM process and run your tests until in breaks in, or if in KM you would use a KD but the same thing applies.  

    Let’s say that you have a heap corruption that is hard to track down in your UM process.  Normally you would do this:
    1.       Isolate your code into a single process with a unique image name.
    2.       Enable page heap on that image name using gflags.
    3.       Start that process and attach a debugger (windbg)
    4.       Reproduce the issue that causes the corruption and watch the debugger break in
    5.       Then use the handy !heap –triage command to give you some more clues as to what is going on

    Alternatively depending on what the memory problem is, in step 2, application verifier could also help.  It can be enable on your image with appverif.exe or gflags.exe.  If in KM, use driver verifier and KD. 

    However with some bugs relating to race conditions, or memory corruptions, enabling these tools on your process or attaching the debugger its self could be enough to make it so that the issue doesn’t repro anymore.  That can be frustrating.

    If directly debugging your process causes you to lose the repro, there are a few other techniques you can employ:
    -          Tracing: adding WPP tracing to your code can be a great way to trouble shoot some issues with your code that are hard to capture with a debugger.  This is an especially good method for race conditions and deadlocks that don’t hit with a debugger.
    -          KD: you can set up your machine for kernel debugging.  When you UM process has an unhandled exception, then the KD will break in.  KD is always a solid choice, but kernel debugging may be harder for some than UM debugging.  Also, some machines are hard to setup for KD like Ultrabooks where the debugable USB port may be internal to the chassis and soldered to the web cam, or it is an ARM tablet.
    -          WER: Windows Error Reporting can be configured to capture dumps of your process when it crashes.  You can debug the dump after the fact.  Dump files are not as nice as debugging a live machine, but this could be a great option where your code is deployed to a lot of machines, and you want to see what happened on a machine where the bug repros.
    -          Postmortem Debugging: you can configure Windows to launch and attach a debugger in response to a un handled exception.  This is called postmortem debugging.

    All of these have their uses, but I will focus on postmortem debugging because it was the only way I could chase down a heap corruption that I have been trying to root cause for the past few days.

    First off, you can use this link to read about it in detail if you like:

    The basics are:
    -          Simply type in an elevated command prompt “windbg –I” to make windbg the postmortem debugger.  This works great for processes in your login session. 
    -          Use NTSD if you need to used named pipe debugging.  Type “ntsd –iae”.  You can use the –iaec “extra options” to add extra options, but it doesn’t work for –server like you need for name pipe debugging.  You need to manually edit the reg key  at \\HKEY_LOCAL_MACHINE\Software\Microsoft\Windows NT\CurrentVersion\AeDebug. 


    Friday, June 28, 2013

    GFlags.exe and Pageheap

    I was tracking down a heap corruption today, and it occurred to me that I haven't mentioned how to enable pageheap.  For instance, when you use the handy "!heap -traige", if there was a heap corruption, sometimes it will tell to try the repro again with pageheap enabled.  How do you do that?

    There are actually several ways to enable pageheap, but I will only talk about one, modifying the global flags using gflags.exe.  It should get installed when you install windbg.

    Basically for a user mode (UM) process, you go to the image file tab, and type in the name of your exe.  If you run in a svchost.exe, then you should probably break out your service to a uniquely named servicehost.exe.  I use myhost.exe normally for debugging a service.  After that, check "Enable page heap."

    Next, you need to run the code in question under  the debugger, and then you reproduce the heap corruption.  Normally the debugger will break in This time you will be able to get a lot more useful information out of !heap and !analyze.

    Tuesday, June 4, 2013

    See the OS Version in WinDBG

    I seem to forget this command every so often, so it is ripe for a post.  Often I will be debugging a remote KD from some other team, and I want to know what version of the OS I am debugging.  The command for this is vertarget.  Super obvious.  Obviously the next most obvious command would be version, which is for the version of the debugger, and ver which is nothing.

    Here is some sample output of vertarget:

    0:005> vertarget
    Windows 8 Version 9200 MP (4 procs) Free x64 <= looks like the output of GetVersionEx which lies after 8, and always says it Windows 8.
    Product: WinNt, suite: SingleUserTS
    kernel32.dll version: 6.3.9418.0 (winmain.130530-1753) <= This looks the actual version you would get fro m RtlGetVersion
    Machine Name: "bob"
    Debug session time: Fri May 31 19:12:24.000 2013 (UTC - 7:00)
    System Uptime: 0 days 1:03:12.334
    Process Uptime: 0 days 0:16:35.000
      Kernel time: 0 days 0:00:00.000
      User time: 0 days 0:00:00.000

    This is what it looks like from a KD:
    0: kd> vertarget
    Windows 8 Kernel Version 9658 MP (4 procs) Free x64
    Product: WinNt, suite: TerminalServer SingleUserTS
    Built by: 9658.0.amd64fre.winmain.131120-1618
    Machine Name: "bobby-brown"
    Kernel base = 0xfffff801`d3e15000 PsLoadedModuleList = 0xfffff801`d40dfb90
    Debug session time: Fri Nov 22 14:28:29.746 2013 (UTC - 8:00)
    System Uptime: 0 days 0:45:58.445