Monday, July 18, 2011

Performance Instrumentation Testing for Windows Programming

Performance is something you should always build in as you are writing code whether you are building something from the ground up or working on bug fixes for some legacy code. Performance is one of those core software principals that is just good hygiene. Similarly you should always write secure code, reliable code, et cetera, but those are separate topics.

Performance considerations start at the very beginning when you start designing your features, and continue throughout the process. So, let’s say that you already came up with a high performance design that you validated through prototyping. Great, you are starting off the project on the right foot; as you know it becomes much harder to make design changes as the project goes down the pike, or even impossible towards the end of the development cycle.

So now you are in the thick of coding, or perhaps on the tail end of development, and you want to make sure that you are still meeting your performance goals for your feature work. Saying you are a good programmer and you always right high performance code isn’t quite good enough; you have to actually test your code. Writing code with timers in it and doing a printf with times is ok for prototypes, but doesn’t work for real production code. Not to worry, there are a number of tools at the Windows developer’s disposal to actually do the job right.

Profiling is one tool in the toolbox; one such profiler is Microsoft’s F1 made by the developer division. I think it is available in the wild as a standalone, or it looks like it has been included with VS08 and VS10. I personally have used the standalone F1 and still write code in Vim, but maybe I am weird like that. The basic idea of F1 is that it breaks into your process at a certain time interval and check to see what is on your threads’ call stacks. Based on what is on the stacks, F1 can give you ideas as to how much time is spent in each function, and about allocations being made. The advantages to using a profiler like F1 are: you don’t have to modify your binaries to instrument, it is light weight, and it gives you an idea how much time your code spends in functions. Some downsides are: you don’t get call count information, CPU usage is skewed by timer caused context switching, doesn’t profile kernel mode usage, it is less effective for system wide issues, it is heavy weight, and it doesn’t scale well if you want to profile often. To sum it up, F1 is a great tool with limitations. It really can give you a real-world performance view of your code (if it runs in UM), but there are better ways to instrument if want to look at performance more than a few times.

One of these better ways is ETW (Event Tracing for Windows). I will not get into the nitty-gritty of ETW; I will assume that the reader knows how to Google and can read the copious content out there, but I will give you a quick a dirty idea as to what it is. Basically ETW is the Windows operating system’s built in tracing mechanism. It is baked in all the way to the kernel level, and is very high performance. Since ETW is baked into the operating system, you can track scenarios from process to process to even KM components with very little overhead. You can use it for many other things as well, like writing events to logs and so on, but I will focus on using ETW for instrumenting your code for performance. For some reason though, I haven’t found too much help on the Internet on how to do performance instrumentation. I plan to help with that.

In the days and weeks to come, I will break it down how to instrument your code with ETW, and use the Windows tools for performance tracking and testing.

ETW Overview: http://msdn.microsoft.com/en-us/magazine/cc163437.aspx

Do not confuse ETW with WPP. Yes WPP is built on ETW, but I use WPP for more conventional printf style tracing. It is good to also add WPP tracing to your code for debugging purposes, but I don’t use it for performance testing. In fact WPP tracing is probably the best way to debug nasty concurrency issues that don’t repro well on a debugger, so you should look into it.