Monday, November 16, 2020

COM Access Violation (AV) Crashes Relating to Unloaded DLLs

My team uses a broadly used WinRT API.  Basically, all apps that have a device-related scenario use it.  This means buggy apps that crash will sometimes have our component featured in their Watson dumps.  Based on certain metrics, Watson will turn buckets turn into code bugs, so we have to triage them.  Aside: in case you were wondering, bugs from WER (Windows Error Reporting) service and dumps, do generate bugs and get fixed, so WER is useful.

A number of AV crash bugs we see are related to our device watcher calling back into the app's handlers for the device events, which almost always means a lifetime management bug in the app.  

You can check the memory allocations to see if they are busy or free.  This applies more to the component that owns the object's memory.  I think I covered that before.  That will let you know if the object has already been released.  A lot of times you can still see the object's state after it has been freed, but the important part is that it has been released/freed already, so the object is not being ref counted correctly.  AppVerif has a handy COM set of checks which might help if you think you may have this problem in your code.  With smart pointers, it can be a little harder to see what is happening with the object's references.  If the object model is not too complicated, you can just find the bug with code inspection.

A super common reason for AVs like this is the DLL is unloaded.  COM aggressively unloads DLLs not being used by calling DllCanUnloadNow.  If the implementation has a bug, the DLL will get unloaded while objects and other COM operations are in flight.  Eventually, when COM tries to CoUnitialize an object that belongs to that DLL, it will AV.  This also applies to other COM things like when COM tries to marshal between apartments, etc.

 We see crashes like this fairly often:

0:049> .excr
rax=00007ff8f305cf70 rbx=0000026dfc5cc900 rcx=0000026dfc5cc900
rdx=0000000000012f06 rsi=0000026dfa091b00 rdi=0000026dfa091ae0
rip=00007ff990c9094d rsp=000000b92deff490 rbp=00000000800401fd
 r8=00007ff970a0b370  r9=000000b92deff5d0 r10=dee01e7974d59970
r11=000000b92deff5e0 r12=0000000000012f06 r13=00007ff970a0b370
r14=000000b92deff5d0 r15=00007ff8f305cf70
iopl=0         nv up ei pl nz na pe nc
cs=0033  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010202
combase!CGIPTable::GetRequestedInterface+0x22 [inlined in combase!CGIPTable::GetInterfaceFromGlobal+0x1ed]:
00007ff9`90c9094d 488b00          mov     rax,qword ptr [rax] ds:00007ff8`f305cf70=????????????????
0:049> k
  *** Stack trace for last set context - .thread/.cxr resets it
 # Child-SP          RetAddr               Call Site
00 (Inline Function) --------`--------     combase!CGIPTable::GetRequestedInterface+0x22 [onecore\com\combase\dcomrem\giptbl.cxx @ 1615] 
01 000000b9`2deff490 00007ff9`7097515b     combase!CGIPTable::GetInterfaceFromGlobal+0x1ed [onecore\com\combase\dcomrem\giptbl.cxx @ 1660] 
02 000000b9`2deff560 00007ff9`70977823     Windows_Devices_Enumeration!Gip<Windows::Foundation::ITypedEventHandler<Windows::Devices::Enumeration::DeviceWatcher *,Windows::Devices::Enumeration::DeviceInformation *> >::Localize+0x6b [onecore\private\base\inc\devices\Git.h @ 167] 
03 (Inline Function) --------`--------     Windows_Devices_Enumeration!GitDelegate2<Windows::Foundation::ITypedEventHandler<Windows::Devices::Enumeration::DeviceWatcher *,Windows::Devices::Enumeration::DeviceInformation *>,Windows::Devices::Enumeration::IDeviceWatcher *,Windows::Devices::Enumeration::IDeviceInformation *>::GetHandler+0x11 [onecoreuap\base\devices\rtenum\dllsrv\GitDelegate.h @ 254] 
04 000000b9`2deff5a0 00007ff9`70976d95     Windows_Devices_Enumeration!GitDelegate2<Windows::Foundation::ITypedEventHandler<Windows::Devices::Enumeration::DeviceWatcher *,Windows::Devices::Enumeration::DeviceInformation *>,Windows::Devices::Enumeration::IDeviceWatcher *,Windows::Devices::Enumeration::IDeviceInformation *>::Invoke+0x23 [onecoreuap\base\devices\rtenum\dllsrv\GitDelegate.h @ 303] 
05 (Inline Function) --------`--------     Windows_Devices_Enumeration!Microsoft::WRL::EventSource<Windows::Foundation::ITypedEventHandler<Windows::Devices::Enumeration::DeviceWatcher *,Windows::Devices::Enumeration::DeviceInformation *>,Microsoft::WRL::InvokeModeOptions<-2> >::InvokeAll::__l2::<lambda_00d1300cb6cb75d6b2b5344e37267964>::operator()+0x36 [onecore\external\sdk\inc\wrl\event.h @ 964] 
06 000000b9`2deff5d0 00007ff9`70976cf4     Windows_Devices_Enumeration!Microsoft::WRL::InvokeTraits<-2>::InvokeDelegates<<lambda_00d1300cb6cb75d6b2b5344e37267964>,Windows::Foundation::ITypedEventHandler<Windows::Devices::Enumeration::DeviceWatcher *,Windows::Devices::Enumeration::DeviceInformation *> >+0x79 [onecore\internal\sdk\inc\wrl\internalevent.h @ 121] 
07 000000b9`2deff630 00007ff9`70981d3d     Windows_Devices_Enumeration!Microsoft::WRL::EventSource<Windows::Foundation::ITypedEventHandler<Windows::Devices::Enumeration::DeviceWatcher *,Windows::Devices::Enumeration::DeviceInformation *>,Microsoft::WRL::InvokeModeOptions<-2> >::DoInvoke<<lambda_00d1300cb6cb75d6b2b5344e37267964> >+0x78 [onecore\external\sdk\inc\wrl\event.h @ 954] 
08 (Inline Function) --------`--------     Windows_Devices_Enumeration!Microsoft::WRL::EventSource<Windows::Foundation::ITypedEventHandler<Windows::Devices::Enumeration::DeviceWatcher *,Windows::Devices::Enumeration::DeviceInformation *>,Microsoft::WRL::InvokeModeOptions<-2> >::InvokeAll+0x19 [onecore\external\sdk\inc\wrl\event.h @ 964] 
09 000000b9`2deff670 00007ff9`8ee2d946     Windows_Devices_Enumeration!Watcher<Windows::Devices::Enumeration::DeviceWatcher,Windows::Devices::Enumeration::IDeviceWatcher,Windows::Devices::Enumeration::IDeviceWatcher2,Windows::Devices::Enumeration::DeviceInformation,Windows::Devices::Enumeration::IDeviceInformation,Windows::Devices::Enumeration::IDeviceInformation2,DeviceInformationServer,Windows::Devices::Enumeration::DeviceInformationUpdate,Windows::Devices::Enumeration::IDeviceInformationUpdate,DeviceInformationUpdateServer,&RuntimeClass_Windows_Devices_Enumeration_DeviceWatcher>::Impl::DevQueryCallback+0x3ad [onecoreuap\base\devices\rtenum\dllsrv\Watcher.h @ 889] 
0a 000000b9`2deff710 00007ff9`91b9b0ea     cfgmgr32!TQuery::ServiceActionQueue+0xe2 [onecore\base\pnp\devquery\lib\query.cpp @ 245] 
0b 000000b9`2deff7a0 00007ff9`91b3ec06     ntdll!TppWorkpExecuteCallback+0x13a [minkernel\threadpool\ntdll\work.c @ 671] 
0c 000000b9`2deff7f0 00007ff9`8ff94ede     ntdll!TppWorkerThread+0x686 [minkernel\threadpool\ntdll\worker.c @ 1109] 
0d 000000b9`2deffae0 00007ff9`91b87c6b     kernel32!BaseThreadInitThunk+0x1e [clientcore\base\win32\client\thread.c @ 70] 
0e 000000b9`2deffb10 00000000`00000000     ntdll!RtlUserThreadStart+0x2b [minkernel\ntdll\rtlstrt.c @ 1152] 
0:049> .frame 0n0;dv /t /v
00 (Inline Function) --------`--------     combase!CGIPTable::GetRequestedInterface+0x22 [onecore\com\combase\dcomrem\giptbl.cxx @ 1615] 
@rbx              struct IUnknown * pUnk = 0x0000026d`fc5cc900
@r15              void * pVtableAddress = 0x00007ff8`f305cf70
<unavailable>     HRESULT hr = <value unavailable>
0:049> dps 0x0000026d`fc5cc900
0000026d`fc5cc900  00007ff8`f305cf70 <Unloaded_xxxxxxxxx.dll>+0xc2cf70
0000026d`fc5cc908  00000001`00000000
0000026d`fc5cc910  0000026d`fbdc9af0
0000026d`fc5cc918  00080000`00000000
0000026d`fc5cc920  00000000`00000008
0000026d`fc5cc928  00000008`4d454d4c
0000026d`fc5cc930  0000026d`fc4d3bf8
You can see that the handler's dll is already unloaded.
// or another example ///
:000> k
combase!CStdMarshal::DisconnectSrvIPIDs::__l29::<lambda_2a3a7b5175b0a5e47c77e1d8eff078e5>::operator()+0x7
combase!ObjectMethodExceptionHandlingAction<<lambda_2a3a7b5175b0a5e47c77e1d8eff078e5> >+0x24
combase!CStdMarshal::DisconnectSrvIPIDs+0x30d
combase!CStdMarshal::DisconnectWorker_ReleasesLock+0x2d7
combase!CStdMarshal::DisconnectSwitch_ReleasesLock+0x1c
combase!CStdMarshal::DisconnectAndReleaseWorker_ReleasesLock+0x32
combase!COIDTable::ThreadCleanup+0x117
combase!FinishShutdown::__l2::<lambda_3d4acc620ec77839d81caec938b15158>::operator()+0x5
combase!ObjectMethodExceptionHandlingAction<<lambda_3d4acc620ec77839d81caec938b15158> >+0x9
combase!FinishShutdown+0x78
combase!ApartmentUninitialize+0xc9
combase!wCoUninitialize+0x17d
combase!CoUninitialize+0xea
wuaueng!UHRunRemoteHandlerServer+0x25e
...
0:000> .exr -1
ExceptionAddress: 00007ffe8571a510 (combase!CStdMarshal::DisconnectSrvIPIDs::__l29::<lambda_2a3a7b5175b0a5e47c77e1d8eff078e5>::operator()+0x0000000000000007)
   ExceptionCode: c0000005 (Access violation)
  ExceptionFlags: 00000000
NumberParameters: 2
Attempt to read from address 00007ffe59d74ee8
0:000> ln 00007ffe59d74ee8
(00007ffe`59d74ee8)   <Unloaded_xxxxxxxxxxx.dll>+0x1b4ee8