Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows Terminal Sometimes Hangs, ControlCore::UpdatePatternLocations #12607

Closed
zadjii-msft opened this issue Mar 1, 2022 · 13 comments
Closed
Assignees
Labels
Area-TerminalControl Issues pertaining to the terminal control (input, selection, keybindings, mouse interaction, etc.) Issue-Bug It either shouldn't be doing this or needs an investigation. Priority-1 A description (P1) Product-Terminal The new Windows Terminal. Resolution-Fix-Available It's available in an Insiders build or a release

Comments

@zadjii-msft
Copy link
Member

From MSFT:38068994

REPRO STEPS
I have a machine which I normally only RDP into. On this machine, I will fairly frequently find Windows Terminal hung. I don't see this on other machines where I normally access from console. When hung, you can't even get the Alt-space Move/resize window, and right-click close on taskbar won't close it.

I have taken a crash dump and can post to a corpnet file share.

Stack:

 # Child-SP          RetAddr               Call Site
00 00000089`df2ff0f8 00007ffc`9a4f379d     ntdll!ZwWaitForAlertByThreadId+0x14 [minkernel\ntdll\daytona\objfre\amd64\usrstubs.asm @ 3891] 
01 00000089`df2ff100 00007ffc`9a4f3652     ntdll!RtlpWaitOnAddressWithTimeout+0x81 [minkernel\ntos\rtl\waitaddr.c @ 851] 
02 00000089`df2ff130 00007ffc`9a4f3363     ntdll!RtlpWaitOnAddress+0xae [minkernel\ntos\rtl\waitaddr.c @ 1094] 
03 00000089`df2ff1a0 00007ffc`97dfce3f     ntdll!RtlWaitOnAddress+0x13 [minkernel\ntos\rtl\waitaddr.c @ 946] 
04 00000089`df2ff1e0 00007ffc`682bd5e5     KERNELBASE!WaitOnAddress+0x2f [minkernel\kernelbase\synch.c @ 2229] 
05 (Inline Function) --------`--------     Microsoft_Terminal_Control!til::atomic_wait+0x18 [C:\a\_work\1\s\src\inc\til\atomic.h @ 13] 
06 00000089`df2ff220 00007ffc`682bd574     Microsoft_Terminal_Control!til::ticket_lock::lock+0x61 [C:\a\_work\1\s\src\inc\til\ticket_lock.h @ 35] 
07 00000089`df2ff260 00007ffc`682b9ee8     Microsoft_Terminal_Control!std::unique_lock<til::ticket_lock>::unique_lock<til::ticket_lock>+0x18 [C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\include\mutex @ 139] 
08 (Inline Function) --------`--------     Microsoft_Terminal_Control!Microsoft::Terminal::Core::Terminal::LockForWriting+0x23 [C:\a\_work\1\s\src\cascadia\TerminalCore\Terminal.cpp @ 869] 
09 00000089`df2ff290 00007ffc`682ebc35     Microsoft_Terminal_Control!winrt::Microsoft::Terminal::Control::implementation::ControlCore::UpdatePatternLocations+0x38 [C:\a\_work\1\s\src\cascadia\TerminalControl\ControlCore.cpp @ 495] 
0a (Inline Function) --------`--------     Microsoft_Terminal_Control!winrt::Microsoft::Terminal::Control::implementation::ControlCore::{ctor}::__l2::<lambda_bb2a51cbd1e4df8c07a1e01e653345d0>::operator()+0x21 [C:\a\_work\1\s\src\cascadia\TerminalControl\ControlCore.cpp @ 174] 
0b (Inline Function) --------`--------     Microsoft_Terminal_Control!std::invoke+0x21 [C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\include\type_traits @ 1585] 
0c (Inline Function) --------`--------     Microsoft_Terminal_Control!std::_Invoker_ret<void,1>::_Call+0x21 [C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\include\functional @ 744] 
0d 00000089`df2ff2e0 00007ffc`682cb0c9     Microsoft_Terminal_Control!std::_Func_impl_no_alloc<<lambda_bb2a51cbd1e4df8c07a1e01e653345d0>,void>::_Do_call+0x25 [C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\include\functional @ 920] 
0e (Inline Function) --------`--------     Microsoft_Terminal_Control!std::invoke+0x5 [C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\include\type_traits @ 1585] 
0f (Inline Function) --------`--------     Microsoft_Terminal_Control!std::_Apply_impl+0x5 [C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\include\tuple @ 1001] 
10 (Inline Function) --------`--------     Microsoft_Terminal_Control!std::apply+0x5 [C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\include\tuple @ 1006] 
11 00000089`df2ff310 00007ffc`682cb05d     Microsoft_Terminal_Control!<lambda_10a22f137e05dd06c4abfcbebc9b23bf>::operator()+0x5d [C:\a\_work\1\s\src\cascadia\WinRTUtils\inc\ThrottledFunc.h @ 116] 
12 00000089`df2ff360 00007ffc`94c97406     Microsoft_Terminal_Control!winrt::impl::delegate<winrt::Windows::System::DispatcherQueueHandler,<lambda_10a22f137e05dd06c4abfcbebc9b23bf> >::Invoke+0xd [C:\a\_work\1\s\src\cascadia\TerminalControl\Generated Files\winrt\Windows.system.h @ 1560] 
13 00000089`df2ff390 00007ffc`94c653db     

Hung on looking at a hyperlink.
version:
Microsoft.WindowsTerminal_1.11.3471

There were some changes here in 1.11 looks like:
Issues · microsoft/terminal (github.com)

I went ahead and installed latest v1.12.10393.0. I just RDP'ed in and got a hang. So, I have like three VS2022 tabs, two Ubuntu's, and what I saw was that one VS2022 tab stopped accepting my keystrokes, I tried the other tabs, they were taking my keystrokes fine, back to the non-accepting tab, and at that point Windows Terminal became non-responsive. Task manager refused to give a crash dump (The operation could not be completed / The operation is not valid for this process), but I was able to attach VS2022 debugger and get a crash dump that way:

(Dumps are in the internal bug, didn't feel comfortable sharing their dumps publicly)

@zadjii-msft zadjii-msft added Issue-Bug It either shouldn't be doing this or needs an investigation. Area-TerminalControl Issues pertaining to the terminal control (input, selection, keybindings, mouse interaction, etc.) Product-Terminal The new Windows Terminal. Priority-1 A description (P1) labels Mar 1, 2022
@zadjii-msft zadjii-msft added this to the Terminal v1.14 milestone Mar 1, 2022
@ghost ghost added the Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting label Mar 1, 2022
@zadjii-msft zadjii-msft removed the Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting label Mar 2, 2022
@lhecker
Copy link
Member

lhecker commented Mar 3, 2022

This can be either of these two things:

  • an unbalanced LockConsole / UnlockConsole call
    I think this is quite unlikely. I checked all invocations of these two functions throughout the project just now and they're either solid (wil::scope_exit = no possibility for mistakes) or not linked with WT (for instance conhost win32 control interactivity code is somewhat wonky but it's not linked with WT)
  • a thread holding the lock while being blocked on something, thus creating a deadlock

In case of the latter we can just check the threads in the crash dump for any stack that includes a prior lock acquisition. 🙂
I can take a stab at this after finishing my other tasks, if no one else gets to this first.

@lhecker
Copy link
Member

lhecker commented Mar 3, 2022

According to the latest dump from the user it's a deadlock due to the Nvidia driver.
Here's the stack trace:

 118  Id: 14848.32ce4 Suspend: 0 Teb: 00000086`0a114000 Unfrozen "Rendering Output Thread"
 # Child-SP          RetAddr               Call Site
00 00000086`0aefe738 00007ffe`05621a5e     ntdll!ZwWaitForSingleObject+0x14 [minkernel\ntdll\daytona\objfre\amd64\usrstubs.asm @ 211] 
01 00000086`0aefe740 00007ffd`e7bfbe76     KERNELBASE!WaitForSingleObjectEx+0x8e [minkernel\kernelbase\synch.c @ 1328] 
02 00000086`0aefe7e0 00007ffd`e7bfced0     nvwgf2umx_cfg!NVDEV_Thunk+0xab4d36
03 00000086`0aefed60 00007ffd`e7c0417c     nvwgf2umx_cfg!NVDEV_Thunk+0xab5d90
04 00000086`0aefed90 00007ffd`e7ca83a1     nvwgf2umx_cfg!NVDEV_Thunk+0xabd03c
05 00000086`0aefedc0 00007ffe`00df9be3     nvwgf2umx_cfg!OpenAdapter10+0x270c1
06 00000086`0aeff030 00007ffe`00df8f58     d3d11!NDXGI::CDevice::ReclaimResourcesImpl+0x673 [onecoreuap\windows\directx\dxg\d3d11\d3dcore\lowfreq\dxgidevice.cpp @ 5030] 
07 00000086`0aeff180 00007ffe`0106a8ee     d3d11!NDXGI::CDevice::ReclaimResources+0x68 [onecoreuap\windows\directx\dxg\d3d11\d3dcore\lowfreq\dxgidevice.cpp @ 4904] 
08 (Inline Function) --------`--------     d2d1!OfferableResourceManager::ReclaimResourcesImmediately+0x1e [onecoreuap\windows\direct2d\d2dcommon\resources\offerableresource.cpp @ 361] 
09 (Inline Function) --------`--------     d2d1!OfferableResourceManager::ReclaimResourcesInList+0x1ea [onecoreuap\windows\direct2d\d2dcommon\resources\offerableresource.cpp @ 325] 
0a 00000086`0aeff2a0 00007ffe`0106a4cd     d2d1!CHwSurfaceRenderTarget::BeginProcessBatch+0x23e [onecoreuap\windows\direct2d\core\hw\hwsurfrt.cpp @ 7309] 
0b 00000086`0aeff4a0 00007ffe`0109520f     d2d1!CHwSurfaceRenderTarget::ProcessBatch+0x2d [onecoreuap\windows\direct2d\core\hw\hwsurfrt.cpp @ 7388] 
0c 00000086`0aeff4f0 00007ffe`0106b2a8     d2d1!CBatchSerializer::FlushInternal+0xcf [onecoreuap\windows\direct2d\core\batching\batchserializer.cpp @ 308] 
0d (Inline Function) --------`--------     d2d1!CBatchSerializer::Flush+0x14 [onecoreuap\windows\direct2d\core\batching\batchserializer.cpp @ 440] 
0e (Inline Function) --------`--------     d2d1!DrawingContext::FlushBatch+0x18 [onecoreuap\windows\direct2d\core\targets\drawingcontext.cpp @ 5958] 
0f 00000086`0aeff580 00007ffe`0106a16b     d2d1!DrawingContext::Flush+0x88 [onecoreuap\windows\direct2d\core\targets\drawingcontext.cpp @ 3409] 
10 (Inline Function) --------`--------     d2d1!DrawingContext::EndDraw+0x2d [onecoreuap\windows\direct2d\core\targets\drawingcontext.cpp @ 4670] 
11 00000086`0aeff5e0 00007ffd`d478bbe7     d2d1!D2DDeviceContextBase<ID2D1DeviceContext6,ID2D1DeviceContext6,null_type>::EndDraw+0xeb [onecoreuap\windows\Direct2D\core\targets\D2DRenderTarget.inl @ 804] 
12 00000086`0aeff6b0 00007ffd`d478b7ee     Microsoft_Terminal_Control!Microsoft::Console::Render::DxEngine::EndPaint+0xa7 [C:\a\_work\1\s\src\renderer\dx\DxRenderer.cpp @ 1386] 
13 (Inline Function) --------`--------     Microsoft_Terminal_Control!Microsoft::Console::Render::Renderer::_PaintFrameForEngine::__l3::<lambda_13faae4558a4495f8deb13383abe653f>::operator()+0x12 [C:\a\_work\1\s\src\renderer\base\renderer.cpp @ 133] 
14 (Inline Function) --------`--------     Microsoft_Terminal_Control!wil::details::lambda_call<<lambda_13faae4558a4495f8deb13383abe653f> >::reset+0x17 [C:\a\_work\1\s\dep\wil\include\wil\resource.h @ 478] 
15 00000086`0aeff950 00007ffd`d478ba9a     Microsoft_Terminal_Control!Microsoft::Console::Render::Renderer::_PaintFrameForEngine+0x1aa [C:\a\_work\1\s\src\renderer\base\renderer.cpp @ 172] 
16 00000086`0aeff9f0 00007ffd`d47b0574     Microsoft_Terminal_Control!Microsoft::Console::Render::Renderer::PaintFrame+0x4a [C:\a\_work\1\s\src\renderer\base\renderer.cpp @ 78] 
17 00000086`0aeffa20 00007ffe`06707034     Microsoft_Terminal_Control!Microsoft::Console::Render::RenderThread::_ThreadProc+0x54 [C:\a\_work\1\s\src\renderer\base\thread.cpp @ 213] 
18 00000086`0aeffa50 00007ffe`07e22651     kernel32!BaseThreadInitThunk+0x14 [clientcore\base\win32\client\thread.c @ 64] 
19 00000086`0aeffa80 00000000`00000000     ntdll!RtlUserThreadStart+0x21 [minkernel\ntdll\rtlstrt.c @ 1153] 

Since the renderer is holding the lock and has no means for timeouts for synchronous calls like EndPaint(), the lock is held indefinitely.

@lhecker
Copy link
Member

lhecker commented Mar 9, 2022

I've taken this off of our 1.14 & and 1.12/1.13 backport targets, because I found that this is a bug that almost certainly originates from the Nvidia driver. I left it in 22H1 if we want to follow up on this.

@zadjii-msft
Copy link
Member Author

Thought - can we do UpdatePatternLocations off the main thread? The throttled func comes back on the UI thread. There we try to take the terminal lock, but nvidia has it. If we resume_background at the top of UpdatePatternLocations, then take the lock, at least we don't deadlock the UI thread... Presumably, we would hang the UI thread somewhere else, but maybe this lock on the UI thread is what's preventing the rendering thread from returning from whatever WFSO Nvidia is doing

@zadjii-msft zadjii-msft added the Needs-Discussion Something that requires a team discussion before we can proceed label Jul 28, 2022
@lhecker
Copy link
Member

lhecker commented Jul 28, 2022

Since it reads from the TextBuffer (and a lot at that) it needs to hold the buffer lock.

What we could do however is to rewrite our asynchronous rendering functions (like UpdatePatternLocations) to work in terms of the regular rendering loop via RenderThread. So right now we got:

  • 1 throttled func for TSF
  • 1 throttled func for patterns
  • 2 independent throttled funcs for the scrollbar position
  • 1 timer for cursor blinking

In the near future we'll get another one:

  • 1 timer to discard unused memory from AtlasEngine

So we got 6 timers soon which all lock the buffer in one way or another and all of them are supposed to sorta run in sync with the rendering loop. Now I'm not saying it's trivial to invert control from "ControlCore updates RenderThread" to "RenderThread updates ControlCore", but I'm sure that this is worth the effort, because we can then avoid issues like that easily and simplify our code by reducing the amount of "state" of our application.

@zadjii-msft
Copy link
Member Author

note to future readers

Checking the "Use software rendering (WARP)" box in the settings will likely mitigate this for you, at the cost of some performance penalty. At least until we've got this sorted out.

@lhecker lhecker modified the milestones: Terminal v1.16, 22H2 Aug 16, 2022
ghost pushed a commit that referenced this issue Aug 16, 2022
We have a number of theories why #12607 is happening, one of which is that
some GPU drivers somehow rely on Win32 messages or similar which we process
on the main thread. If we then try to acquire the console lock on the main
thread, while the GPU-driver thread itself is holding that lock, we've got
ourselves a deadlock. This PR makes this less likely by running the repeat
offender `UpdatePatternLocations` on a background thread instead.
We have a number of other locations which acquire the console lock on the
main thread and a thorough bug fix must be done in a different way.

## Validation Steps Performed
* After pasting an URL it gets underlined on hover ✅
PKRoma pushed a commit to PKRoma/Terminal that referenced this issue Oct 15, 2022
We have a number of theories why microsoft#12607 is happening, one of which is that
some GPU drivers somehow rely on Win32 messages or similar which we process
on the main thread. If we then try to acquire the console lock on the main
thread, while the GPU-driver thread itself is holding that lock, we've got
ourselves a deadlock. This PR makes this less likely by running the repeat
offender `UpdatePatternLocations` on a background thread instead.
We have a number of other locations which acquire the console lock on the
main thread and a thorough bug fix must be done in a different way.

## Validation Steps Performed
* After pasting an URL it gets underlined on hover ✅

(cherry picked from commit 23e4d31)
Service-Card-Id: 85033019
Service-Version: 1.15
PKRoma pushed a commit to PKRoma/Terminal that referenced this issue Oct 15, 2022
We have a number of theories why microsoft#12607 is happening, one of which is that
some GPU drivers somehow rely on Win32 messages or similar which we process
on the main thread. If we then try to acquire the console lock on the main
thread, while the GPU-driver thread itself is holding that lock, we've got
ourselves a deadlock. This PR makes this less likely by running the repeat
offender `UpdatePatternLocations` on a background thread instead.
We have a number of other locations which acquire the console lock on the
main thread and a thorough bug fix must be done in a different way.

* After pasting an URL it gets underlined on hover ✅

(cherry picked from commit 23e4d31)
Service-Card-Id: 85033018
Service-Version: 1.14
@zadjii-msft zadjii-msft modified the milestones: 22H2, Terminal v1.17 Dec 5, 2022
@rwasef1830
Copy link

Windows 11 22h2 KB5022913 seems to have fixed this hang for me in chrome and edge. It may be fixed in Windows Terminal as well.

@lhecker
Copy link
Member

lhecker commented Apr 12, 2023

This will most likely be fixed by #14959 because it doesn't call any graphics APIs while the console lock is being held.

@zadjii-msft
Copy link
Member Author

Anyone still seeing this on 1.18 Stable & 1.19 Preview/?

@microsoft-github-policy-service microsoft-github-policy-service bot added Needs-Author-Feedback The original author of the issue/PR needs to come back and respond to something No-Recent-Activity This issue/PR is going stale and may be auto-closed without further activity. labels Sep 27, 2023
@rwasef1830
Copy link

This issue seems to have been solved for me since 22H2 windows 11 update.

@microsoft-github-policy-service microsoft-github-policy-service bot removed the No-Recent-Activity This issue/PR is going stale and may be auto-closed without further activity. label Oct 2, 2023
@zadjii-msft
Copy link
Member Author

Welp, that sounds good to me. Thanks for following up!

@microsoft-github-policy-service microsoft-github-policy-service bot added the Needs-Attention The core contributors need to come back around and look at this ASAP. label Oct 2, 2023
@zadjii-msft zadjii-msft added Resolution-Fix-Available It's available in an Insiders build or a release and removed Needs-Attention The core contributors need to come back around and look at this ASAP. labels Oct 2, 2023
@microsoft-github-policy-service microsoft-github-policy-service bot removed the Needs-Author-Feedback The original author of the issue/PR needs to come back and respond to something label Oct 2, 2023
@DrPhillOS
Copy link

This issue seems to be back with 24H2. Enabling WARP software rendering instantly unlocks my terminal (ssh session). I can even toggle it on/off and freeze / unfreeze my shell.. NICE!

Was it an NVIDIA driver issue?

@lhecker
Copy link
Member

lhecker commented Jan 30, 2025

It may be related to this issue which is a bug in 24H2 (the fix is currently in the process of being released): #18040

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-TerminalControl Issues pertaining to the terminal control (input, selection, keybindings, mouse interaction, etc.) Issue-Bug It either shouldn't be doing this or needs an investigation. Priority-1 A description (P1) Product-Terminal The new Windows Terminal. Resolution-Fix-Available It's available in an Insiders build or a release
Projects
None yet
Development

No branches or pull requests

4 participants