Skip to content

fix init image at creation #759

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

rjodinchr
Copy link
Contributor

We need to create the cvk_command_image_init with a valid context. Otherwise we can have a race condition between the destruction of queue and commands being deleted in the executor thread.

@rjodinchr
Copy link
Contributor Author

switftshader artifact needs to be refreshed

@kpet
Copy link
Owner

kpet commented Jan 24, 2025

Kicked off https://github.com/kpet/clvk/actions/runs/12951700691

@rjodinchr rjodinchr force-pushed the pr/fix-init-image-at-creation branch from 8908894 to 085672d Compare January 24, 2025 15:12
@rjodinchr
Copy link
Contributor Author

Can you re-trigger the failing bot? It's the first time I'm seeing that error. I wonder if it happens just very rarely. I'm not able to reproduce on my machine. That may be something we just want to filter in the test to avoid flakyness I guess

@rjodinchr rjodinchr force-pushed the pr/fix-init-image-at-creation branch 2 times, most recently from 5e3541a to a9683d9 Compare February 25, 2025 06:45
@rjodinchr rjodinchr force-pushed the pr/fix-init-image-at-creation branch from a9683d9 to 9ef9dc1 Compare March 18, 2025 14:51
@rjodinchr
Copy link
Contributor Author

In fact, the bug was much simpler than what I was thinking. It's just about not forcing the deletion of the queue.

@rjodinchr
Copy link
Contributor Author

CI should be fixed with #771

@rjodinchr
Copy link
Contributor Author

Using a cvk_command_queue_holder would be better looking, but we would go back to the headers issue: #735

@rjodinchr rjodinchr force-pushed the pr/fix-init-image-at-creation branch from 9ef9dc1 to 12af440 Compare March 24, 2025 07:51
@rjodinchr rjodinchr force-pushed the pr/fix-init-image-at-creation branch from 12af440 to 7891c2c Compare April 8, 2025 14:21
@rjodinchr
Copy link
Contributor Author

Fixing the bug also showed another deadlock between the executor and the main thread, for which I've added the fix.

The deadlock involves the lock of cvk_executor_thread.m_lock and cvk_executor_thread_pool.m_lock. The main thread has cvk_executor_thread_pool.m_lock running in cvk_executor_thread_pool::get_executor, and is trying to take cvk_executor_thread.m_lock in cvk_executor_thread::is_idle, while an executor is holding cvk_executor_thread.m_lock from cvk_executor_thread::executor and is trying to get cvk_executor_thread_pool.m_lock while deleting the queue that noone hold anymore in cvk_executor_thread_pool::return_executor.

main:
  cvk_executor_thread::is_idle -> tries to take cvk_executor_thread.m_lock
  cvk_executor_thread_pool::get_executor -> has cvk_executor_thread_pool.m_lock
executor:
  cvk_executor_thread_pool::return_executor -> tries to take cvk_executor_thread_pool.m_lock
  cvk_executor_thread::executor -> has cvk_executor_thread.m_lock

@kpet
Copy link
Owner

kpet commented Apr 8, 2025

How did you bump into that deadlock? What test or application were you running?

@rjodinchr
Copy link
Contributor Author

How did you bump into that deadlock? What test or application were you running?

Several tests in api_tests started to be flaky (with that deadlock) once I updated ChromeOS with that new version of the fix (with only the release on the command queue).

@rjodinchr
Copy link
Contributor Author

rjodinchr commented Apr 14, 2025

TODO: move queue_holder into group?
Done

@rjodinchr rjodinchr force-pushed the pr/fix-init-image-at-creation branch from 7891c2c to 4a9460d Compare April 15, 2025 08:09
@rjodinchr rjodinchr force-pushed the pr/fix-init-image-at-creation branch from 4a9460d to 32f2248 Compare April 15, 2025 08:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants