-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Unit tests time out randomly on ubuntu-latest #673
Comments
It seems that the tests always get stuck on the Ubuntu runner, but not always on the same test after all. Last failure:
|
And another one, also on Ubuntu:
|
Suggest we might try adding a global timeout to the test so we might actually get some debug information rather than GitHub killing the runner? |
We just had another test that got stuck: https://github.com/ISISNeutronMuon/MDANSE/actions/runs/13585226327/job/37978614629 It was a parallel job. The most relevant part of the error output is:
|
Description of the error
Our unit test workflow seems to get stuck on the
ubuntu-latest
runner. We have 4 workflows onubuntu-latest
, for different versions of Python. Sometimes one out of four (or, rarely, two out of four https://github.com/ISISNeutronMuon/MDANSE/actions/runs/13350554804 ) ends up marked as failed, with the error:The job running on runner GitHub Actions X has exceeded the maximum execution time of 360 minutes.
Describe the expected result
Most of the time, if the tests all passed on other platform, they should also pass on Linux.
Describe the actual result
Usually, the test all complete without problems. Sometimes, the unit tests stop running after the first file. The output is:
Analysis/test_average_structure.py ................ [ 2%]
and the next test in
Analysis/test_dynamics.py
either never starts, or never completes.
In the end the entire workflow times out.
Suggested fix
Since it is not reproducible, and seems only to affect one platform, it is difficult to recognise if the error is on our side, or is a problem with the runner. However, if the test fails, it seems to fail always at the same point in the workflow, so maybe the unit tests need to be changed somehow.
From the different point of view, we could also lower the time limit for the workflows, so they fail sooner. We don't normally need more than 20 minutes, so a limit of 30 minutes would be more than enough.
Additional details
N/A
The text was updated successfully, but these errors were encountered: