Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scheduler.py: Try to not fail if submit have exception #367

Merged
merged 1 commit into from
Dec 20, 2023

Conversation

nuclearcat
Copy link
Member

We are having quite often errors on LAVA submit, and this will cause scheduler to crash.
This is generic attempt to keep running on such failures, which might be handled better in future.

We are having quite often errors on LAVA submit, and this
will cause scheduler to crash.
This is generic attempt to keep running on such failures,
which might be handled better in future.

Signed-off-by: Denys Fedoryshchenko <denys.f@collabora.com>
Copy link
Contributor

@pawiecz pawiecz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you point to the failing nodes? It would be useful to eliminate root cause of such issues in the future.

try:
running_job = runtime.submit(output_file)
except Exception as e:
self.log.error(' '.join([
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would probably be neater to collect node/runtime/job data first and append it with TestJob ID/error message for logging but it could be the next step

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, and any failure of schedulers or any processes in pipeline very big and interesting topic, where we need to retry, where we need to log failure, and how to log it.

@nuclearcat
Copy link
Member Author

Could you point to the failing nodes? It would be useful to eliminate root cause of such issues in the future.

yesterday at 11:39:02 PM12/07/2023 09:39:02 PM UTC [ERROR] 65723b76ea59c4c039f324ff lava-collabora-staging minnowboard-turbot-E3826 baseline-x86 400 Client Error: Bad Request for url: https://staging.lava.collabora.dev/api/v0.2/jobs/?format=json&limit=256

@nuclearcat nuclearcat marked this pull request as ready for review December 20, 2023 12:57
@nuclearcat nuclearcat added this pull request to the merge queue Dec 20, 2023
Merged via the queue into kernelci:main with commit da52a66 Dec 20, 2023
3 checks passed
@nuclearcat nuclearcat deleted the workaround-exception-submit branch December 20, 2023 12:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants