-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restarting a service can fail because a process is still running #1570
Comments
@benhoyt probably belongs to canonical/pebble. |
Hi @gboutry, thanks for the report. As of Pebble v1.17.0, which happily is what you're using, Pebble will report that the error occurred (via the ChangeError that you're seeing) but keep trying to start the service in the background. You'll notice the "will restart" at the end of the error message, indicating it'll do that. So assuming Pebble >= v1.17.0, you can probably catch the ChangeError, log it, and expect Pebble to keep trying to auto-restart it in backoff mode. However, it does beg the question: is the httpd server not shutting down correctly? It seems like it's not doing a graceful shutdown and the port is staying open, or a daemonised subprocess (pid 434) is still running. Ideally it would shut down gracefully/correctly. Or maybe there's a way to have httpd not daemonise? Not quite sure on the details here. |
Off-topic: see https://modwsgi.readthedocs.io/en/latest/user-guides/application-issues.html#registration-of-signal-handlers to change the default signals behaviour, if oslo-reports responding to USR2 "meditation report" is intended. |
Hey, thanks for the answer. Yes, the restart goes through at some point. -DFOREGROUND makes the httpd daemon not fork, and from its docs, sending SIGTERM will stop all the child processes and itself. From Pebble implementation, I was under the impression it waited for a grace period for the SIGTERM to stop the process and would send SIGKILL if too long. I thought there might be an issue in managing that group of processes or detecting if the process is down while not? Maybe it's all on httpd side. |
Hard to say without the pstree snapshot before the attempted restart... Could it be that |
In sunbeam, some events require the restart of the wsgi application. This application is served by apache httpd. In some occurrences, the call to
container.restart("wsgi-nova-api")
fails to start because the httpd process is still running.This is event has a low frequency rate in my test environment, but some of our users have a high frequency rate.
This behavior has been observed in at least 5 different charms using apache httpd, the following logs are for the
nova-k8s
charm:pebble plan:
pebble version:
Process tree:
Debug logs:
The text was updated successfully, but these errors were encountered: