Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restarting a service can fail because a process is still running #1570

Open
gboutry opened this issue Feb 6, 2025 · 5 comments
Open

Restarting a service can fail because a process is still running #1570

gboutry opened this issue Feb 6, 2025 · 5 comments

Comments

@gboutry
Copy link

gboutry commented Feb 6, 2025

In sunbeam, some events require the restart of the wsgi application. This application is served by apache httpd. In some occurrences, the call to container.restart("wsgi-nova-api") fails to start because the httpd process is still running.

This is event has a low frequency rate in my test environment, but some of our users have a high frequency rate.

This behavior has been observed in at least 5 different charms using apache httpd, the following logs are for the nova-k8s charm:

pebble plan:

services:
    wsgi-nova-api:
        summary: nova-api wsgi
        startup: disabled
        override: replace
        command: /usr/sbin/apache2ctl -DFOREGROUND
checks:
    online:
        override: replace
        level: ready
        period: 10s
        timeout: 3s
        threshold: 3
        http:
            url: http://localhost:8774/
    up:
        override: replace
        level: alive
        period: 10s
        timeout: 3s
        threshold: 3
        exec:
            command: service apache2 status

pebble version:

client  v1.17.0
server  v1.17.0

Process tree:

root@nova-0:/# ps -ef --forest
UID          PID    PPID  C STIME TTY          TIME CMD
root        1308       0  0 10:07 pts/0    00:00:00 bash
root        1572    1308  0 10:13 pts/0    00:00:00  \_ ps -ef --forest
root           1       0  0 09:34 ?        00:00:01 /charm/bin/pebble run --create-dirs --hold --http :38813 --verbose
root         586       1  0 09:52 ?        00:00:00 /bin/sh /usr/sbin/apache2ctl -DFOREGROUND
root         588     586  0 09:52 ?        00:00:00  \_ /usr/sbin/apache2 -DFOREGROUND
nova         589     588  0 09:52 ?        00:00:02      \_ (wsgi:nova-api)   -DFOREGROUND
nova         590     588  0 09:52 ?        00:00:02      \_ (wsgi:nova-api)   -DFOREGROUND
nova         591     588  0 09:52 ?        00:00:02      \_ (wsgi:nova-api)   -DFOREGROUND
nova         592     588  0 09:52 ?        00:00:02      \_ (wsgi:nova-api)   -DFOREGROUND
nova         593     588  0 09:52 ?        00:00:00      \_ (wsgi:nova-metada -DFOREGROUND
nova         594     588  0 09:52 ?        00:00:00      \_ (wsgi:nova-metada -DFOREGROUND
nova         601     588  0 09:52 ?        00:00:00      \_ (wsgi:nova-metada -DFOREGROUND
nova         602     588  0 09:52 ?        00:00:00      \_ (wsgi:nova-metada -DFOREGROUND
www-data     603     588  0 09:52 ?        00:00:00      \_ /usr/sbin/apache2 -DFOREGROUND
www-data     604     588  0 09:52 ?        00:00:00      \_ /usr/sbin/apache2 -DFOREGROUND

Debug logs:

unit-nova-0: 09:52:30 DEBUG unit.nova/0.juju-log amqp:25: ops 2.18.1 up and running.
unit-nova-0: 09:52:30 DEBUG unit.nova/0.juju-log amqp:25: Setting up ingress event handler
unit-nova-0: 09:52:33 DEBUG unit.nova/0.juju-log amqp:25: Setting up ingress event handler
unit-nova-0: 09:52:33 DEBUG unit.nova/0.juju-log amqp:25: Ingress-internal relation found, returning ingress_internal.url of: http://10.206.54.132/zaza-b37d23b93787-nova
unit-nova-0: 09:52:33 DEBUG unit.nova/0.juju-log amqp:25: Ingress-internal relation found, returning ingress_internal.url of: http://10.206.54.132/zaza-b37d23b93787-nova
unit-nova-0: 09:52:33 DEBUG unit.nova/0.juju-log amqp:25: Setting up Identity Service event handler
unit-nova-0: 09:52:34 DEBUG unit.nova/0.juju-log amqp:25: Setting up log forward event handler
unit-nova-0: 09:52:34 DEBUG unit.nova/0.juju-log amqp:25: Invalid Loki alert rules folder at /var/lib/juju/agents/unit-nova-0/charm/src/loki_alert_rules: directory does not exist
unit-nova-0: 09:52:34 DEBUG unit.nova/0.juju-log amqp:25: Setting up AMQP event handler
unit-nova-0: 09:52:34 DEBUG unit.nova/0.juju-log amqp:25: Setting up DB event handler
unit-nova-0: 09:52:34 DEBUG unit.nova/0.juju-log amqp:25: Setting up DB event handler
unit-nova-0: 09:52:34 DEBUG unit.nova/0.juju-log amqp:25: Setting up DB event handler
unit-nova-0: 09:52:35 DEBUG unit.nova/0.juju-log amqp:25: Setting up peer event handler
unit-nova-0: 09:52:35 DEBUG unit.nova/0.juju-log amqp:25: Cannot add handler for relation certificates, relation not present in charm metadata
unit-nova-0: 09:52:35 DEBUG unit.nova/0.juju-log amqp:25: Cannot add handler for relation identity-credentials, relation not present in charm metadata
unit-nova-0: 09:52:35 DEBUG unit.nova/0.juju-log amqp:25: Cannot add handler for relation ceph-access, relation not present in charm metadata
unit-nova-0: 09:52:35 DEBUG unit.nova/0.juju-log amqp:25: Setting up certificate transfer event handler
unit-nova-0: 09:52:36 DEBUG unit.nova/0.juju-log amqp:25: Setting up cloud-compute event handler
unit-nova-0: 09:52:36 DEBUG unit.nova/0.juju-log amqp:25: Setting up Nova service event handler
unit-nova-0: 09:52:36 DEBUG unit.nova/0.juju-log amqp:25: Setting up ingress event handler
unit-nova-0: 09:52:36 DEBUG unit.nova/0.juju-log amqp:25: Setting up ingress event handler
unit-nova-0: 09:52:36 DEBUG unit.nova/0.juju-log amqp:25: Setting up service-ready event handler for placement
unit-nova-0: 09:52:36 DEBUG unit.nova/0.juju-log amqp:25: no relation on 'tracing': tracing not ready
unit-nova-0: 09:52:36 DEBUG unit.nova/0.juju-log amqp:25: Emitting Juju event amqp_relation_changed.
unit-nova-0: 09:52:36 DEBUG unit.nova/0.juju-log amqp:25: RabbitMQRabbitMQRequires on_changed/departed
unit-nova-0: 09:52:36 DEBUG unit.nova/0.juju-log amqp:25: Emitting custom event <RabbitMQReadyEvent via NovaOperatorCharm/RabbitMQRequires[amqp]/on/ready[292]>.
unit-nova-0: 09:52:36 DEBUG unit.nova/0.juju-log amqp:25: Found metadata secret in leader DB
unit-nova-0: 09:52:36 INFO unit.nova/0.juju-log amqp:25: Entering guarded section: 'Bootstrapping'
unit-nova-0: 09:52:36 DEBUG unit.nova/0.juju-log amqp:25: send_requests not implemented for <ops_sunbeam.relation_handlers.IngressInternalHandler object at 0x761fb5cea8a0>
unit-nova-0: 09:52:37 DEBUG unit.nova/0.juju-log amqp:25: send_requests not implemented for <ops_sunbeam.relation_handlers.IngressPublicHandler object at 0x761fb5ceb0e0>
unit-nova-0: 09:52:37 DEBUG unit.nova/0.juju-log amqp:25: send_requests not implemented for <ops_sunbeam.relation_handlers.IdentityServiceRequiresHandler object at 0x761fb62c33b0>
unit-nova-0: 09:52:37 DEBUG unit.nova/0.juju-log amqp:25: send_requests not implemented for <ops_sunbeam.relation_handlers.LogForwardHandler object at 0x761fb55d1b20>
unit-nova-0: 09:52:37 DEBUG unit.nova/0.juju-log amqp:25: send_requests not implemented for <ops_sunbeam.relation_handlers.RabbitMQHandler object at 0x761fb55a4f20>
unit-nova-0: 09:52:37 DEBUG unit.nova/0.juju-log amqp:25: send_requests not implemented for <ops_sunbeam.relation_handlers.DBHandler object at 0x761fb5564170>
unit-nova-0: 09:52:37 DEBUG unit.nova/0.juju-log amqp:25: send_requests not implemented for <ops_sunbeam.relation_handlers.DBHandler object at 0x761fb55a69c0>
unit-nova-0: 09:52:37 DEBUG unit.nova/0.juju-log amqp:25: send_requests not implemented for <ops_sunbeam.relation_handlers.DBHandler object at 0x761fb55a5eb0>
unit-nova-0: 09:52:38 DEBUG unit.nova/0.juju-log amqp:25: send_requests not implemented for <ops_sunbeam.relation_handlers.BasePeerHandler object at 0x761fb5487800>
unit-nova-0: 09:52:38 DEBUG unit.nova/0.juju-log amqp:25: send_requests not implemented for <ops_sunbeam.relation_handlers.CertificateTransferRequiresHandler object at 0x761fb54bc590>
unit-nova-0: 09:52:38 DEBUG unit.nova/0.juju-log amqp:25: send_requests not implemented for <__main__.CloudComputeRequiresHandler object at 0x761fb537f3b0>
unit-nova-0: 09:52:38 DEBUG unit.nova/0.juju-log amqp:25: send_requests not implemented for <__main__.NovaServiceProvidesHandler object at 0x761fb55c21e0>
unit-nova-0: 09:52:38 DEBUG unit.nova/0.juju-log amqp:25: send_requests not implemented for <ops_sunbeam.relation_handlers.TraefikRouteHandler object at 0x761fb5362510>
unit-nova-0: 09:52:38 DEBUG unit.nova/0.juju-log amqp:25: send_requests not implemented for <ops_sunbeam.relation_handlers.TraefikRouteHandler object at 0x761fb5363d40>
unit-nova-0: 09:52:38 DEBUG unit.nova/0.juju-log amqp:25: send_requests not implemented for <ops_sunbeam.relation_handlers.ServiceReadinessRequiresHandler object at 0x761fb53a31a0>
unit-nova-0: 09:52:38 DEBUG unit.nova/0.juju-log amqp:25: Running init for nova-api
unit-nova-0: 09:52:41 DEBUG unit.nova/0.juju-log amqp:25: Wrote template /etc/nova/nova.conf in container nova-api.
unit-nova-0: 09:52:41 DEBUG unit.nova/0.juju-log amqp:25: Changes detected in ['/etc/nova/nova.conf']
unit-nova-0: 09:52:41 DEBUG unit.nova/0.juju-log amqp:25: /usr/local/share/ca-certificates/ca-bundle.pem in nova-api matches desired content.
unit-nova-0: 09:52:41 DEBUG unit.nova/0.juju-log amqp:25: No file changes detected
unit-nova-0: 09:52:41 DEBUG unit.nova/0.juju-log amqp:25: /root/cell_create_wrapper.sh in nova-api matches desired content.
unit-nova-0: 09:52:41 DEBUG unit.nova/0.juju-log amqp:25: No file changes detected
unit-nova-0: 09:52:42 DEBUG unit.nova/0.juju-log amqp:25: /etc/apache2/sites-available/wsgi-nova-api.conf in nova-api matches desired content.
unit-nova-0: 09:52:42 DEBUG unit.nova/0.juju-log amqp:25: No file changes detected
unit-nova-0: 09:52:42 DEBUG unit.nova/0.juju-log amqp:25: Changes detected in ['/etc/nova/nova.conf']
unit-nova-0: 09:52:43 DEBUG unit.nova/0.juju-log amqp:25: Output from a2ensite: 
Site wsgi-nova-api already enabled

unit-nova-0: 09:52:43 DEBUG unit.nova/0.juju-log amqp:25: /etc/nova/nova.conf in nova-api matches desired content.
unit-nova-0: 09:52:43 DEBUG unit.nova/0.juju-log amqp:25: No file changes detected
unit-nova-0: 09:52:43 DEBUG unit.nova/0.juju-log amqp:25: /usr/local/share/ca-certificates/ca-bundle.pem in nova-api matches desired content.
unit-nova-0: 09:52:43 DEBUG unit.nova/0.juju-log amqp:25: No file changes detected
unit-nova-0: 09:52:43 DEBUG unit.nova/0.juju-log amqp:25: /root/cell_create_wrapper.sh in nova-api matches desired content.
unit-nova-0: 09:52:44 DEBUG unit.nova/0.juju-log amqp:25: No file changes detected
unit-nova-0: 09:52:44 DEBUG unit.nova/0.juju-log amqp:25: /etc/apache2/sites-available/wsgi-nova-api.conf in nova-api matches desired content.
unit-nova-0: 09:52:44 DEBUG unit.nova/0.juju-log amqp:25: No file changes detected
unit-nova-0: 09:52:44 DEBUG unit.nova/0.juju-log amqp:25: No file changes detected
unit-nova-0: 09:52:44 DEBUG unit.nova/0.juju-log amqp:25: Restarting wsgi-nova-api in nova-api
unit-nova-0: 09:52:49 ERROR unit.nova/0.juju-log amqp:25: Exception raised in section 'Bootstrapping': cannot perform the following tasks:
- Start service "wsgi-nova-api" (service start attempt: exited quickly with code 0, will restart)
----- Logs from task 1 -----
2025-02-06T09:52:49Z INFO Most recent service output:
    (...)
    2025-02-06 09:52:45.179727 2025-02-06 09:52:45.179 435 INFO nova.api.openstack.requestlog [None req-a920439a-d870-4845-b4d1-1f95ae41af84 - - - - - -] ::1 "GET /" status: 200 len: 367 microversion: - time: 0.000518\x1b[00m
    ::1 - - [06/Feb/2025:09:52:01 +0000] "GET / HTTP/1.1" 200 0 "-" "Go-http-client/1.1"
    2025-02-06 09:52:46.199959 2025-02-06 09:52:46.199 438 INFO nova.api.openstack.requestlog [None req-c699da42-e427-4ddb-8f1f-7df527e762f8 - - - - - -] ::1 "GET /" status: 200 len: 367 microversion: - time: 0.001746\x1b[00m
    ::1 - - [06/Feb/2025:09:52:11 +0000] "GET / HTTP/1.1" 200 0 "-" "Go-http-client/1.1"
    2025-02-06 09:52:46.270675 mod_wsgi (pid=437): Callback registration for signal 12 ignored.
    2025-02-06 09:52:46.278078   File "/usr/bin/nova-api-wsgi", line 52, in <module>
    2025-02-06 09:52:46.278304     application = init_application()
    2025-02-06 09:52:46.278374   File "/usr/lib/python3/dist-packages/nova/api/openstack/compute/wsgi.py", line 20, in init_application
    2025-02-06 09:52:46.278405     return wsgi_app.init_application(NAME)
    2025-02-06 09:52:46.278445   File "/usr/lib/python3/dist-packages/nova/api/openstack/wsgi_app.py", line 128, in init_application
    2025-02-06 09:52:46.278474     init_global_data(conf_files, name)
    2025-02-06 09:52:46.278516   File "/usr/lib/python3/dist-packages/nova/utils.py", line 1133, in wrapper
    2025-02-06 09:52:46.278548     return func(*args, **kwargs)
    2025-02-06 09:52:46.278569   File "/usr/lib/python3/dist-packages/nova/api/openstack/wsgi_app.py", line 105, in init_global_data
    2025-02-06 09:52:46.278611     gmr.TextGuruMeditation.setup_autorun(
    2025-02-06 09:52:46.278693   File "/usr/lib/python3/dist-packages/oslo_reports/guru_meditation_report.py", line 160, in setup_autorun
    2025-02-06 09:52:46.278753     cls._setup_signal(signal.SIGUSR2,
    2025-02-06 09:52:46.278772   File "/usr/lib/python3/dist-packages/oslo_reports/guru_meditation_report.py", line 193, in _setup_signal
    2025-02-06 09:52:46.278811     signal.signal(signum,
    httpd (pid 434) already running
2025-02-06T09:52:49Z ERROR service start attempt: exited quickly with code 0, will restart
-----
unit-nova-0: 09:52:49 ERROR unit.nova/0.juju-log amqp:25: Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-nova-0/charm/lib/ops_sunbeam/guard.py", line 94, in guard
    yield
  File "/var/lib/juju/agents/unit-nova-0/charm/lib/ops_sunbeam/charm.py", line 347, in configure_charm
    self.configure_unit(event)
  File "/var/lib/juju/agents/unit-nova-0/charm/lib/charms/tempo_k8s/v1/charm_tracing.py", line 677, in wrapped_function
    return callable(*args, **kwargs)  # type: ignore
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/lib/juju/agents/unit-nova-0/charm/lib/ops_sunbeam/charm.py", line 692, in configure_unit
    self.init_container_services()
  File "/var/lib/juju/agents/unit-nova-0/charm/lib/charms/tempo_k8s/v1/charm_tracing.py", line 677, in wrapped_function
    return callable(*args, **kwargs)  # type: ignore
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/lib/juju/agents/unit-nova-0/charm/lib/ops_sunbeam/charm.py", line 657, in init_container_services
    ph.init_service(self.contexts())
  File "/var/lib/juju/agents/unit-nova-0/charm/lib/charms/tempo_k8s/v1/charm_tracing.py", line 677, in wrapped_function
    return callable(*args, **kwargs)  # type: ignore
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/lib/juju/agents/unit-nova-0/charm/lib/ops_sunbeam/container_handlers.py", line 505, in init_service
    self.start_wsgi(restart=True)
  File "/var/lib/juju/agents/unit-nova-0/charm/lib/charms/tempo_k8s/v1/charm_tracing.py", line 677, in wrapped_function
    return callable(*args, **kwargs)  # type: ignore
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/lib/juju/agents/unit-nova-0/charm/lib/ops_sunbeam/container_handlers.py", line 434, in start_wsgi
    self.start_all(restart=restart)
  File "/var/lib/juju/agents/unit-nova-0/charm/lib/charms/tempo_k8s/v1/charm_tracing.py", line 677, in wrapped_function
    return callable(*args, **kwargs)  # type: ignore
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/lib/juju/agents/unit-nova-0/charm/lib/charms/tempo_k8s/v1/charm_tracing.py", line 677, in wrapped_function
    return callable(*args, **kwargs)  # type: ignore
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/lib/juju/agents/unit-nova-0/charm/lib/ops_sunbeam/container_handlers.py", line 333, in start_all
    container.restart(service_name)
  File "/var/lib/juju/agents/unit-nova-0/charm/venv/ops/model.py", line 2400, in restart
    self._pebble.restart_services(service_names)
  File "/var/lib/juju/agents/unit-nova-0/charm/venv/ops/pebble.py", line 2219, in restart_services
    return self._services_action('restart', services, timeout, delay)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/lib/juju/agents/unit-nova-0/charm/venv/ops/pebble.py", line 2244, in _services_action
    raise ChangeError(change.err, change)
ops.pebble.ChangeError: cannot perform the following tasks:
- Start service "wsgi-nova-api" (service start attempt: exited quickly with code 0, will restart)
----- Logs from task 1 -----
2025-02-06T09:52:49Z INFO Most recent service output:
    (...)
    2025-02-06 09:52:45.179727 2025-02-06 09:52:45.179 435 INFO nova.api.openstack.requestlog [None req-a920439a-d870-4845-b4d1-1f95ae41af84 - - - - - -] ::1 "GET /" status: 200 len: 367 microversion: - time: 0.000518\x1b[00m
    ::1 - - [06/Feb/2025:09:52:01 +0000] "GET / HTTP/1.1" 200 0 "-" "Go-http-client/1.1"
    2025-02-06 09:52:46.199959 2025-02-06 09:52:46.199 438 INFO nova.api.openstack.requestlog [None req-c699da42-e427-4ddb-8f1f-7df527e762f8 - - - - - -] ::1 "GET /" status: 200 len: 367 microversion: - time: 0.001746\x1b[00m
    ::1 - - [06/Feb/2025:09:52:11 +0000] "GET / HTTP/1.1" 200 0 "-" "Go-http-client/1.1"
    2025-02-06 09:52:46.270675 mod_wsgi (pid=437): Callback registration for signal 12 ignored.
    2025-02-06 09:52:46.278078   File "/usr/bin/nova-api-wsgi", line 52, in <module>
    2025-02-06 09:52:46.278304     application = init_application()
    2025-02-06 09:52:46.278374   File "/usr/lib/python3/dist-packages/nova/api/openstack/compute/wsgi.py", line 20, in init_application
    2025-02-06 09:52:46.278405     return wsgi_app.init_application(NAME)
    2025-02-06 09:52:46.278445   File "/usr/lib/python3/dist-packages/nova/api/openstack/wsgi_app.py", line 128, in init_application
    2025-02-06 09:52:46.278474     init_global_data(conf_files, name)
    2025-02-06 09:52:46.278516   File "/usr/lib/python3/dist-packages/nova/utils.py", line 1133, in wrapper
    2025-02-06 09:52:46.278548     return func(*args, **kwargs)
    2025-02-06 09:52:46.278569   File "/usr/lib/python3/dist-packages/nova/api/openstack/wsgi_app.py", line 105, in init_global_data
    2025-02-06 09:52:46.278611     gmr.TextGuruMeditation.setup_autorun(
    2025-02-06 09:52:46.278693   File "/usr/lib/python3/dist-packages/oslo_reports/guru_meditation_report.py", line 160, in setup_autorun
    2025-02-06 09:52:46.278753     cls._setup_signal(signal.SIGUSR2,
    2025-02-06 09:52:46.278772   File "/usr/lib/python3/dist-packages/oslo_reports/guru_meditation_report.py", line 193, in _setup_signal
    2025-02-06 09:52:46.278811     signal.signal(signum,
    httpd (pid 434) already running
2025-02-06T09:52:49Z ERROR service start attempt: exited quickly with code 0, will restart
-----

unit-nova-0: 09:52:49 DEBUG unit.nova/0.juju-log amqp:25: Starting nova scheduler service, pebble ready
unit-nova-0: 09:52:49 DEBUG unit.nova/0.juju-log amqp:25: Restarting nova-scheduler in nova-scheduler
unit-nova-0: 09:52:51 DEBUG unit.nova/0.juju-log amqp:25: Adding unit status BlockedStatus('(workload) Error in charm (see logs): cannot perform the following tasks:\n- Start service "wsgi-nova-api" (service start attempt: exited quickly with code 0, will restart)\n----- Logs from task 1 -----\n2025-02-06T09:52:49Z INFO Most recent service output:\n    (...)\n    2025-02-06 09:52:45.179727 2025-02-06 09:52:45.179 435 INFO nova.api.openstack.requestlog [None req-a920439a-d870-4845-b4d1-1f95ae41af84 - - - - - -] ::1 "GET /" status: 200 len: 367 microversion: - time: 0.000518\\x1b[00m\n    ::1 - - [06/Feb/2025:09:52:01 +0000] "GET / HTTP/1.1" 200 0 "-" "Go-http-client/1.1"\n    2025-02-06 09:52:46.199959 2025-02-06 09:52:46.199 438 INFO nova.api.openstack.requestlog [None req-c699da42-e427-4ddb-8f1f-7df527e762f8 - - - - - -] ::1 "GET /" status: 200 len: 367 microversion: - time: 0.001746\\x1b[00m\n    ::1 - - [06/Feb/2025:09:52:11 +0000] "GET / HTTP/1.1" 200 0 "-" "Go-http-client/1.1"\n    2025-02-06 09:52:46.270675 mod_wsgi (pid=437): Callback registration for signal 12 ignored.\n    2025-02-06 09:52:46.278078   File "/usr/bin/nova-api-wsgi", line 52, in <module>\n    2025-02-06 09:52:46.278304     application = init_application()\n    2025-02-06 09:52:46.278374   File "/usr/lib/python3/dist-packages/nova/api/openstack/compute/wsgi.py", line 20, in init_application\n    2025-02-06 09:52:46.278405     return wsgi_app.init_application(NAME)\n    2025-02-06 09:52:46.278445   File "/usr/lib/python3/dist-packages/nova/api/openstack/wsgi_app.py", line 128, in init_application\n    2025-02-06 09:52:46.278474     init_global_data(conf_files, name)\n    2025-02-06 09:52:46.278516   File "/usr/lib/python3/dist-packages/nova/utils.py", line 1133, in wrapper\n    2025-02-06 09:52:46.278548     return func(*args, **kwargs)\n    2025-02-06 09:52:46.278569   File "/usr/lib/python3/dist-packages/nova/api/openstack/wsgi_app.py", line 105, in init_global_data\n    2025-02-06 09:52:46.278611     gmr.TextGuruMeditation.setup_autorun(\n    2025-02-06 09:52:46.278693   File "/usr/lib/python3/dist-packages/oslo_reports/guru_meditation_report.py", line 160, in setup_autorun\n    2025-02-06 09:52:46.278753     cls._setup_signal(signal.SIGUSR2,\n    2025-02-06 09:52:46.278772   File "/usr/lib/python3/dist-packages/oslo_reports/guru_meditation_report.py", line 193, in _setup_signal\n    2025-02-06 09:52:46.278811     signal.signal(signum,\n    httpd (pid 434) already running\n2025-02-06T09:52:49Z ERROR service start attempt: exited quickly with code 0, will restart\n-----')
@dimaqq
Copy link
Contributor

dimaqq commented Feb 7, 2025

@benhoyt probably belongs to canonical/pebble.
It could be an issue with the workload though.

@benhoyt
Copy link
Collaborator

benhoyt commented Feb 7, 2025

Hi @gboutry, thanks for the report. As of Pebble v1.17.0, which happily is what you're using, Pebble will report that the error occurred (via the ChangeError that you're seeing) but keep trying to start the service in the background. You'll notice the "will restart" at the end of the error message, indicating it'll do that. So assuming Pebble >= v1.17.0, you can probably catch the ChangeError, log it, and expect Pebble to keep trying to auto-restart it in backoff mode.

However, it does beg the question: is the httpd server not shutting down correctly? It seems like it's not doing a graceful shutdown and the port is staying open, or a daemonised subprocess (pid 434) is still running. Ideally it would shut down gracefully/correctly. Or maybe there's a way to have httpd not daemonise? Not quite sure on the details here.

@dimaqq
Copy link
Contributor

dimaqq commented Feb 7, 2025

Off-topic: see https://modwsgi.readthedocs.io/en/latest/user-guides/application-issues.html#registration-of-signal-handlers to change the default signals behaviour, if oslo-reports responding to USR2 "meditation report" is intended.

@gboutry
Copy link
Author

gboutry commented Feb 7, 2025

Hey, thanks for the answer. Yes, the restart goes through at some point.

-DFOREGROUND makes the httpd daemon not fork, and from its docs, sending SIGTERM will stop all the child processes and itself.

From Pebble implementation, I was under the impression it waited for a grace period for the SIGTERM to stop the process and would send SIGKILL if too long. I thought there might be an issue in managing that group of processes or detecting if the process is down while not? Maybe it's all on httpd side.

@dimaqq
Copy link
Contributor

dimaqq commented Feb 7, 2025

Hard to say without the pstree snapshot before the attempted restart...

Could it be that apache2ctl is killed correctly, but it in turn fails to kill httpd correctly?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants