Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added basic liveness- & readinessProbe for awx-web & awx-task containers #927

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 0 additions & 22 deletions roles/installer/tasks/install.yml
Original file line number Diff line number Diff line change
Expand Up @@ -56,28 +56,6 @@
- name: Include resources configuration tasks
include_tasks: resources_configuration.yml

- name: Check for pending migrations
k8s_exec:
namespace: "{{ ansible_operator_meta.namespace }}"
pod: "{{ tower_pod_name }}"
container: "{{ ansible_operator_meta.name }}-task"
command: >-
bash -c "awx-manage showmigrations | grep -v '[X]' | grep '[ ]' | wc -l"
changed_when: false
register: database_check

- name: Migrate the database if the K8s resources were updated. # noqa 305
k8s_exec:
namespace: "{{ ansible_operator_meta.namespace }}"
pod: "{{ tower_pod_name }}"
container: "{{ ansible_operator_meta.name }}-task"
command: >-
bash -c "awx-manage migrate --noinput"
register: migrate_result
when:
- database_check is defined
- (database_check.stdout|trim) != '0'

- name: Initialize Django
include_tasks: initialize_django.yml

Expand Down
1 change: 1 addition & 0 deletions roles/installer/tasks/resources_configuration.yml
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,7 @@
apply: yes
definition: "{{ lookup('template', 'deployment.yaml.j2') }}"
wait: yes
wait_timeout: 600
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this. I feel like it's more likely to cause problems than help anything.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default is 120s:
https://docs.ansible.com/ansible/latest/collections/kubernetes/core/k8s_module.html#parameter-wait_timeout
Initial migration of AWX Database takes longer than 120s, therefore following tasks would be run without AWX pods being ready.
I cannot guess how long it takes in individual environments, as it depends on the circumstances such as throughput, IO and so on.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given it may be hard to guess, I wonder if we want to make this a variable we could set

register: tower_deployment_result

- block:
Expand Down
69 changes: 67 additions & 2 deletions roles/installer/templates/deployment.yaml.j2
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,56 @@ spec:
{% if init_container_extra_volume_mounts -%}
{{ init_container_extra_volume_mounts | indent(width=12, indentfirst=True) }}
{% endif %}
{% endif %}
- name: 'database-migration'
image: '{{ _image }}'
imagePullPolicy: '{{ image_pull_policy }}'
command:
- '/bin/sh'
args:
- '-c'
- '/usr/bin/awx-manage migrate --no-input || /usr/local/bin/wait-for-migrations'
volumeMounts:
{% if bundle_ca_crt %}
- name: "ca-trust-extracted"
mountPath: "/etc/pki/ca-trust/extracted"
- name: "{{ ansible_operator_meta.name }}-bundle-cacert"
mountPath: /etc/pki/ca-trust/source/anchors/bundle-ca.crt
subPath: bundle-ca.crt
readOnly: true
{% endif %}
- name: "{{ ansible_operator_meta.name }}-application-credentials"
mountPath: "/etc/tower/conf.d/credentials.py"
subPath: credentials.py
readOnly: true
- name: "{{ secret_key_secret_name }}"
mountPath: /etc/tower/SECRET_KEY
subPath: SECRET_KEY
readOnly: true
- name: {{ ansible_operator_meta.name }}-settings
mountPath: /etc/tower/settings.py
subPath: settings.py
readOnly: true
{% if development_mode | bool %}
- name: awx-devel
mountPath: "/awx_devel"
{% endif %}
env:
- name: MY_POD_UID
valueFrom:
fieldRef:
fieldPath: metadata.uid
- name: MY_POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
- name: MY_POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
{% if development_mode | bool %}
- name: AWX_KUBE_DEVEL
value: "1"
{% endif %}
containers:
- image: '{{ _redis_image }}'
Expand Down Expand Up @@ -111,6 +161,17 @@ spec:
args: {{ web_args }}
{% endif %}
imagePullPolicy: '{{ image_pull_policy }}'
livenessProbe:
httpGet:
path: /api/v2/ping/
port: 8052
initialDelaySeconds: 90
timeoutSeconds: 2
readinessProbe:
httpGet:
path: /api/v2/ping/
port: 8052
timeoutSeconds: 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't it be configured from outside with predefined default values?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#1188 - something like that
that way the user will have more control over the deployment if needed

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@erz4
I agree - it just was not the aim for my PR.
I wanted this PR to be a basic minimum to add this functionality ASAP, as every change to the deployment causes downtimes & job failures.

But since this PR is still open after now nearly 8 months, that obviously did not go as desired and the problem still persists.

If your PR is merged, this can be closed. We just need these checks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@moonrail
did you test it locally with molecule?

my probes failing in the docker-desktop env but working on a k8s cluster

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested with molecule but also in our test environment (k8s).
Both ran at the last try (several months ago).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup mine is failing in the migration phase as mentioned above.

So what is the status now? why is that PR stuck?

ports:
- containerPort: 8052
{% if ingress_type | lower == 'route' and route_tls_termination_mechanism | lower == 'passthrough' %}
Expand Down Expand Up @@ -195,6 +256,12 @@ spec:
- image: '{{ _image }}'
name: '{{ ansible_operator_meta.name }}-task'
imagePullPolicy: '{{ image_pull_policy }}'
readinessProbe:
exec:
command:
- /usr/bin/awx-manage
- check
timeoutSeconds: 10
{% if task_privileged == true %}
securityContext:
privileged: true
Expand Down Expand Up @@ -260,8 +327,6 @@ spec:
env:
- name: SUPERVISOR_WEB_CONFIG_PATH
value: "/etc/supervisord.conf"
- name: AWX_SKIP_MIGRATIONS
value: "1"
- name: MY_POD_UID
valueFrom:
fieldRef:
Expand Down