-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fact caching could leed to OOM on the controler nodes #15827
Comments
Do you know what method this happens from? I ask, because saving facts post-run is batched to 100 hosts at a time. Lines 138 to 140 in f377b5f
So then it would be helpful to know if that batching in ineffective, or if 100 is simply too many. Or, this could be more acute in the pre-run Do you have any ballpark number for the size of facts? Like, 1.7k hosts with 250 MB facts per host? |
I've seen this in past, and there is even an issue touching some aspect of the problem #9403 and a more generic one in Ansible project ansible/ansible#73654. |
Please confirm the following
security@ansible.com
instead.)Bug Summary
Hey, first I'm not sure of this is a bug or a feature request.
Last week both of our controller nodes run into OOM (out of memory) after investigating the root cause we noticed that multiple jobs (with limits) were started for a big inventory (~1700 host) where fact caching was enabled. Each job toke ~ 1.7 GB of memory on the controller node. After checking what's the root cause I noticed that the facts for each host in the inventory are stored in memory.
AWX version
2024.6.1
Select the relevant components
Installation method
docker development environment
Modifications
yes
Ansible version
No response
Operating system
No response
Web browser
No response
Steps to reproduce
Expected results
Take the amount of memory that will be used by the fact caching into account when calculating the job impact and/or only cache facts for hosts that are within the job limit.
Actual results
The facts are cached for all hosts of the inventory which lead to the OOM of the system.
Additional information
There was already a PR that would have fixed that.
Also, It's kind a same root cause like #5567
The text was updated successfully, but these errors were encountered: