Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate process management out of pool #70

Merged
merged 3 commits into from
Feb 14, 2025

Conversation

AlanCoding
Copy link
Member

This is pre-work before introducing use of "forkserver".

As we introduce that new method of forking, my plan is to preserve the current logic. Instead of replacing, this will be a configurable class.

For now, this enabling change attempts to quarantine all interactions with the python multiprocessing library into a new process module. This leaves pool as a module highly specialized on asyncio task management and balancing of capacity.

@AlanCoding AlanCoding marked this pull request as ready for review February 11, 2025 20:31
@AlanCoding AlanCoding added the refactoring code build-out and clean-up label Feb 11, 2025
@AlanCoding
Copy link
Member Author

Let me explain what the intent here is - yes, the purpose is so that we can later swap out the ProcessManager class for different classes that implement these things differently. What other implementations would we use? You might ask...

  1. As is well-established, the forkserver implementation described in AAP-37500 Excessive number of open file descriptors #39, research patch at ansible/awx@devel...AlanCoding:awx:forkserver, from the patch, you can see this mainly requires setting some multiprocessing global context settings
  2. Other proposal, which still stands, is using janus Investigate janus to make the main program less blocking and cleaner #3, experimental diff at main...AlanCoding:dispatcher:janus, and you can see from the patch that we're not changing anything about the Process object, only the Queue creation, and even then, in a way that's fairly trivial to swap out with this class structure. While this didn't work before, it might be due to the threading-related issues we were already warned about, and combining this with (1) might make it "just work".

Both of these are very viable with this class structure. I think we should add them, and test them. One, because we would change the default and that solves an urgent problem. But even after that, it's like driving around with a spare tire in your trunk. My memory is that celery had major issues that happened once it started playing with novel forking methodologies. Once this stabilizes, we should keep around some options, just in case.

Copy link
Collaborator

@Alex-Izquierdo Alex-Izquierdo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, have you done some manual testing?

Alex-Izquierdo
Alex-Izquierdo previously approved these changes Feb 12, 2025
dispatcher/process.py Outdated Show resolved Hide resolved
@AlanCoding
Copy link
Member Author

I added some simple tests. I don't see a whole lot of value (almost trivial), but I like how this is informative, how it steps-through the layers of code construction.

Copy link
Collaborator

@pb82 pb82 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. So ProcessProxy represents an individual process and ProcessManager is the interface for interacting with proesses (creating, reading results)

@AlanCoding AlanCoding merged commit 752c0c7 into ansible:main Feb 14, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactoring code build-out and clean-up
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants