-
Notifications
You must be signed in to change notification settings - Fork 3
Thinking about the queue
The heart of our build system is going to be a queue of tasks. This is easy to visualise, tasks get popped onto the queue by one thread and another loops over them, interogating each one to see if it should be run. However that last part brings with it a multitude of complications.
For many tasks things are relatively simple. They take a file and produce another file. In this case the test need only be "does the intput file exist?"
Things get more complicated from there.
Even apparently simple tasks like compiling a Fortran source file are more complex than they first seem. Although a Fortran compiler takes a Fortran source file and produces an object file it is not that simple. If any modules are used it also depends on those .mod
files existing. (Except where the compiler doesn't produce .mod
files but has some other mechanism for storing module information, such as the Cray compiler storing it in th eobject file itself)
Linking is another example of a task which requires a number (possibly a large one) of input files.
There are a number of potential solutions to tracking whether a task is fit to run, i.e. all its dependencies are fulfilled. In this page we present them and discuss their pros and cons.
In this model a completing task returns any generated files to the queue manager. The queue manager then runs the length of the queue offering these to all waiting tasks in turn. If the task is waiting for that file it ticks it as "ready". Then when the queue managaer gets to a particular task as a candidate for execution it already knows if it can run or not.
This model has good OO qualities of encapsulation. The task knows itself whether it can run or not so the queue manager does not need to concern itself with that logic. The down side is that each completed task involves a scan of the whole queue. This is likely to be complicated by the addition of multiple task running threads which we expect to implement at some point.
In this approach both the queue manager and the tasks meet at an agreed location to swap information. It has sometimes been referred to (by me) as the "Scoreboard" method.
When a task completes it returns the files it has generated. The queue manager hands these off to the "scoreboard" which notes them as created. Then, when a task is being considered for execution, the queue manager asks it for its prerequisites and compares those with what's on the scoreboard. If they are all present then you can go ahead and run the task, otherwise it is popped back on the queue.
The advantage here is obviously the lack of a full queue scan for each completed task. This approach will also work nicely with multiple worker threads. The down side is that it may not be quite as clean a design from an encapsulation point of view. I would argue that it is not as bad as all that since the task is still responsible for knowing if it can run (the list of prerequisites) just not of making the decission of if it will run.
Rather than using a queue we could handle this prerequisites problem by storing an actual task graph in memory. Then it becomes a matter of traversing the graph. It has to be done "backwards" in that the final link tasks will be far fewer than the mesh of interdependent compile tasks.
Although this approach makes resolving the final task order pretty easy building the graph is potentially quite fraught. Which is why we discounted it as an idea in the first place. It is always available for reconsideration though.
The queue could be split in two to give a queue which holds only runable tasks and another which holds tasks waiting to be scheduled. The logic for working out which tasks are ready to run is moved to this first queue.
This approach provides no improvement in performance. It is still necessary to poll through the scheduling queue however it does neetly separate the two functions.
Following on from the previous "splitting the queue" idea, instead of a queue a map between dependent files and tasks could be used. This would be an n-way map in that a task can be waiting on multiple files and a file can be waited on by multiple tasks. The mapping is file to task though.
In this scenario, when a file becomes available it is dereferenced through the map to find out what is waiting for it. These references are then removed from the map and each task scrutinised to determined how many other files it is waiting for. If the answer is "none" then it is ready to run and is popped on the running queue.
- Future Release
- vn1.0 Release, March 2023
- 0.11 Beta Release, Jan 2023
- 0.10 Beta Release, Oct 2022
- 0.9 Alpha Release, June 2022
- Phase 2
- Phase 3
- Phase 4
- Repository Management
- Development Process
- Development Environment
- Releasing Fab
- Coding Conventions
- Glossary
- Concerning the Database
- Unit Test Coverage
- Issues With the System Testing Framework