forked from aeolusproject/conductor
-
Notifications
You must be signed in to change notification settings - Fork 0
Condor_Removal
Matt Wagner edited this page Oct 25, 2012
·
1 revision
As of Sept 13 2011, we have decided to switch away from using Condor as the backend scheduler for the conductor. While we do have condor working, there are a few reasons for this switch:
- While condor has a host of features that could be useful, our scheduling needs are very simple right now. We just need the scheduler to accept jobs from the UI, run the instances, and externally monitor the instances.
- While condor has great scaling, our current scaling targets don’t require this level of scaling. Additionally, it is development’s opinion that there are some other large bottlenecks in scaling that will need to be addressed before the scheduler itself becomes a factor. Regardless, none of this has been benchmarked.
- We’ve exported core Aeolus functionality to a third party component. This has several ramifications, including competing release schedules, slower time to implement features, and less people on the team who are knowledgeable about both
- The Aeolus team is comprised of largely ruby and python developers, and condor is in C++. What this means is that the number of people who can successfully make changes to the Condor codebase is low. Consequently, this becomes a development and support issue as the few people who can work on it become the bottleneck.
- Overall condor is a bit too complex for our needs. Because of this it is hard for members of the team to debug what is going wrong when scheduling fails.
The proposed replacement for this does a couple of things:
- Instances are launched inline from the Conductor UI, taking advantage of the deltacloud-client ruby library. This has the possible side effect of causing long delays in the UI; if this happens, we can look at forking off a separate process for each deltacloud call, and then using the state in the database as a marker of whether the call succeeded or not.
- dbomatic switches from monitoring condor to monitoring the instances directly. Every 60 seconds dbomatic wakes up and runs through the list of running and pending instances in the database. If any of them has changed state, dbomatic updates the state in the database and then goes back to sleep. dbomatic uses one process per provider account; that way, a single slow provider account won’t hold up the instance updates for all of the rest of the instances.