Skip to content

Database Connection Test

Steve Wardle edited this page Apr 20, 2020 · 1 revision

In the course of implementing a queue system with Python's multiprocessing module we ran into the problem that certain objects do not like being shared amongst processes - in particular these are open file handles and live database connections. A solution to this was proposed as part of https://github.com/metomi/fab/pull/55 but we were conscious of the idea that making the database connections something that was created "on demand" for each Task utilising the database, rather than having a persistent connection might come with some performance problems.

To try and test this some code was added in the Experimental section that can generate a big project - a script which given a value N will create a project that has that many Fortran modules which all depend on each other in a long chain. This can be used to benchmark the core mechanics of the tool. To that end a project with N=1000 was generated and used to test both master and the deferred_database branch mentioned in the above PR. The results for 3 runs of each were:

With deferred database:

real	3m29.162s
user	1m43.717s
sys	1m0.478s

real	3m55.976s
user	1m55.416s
sys	1m6.150s

real	4m34.323s
user	2m4.063s
sys     1m8.497s

Without deferred database:

real	4m20.084s
user	1m56.331s
sys     1m9.836s

real	3m44.814s
user	1m48.041s
sys     1m2.433s

real	3m51.490s
user	1m50.300s
sys     1m4.856s

The timings are fairly noisy (likely due to most of the work being done being I/O related) but it doesn't suggest that the overhead of (in this case) creating many individual connections was significantly worse than using a persistent connection. We would reasonably expect that when the Tasks are doing a larger amount of work (as they will in most real examples) this will be even more negligible.

Clone this wiki locally