Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ReturnTestDatabase does not wipe database before returning to pool #2

Closed
wouterh-dev opened this issue Dec 5, 2020 · 9 comments
Closed
Labels
bug Something isn't working documentation Improvements or additions to documentation

Comments

@wouterh-dev
Copy link

The documentation says

Marks the test database that it can be wiped early on pool limit overflow (or reused if true is submitted)

Which implies the behavior of this call depends on whether true is submitted, but I figure this out.

I was expecting this api call to drop the database and recreate it according to the template and then make it available again.

@majodev
Copy link
Member

majodev commented Dec 7, 2020

Hi wouterhund,

I see, our docs are misleading, we need additional clarification.

DELETE /templates/{hash}/tests/{test-database-id} is optional and not going to destroy the test-database. Instead, it just deletes the dirty lock, meaning it flags the test-database as dirty: false. Our use case is the following:

  • GET /templates/{hash}/tests The test-database is automatically in dirty: true state when it becomes selected
  • Although this test-database was already used, our test may have not actually modified it (you are responsible to ensure that it really wasn't modified within your test code). As an usage optimization, we can call DELETE /templates/{hash}/tests/{test-database-id}, leading to dirty: false again.
  • The next GET /templates/{hash}/tests can select this database directly without recreation again (as it's available for selection like it was in the initial creation).

So it's actually just an optimization for a very specific use case. Generally, we do not worry about whats happening to the test-databases after their usage. If the pool is exceeded, new test-databases are automatically recreated and assigned anyway.

Do you feel that we are missing your specific use case here?

@wouterh-dev
Copy link
Author

wouterh-dev commented Dec 7, 2020

Ah, I understand! That is an interesting use-case I hadn't considered, and might be able to use to further optimize my tests.

When do dirty databases in the pool get recycled? If IntegreSQL does this when the pool is empty, how does IntegreSQL know that the test runner has releases its lock on the database? It seems like it might be possible for an in-use database to be recycled? Or could that only happen if concurrent test runners > pool size?

In my testing I noticed that once the pool was empty, then GetTestDatabase started becoming blocking, likely creating the database synchronously. I was hoping I could help PostgreSQL recycle dirty databases early by informing it that they're now no longer used, and was expecting this api call to be used for that.

@majodev
Copy link
Member

majodev commented Dec 7, 2020

Hi, see /pkg/manager/manager.go createNextTestDatabase. We currently just start by recreating the first encountered dirty database after the pool is exceeded. We currently cannot and do not want to know if the test runner actually has released the lock and it shouldn't be its responsibility.

But you are absolutely right, I think we totally missed having buffered (defaults to 10) recreations after the pool has exceeded. -> Bug

Regarding early recycling, that's a pretty relevant use case, we will look into that. -> Feature Request

@majodev majodev added bug Something isn't working documentation Improvements or additions to documentation enhancement New feature or request labels Dec 7, 2020
@majodev
Copy link
Member

majodev commented Dec 7, 2020

After taking a closer look into the recreation mechanism, I can no longer confirm that this is a bug.

The buffer mechanism is based on the INTEGRESQL_TEST_INITIAL_POOL_SIZE env var, meaning after the template is finished (PUT /templates/{hash}), we:

  • immediately create the INTEGRESQL_TEST_INITIAL_POOL_SIZE number of test databases (default to 10) from the template.
  • automatically spawn another test database in the background on every GET /templates/{hash}/tests.

Therefore, we always try to have at least INTEGRESQL_TEST_INITIAL_POOL_SIZE test databases available while running.

However, what could have been happening is that we tried to delete a test database that is still in use, thus failing to free this particular one.

Following up on your question:

It seems like it might be possible for an in-use database to be recycled? Or could that only happen if concurrent test runners > pool size?

Yes, this can happen if your INTEGRESQL_TEST_MAX_POOL_SIZE is too low. I would recommend to set INTEGRESQL_TEST_INITIAL_POOL_SIZE to (approximately) your number of concurrent test runners and increase INTEGRESQL_TEST_MAX_POOL_SIZE at least by a factor of 4 to prevent any potential overlap. We typically run ~8 concurrent test runners (and use the default of 10) and simply start clear up databases when we hit 500 and it's been a quite pleasant experience.

We will still take a look at early recycling and the above edge-case anyways...

@majodev majodev removed the bug Something isn't working label Dec 7, 2020
@wouterh-dev
Copy link
Author

wouterh-dev commented Dec 7, 2020

Thank you for looking into this!

My main issue is about performance sometimes being non-ideal. For context, this is my setup:

I am currently using a single test runner, with the default pool size settings (INITIAL=10, MAX=500). But I've also tested with INITIAL=100 and the results are similar.

My test runner requires two databases, and thus provisions two templates. This means that before every test GetTestDatabase is called twice in rapid succession, once for each template.

Each test takes between 100-200ms, so that's two calls for GetTestDatabase every ~200ms. Occasionally, one of these calls appears to be blocking, taking about ~500ms to complete instead of the usual 1-2ms.

At this point I was assuming that the pool was depleted, and it was no longer creating new databases in the background and so I created this issue.

After digging further into it, I believe that instead IntegreSQL is correctly recycling databases in the background, but that it does so synchronously by locking on the template here:

template.Lock()

So my tests are running faster than IntegreSQL is able to create databases. Which makes sense, since creating a database from a template takes about ~500ms in my case and most of my tests are faster than that. And there's hundreds of tests, so it is quite likely that the pool will get depleted during a test run.

I have an in-house implementation that I was looking to replace with IntegreSQL. This implementation works similar to approach 3c described in the README, except I'm using 8 background worker threads to recycle dirty databases which doesn't seem to be an issue for PostgreSQL.

So, perhaps it is an option to let addTestDatabasesInBackground run concurrently (up to a limit)? I believe this might resolve my performance concern.

@majodev
Copy link
Member

majodev commented Dec 7, 2020

Thanks for your investigation,

I understand your concern that the locking mechanism used here is suboptimal for selecting the next ready test database.

Nevertheless, are you sure that you are properly closing connections to the pg database between your test runs? Can you check your logs from your PostgreSQL container and see if there are recurrent errors like the following:

ERROR:  database "pgtestpooloverflow_test_hashinghash_000" is being accessed by other users
postgres_1    | 2020-12-07 18:57:20.071 UTC [2024] DETAIL:  There is 1 other session using the database.

I think we generally hold on to the lock for too long as soon as we exceed INTEGRESQL_TEST_MAX_POOL_SIZE and (in combination which the above error) repeatedly try too hard to delete a old dirty database and finally swallow the error message:

// TODO log error somewhere instead of silently swallowing it?
_, _ = m.createNextTestDatabase(ctx, template)

What's additionally concerning is that if we fail to find an next ready test database, we synchronously return from createNextTestDatabase which may put additional pressure on the test database creation procedure.

if testDB == nil {
var err error
testDB, err = m.createNextTestDatabase(ctx, template)
if err != nil {
return nil, err
}
}

Will continue to investigate this, but it will potentially take some time (holidays incoming). Feel free to PR in the meantime.

@majodev majodev added bug Something isn't working and removed enhancement New feature or request labels Dec 7, 2020
@wouterh-dev
Copy link
Author

Nevertheless, are you sure that you are properly closing connections to the pg database between your test runs?

I am closing them, but only after provisioning the next database if that matters. So before each test run:

  1. Create a new test database
  2. Close all connections to the previous test database

I can also confirm that I'm not seeing any errors regarding the database being accessed by other users.

Will continue to investigate this, but it will potentially take some time (holidays incoming). Feel free to PR in the meantime.

Thank you! I'll keep digging into this as well, but the holidays will soon also take priority 😄

@majodev
Copy link
Member

majodev commented Dec 9, 2020

1. Create a new test database
2. Close all connections to the previous test database

Yep, this could actually have an impact, as we try to destroy database as soon as GET /templates/{hash}/tests gets called after hitting the pool limit (in the background through a separate goroutine, but still - this will lock m.templates[hash]). Could you retry by closing your database connection immediately after running the test?

Thanks for you patience, we are also very keen to squeeze out any potential performance gains as well and thus take this issue seriously.

@majodev
Copy link
Member

majodev commented Jan 30, 2024

Closed via #16

@majodev majodev closed this as completed Jan 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants