Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove redudant db commit when getting serialised service #4337

Closed
wants to merge 1 commit into from

Conversation

spatel033
Copy link
Contributor

@spatel033 spatel033 commented Jan 13, 2025

We have investigated P1 504 outage and came across following possible issue. This PR is to remove db commit which was added previously to close the transaction to improve the performance.

First we get redis error - kibana

then pg connection closed error - kibana

Possible scenario we found during the P1 504 outage:

  • Redis get disconnected
  • We have this service bombarding auth request (200-450 auth request per min) with incorrect token
  • All these requests now go to postgres
  • When getting serialised service to get api_key for auth, it also does unnecessary db.session.commit() ??
  • At the same time RDS has 66% commit queries

image
image
image
image

@spatel033 spatel033 force-pushed the remove-db-commit-serialised-service branch from bfb23f9 to 06c36fd Compare January 13, 2025 17:31
@leohemsted
Copy link
Contributor

leohemsted commented Jan 13, 2025

this was introduced here: 46dbcb7

Commit transactions as soon no longer needed
We think that holding open database transactions while we go and do something else is causing us to have poor performance.

Because we’re not serialising everything as soon as we pull it out of the database we can guarantee that we don’t need to go back to the database again.

So let’s see if explicitly closing the transaction helps with performance.

concurrent db connection count is not as big a concern for us right this second as it used to be

@spatel033 spatel033 marked this pull request as ready for review January 14, 2025 08:56
@quis
Copy link
Member

quis commented Jan 14, 2025

If it makes sense to remove this we should also remove the one for SerialisedTemplate here:

db.session.commit()

…and SerialisedAPIKeys here:

db.session.commit()

But (without a description in your PR) I’m not understanding the benefit/rationale for doing this now.

@spatel033 spatel033 force-pushed the remove-db-commit-serialised-service branch from 06c36fd to f582fb4 Compare January 14, 2025 11:27
@spatel033 spatel033 force-pushed the remove-db-commit-serialised-service branch from f582fb4 to c40242e Compare January 14, 2025 11:48
@leohemsted
Copy link
Contributor

i'm still a little skeptical about this:

  1. i dont understand why redis would fail - we havent had redis errors before/elsewhere
  2. the redis errors we see are EventletTimeoutErrors. There are two reasons an eventlet could time out:
  • the IO takes over thirty seconds
  • the IO completed on time but the eventlet process wasn't able to carry on with the code execution in under 30 seconds
  1. i dont see why this would be contained to one specific api-web instance
  2. 450 requests per min sounds like a lot but it's only 7.5 requests per second - whereas we have 35 concurrent instances with four processes each that can each handle 250 requests

@spatel033
Copy link
Contributor Author

i'm still a little skeptical about this:

  1. i dont understand why redis would fail - we havent had redis errors before/elsewhere
  2. the redis errors we see are EventletTimeoutErrors. There are two reasons an eventlet could time out:
  • the IO takes over thirty seconds
  • the IO completed on time but the eventlet process wasn't able to carry on with the code execution in under 30 seconds
  1. i dont see why this would be contained to one specific api-web instance
  2. 450 requests per min sounds like a lot but it's only 7.5 requests per second - whereas we have 35 concurrent instances with four processes each that can each handle 250 requests

Yes this PR will not solve the actual redis issue but it may stop 504 error when redis connection is closed

@spatel033
Copy link
Contributor Author

i'm still a little skeptical about this:

  1. i dont understand why redis would fail - we havent had redis errors before/elsewhere
  2. the redis errors we see are EventletTimeoutErrors. There are two reasons an eventlet could time out:
  • the IO takes over thirty seconds
  • the IO completed on time but the eventlet process wasn't able to carry on with the code execution in under 30 seconds
  1. i dont see why this would be contained to one specific api-web instance
  2. 450 requests per min sounds like a lot but it's only 7.5 requests per second - whereas we have 35 concurrent instances with four processes each that can each handle 250 requests

Yes this PR will not solve the actual redis issue but it may stop 504 error when redis connection is closed

Recent P1 stared with eventlet timeout for SerialisedAPIKeyCollection.from_service_id(self.id)

@spatel033 spatel033 closed this Feb 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants