-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure websocket conections persist until done on queue-proxy drain #15759
base: main
Are you sure you want to change the base?
Ensure websocket conections persist until done on queue-proxy drain #15759
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: elijah-rou The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Hi @elijah-rou. Thanks for your PR. I'm waiting for a knative member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/ok-to-test |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #15759 +/- ##
==========================================
- Coverage 80.84% 80.74% -0.11%
==========================================
Files 222 222
Lines 18070 18096 +26
==========================================
+ Hits 14609 14611 +2
- Misses 3089 3114 +25
+ Partials 372 371 -1 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add an e2e test to validate the failure is fixed by these changes
d6da0ea
to
99dd71c
Compare
pkg/queue/sharedmain/main.go
Outdated
break WaitOnPendingRequests | ||
} | ||
} | ||
time.Sleep(drainSleepDuration) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This adds a non required overhead if there are no websocket connections though. Could we avoid it for the non-websocket workloads?
I suppose this is needed so that there is enough time for the app to notify the client by initiating a websocker close action? Note here that QP does not do the actual draining of the connection (due to the known reasons).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another idea is to have something like:
done := make(chan struct{})
Server.RegisterOnShutdown( func() {
ticker := time.NewTicker(1 * time.Second)
defer ticker.Stop()
logger.Infof("Drain: waiting for %d pending requests to complete", pendingRequests.Load())
WaitOnPendingRequests:
for range ticker.C {
if pendingRequests.Load() <= 0 {
logger.Infof("Drain: all pending requests completed")
break WaitOnPendingRequests
}
}
# wait some configurable time e.g. WEBSOCKET_TERMINATION_DRAIN_DURATION_SECONDS
defer done <- struct{}{}
})
Server.Shutdown(...)
<-done
WEBSOCKET_TERMINATION_DRAIN_DURATION_SECONDS could be zero by default for regular workloads. Or use something similar for the sleep time above in this PR.
Another thing (a bit of a hack and thinking out loud) is whether we could detect hijacked connections with the upgrade and connection headers which are mandatory 🤔 (not sure about wss but I think we dont support it do we ?). cc @dprotaso
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think may have moved that sleep to the wrong place, should probably just move the sleep to before the wait on pending. Will avoid the additional wait. drainSleepDuration
should only be a minimum wait period.
As for avoiding the actual pending req check, may be able to, but it's probably required then that you check specifics about the connection before incrementing the counter to make it a websocket specific (so instead of just using rev proxy we would probably have to). IMO likely not worth doing, since you expecting a wait from .Shutdown
already, and this would only bypass .Shutdown
having to do the work to wait for non-highjacked connections.
99dd71c
to
54db92b
Compare
add e2e for ws beyond queue drain; move sleep to appropriate loc add ref to go issue
54db92b
to
e8dec25
Compare
Fixes: Websockets closing abruptly when queue-proxy undergoes drain.
Due to hijacked connections in net/http not being respected when
server.Shutdown
is called, any active websocket connections currently end as soon as the queue-proxy calls.Shutdown
. See gorilla/websocket#448 and golang/go#17721 for details. This patch fixes this issue by introducing an atomic counter of active requests, which increments as a request comes in and decrements as a request handler terminates. During drain, this counter must reach zero or adhere to the revision timeout, in order to call.Shutdown
.Release Note