Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Human readable logging on timeout #15752

Open
marioloko opened this issue Feb 4, 2025 · 2 comments
Open

Human readable logging on timeout #15752

marioloko opened this issue Feb 4, 2025 · 2 comments
Labels
kind/feature Well-understood/specified features, ready for coding. triage/accepted Issues which should be fixed (post-triage)

Comments

@marioloko
Copy link

Describe the feature

When queue proxy timeouts it shows the following trace:

{
  "severity": "ERROR",
  "timestamp": "2025-02-04T18:01:55.621921326Z",
  "logger": "queueproxy",
  "caller": "network/error_handler.go:33",
  "message": "error reverse proxying request; sockstat: sockets: used 9\nTCP: inuse 1 orphan 1 tw 14 alloc 444 mem 2299\nUDP: inuse 0 mem 4352\nUDPLITE: inuse 0\nRAW: inuse 0\nFRAG: inuse 0 memory 0\n",
  "commit": "6a27004",
  "knative.dev/key": "vista/helloworld-go-00004",
  "knative.dev/pod": "helloworld-go-00004-deployment-5578b648b-kldq5",
  "error": "context canceled",
  "stacktrace": "knative.dev/pkg/network.ErrorHandler.func1\n\tknative.dev/pkg@v0.0.0-20241021183759-9b9d535af5ad/network/error_handler.go:33\nnet/http/httputil.(*ReverseProxy).ServeHTTP\n\tnet/http/httputil/reverseproxy.go:472\nknative.dev/serving/pkg/queue.(*appRequestMetricsHandler).ServeHTTP\n\tknative.dev/serving/pkg/queue/request_metric.go:199\nknative.dev/serving/pkg/queue/sharedmain.mainHandler.ProxyHandler.func3.2\n\tknative.dev/serving/pkg/queue/handler.go:65\nknative.dev/serving/pkg/queue.(*Breaker).Maybe\n\tknative.dev/serving/pkg/queue/breaker.go:155\nknative.dev/serving/pkg/queue/sharedmain.mainHandler.ProxyHandler.func3\n\tknative.dev/serving/pkg/queue/handler.go:63\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2171\nknative.dev/serving/pkg/queue/sharedmain.mainHandler.ForwardedShimHandler.func4\n\tknative.dev/serving/pkg/queue/forwarded_shim.go:54\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2171\nknative.dev/serving/pkg/http/handler.(*timeoutHandler).ServeHTTP.func4\n\tknative.dev/serving/pkg/http/handler/timeout.go:118"
}

The error that it shows is:

error reverse proxying request; sockstat: sockets: used 9
TCP: inuse 1 orphan 1 tw 14 alloc 444 mem 2299
UDP: inuse 0 mem 4352
UDPLITE: inuse 0
RAW: inuse 0
FRAG: inuse 0 memory 0

I guess that for most people, it is hard to correlate this error with a timeout due to timeoutSeconds. My proposal is to log or trace the timeout when it occurs in the following functions:

  1. tryTimeoutAndWriteError
  2. tryResponseStartTimeoutAndWriteError
  3. tryIdleTimeoutAndWriteError

In each of the function indicating that a timeout was triggered and which timeout was triggered. I think that it is a simple fix, that should not impact in performance as it should be quite infrequent and it can help people quite a log figuring out what is happening in their systems.

@marioloko marioloko added the kind/feature Well-understood/specified features, ready for coding. label Feb 4, 2025
@dprotaso
Copy link
Member

dprotaso commented Feb 6, 2025

/triage accepted

yeah this has caused confusion in the past

@knative-prow knative-prow bot added the triage/accepted Issues which should be fixed (post-triage) label Feb 6, 2025
@skonto
Copy link
Contributor

skonto commented Feb 7, 2025

Timeout correlation is one thing the other is that the output there is only helpful for getting the socket stats and the connection error just in case you hit some known issue e.g. low number of file descriptors, someone killed the connection (context cancelled) etc. Most of the time you need to check elsewhere than the qp tbh e.g. does your app take too long to process a request for example queued somewhere etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Well-understood/specified features, ready for coding. triage/accepted Issues which should be fixed (post-triage)
Projects
None yet
Development

No branches or pull requests

3 participants