-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect latency numbers for a Go demo application #1517
Comments
Hi @fstab, I reproduced this issue with beyla:latest but I can't see it in beyla:main. I suspect that it has to do with how we changed the parsing of the incoming headers in Go. Previously we read the go map, which was always tricky and it could read bad data. Now, we rely on a uprobe which ensures the information is directly from what's parsed by the Go functions. PR here #1413. Can you please confirm that if you switch the Beyla Docker image to grafana/beyla:main you don't see the problem anymore? |
I think I didn't look for the outliers hard enough, the bug is still there if you let things run for a while and search for spans > 1s. |
Yes, I switched to https://github.com/fstab/fosdem-2025/blob/main/deploy/beyla.yaml#L50 |
I investigated if this was related to changes of tracing SDK. But I believe that bug is older than that feature. Will keep looking. |
It has to do with our go routine request start timing. It seems that a goroutine can have very early information on start and then somehow we use that. I'm not sure how this happens yet, so I've made a temporary fix which will work for HTTP but will disable the Go start request timing for gRPC #1556. I think with this at least we won't have wrong information, but we should track down the actual cause. Once we know the cause we can add the fix which will re-enable the initial request time tracking for gRPC. |
Thanks @grcevski! I ran the demo on my laptop for a couple of hours and all latency numbers are correct now. |
@fstab we can close this, right? |
I created a simple demo application for our FOSDEM talk https://github.com/fstab/fosdem-2025. It does something like this:
Now, looking at the latency numbers for the pricing service, there are some inexplicable spikes in the 98th percentile:
Searching for Spans with duration greater than 1s reveals that these are caused by incorrect queue times reported by Beyla
Beyla reports that requests to the pricing service are "in queue" before the request to the inventory service is done, which can't be true.
To reproduce in a local
kind
cluster, see README in https://github.com/fstab/fosdem-2025.The text was updated successfully, but these errors were encountered: