Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect latency numbers for a Go demo application #1517

Closed
fstab opened this issue Jan 13, 2025 · 7 comments
Closed

Incorrect latency numbers for a Go demo application #1517

fstab opened this issue Jan 13, 2025 · 7 comments
Assignees

Comments

@fstab
Copy link
Member

fstab commented Jan 13, 2025

I created a simple demo application for our FOSDEM talk https://github.com/fstab/fosdem-2025. It does something like this:

sequenceDiagram
    actor client
    participant product as product service
    participant inventory as inventory service
    participant pricing as pricing service
    client->>+product: search products
    product->>+inventory: get inventory items
    inventory-->>-product: list of inventory items
    par for each inventory item
        product->>+pricing: get price
        pricing-->>-product: price
    end
    product-->>-client: list of products with prices
Loading

Now, looking at the latency numbers for the pricing service, there are some inexplicable spikes in the 98th percentile:

Image

Searching for Spans with duration greater than 1s reveals that these are caused by incorrect queue times reported by Beyla

Image

Beyla reports that requests to the pricing service are "in queue" before the request to the inventory service is done, which can't be true.

To reproduce in a local kind cluster, see README in https://github.com/fstab/fosdem-2025.

@fstab fstab changed the title Wrong latency numbers for a Go demo application Incorrect latency numbers for a Go demo application Jan 13, 2025
@grcevski grcevski self-assigned this Jan 15, 2025
@marctc marctc assigned marctc and unassigned grcevski Jan 17, 2025
@grcevski
Copy link
Contributor

Hi @fstab, I reproduced this issue with beyla:latest but I can't see it in beyla:main. I suspect that it has to do with how we changed the parsing of the incoming headers in Go. Previously we read the go map, which was always tricky and it could read bad data. Now, we rely on a uprobe which ensures the information is directly from what's parsed by the Go functions. PR here #1413.

Can you please confirm that if you switch the Beyla Docker image to grafana/beyla:main you don't see the problem anymore?

@grcevski
Copy link
Contributor

I think I didn't look for the outliers hard enough, the bug is still there if you let things run for a while and search for spans > 1s.

@fstab
Copy link
Member Author

fstab commented Jan 20, 2025

the bug is still there if you let things run for a while

Yes, I switched to grafana/beyla:main, but that didn't fix it.

https://github.com/fstab/fosdem-2025/blob/main/deploy/beyla.yaml#L50

@marctc
Copy link
Contributor

marctc commented Jan 21, 2025

I investigated if this was related to changes of tracing SDK. But I believe that bug is older than that feature. Will keep looking.

@grcevski
Copy link
Contributor

It has to do with our go routine request start timing. It seems that a goroutine can have very early information on start and then somehow we use that. I'm not sure how this happens yet, so I've made a temporary fix which will work for HTTP but will disable the Go start request timing for gRPC #1556. I think with this at least we won't have wrong information, but we should track down the actual cause. Once we know the cause we can add the fix which will re-enable the initial request time tracking for gRPC.

@fstab
Copy link
Member Author

fstab commented Jan 22, 2025

Thanks @grcevski! I ran the demo on my laptop for a couple of hours and all latency numbers are correct now.

@marctc
Copy link
Contributor

marctc commented Feb 5, 2025

@fstab we can close this, right?

@marctc marctc closed this as completed Feb 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants