Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add integration test harness that bypasses the FastAPI server to directly test the batcher #959

Closed
wants to merge 2 commits into from

Conversation

renxida
Copy link
Contributor

@renxida renxida commented Feb 12, 2025

Previously, to test concurrent generation, we quickly submit multiple requests.

This does not consistently result in the same batching, and timing differences across shortfin can cause e.g. 4 requests to go from being all batches together to boarding a batch of 3 and then a batch of 1. This caused some very difficult-to-replicate errors and accounted for some of the regressions we observed leading up to the v3.2.0 release.

This harness and it's associated tests will eliminate inconsistent batching as an untested point of failure, allowing us to detect and replicate future errors that depend on certain batching conditions to appear.

@renxida renxida force-pushed the test-generation-request branch from 2094338 to fa10955 Compare February 13, 2025 18:34
@renxida
Copy link
Contributor Author

renxida commented Feb 13, 2025

Note that this introduces some code duplication in server_management.py.

Will put in a follow up PR to dedupe the code there, and between model_management.py and sharktank's hf_datasets.py

@renxida
Copy link
Contributor Author

renxida commented Feb 15, 2025

Temporarily dropping this in favor of a micro-server like SDXL's.

@renxida renxida closed this Feb 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant