Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port Optimized Legacy Reduce-Scatter (Width Dim, Interleaved, Tile, Multi-Tile High Tensor) To Reduce-Scatter Async #18739

Open
SeanNijjar opened this issue Mar 6, 2025 · 0 comments
Assignees
Labels
llm_t3000 P1 perf for issues tracking performance problems/improvements

Comments

@SeanNijjar
Copy link
Contributor

Known Cases

Model Machine Type In Shape Out Shape Dim DF Tensor # Devices
Llama 8B N300 [128,4096] [128,2048] 3 bfp8 Interleaved, Tile 2
* Prefill T3K Y=128, 2048, 4096, 8192, 16384 3 fp16, bfp8 Interleaved, Tile 8
@SeanNijjar SeanNijjar added llm_t3000 P1 perf for issues tracking performance problems/improvements labels Mar 6, 2025
@SeanNijjar SeanNijjar self-assigned this Mar 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llm_t3000 P1 perf for issues tracking performance problems/improvements
Projects
None yet
Development

No branches or pull requests

1 participant