Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ROCM] Raise device memory cap for parallel GPU execution to 5GB #2840

Open
wants to merge 1 commit into
base: develop-upstream
Choose a base branch
from

Conversation

amd-jianli12
Copy link
Collaborator

Recently added unit test below is requesting more than 4GB memory to run

TritonEmitterTest.FusionWithOutputContainingMoreThanInt32MaxElementsExecutesCorrectly

With current memory cap it complains OOM at run time

W stream_executor.h:364] Not enough memory to allocate 4294967552 on device 0 within provided limit. limit=4294967296]

Recently added unit test below is requesting more than 4GB memory to run

TritonEmitterTest.FusionWithOutputContainingMoreThanInt32MaxElementsExecutesCorrectly

With current memory cap it complains OOM at run time

W stream_executor.h:364] Not enough memory to allocate 4294967552 on device 0 within provided limit.  limit=4294967296]
@amd-jianli12
Copy link
Collaborator Author

!gen-cache

@okakarpa
Copy link
Collaborator

okakarpa commented Feb 6, 2025

The disk cache generation for the cpu-pycpp tests status: successfully finished
The disk cache generation for the gpu-pycpp tests status: successfully finished
The disk cache generation for the gpu-nonpip-multi tests status: successfully finished

The disk cache generation for the XLA tests status: in progress

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants