-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PyTorch Support #88
Comments
A possible 'solution' could be to custom build PyTorch with dynamic linking. The dynamically linked PyTorch CUDA calls would then be intercepted as expected possibly making it easier to use with SCUDA. I used to do this with RWTH-ACS/cricket more than a year ago. This is also the approach used by nvwacloud/tensorlink (it's on GitHub but it is closed source). They supply a custom built Pytorch-2.1.2 with dynamically linked CUDA runtime library for use with their framework. Unexpectedly, to me at least, dynamically linked binaries of PyTorch seem to be virtually inexistent online. Building it myself is a chore, it takes almost a day to compile on my computer. I am currently doing that with pytorch-v2.5.1 but it may be a few days before I get something workable. So this is the build command I am using:
If you try building a dynamically linked PyTorch yourself make sure nvcc is actually being called with the option "--cudart=shared". So I use the following ugly hack...
This way I am sure the option gets passed. |
Tensorlink is suspended now. We made a new project to support pytorch on remote nvidia gpu. For more information, please visit gpu.tf |
PyTorch statically compiles the CUDA Runtime API shared library (
libcudart.so
) which exposes the functions defined in thecuda_runtime.h
header.You can confirm this using the following Rust shared library code:
The above shared library intercepts calls to
dlopen
, which should be called bypytorch
if it's dynamically loading a shared library.When we run this with:
LD_PRELOAD=dlopen_interceptor.so python3 -c "import torch; print(torch.cuda.is_available())"
we get the following logs:Grepping through this, it's clear that only
libcuda.so.1
is ever loaded - this is the Device Driver API, and I'm assuming it's loaded by the statically compiled Runtime API library (I'm pretty sure that PyTorch doesn't use the Device Driver API directly).It could be that such a simple program does not result in the Runtime API ever being invoked, so I wrote the following pytorch script:
Running this as above (
LD_PRELOAD=dlopen_interceptor.so python3 test.py
) shows us the following:The script runs without an issue and in the logs we can see that
libnvidia-ml.so.1
is opened, which I think is part of the NVML library (and not statically compiled with PyTorch).But - no CUDA Runtime API is loaded.
As far I understand it, without recompiling PyTorch to utilize a shared CUDA runtime library (instead of statically compiling it) it shouldn't be possible to use SCUDA to do GPU-over-IP with PyTorch without fully implementing the Device Driver API, including the parts where certain returned memory pointers are directly read/written to by the host.
The text was updated successfully, but these errors were encountered: