Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] UMAP fit_transform with spectral initialization fails with 2**18 + 1 records #6370

Closed
beckernick opened this issue Feb 25, 2025 · 3 comments · Fixed by rapidsai/raft#2597
Assignees
Labels
? - Needs Triage Need team to review and classify bug Something isn't working

Comments

@beckernick
Copy link
Member

In the current 25.04 nightly (see environment below), UMAP with spectral init fails at 2**18+1 records.

from cuml.manifold import UMAP
from sklearn.datasets import make_blobs

N = 2**18 + 1
K = 20
C = 5

X, y = make_blobs(
    n_samples=N,
    n_features=K,
    centers=C,
)

clf = UMAP()
X_t = clf.fit_transform(X[:2**18])
print("2**18 succeeds with spectral init")

clf = UMAP(init="random")
X_t = clf.fit_transform(X[:2**18+1])
print("2**18 +1 succeeds with random init")

clf = UMAP()
X_t = clf.fit_transform(X[:2**18+1]) # 2**18 +1 fails with spectral init
[2025-02-25 13:47:49.467] [CUML] [info] Building knn graph using nn descent
2**18 succeeds with spectral init
[2025-02-25 13:47:51.002] [CUML] [info] Building knn graph using nn descent
2**18 +1 succeeds with random init
[2025-02-25 13:47:52.374] [CUML] [info] Building knn graph using nn descent
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[6], line 23
     20 print("2**18 +1 succeeds with random init")
     22 clf = UMAP()
---> 23 X_t = clf.fit_transform(X[:2**18+1]) # 2**18 +1 fails with spectral init

File [/raid/nicholasb/miniforge3/envs/cuml-25.04/lib/python3.12/site-packages/cuml/internals/api_decorators.py:193](http://10.176.1.125:8883/lab/tree/raid/nicholasb/raid/nicholasb/miniforge3/envs/cuml-25.04/lib/python3.12/site-packages/cuml/internals/api_decorators.py#line=192), in _make_decorator_function.<locals>.decorator_function.<locals>.decorator_closure.<locals>.wrapper(*args, **kwargs)
    190     set_api_output_dtype(output_dtype)
    192 if process_return:
--> 193     ret = func(*args, **kwargs)
    194 else:
    195     return func(*args, **kwargs)

File [/raid/nicholasb/miniforge3/envs/cuml-25.04/lib/python3.12/site-packages/cuml/internals/api_decorators.py:416](http://10.176.1.125:8883/lab/tree/raid/nicholasb/raid/nicholasb/miniforge3/envs/cuml-25.04/lib/python3.12/site-packages/cuml/internals/api_decorators.py#line=415), in enable_device_interop.<locals>.dispatch(self, *args, **kwargs)
    414 if hasattr(self, "dispatch_func"):
    415     func_name = gpu_func.__name__
--> 416     return self.dispatch_func(func_name, gpu_func, *args, **kwargs)
    417 else:
    418     return gpu_func(self, *args, **kwargs)

File [/raid/nicholasb/miniforge3/envs/cuml-25.04/lib/python3.12/site-packages/cuml/internals/api_decorators.py:195](http://10.176.1.125:8883/lab/tree/raid/nicholasb/raid/nicholasb/miniforge3/envs/cuml-25.04/lib/python3.12/site-packages/cuml/internals/api_decorators.py#line=194), in _make_decorator_function.<locals>.decorator_function.<locals>.decorator_closure.<locals>.wrapper(*args, **kwargs)
    193         ret = func(*args, **kwargs)
    194     else:
--> 195         return func(*args, **kwargs)
    197 return cm.process_return(ret)

File base.pyx:762, in cuml.internals.base.UniversalBase.dispatch_func()

File umap.pyx:767, in cuml.manifold.umap.UMAP.fit_transform()

File [/raid/nicholasb/miniforge3/envs/cuml-25.04/lib/python3.12/site-packages/cuml/internals/api_decorators.py:193](http://10.176.1.125:8883/lab/tree/raid/nicholasb/raid/nicholasb/miniforge3/envs/cuml-25.04/lib/python3.12/site-packages/cuml/internals/api_decorators.py#line=192), in _make_decorator_function.<locals>.decorator_function.<locals>.decorator_closure.<locals>.wrapper(*args, **kwargs)
    190     set_api_output_dtype(output_dtype)
    192 if process_return:
--> 193     ret = func(*args, **kwargs)
    194 else:
    195     return func(*args, **kwargs)

File [/raid/nicholasb/miniforge3/envs/cuml-25.04/lib/python3.12/site-packages/cuml/internals/api_decorators.py:416](http://10.176.1.125:8883/lab/tree/raid/nicholasb/raid/nicholasb/miniforge3/envs/cuml-25.04/lib/python3.12/site-packages/cuml/internals/api_decorators.py#line=415), in enable_device_interop.<locals>.dispatch(self, *args, **kwargs)
    414 if hasattr(self, "dispatch_func"):
    415     func_name = gpu_func.__name__
--> 416     return self.dispatch_func(func_name, gpu_func, *args, **kwargs)
    417 else:
    418     return gpu_func(self, *args, **kwargs)

File [/raid/nicholasb/miniforge3/envs/cuml-25.04/lib/python3.12/site-packages/cuml/internals/api_decorators.py:195](http://10.176.1.125:8883/lab/tree/raid/nicholasb/raid/nicholasb/miniforge3/envs/cuml-25.04/lib/python3.12/site-packages/cuml/internals/api_decorators.py#line=194), in _make_decorator_function.<locals>.decorator_function.<locals>.decorator_closure.<locals>.wrapper(*args, **kwargs)
    193         ret = func(*args, **kwargs)
    194     else:
--> 195         return func(*args, **kwargs)
    197 return cm.process_return(ret)

File base.pyx:762, in cuml.internals.base.UniversalBase.dispatch_func()

File umap.pyx:704, in cuml.manifold.umap.UMAP.fit()

RuntimeError: CUDA error encountered at: file=[/raid/nicholasb/miniforge3/envs/cuml-25.04/include/raft/linalg/detail/coalesced_reduction-inl.cuh](http://10.176.1.125:8883/lab/tree/raid/nicholasb/raid/nicholasb/miniforge3/envs/cuml-25.04/include/raft/linalg/detail/coalesced_reduction-inl.cuh) line=544:
conda list | grep "cuml\|cuvs\|raft"
# packages in environment at /raid/nicholasb/miniforge3/envs/cuml-25.04:
cuml                      25.04.00a76     cuda12_py312_250225_g790c80dfc_76    rapidsai-nightly
libcuml                   25.04.00a76     cuda12_250225_g790c80dfc_76    rapidsai-nightly
libcumlprims              25.04.00a10     cuda12_250225_g276c12f_10    rapidsai-nightly
libcuvs                   25.04.00a86     cuda12_250225_ga2a6a67_86    rapidsai-nightly
libraft                   25.04.00a37     cuda12_250225_gcb6fe7c7_37    rapidsai-nightly
libraft-headers           25.04.00a37     cuda12_250225_gcb6fe7c7_37    rapidsai-nightly
libraft-headers-only      25.04.00a37     cuda12_250225_gcb6fe7c7_37    rapidsai-nightly
pylibraft                 25.04.00a37     cuda12_py312_250225_gcb6fe7c7_37    rapidsai-nightly
raft-dask                 25.04.00a37     cuda12_py312_250225_gcb6fe7c7_37    rapidsai-nightly
@beckernick beckernick added ? - Needs Triage Need team to review and classify bug Something isn't working labels Feb 25, 2025
@wphicks
Copy link
Contributor

wphicks commented Feb 27, 2025

This is because of an invalid launch configuration for this kernel call. Setting the maximum here to 1024 instead of 65535 resolves the issue, but it's not immediately obvious to me why we need that low of a limit.

@jcrist
Copy link
Member

jcrist commented Feb 28, 2025

I pushed a PR up to RAFT (rapidsai/raft#2597) and can confirm that with that patch the above test case passes fine.

@wphicks
Copy link
Contributor

wphicks commented Feb 28, 2025

Confirmed that the fix is correct and approved the PR.

AyodeAwe pushed a commit to rapidsai/raft that referenced this issue Feb 28, 2025
…2597)

These were incorrectly ordered before, leading to errors on larger data
in `cuml.UMAP`.

Fixes rapidsai/cuml#6370.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
? - Needs Triage Need team to review and classify bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants