Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RELEASE] rmm v24.02 #1440

Merged
merged 31 commits into from
Feb 12, 2024
Merged

[RELEASE] rmm v24.02 #1440

merged 31 commits into from
Feb 12, 2024

Conversation

raydouglass
Copy link
Member

❄️ Code freeze for branch-24.02 and v24.02 release

What does this mean?

Only critical/hotfix level issues should be merged into branch-24.02 until release (merging of this PR).

What is the purpose of this PR?

  • Update documentation
  • Allow testing for the new release
  • Enable a means to merge branch-24.02 into main for the release

raydouglass and others added 29 commits November 9, 2023 16:27
Forward-merge branch-23.12 to branch-24.02
Forward-merge branch-23.12 to branch-24.02
Forward-merge branch-23.12 to branch-24.02
Forward-merge branch-23.12 to branch-24.02
Forward-merge branch-23.12 to branch-24.02
…#1395)

Replaces #1394, this is targeted for 24.02.

fixes #1393

In Spark with the Spark Rapids accelerator using cudf 23.12 snapshot we have an application that is reading ORC files, doing some light processing and then writing ORC files. It consistently fails while doing the ORC write with:

```
terminate called after throwing an instance of 'rmm::logic_error'
  what():  RMM failure at:/home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-594-cuda11/thirdparty/cudf/cpp/build/_deps/rmm-src/include/rmm/mr/device/arena_memory_resource.hpp:238: allocation not found
```

The underlying issue is brought about because Spark with the Rapids accelerate is using ARENA allocator with per default streams enabled.  CUDF recently added its own stream pool that is used in addition to when per default streams are used.  
It's now possible to use per thread default streams along with another pool of streams. This means that it's possible for an arena to move from a thread or stream arena back  into the global arena during a defragmentation and then move down into another arena type. For instance, thread arena -> global arena -> stream arena. If this happens and  there was an allocation from it while it was a thread arena, we now have to check to see if the allocation is part of a stream arena.

I added a test here. I was trying to make sure that all the allocations were now in stream arenas, if there is a better way to do this please let me know.

Authors:
  - Thomas Graves (https://github.com/tgravescs)

Approvers:
  - Lawrence Mitchell (https://github.com/wence-)
  - Bradley Dice (https://github.com/bdice)
  - Rong Ou (https://github.com/rongou)
  - Mark Harris (https://github.com/harrism)

URL: #1395
Updates code for `RMM_CUDA_TRY_ALLOC` macro in `detail/error.hpp` to eliminate a clang-tidy error.

Authors:
  - Mark Harris (https://github.com/harrism)

Approvers:
  - Rong Ou (https://github.com/rongou)
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: #1391
This PR updates to fmt 10.1.1 and spdlog 1.12.

Depends on rapidsai/rapids-cmake#473.

Closes #1356

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Jake Awe (https://github.com/AyodeAwe)

URL: #1374
Forward-merge branch-23.12 to branch-24.02
Forward-merge branch-23.12 to branch-24.02
Forward-merge branch-23.12 to branch-24.02
Some minor simplification in advance of the scikit-build-core migration to better align wheel and non-wheel Python builds.

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Robert Maynard (https://github.com/robertmaynard)
  - Ray Douglass (https://github.com/raydouglass)

URL: #1401
This PR updates cuda-python. The CUDA 11 build was locked to an outdated version (11.7.1). This matches the specifications in dependencies.yaml and also cudf recipes.

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Ray Douglass (https://github.com/raydouglass)

URL: #1406
Contributes to rapidsai/build-planning#2

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - AJ Schmidt (https://github.com/ajschmidt8)

URL: #1287
This PR moves the definition of `python>=3.9,<3.11` into the `py_version` dependency list, under the empty (fallback) matrix. This change aligns RMM's `dependencies.yaml` with other RAPIDS repositories.

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Vyas Ramasubramani (https://github.com/vyasr)
  - Ray Douglass (https://github.com/raydouglass)

URL: #1409
Since pytorch/pytorch#91398, the signature of the pluggable allocate and deallocate functions must accept the device id. The current version only accepts a device id for allocate, which means that when using a stream ordered allocator with devices other than device zero, we pass an invalid stream into the deallocation function. To fix this, adapt the signature to match the one pytorch expects.

Now, since we have the device available during allocation and deallocation, we would like to use that device to obtain the appropriate memory resource.

Unfortunately, since RMM's cuda_device_id does not have a nullary constructor, we can't use it in Cython without some hacky workarounds.

However, since we don't actually need to build a Python module, but rather just a single shared library that offers two extern "C" functions, let's just write our allocator hooks directly in C++.

- Closes #1405

Authors:
  - Lawrence Mitchell (https://github.com/wence-)

Approvers:
  - Mark Harris (https://github.com/harrism)
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: #1407
We are dropping Pascal support in 24.02 (see rapidsai/rapids-cmake#482) 

This PR changes the way we document GPU support in RMM to explain what is tested and supported rather than what is required (since it may work on earlier hardware than we test/support).

Authors:
  - Mark Harris (https://github.com/harrism)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: #1413
We no longer require separate librmm doc builds since they are incorporated into the Sphinx build now.

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Ray Douglass (https://github.com/raydouglass)

URL: #1415
This PR updates RMM to CCCL 2.2.0. Do not merge until all of RAPIDS is ready to update.

Depends on rapidsai/rapids-cmake#495.

Replaces #1247.

Authors:
  - Bradley Dice (https://github.com/bdice)
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Mark Harris (https://github.com/harrism)
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: #1404
This PR updates `dependencies.yaml` so that generic CUDA 12.* dependencies can be specified with a glob, like `cuda: "12.*"`. This feature requires `rapids-dependency-file-generator>=1.8.0`, so the pre-commit hook has been updated.

I have not yet added support for a specific CUDA version like 12.1 or 12.2. That can be done separately.

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Mark Harris (https://github.com/harrism)
  - AJ Schmidt (https://github.com/ajschmidt8)

URL: #1414
Removes remaining references to `setup.py` in documentation.

This project no longer has a `setup.py` as of its switch to `pyproject.toml` + `scikit-build-core` (see #1287, #1300).

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Lawrence Mitchell (https://github.com/wence-)
  - Mark Harris (https://github.com/harrism)

URL: #1420
This is a follow-up PR to #1414. I thought some more about how to separate `cuda-version` pinnings (which control the CUDA version we use to build and test in conda) from actual CUDA Toolkit package dependencies (which we can handle according to only the major version 11/12). I discussed this PR on a call with @jameslamb in the context of upgrading to CUDA 12.2 (rapidsai/build-planning#6). This set of changes is mostly important for conda builds/tests, since `cuda-version` only controls conda. The pip wheel build/test process is unchanged, since its CUDA versions are controlled by the `shared-workflows` CI images.

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - https://github.com/jakirkham
  - Vyas Ramasubramani (https://github.com/vyasr)
  - Ray Douglass (https://github.com/raydouglass)

URL: #1422
Reference: rapidsai/ops#2766

Replace rapids-env-update with rapids-configure-conda-channels,
rapids-configure-sccache, and rapids-date-string.

Authors:
  - Kyle Edwards (https://github.com/KyleFromNVIDIA)

Approvers:
  - AJ Schmidt (https://github.com/ajschmidt8)

URL: #1423
…ings out of detail namespace (#1417)

Fixes #1416. 

 - ~Deprecates existing ctors of `pool_memory_resource` that provide optional parameter for the initial pool size.~
 - Adds new ctors that require an explicit initial pool size.
 - We don't yet deprecate anything in this PR because that would break builds of some RAPIDS libraries. We will follow up with PRs to cuDF, cuGraph and anything else needed to remove deprecated usages after this PR is merged.
 - Adds a new utility `fraction_of_available_device_memory` that calculates the specified fraction of free memory on the current CUDA device. This is now used in tests to provide an explicit pool size and can be used to produce the previous behavior of `pool_memory_resource` for consumers of the library.
 - Moves `available_device_memory` from a detail header to `cuda_device.hpp` so it is now publicly usable, along with the above utility.
 - Temporarily adds `detail::available_device_memory` as an alias of the above in order to keep cudf and cugraph building until we can update them.
 - Duplicates commonly externally used alignment functions that are currently in `rmm::detail` to the public `rmm` namespace.  The detail versions will be removed after cuDF and cuGraph are updated to not use them.

Authors:
  - Mark Harris (https://github.com/harrism)
  - Lawrence Mitchell (https://github.com/wence-)

Approvers:
  - Michael Schellenberger Costa (https://github.com/miscco)
  - Lawrence Mitchell (https://github.com/wence-)
  - Jake Hemstad (https://github.com/jrhemstad)

URL: #1417
…ilities, and optional pool_memory_resource initial size (#1424)

Follow-on to #1417, this PR deprecates the following:

 - `rmm::detail::available_device_memory` in favor of rmm::available_device_memory
 - `rmm::detail::is_aligned`, `rmm::detail::align_up` and related alignment utility functions in favor of the `rmm::` top level namespace versions.
 - The `rmm::pool_memory_resource` constructors that take an optional initial size parameter.

Should be merged after the following:
 - rapidsai/cugraph#4086
 - rapidsai/cudf#14741
 - rapidsai/raft#2088

Authors:
  - Mark Harris (https://github.com/harrism)

Approvers:
  - Michael Schellenberger Costa (https://github.com/miscco)
  - Rong Ou (https://github.com/rongou)

URL: #1424
…ool_memory_resource`. (#1392)

Depends on #1417

Adds a new `host_pinned_memory_resource` that implements the new `cuda::mr::memory_resource` and `cuda::mr::async_memory_resource` concepts which makes it usable as an upstream MR for `rmm::mr::device_memory_resource`. 

Also tests a pool made with this new MR as the upstream.

Note that the tests explicitly set the initial and maximum pool sizes as using the defaults does not currently work. See #1388 .

Closes #618

Authors:
  - Mark Harris (https://github.com/harrism)
  - Lawrence Mitchell (https://github.com/wence-)

Approvers:
  - Michael Schellenberger Costa (https://github.com/miscco)
  - Alessandro Bellina (https://github.com/abellina)
  - Lawrence Mitchell (https://github.com/wence-)
  - Jake Hemstad (https://github.com/jrhemstad)
  - Bradley Dice (https://github.com/bdice)

URL: #1392
…nfo() nonvirtual. Remove derived implementations and calls in RMM (#1430)

Closes #1426

As part of #1388, this PR contributes to deprecating and removing all `get_mem_info` functionality from memory resources.  This first PR makes these methods optional without deprecating them. 

 - Makes `rmm::mr::device_memory_resource::supports_get_mem_info()` nonvirtual (and always return false) 
 - Makes `rmm::mr::device_memory_resource::do_get_mem_info()` nonvirtual (and always return `{0, 0}`).
 - Removes all derived implementations of the above.
 - Removes all calls to the above.

Authors:
  - Mark Harris (https://github.com/harrism)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: #1430
@raydouglass raydouglass requested a review from a team as a code owner January 26, 2024 19:18
@raydouglass raydouglass requested review from a team as code owners January 26, 2024 19:18
@raydouglass raydouglass requested review from cwharris and vyasr January 26, 2024 19:18
@github-actions github-actions bot added CMake Python Related to RMM Python API conda cpp Pertains to C++ code ci labels Jan 26, 2024
@raydouglass raydouglass merged commit f9f1bee into main Feb 12, 2024
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci CMake conda cpp Pertains to C++ code Python Related to RMM Python API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants