-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RELEASE] rmm v24.02 #1440
Merged
Merged
[RELEASE] rmm v24.02 #1440
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Forward-merge branch-23.12 to branch-24.02
Forward-merge branch-23.12 to branch-24.02
Forward-merge branch-23.12 to branch-24.02
Forward-merge branch-23.12 to branch-24.02
Forward-merge branch-23.12 to branch-24.02
…#1395) Replaces #1394, this is targeted for 24.02. fixes #1393 In Spark with the Spark Rapids accelerator using cudf 23.12 snapshot we have an application that is reading ORC files, doing some light processing and then writing ORC files. It consistently fails while doing the ORC write with: ``` terminate called after throwing an instance of 'rmm::logic_error' what(): RMM failure at:/home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-594-cuda11/thirdparty/cudf/cpp/build/_deps/rmm-src/include/rmm/mr/device/arena_memory_resource.hpp:238: allocation not found ``` The underlying issue is brought about because Spark with the Rapids accelerate is using ARENA allocator with per default streams enabled. CUDF recently added its own stream pool that is used in addition to when per default streams are used. It's now possible to use per thread default streams along with another pool of streams. This means that it's possible for an arena to move from a thread or stream arena back into the global arena during a defragmentation and then move down into another arena type. For instance, thread arena -> global arena -> stream arena. If this happens and there was an allocation from it while it was a thread arena, we now have to check to see if the allocation is part of a stream arena. I added a test here. I was trying to make sure that all the allocations were now in stream arenas, if there is a better way to do this please let me know. Authors: - Thomas Graves (https://github.com/tgravescs) Approvers: - Lawrence Mitchell (https://github.com/wence-) - Bradley Dice (https://github.com/bdice) - Rong Ou (https://github.com/rongou) - Mark Harris (https://github.com/harrism) URL: #1395
Updates code for `RMM_CUDA_TRY_ALLOC` macro in `detail/error.hpp` to eliminate a clang-tidy error. Authors: - Mark Harris (https://github.com/harrism) Approvers: - Rong Ou (https://github.com/rongou) - Vyas Ramasubramani (https://github.com/vyasr) URL: #1391
This PR updates to fmt 10.1.1 and spdlog 1.12. Depends on rapidsai/rapids-cmake#473. Closes #1356 Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Jake Awe (https://github.com/AyodeAwe) URL: #1374
Forward-merge branch-23.12 to branch-24.02
Forward-merge branch-23.12 to branch-24.02
Forward-merge branch-23.12 to branch-24.02
Some minor simplification in advance of the scikit-build-core migration to better align wheel and non-wheel Python builds. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Robert Maynard (https://github.com/robertmaynard) - Ray Douglass (https://github.com/raydouglass) URL: #1401
This PR updates cuda-python. The CUDA 11 build was locked to an outdated version (11.7.1). This matches the specifications in dependencies.yaml and also cudf recipes. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Ray Douglass (https://github.com/raydouglass) URL: #1406
Contributes to rapidsai/build-planning#2 Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Bradley Dice (https://github.com/bdice) - AJ Schmidt (https://github.com/ajschmidt8) URL: #1287
This PR moves the definition of `python>=3.9,<3.11` into the `py_version` dependency list, under the empty (fallback) matrix. This change aligns RMM's `dependencies.yaml` with other RAPIDS repositories. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Ray Douglass (https://github.com/raydouglass) URL: #1409
Since pytorch/pytorch#91398, the signature of the pluggable allocate and deallocate functions must accept the device id. The current version only accepts a device id for allocate, which means that when using a stream ordered allocator with devices other than device zero, we pass an invalid stream into the deallocation function. To fix this, adapt the signature to match the one pytorch expects. Now, since we have the device available during allocation and deallocation, we would like to use that device to obtain the appropriate memory resource. Unfortunately, since RMM's cuda_device_id does not have a nullary constructor, we can't use it in Cython without some hacky workarounds. However, since we don't actually need to build a Python module, but rather just a single shared library that offers two extern "C" functions, let's just write our allocator hooks directly in C++. - Closes #1405 Authors: - Lawrence Mitchell (https://github.com/wence-) Approvers: - Mark Harris (https://github.com/harrism) - Vyas Ramasubramani (https://github.com/vyasr) URL: #1407
We are dropping Pascal support in 24.02 (see rapidsai/rapids-cmake#482) This PR changes the way we document GPU support in RMM to explain what is tested and supported rather than what is required (since it may work on earlier hardware than we test/support). Authors: - Mark Harris (https://github.com/harrism) Approvers: - Bradley Dice (https://github.com/bdice) URL: #1413
We no longer require separate librmm doc builds since they are incorporated into the Sphinx build now. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Ray Douglass (https://github.com/raydouglass) URL: #1415
This PR updates RMM to CCCL 2.2.0. Do not merge until all of RAPIDS is ready to update. Depends on rapidsai/rapids-cmake#495. Replaces #1247. Authors: - Bradley Dice (https://github.com/bdice) - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Mark Harris (https://github.com/harrism) - Vyas Ramasubramani (https://github.com/vyasr) URL: #1404
This PR updates `dependencies.yaml` so that generic CUDA 12.* dependencies can be specified with a glob, like `cuda: "12.*"`. This feature requires `rapids-dependency-file-generator>=1.8.0`, so the pre-commit hook has been updated. I have not yet added support for a specific CUDA version like 12.1 or 12.2. That can be done separately. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Mark Harris (https://github.com/harrism) - AJ Schmidt (https://github.com/ajschmidt8) URL: #1414
Removes remaining references to `setup.py` in documentation. This project no longer has a `setup.py` as of its switch to `pyproject.toml` + `scikit-build-core` (see #1287, #1300). Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Lawrence Mitchell (https://github.com/wence-) - Mark Harris (https://github.com/harrism) URL: #1420
This is a follow-up PR to #1414. I thought some more about how to separate `cuda-version` pinnings (which control the CUDA version we use to build and test in conda) from actual CUDA Toolkit package dependencies (which we can handle according to only the major version 11/12). I discussed this PR on a call with @jameslamb in the context of upgrading to CUDA 12.2 (rapidsai/build-planning#6). This set of changes is mostly important for conda builds/tests, since `cuda-version` only controls conda. The pip wheel build/test process is unchanged, since its CUDA versions are controlled by the `shared-workflows` CI images. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - https://github.com/jakirkham - Vyas Ramasubramani (https://github.com/vyasr) - Ray Douglass (https://github.com/raydouglass) URL: #1422
Reference: rapidsai/ops#2766 Replace rapids-env-update with rapids-configure-conda-channels, rapids-configure-sccache, and rapids-date-string. Authors: - Kyle Edwards (https://github.com/KyleFromNVIDIA) Approvers: - AJ Schmidt (https://github.com/ajschmidt8) URL: #1423
…ings out of detail namespace (#1417) Fixes #1416. - ~Deprecates existing ctors of `pool_memory_resource` that provide optional parameter for the initial pool size.~ - Adds new ctors that require an explicit initial pool size. - We don't yet deprecate anything in this PR because that would break builds of some RAPIDS libraries. We will follow up with PRs to cuDF, cuGraph and anything else needed to remove deprecated usages after this PR is merged. - Adds a new utility `fraction_of_available_device_memory` that calculates the specified fraction of free memory on the current CUDA device. This is now used in tests to provide an explicit pool size and can be used to produce the previous behavior of `pool_memory_resource` for consumers of the library. - Moves `available_device_memory` from a detail header to `cuda_device.hpp` so it is now publicly usable, along with the above utility. - Temporarily adds `detail::available_device_memory` as an alias of the above in order to keep cudf and cugraph building until we can update them. - Duplicates commonly externally used alignment functions that are currently in `rmm::detail` to the public `rmm` namespace. The detail versions will be removed after cuDF and cuGraph are updated to not use them. Authors: - Mark Harris (https://github.com/harrism) - Lawrence Mitchell (https://github.com/wence-) Approvers: - Michael Schellenberger Costa (https://github.com/miscco) - Lawrence Mitchell (https://github.com/wence-) - Jake Hemstad (https://github.com/jrhemstad) URL: #1417
…ilities, and optional pool_memory_resource initial size (#1424) Follow-on to #1417, this PR deprecates the following: - `rmm::detail::available_device_memory` in favor of rmm::available_device_memory - `rmm::detail::is_aligned`, `rmm::detail::align_up` and related alignment utility functions in favor of the `rmm::` top level namespace versions. - The `rmm::pool_memory_resource` constructors that take an optional initial size parameter. Should be merged after the following: - rapidsai/cugraph#4086 - rapidsai/cudf#14741 - rapidsai/raft#2088 Authors: - Mark Harris (https://github.com/harrism) Approvers: - Michael Schellenberger Costa (https://github.com/miscco) - Rong Ou (https://github.com/rongou) URL: #1424
…ool_memory_resource`. (#1392) Depends on #1417 Adds a new `host_pinned_memory_resource` that implements the new `cuda::mr::memory_resource` and `cuda::mr::async_memory_resource` concepts which makes it usable as an upstream MR for `rmm::mr::device_memory_resource`. Also tests a pool made with this new MR as the upstream. Note that the tests explicitly set the initial and maximum pool sizes as using the defaults does not currently work. See #1388 . Closes #618 Authors: - Mark Harris (https://github.com/harrism) - Lawrence Mitchell (https://github.com/wence-) Approvers: - Michael Schellenberger Costa (https://github.com/miscco) - Alessandro Bellina (https://github.com/abellina) - Lawrence Mitchell (https://github.com/wence-) - Jake Hemstad (https://github.com/jrhemstad) - Bradley Dice (https://github.com/bdice) URL: #1392
…nfo() nonvirtual. Remove derived implementations and calls in RMM (#1430) Closes #1426 As part of #1388, this PR contributes to deprecating and removing all `get_mem_info` functionality from memory resources. This first PR makes these methods optional without deprecating them. - Makes `rmm::mr::device_memory_resource::supports_get_mem_info()` nonvirtual (and always return false) - Makes `rmm::mr::device_memory_resource::do_get_mem_info()` nonvirtual (and always return `{0, 0}`). - Removes all derived implementations of the above. - Removes all calls to the above. Authors: - Mark Harris (https://github.com/harrism) Approvers: - Bradley Dice (https://github.com/bdice) - Vyas Ramasubramani (https://github.com/vyasr) URL: #1430
harrism
approved these changes
Jan 26, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
❄️ Code freeze for
branch-24.02
and v24.02 releaseWhat does this mean?
Only critical/hotfix level issues should be merged into
branch-24.02
until release (merging of this PR).What is the purpose of this PR?
branch-24.02
intomain
for the release