Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes to support the v8 interface of the NCCL plugin interface #365

Merged
merged 3 commits into from
Mar 30, 2024

Conversation

rajachan
Copy link
Member

A more succint version of #345

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

rajachan and others added 2 commits March 21, 2024 18:31
Sync with headers from upstream NCCL 2.20.5 and pull in the v8
definitions.

Signed-off-by: Raghu Raja <raghunch@amazon.com>
Commit 10383ce ("Split api and core source files") introduced
nccl_net_ofi_regMr_sizet() with exact same code as nccl_net_ofi_regMr()
besides receiving the size argument as size_t rather than int.

NCCL net_plugin v8 changes the regMr() API to have a size_t size
argument simialr to nccl_net_ofi_regMr_sizet().

Change standard nccl_net_ofi_regMr() to have a size_t size argument
and introduce a v4-v7 compatible API that have a int size argument
but just calls the standard API after casting the size argument.

Remove nccl_net_ofi_regMr_sizet() from the code as it is same as
v4-v7 compatible regMr() API.

Signed-off-by: Liran Alon <liran@amazon.com>
@rajachan rajachan requested review from bwbarrett and a team as code owners March 21, 2024 19:00
@rajachan rajachan force-pushed the move-up-to-v8 branch 2 times, most recently from 6bc417d to 8b1777b Compare March 21, 2024 19:14
Two key changes:
- regMr size changed from int -> size_t
- A new `regIsGlobal` property which is used by NCCL to determine
  support for user registrations. The plugin now determines this via the
  mr_mode bit providers set to define the scope of a MR (domain-level or
  endpoint-level).

Signed-off-by: Raghu Raja <raghunch@amazon.com>
Signed-off-by: Liran Alon <liran@amazon.com>
Signed-off-by: Raghu Raja <raghunch@amazon.com>
@rajachan rajachan added the BuildTriggerRequest CI build will be triggered when this label is set label Mar 21, 2024
@rauteric rauteric added BuildTriggerRequest CI build will be triggered when this label is set and removed BuildTriggerRequest CI build will be triggered when this label is set labels Mar 26, 2024
@bwbarrett bwbarrett added BuildTriggerRequest CI build will be triggered when this label is set and removed BuildTriggerRequest CI build will be triggered when this label is set labels Mar 27, 2024
@rajachan rajachan added BuildTriggerRequest CI build will be triggered when this label is set and removed BuildTriggerRequest CI build will be triggered when this label is set labels Mar 28, 2024
@rauteric rauteric added BuildTriggerRequest CI build will be triggered when this label is set and removed BuildTriggerRequest CI build will be triggered when this label is set labels Mar 29, 2024
@bwbarrett bwbarrett merged commit a04d366 into aws:master Mar 30, 2024
13 checks passed
@rajachan rajachan deleted the move-up-to-v8 branch April 1, 2024 15:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BuildTriggerRequest CI build will be triggered when this label is set
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants