Skip to content

Commit

Permalink
fix(rdma): Set minimum size of the eager message freelist to one
Browse files Browse the repository at this point in the history
After the freelist separation between the eager and data message, we
should use a freelist of at-least one byte to avoid assertion on the
eager message when running with OFI_NCCL_EAGER_MAX_SIZE=0

Signed-off-by: Arun Karthik <akkart@amazon.com>
  • Loading branch information
arunkarthik-akkart committed Feb 20, 2025
1 parent 775d15a commit 5721fbf
Showing 1 changed file with 5 additions and 1 deletion.
6 changes: 5 additions & 1 deletion src/nccl_ofi_rdma.c
Original file line number Diff line number Diff line change
Expand Up @@ -6258,7 +6258,11 @@ static inline int init_rx_buffers(nccl_net_ofi_rdma_ep_t *ep)
return ret;
}

ret = nccl_ofi_freelist_init_mr(ep->eager_rx_buff_size,
/* Set the eager freelist buffer size to at least the maximum of EAGER_RX_BUFFER_ALIGNMENT
* and eager_rx_buff_size. This ensures the freelist maintains a minimum size equal to
* EAGER_RX_BUFFER_ALIGNMENT even when OFI_NCCL_EAGER_MAX_SIZE is set to 0.
*/
ret = nccl_ofi_freelist_init_mr(NCCL_OFI_MAX(EAGER_RX_BUFFER_ALIGNMENT, ep->eager_rx_buff_size),
ofi_nccl_rdma_min_posted_bounce_buffers(), 16, 0,
freelist_regmr_host_fn, freelist_deregmr_host_fn,
ep, EAGER_RX_BUFFER_ALIGNMENT, &ep->eager_rx_buff_fl);
Expand Down

0 comments on commit 5721fbf

Please sign in to comment.