Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rdma: (fix) memory leak: don't alloc recv_conn_resp_req on EAGAIN #388

Merged
merged 1 commit into from
Apr 13, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 8 additions & 8 deletions src/nccl_ofi_rdma.c
Original file line number Diff line number Diff line change
Expand Up @@ -3800,10 +3800,6 @@ static int accept(nccl_net_ofi_listen_comm_t *listen_comm,
/* Reset request state for connect response message */
prepare_send_conn_resp_req(l_comm);

l_comm->stage = COMM_SEND_CONN;

case COMM_SEND_CONN:

/* Initialize connect response message */
ret = prepare_conn_resp(ep, l_comm, dev_id);
if (ret != 0) {
Expand All @@ -3816,6 +3812,10 @@ static int accept(nccl_net_ofi_listen_comm_t *listen_comm,
/* Send r_comm's remote comm ID */
conn_msg->remote_comm_id = r_comm->remote_comm_id;

l_comm->stage = COMM_SEND_CONN;

case COMM_SEND_CONN:

/* COMM_SEND_CONN: Send connect response message to remote */
ret = post_send_conn_resp(r_comm, conn_msg, device, ep, req);
if (ret == -FI_EAGAIN) {
Expand Down Expand Up @@ -5169,17 +5169,17 @@ static int connect(nccl_net_ofi_ep_t *base_ep,
}
comm_state->req = &req->base;

comm_state->stage = COMM_SEND_CONN;

case COMM_SEND_CONN:

/* Prepare request to receive connect response message */
s_comm->conn_resp_req = prepare_recv_conn_resp_req(s_comm);
if (OFI_UNLIKELY(s_comm->conn_resp_req == NULL)) {
send_close(s_comm);
return -EINVAL;
}

comm_state->stage = COMM_SEND_CONN;

case COMM_SEND_CONN:

/* COMM_SEND_CONN: Post a connect message to send peer connections */
ret = post_send_conn(s_comm, device, ep, req);
if (ret == -FI_EAGAIN) {
Expand Down
Loading