-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move rdma control channel traffic to its own endpoint #321
Conversation
cfe79c6
to
da3fb7c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't finish reviewing yet but left comments on the last commit.
da3fb7c
to
c2143e6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots of changes, so I'd need to read through again to digest it all, but the primary change of moving to rail 0 for ctrl and the ofiutil changes looks good to me.
c2143e6
to
16784f0
Compare
rebased against the master branch and resolved conflicts from the id pool changes. |
16784f0
to
8a22c74
Compare
8a22c74
to
91d3d8e
Compare
Fixed an |
/* look for control messages and then retry the message search | ||
to avoid unnecessary polling / queueing. */ | ||
if (OFI_UNLIKELY(!polled_cq && !have_ctrl)) { | ||
ofi_process_cq_rail(ep, &ep->control_rail); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check return value
@@ -4659,6 +4661,14 @@ static int send(nccl_net_ofi_send_comm_t *send_comm, void *data, int size, int t | |||
goto error; | |||
} | |||
|
|||
/* look for control messages and then retry the message search | |||
to avoid unnecessary polling / queueing. */ | |||
if (OFI_UNLIKELY(!polled_cq && !have_ctrl)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this unlikely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it's going to take forever, so optimize for the fast path.
91d3d8e
to
0300d4e
Compare
Converting this PR to a draft. #344 includes all but the last two patches, which actually create the control QP/CQ. Those two patches impact performance in a way that needs more work. Leaving this PR so that we don't lose the work, but in the mean time, we'll get the refactoring code in as part of #344. |
0300d4e
to
a088255
Compare
Please merge master. |
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
For the long message RDMA protocol, we want to make sure that we never starve the sender for data to move, which means prioritizing control messages from the receiver to the sender. This patch moves both the communicator setup and recv control messages to a new endpoint, which is always on device rail 0. Future patches will optimize polling of the control message cq in the send path and setting priority bits on the control cq. Signed-off-by: Brian Barrett <bbarrett@amazon.com>
If a match isn't found for the current send, poll the control cq to see if the match can be found. While this extends the current send() call, it potentially lowers the time until data transfer starts. Signed-off-by: Brian Barrett <bbarrett@amazon.com>
a088255
to
cd4a393
Compare
This is now so hopelessly out of date that I'm going to close the PR. @rajachan is working on refactoring to work with HEAD of master. |
This is a series of cleanup patches that includes splitting some of the initialization of Libfabric between the send/recv and rdma transports, with the end goal of making it easier to move the control channel data from the data endpoint to its own endpoint (always on "NIC 0"), so that we can eventually increase the priority level of control messages over data messages (when Libfabric/rdma-core supports the service level feature through to the card).
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.