-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1D Fabric launch via device init #18712
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General question: do we need dedicated/unique sender and receiver test kernels between 1D and 2D fabric if they aren't directly interacting with fabric? Can the 1D just be a simplified version (e.g. some compile time args to disable mcasting in 4 directions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can have common kernels for testing once we have some abstraction layer that unifies the two? currently the setup is bit different, so I created a new file to avoid confusion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a git mv
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was initially, but then I split it to move the LineOpInterface into a helper file in ttnn itself
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some initial comments. I'm not familiar with topology so I'll spend a little time later looking at it but would obviously defer to the other experts on the team for that design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
git mv
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to omit all the reformatting from this PR? It'll simplify review and keep it more incremental
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didnt update the clang format file, so I think this is the result of that. I will see if this can be reverted.
FORCE_INLINE bool all_eth_packets_completed() const { | ||
return this->local_rdptr.is_caught_up_to(this->local_wrptr); | ||
[[nodiscard]] FORCE_INLINE bool connection_is_live() const { | ||
return *connection_live_semaphore == tt::fabric::EdmToEdmSender<0>::open_connection_value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was this grabbed from a branch of mine or something? Unsure why it's needed in the PR but I think I did make this change elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure how this change made it here, will have to dig a bit. I certainly did not pull anything from your branch though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized that the order of functions was changed probably because of clang formatting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you share the reason we need this here? What's the purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keeping this in ttnn would keep the current tests intact. We do not want this in metal since the fabric init has moved to device init. So until we port the tests, this can exist in ttnn as a helper to launch 1d fabric using sub device
87e09c3
to
265a94d
Compare
|
||
// Device ID -> EDM Builders | ||
std::unordered_map<size_t, std::vector<tt::tt_fabric::FabricEriscDatamoverBuilder>> edm_builders_forward_direction; | ||
std::unordered_map<size_t, std::vector<tt::tt_fabric::FabricEriscDatamoverBuilder>> edm_builders_backward_direction; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possible to make private?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a good chance that this file will be deprecated soon once the tests are migrated to the new init flow
// Device ID -> link index | ||
std::unordered_map<size_t, size_t> next_forward_direction_edm_available; | ||
std::unordered_map<size_t, size_t> next_backward_direction_edm_available; | ||
|
||
std::vector<tt::tt_metal::IDevice*> device_sequence; | ||
std::vector<tt::tt_metal::Program*> programs; | ||
|
||
size_t num_links; | ||
size_t buffer_size_bytes; | ||
size_t firmware_context_switch_interval = | ||
tt::tt_fabric::FabricEriscDatamoverBuilder::default_firmware_context_switch_interval; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add suffix _
for data members
size_t num_links; | ||
size_t buffer_size_bytes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
initialize to 0 to avoid UB
for (size_t i = 0; i < device_sequence.size(); i++) { | ||
if (device_sequence[i] == device) { | ||
return i; | ||
} | ||
} | ||
TT_THROW("Device {} not found in device sequence of line fabric", device->id()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possible to move to.cpp?
95af811
to
a97630c
Compare
if (routing_direction == direction) { | ||
return eth_chans; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be safer to assert here instead? Or are callers of get_active_fabric_eth_channels_in_direction
responsible for checking for no eth channels in direction
@@ -83,4 +85,12 @@ class FabricFixture : public ::testing::Test { | |||
} | |||
}; | |||
|
|||
class FabricFixture : public BaseFabricFixture { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we call this Fabric2DFixture
?
Ticket
Link to Github Issue
Problem description
1D Fabric (EDM Fabric) is currently launched as a subdevice. This change moves the initialization of 1D Fabric to device init.
What's changed
tt::tt_fabric
namespace and got rid oftt::fabric
Things to be covered in follow-up PR(s):
Checklist