-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#18006: Add push router support to tt-fabric. #18564
Conversation
How does the programming model change when you use the push method in this? Do we need need to allocate space for the headers? |
6394ef9
to
865a5cd
Compare
It is the same in this PR. You need to allocate packet header space in the data buffer. |
@@ -37,7 +40,649 @@ inline uint64_t get_timestamp() { | |||
return (((uint64_t)timestamp_high) << 32) | timestamp_low; | |||
} | |||
|
|||
typedef struct fvc_consumer_state { | |||
typedef struct fvc_outbound_push_state { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about having base struct and inherit for both push and pull. There are same variables and copy&paste. and also #ifdef
around code could be removed by having big #ifdef
in this file.
#ifdef FVC_MODE_PULL | ||
fvc_outbound_pull_state_t fvc_outbound_state __attribute__((aligned(16))); // replicate for each fvc | ||
#else | ||
// fvc_inbound_push_state_t fvc_inbound_state; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
FABRIC_ROUTER_DATA_BUF_START + (fvc_data_buf_size_words * PACKET_WORD_SIZE_BYTES / 2), | ||
fvc_data_buf_size_words / 2); | ||
#else | ||
fvc_outbound_state[0].init( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can be for loop
router_addr += direction * 8; | ||
// stream register to receive router buffer space available updates. | ||
uint64_t xy_local_addr = get_noc_addr(0); | ||
noc_inline_dw_write(router_addr, (STREAM_REG_ADDR(0, STREAM_REMOTE_DEST_BUF_SPACE_AVAILABLE_UPDATE_REG_INDEX))); | ||
noc_inline_dw_write(router_addr + 4, xy_local_addr >> 32); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are these magic number. 8 and 4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will update comments. NOC addresses are 64-bits, 8 bytes. To calculate respective offset for a direction, its direction * 8.
line 187 is writing lower/upper 4 bytes of the 8 bytes with two noc inline dword writes.
tt_metal/fabric/hw/inc/tt_fabric.h
Outdated
// set the stream auto increment to use on remote router according to | ||
// this router's direction. | ||
router_push_addr = | ||
(STREAM_REG_ADDR(6 + my_direction, STREAM_REMOTE_DEST_BUF_SPACE_AVAILABLE_UPDATE_REG_INDEX)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
magic number 6?
df3fff8
to
532011d
Compare
@@ -1410,6 +1413,7 @@ int main(int argc, char **argv) { | |||
bool fixed_async_wr_notif_addr = test_args::has_command_option(input_args, "--fixed_async_wr_notif_addr"); | |||
|
|||
benchmark_mode = test_args::has_command_option(input_args, "--benchmark"); | |||
push_mode = test_args::has_command_option(input_args, "--push_router"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add tests for the push model to CI?
We have:
Post Commit on N300: tests/scripts/run_cpp_fabric_tests.sh
T3k unit tests: tests/scripts/t3000/run_t3000_unit_tests.sh
TG unit tests: tests/scripts/tg/run_tg_unit_tests.sh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added tests to CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
made some comments to ease uplift to BH
532011d
to
691bada
Compare
packet_word_0 = (volatile uint32_t*)get_local_buffer_read_addr(); | ||
packet_word_1 = | ||
reinterpret_cast<tt_l1_ptr uint32_t*>(buffer_start + (out_rdptr_inc<fvc_mode>(1) * PACKET_WORD_SIZE_BYTES)); | ||
packet_word_2 = | ||
reinterpret_cast<tt_l1_ptr uint32_t*>(buffer_start + (out_rdptr_inc<fvc_mode>(2) * PACKET_WORD_SIZE_BYTES)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are these 0,1,2 for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are pointers to first 3 packet words which point to packet header. It accounts for the case where the header might be wrapping at the end of the buffer.
tt_metal/fabric/hw/inc/tt_fabric.h
Outdated
uint32_t temp = packet_word_0[0]; | ||
sender_buffer_index = temp >> 30; | ||
temp &= 0x3FFFFFFF; | ||
packet_words_remaining = (temp + PACKET_WORD_SIZE_BYTES - 1) >> 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this 'temp' looks having more meaning than temporal value. What is this?
} | ||
|
||
uint32_t get_next_hop_router_noc_xy() { | ||
uint32_t dst_mesh_id = packet_word_0[1] & 0xFFFF; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this indexing and masking?
Looks like this packet_word_* is accessing header values? If so I want more explicit accessor. getter or casting
loop_count = 0; | ||
} else if ( | ||
*fvc_outbound_state[next_outbound_buffer].noc_word_credits || | ||
*fvc_outbound_state[next_outbound_buffer].sender_words_cleared) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use getter? get_num_words_cleared
or wrap these two check in one method?
} else if ( | ||
*fvc_outbound_state[next_outbound_buffer].noc_word_credits || | ||
*fvc_outbound_state[next_outbound_buffer].sender_words_cleared) { | ||
curr_outbound_buffer = next_outbound_buffer; | ||
next_outbound_buffer = (next_outbound_buffer + 1) & 0x3; | ||
} else { | ||
next_outbound_buffer = (next_outbound_buffer + 1) & 0x3; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
else {
if (*fvc_outbound_state[next_outbound_buffer].noc_word_credits ||
*fvc_outbound_state[next_outbound_buffer].sender_words_cleared) {
curr_outbound_buffer = next_outbound_buffer;
}
next_outbound_buffer = (next_outbound_buffer + 1) & 0x3;
}
} | ||
} | ||
|
||
FORCE_INLINE void get_words_pushed() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getter should return something.
e295ef9
to
2cc6974
Compare
if constexpr (fvc_mode == FVC_MODE_ROUTER) { | ||
tt::tt_fabric::chan_id_t my_chan = routing_table->intra_mesh_table.dest_entry[device_id]; | ||
uint32_t my_direction; | ||
if (routing_table->port_direction.directions[eth_chan_directions::EAST] == my_chan) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like there are duplicate code. How about having function
next_port = routing_table->intra_mesh_table.dest_entry[dst_dev_id]; | ||
} | ||
|
||
if (routing_table->port_direction.directions[eth_chan_directions::EAST] == next_port) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dup
next_port = routing_table[routing_plane].intra_mesh_table.dest_entry[dst_dev_id]; | ||
} | ||
|
||
if (routing_table[routing_plane].port_direction.directions[eth_chan_directions::EAST] == next_port) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dup
inline void init(uint32_t buffer_id, uint32_t data_buf_start, uint32_t data_buf_size_words) { | ||
uint32_t words = sizeof(fvc_outbound_push_state) / 4; | ||
uint32_t* ptr = (uint32_t*)this; | ||
for (uint32_t i = 0; i < words; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be removed? This is zeroing the struct but we are reassigning what was just set to zero below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its initially clearing all the structure data to 0.
Later members are initialized to respective values.
Not all the members end up with non-zero values.
inline void init(uint32_t data_buf_start, uint32_t data_buf_size_words) { | ||
uint32_t words = sizeof(fvc_inbound_push_state) / 4; | ||
uint32_t* ptr = (uint32_t*)this; | ||
for (uint32_t i = 0; i < words; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not all structure data members get set later in the init code.
continue; | ||
} | ||
uint32_t forwarding_channel = routing_table->port_direction.directions[i]; | ||
if (forwarding_channel == INVALID_DIRECTION) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to check for this INVALID_DIRECTION
?
If the direction was truly invalid as the name implies, a corrupt value like 123, wouldn't be caught because the forwarding channel is a uint32_t
and it won't match to any of the enum values.
If INVALID_DIRECTION
means to skip this direction then consider renaming it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not part of this commit. So we can update this later.
control plane sets the port directions when its initialized.
In this code, we are only consuming the entries in the routing table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the initializations will eventually move to host code and removed from kernel.
template <uint8_t fvc_mode = FVC_MODE_ROUTER> | ||
void register_with_routers(uint32_t device_id, uint32_t mesh_id = 0) { | ||
if constexpr (fvc_mode == FVC_MODE_ROUTER) { | ||
tt::tt_fabric::chan_id_t my_chan = routing_table->intra_mesh_table.dest_entry[device_id]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where & how is this routing_table initialized?
globally declared and defined by each includer?
tt_metal/fabric/hw/inc/tt_fabric.h
Outdated
STREAM_ID_NOC_WORDS_RECEIVED + my_direction, STREAM_REMOTE_DEST_BUF_SPACE_AVAILABLE_UPDATE_REG_INDEX)); | ||
remote_buffer_start = FABRIC_ROUTER_DATA_BUF_START + my_direction * FABRIC_ROUTER_OUTBOUND_BUF_SIZE; | ||
|
||
for (uint32_t i = 0; i < eth_chan_directions::COUNT; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i = eth_chan_directions::EAST
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, yeah, meant to to that.
if (dst_mesh_id != routing_table->my_mesh_id) { | ||
uint32_t next_port = routing_table->inter_mesh_table.dest_entry[dst_mesh_id]; | ||
remote_wrptr_direction = port_direction_table[next_port]; | ||
return eth_chan_to_noc_xy[noc_index][next_port]; | ||
} else { | ||
uint32_t dst_device_id = packet_word_0[1] >> 16; | ||
uint32_t next_port = routing_table->intra_mesh_table.dest_entry[dst_device_id]; | ||
remote_wrptr_direction = port_direction_table[next_port]; | ||
return eth_chan_to_noc_xy[noc_index][next_port]; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uint32_t next_port;
if (dst_mesh_id != routing_table->my_mesh_id) {
next_port = routing_table->inter_mesh_table.dest_entry[dst_mesh_id];
} else {
uint32_t dst_device_id = packet_word_0[1] >> 16;
next_port = routing_table->intra_mesh_table.dest_entry[dst_device_id];
}
remote_wrptr_direction = port_direction_table[next_port];
return eth_chan_to_noc_xy[noc_index][next_port];
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These if else looks really common. How about having private inlined get_next_port
or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this code hurts performance.
It is nice to look at, but not good for performance.
local scope variable and return allow the code to not maintain register variables across the if/else block.
} | ||
|
||
template <AsyncWriteMode mode = AsyncWriteMode::ALL> | ||
inline void fabric_async_write( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add the client_interface type to the template as well so that we have just one definition of the async write API?
client_interface->buffer_size = FABRIC_ROUTER_OUTBOUND_BUF_SIZE / PACKET_WORD_SIZE_BYTES; | ||
client_interface->wr_ptr = 0; | ||
client_interface->buffer_start = FABRIC_ROUTER_DATA_BUF_START + direction * FABRIC_ROUTER_OUTBOUND_BUF_SIZE; | ||
client_interface->router_push_addr = (STREAM_REG_ADDR( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe not in this PR, but should we add all the STREAM REG handling in a separate utility file? If we have all the handling in a single place we could also keep a track of what registers are in use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we can have it in a separate stream register utility file in a different PR.
Guys, In general, if there are any hazards or missing functionality, please comment on that. |
4dd37e4
to
e5cd1af
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These new tests should work if you remove TT_METAL_SLOW_DISPATCH_MODE=1
and use fast dispatch?
I'll check that. |
7a59971
to
d8ec54a
Compare
Ran successful 1000 run loop for https://github.com/tenstorrent/tt-metal/actions/runs/13724561170 |
d8ec54a
to
c85e0fd
Compare
Use get_noc_addr_helper( ) when making NOC address. Roll sender buffer space register address in to a loop. Add tests to CI Fix bug where ethernet receiver was returning extra credits to ethernet sender. Hook up device init for push router.
c85e0fd
to
192720c
Compare
#18006
Adding support for push buffers in TT-Fabric routers.