-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kernel profiler based NoC event tracing #16541
base: main
Are you sure you want to change the base?
Conversation
ecd2252
to
bd1dd4d
Compare
bd1dd4d
to
cc919af
Compare
cc919af
to
5d3f02b
Compare
5d3f02b
to
0b788f8
Compare
0b788f8
to
3d569a8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clean up on logging functions.
Could you please make sure the following are green:
Device Perf: https://github.com/tenstorrent/tt-metal/actions/workflows/perf-device-models.yaml
uBenchmark:https://github.com/tenstorrent/tt-metal/actions/workflows/metal-run-microbenchmarks.yaml
multi device (T3K) profiling: https://github.com/tenstorrent/tt-metal/actions/workflows/t3000-profiler-tests.yaml
@@ -69,7 +69,12 @@ constexpr static std::uint32_t PROFILER_L1_BUFFER_SIZE = PROFILER_L1_VECTOR_SIZE | |||
|
|||
} // namespace kernel_profiler | |||
|
|||
#if defined(TRACY_ENABLE) && defined(DEVICE_PROFILER_OP_SUPPORT_COUNT_OVERRIDE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice !! Thanks.
if constexpr (requires { device_operation_t::get_type_name(operation_attributes); }) { | ||
opName = device_operation_t::get_type_name(operation_attributes); | ||
} | ||
runtime_id_to_opname.emplace({device_id, program.get_runtime_id()}, opName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depending on how get_type_name
is implemented for the op, we might acquire considerable perf hit on cached runs.
Can we push the emplace to right before the return and lookup opName
from cached_ops
. It will be filled by the end of this function. We do need a local return variable this way.
} | ||
|
||
void DeviceProfiler::logPacketDataToCSV( | ||
const IDevice* device, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if getPhysicalAddressFromVirtual
takes device_id instead of IDevice*. We don't need to pass IDevice down from readRiscProfilerResults
Problem Description
This PR adds support for recording detailed traces of all NoC activity initiated by Tensix worker cores. This data is then written to a JSON format file. These JSON traces can be either analyzed directly (as a sort of log file) or consumed by either flows (e.g. used to verify software noc performance estimator).
What's Changed
NoC tracing builds on top of the timestamped data packets (
PacketType::TS_DATA
) feature in the kernel profiler.NoC tracing can be enabled on top of normal kernel profiling by setting the ENV variable
TT_METAL_DEVICE_PROFILER_NOC_EVENTS=1
.A timestamped packet is recorded on each worker core for each call to
dataflow_api.h:noc_async_*
functions. All arguments to thedataflow_api.h
function are bit-packed into theTS_DATA
payload. This is done by adding new stripped-by-default macros to each noc_async* call of interest indataflow_api.h
.For situations where it is impossible/undesirable for noc_async_* calls to be profiled, template arguments have been added to all effected sites so that noc tracing can be disabled at the call site. This is done within the kernel profiler itself to avoid a circular dependency.
Concerns
kernel_profiler::quick_push()
and supporting code has been modified to compile for BRISC and NCRISC, even if dispatch isfalse
.kernel_profiler::timestampedData()
has been modified to check ifbufferHasRoom
; if there is no room,quick_push()
is automatically called to flush the buffer to make space.As far as I can tell (informal testing), regular kernel profiling still works as expected (with or without noc event profiling being enabled).
Despite the overlap with existing profiler-based features, there is meaningful differentiation here. Capturing all of the details of each NoC transaction (packet size, virtual channel, timestamp) is useful (probably to many different teams) and currently not accomplished by other methods.
Checklist
origin/main