Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
!wip feat(tracing): add nvtx provider
Hook nvtx_push()/nvtx_pop() on existing lttng macros. WIP: Currently, this builds correctly but is untested. Will remain WIP until we figure out how to structure this in a way that aligns the required usages of NCCL_OFI_TRACE_POP() for nvtx with cases like NCCL_OFI_TRACE_SEND_WRITE_SEG COMPLETE/START. Probably, we should just redo all the lttng macros so that they all wrap a workload, rather than today where the majority just signal that an event took place. I would also like to support a separate course-grained type of probe definition within this module. lttng and nvtx are best-suited for fine-grained/range-based eventing around program behavior (not quite what we have today, but where we want to get: things like wrapping an entire event and/or supplying rich metadata around that event.) For this, we need to support: 1. NVTX because of the ecosystem this plugin exists in. 2. Something that's cheaper than userspace uprobe (see bpftime below) and in-process or nearly so. Some candidates: perfetto, redoing the existing lttng support, etc. Separate from this, we should also support builds with course entry/exit USDT probes for basically all nontrivial functions. This can be a lot more useful for development and for building debug tools. Some tooling that this would enable: + very generic and allows for cross-dependency analysis + https://github.com/eunomia-bpf/bpftime + bpftrace or bcc makes this cheap + certain `linux perf` calls can benefit from this. + potential to profile kernel via kprobes in the same script. + offcpu analysis These are just nop sleds and have zero runtime overhead; so they can be enabled on default/release builds. (See: [1] for how others use this) It's surprisingly difficult to do this in a way that does not require code changes. Can potentially do this with a small out-of-tree llvm pass (and/or a gcc equivilent, see "gcc python plugin" on github) that piggy-backs on -finstrument-functions's __cyg_profile_func_exit and __cyg_profile_func_entry calls. Putting the USDT probe in the __cyg_profile_func_exit impl itself is not viable. Need to dig more. [1]: https://www.brendangregg.com/Slides/reInvent2019_BPF_Performance_Analysis/
- Loading branch information