You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently the PFC_START/END macros have a fixed behavior of reading exactly the first 7 counters (3 fixed, 4 programmable).
Sometimes you only want to read 1 or 2 counters, or sometimes you want to read up to 11 counters (e.g., because I have hyperthreading off, I actually have all 8 counters available to a single thread).
Probably with some macro-programming tricks we can generate the appropriate code to read different numbers of counters. Not particularly urgent (if necessary one can always add the specific variant desired via copy-paste).
The text was updated successfully, but these errors were encountered:
Yes, this might be valuable... The reason I always read 3+4 counters in a specific order and with no dynamic customization was to maximize the predictability of the PFC_START/PFC_END macros and minimize the number of counters that they bump (e.g. branch mispredict), so that pfcRemoveBias() would work well (and work well it does, in general).
I recently refactored those macros, so if it were somehow possible to generate the inline asm start/end/biasremove blobs you needed, that would be great. Could also consider JITting that code at the price of a small call/ret overhead.
Indeed, but creating the more specific versions shouldn't cause any additional variance, since you'd also generate the specific version of pfcRemoveBias in the same way. This all happening at compile-time, so it shouldn't change the predictability at all (since the macros are purely a compile time construct and get injected directly at each call site in their entirety).
Currently the
PFC_START
/END
macros have a fixed behavior of reading exactly the first 7 counters (3 fixed, 4 programmable).Sometimes you only want to read 1 or 2 counters, or sometimes you want to read up to 11 counters (e.g., because I have hyperthreading off, I actually have all 8 counters available to a single thread).
Probably with some macro-programming tricks we can generate the appropriate code to read different numbers of counters. Not particularly urgent (if necessary one can always add the specific variant desired via copy-paste).
The text was updated successfully, but these errors were encountered: