Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement: Allow reading less than "max" counters #13

Open
travisdowns opened this issue Aug 5, 2017 · 2 comments
Open

Enhancement: Allow reading less than "max" counters #13

travisdowns opened this issue Aug 5, 2017 · 2 comments

Comments

@travisdowns
Copy link
Contributor

Currently the PFC_START/END macros have a fixed behavior of reading exactly the first 7 counters (3 fixed, 4 programmable).

Sometimes you only want to read 1 or 2 counters, or sometimes you want to read up to 11 counters (e.g., because I have hyperthreading off, I actually have all 8 counters available to a single thread).

Probably with some macro-programming tricks we can generate the appropriate code to read different numbers of counters. Not particularly urgent (if necessary one can always add the specific variant desired via copy-paste).

@obilaniu
Copy link
Owner

obilaniu commented Aug 5, 2017

Yes, this might be valuable... The reason I always read 3+4 counters in a specific order and with no dynamic customization was to maximize the predictability of the PFC_START/PFC_END macros and minimize the number of counters that they bump (e.g. branch mispredict), so that pfcRemoveBias() would work well (and work well it does, in general).

I recently refactored those macros, so if it were somehow possible to generate the inline asm start/end/biasremove blobs you needed, that would be great. Could also consider JITting that code at the price of a small call/ret overhead.

@travisdowns
Copy link
Contributor Author

Indeed, but creating the more specific versions shouldn't cause any additional variance, since you'd also generate the specific version of pfcRemoveBias in the same way. This all happening at compile-time, so it shouldn't change the predictability at all (since the macros are purely a compile time construct and get injected directly at each call site in their entirety).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants