Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cpu/windows] Switch to performance counters #192

Merged
merged 30 commits into from
Dec 10, 2024

Conversation

VihasMakwana
Copy link
Contributor

@VihasMakwana VihasMakwana commented Nov 18, 2024

Add conditional switching to performance counters for cpu/core metricset collections
This PR add an option WithPerformanceCounter, which uses performance counter for CPU data collection, when enabled.
Default behaviour will be the existing implementation.

How to use performance counters?

It's simple. Something like following:

cpu, _ := metrics.New(WithPerformanceCounter())
data, err := cpu.Fetch()

Why performance counters?

From elastic/beats#40926,

On Windows, Metricbeat measures CPU use via the Windows API call GetSystemTimes. Each metrics interval, it fetches the CPU numbers, and compares them to the previous measurement to determine CPU load during that interval. On most systems this includes CPU time "including all threads in all processes, on all processors". However, on systems with more than 64 cores, it returns only the data for the current processor group of up to 64 cores.

This is a major limitation for such systems and it leads to inconsistent data.

Testing

Test results are reported on the linked issue.

Fixes elastic/beats#40926

Note:

WithPerformanceCounter option is only effective for windows and is ineffective if used by other OS'.

@VihasMakwana VihasMakwana force-pushed the pdh-windows-integrated branch 6 times, most recently from fd55440 to 7e067ac Compare November 18, 2024 08:05
go.mod Outdated
@@ -4,7 +4,7 @@ go 1.22.8

require (
github.com/docker/docker v26.1.5+incompatible
github.com/elastic/elastic-agent-libs v0.9.13
github.com/elastic/elastic-agent-libs v0.17.3-0.20241112062438-5ba501c7ca8c
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be updated once elastic/elastic-agent-libs#252 is merged and released

@VihasMakwana VihasMakwana force-pushed the pdh-windows-integrated branch from 7e067ac to fdb61e0 Compare November 18, 2024 08:07
@VihasMakwana VihasMakwana changed the title [DRAFT] Perf counters windows integrated [cpu/windows] Switch to performance counters Nov 22, 2024
@VihasMakwana VihasMakwana marked this pull request as ready for review November 22, 2024 11:38
@VihasMakwana VihasMakwana requested a review from a team as a code owner November 22, 2024 11:38
@VihasMakwana VihasMakwana requested review from mauri870 and khushijain21 and removed request for a team November 22, 2024 11:38
@VihasMakwana VihasMakwana self-assigned this Nov 22, 2024
@VihasMakwana VihasMakwana added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label Nov 22, 2024
@VihasMakwana VihasMakwana force-pushed the pdh-windows-integrated branch 2 times, most recently from 442bd63 to b9277da Compare November 22, 2024 11:40
@VihasMakwana VihasMakwana force-pushed the pdh-windows-integrated branch from b9277da to 695ac10 Compare November 22, 2024 11:41
@VihasMakwana
Copy link
Contributor Author

PR is up for review!

@VihasMakwana VihasMakwana requested a review from cmacknz November 27, 2024 18:17

type OptionFunc func(*option)

func WithPerformanceCounter() OptionFunc {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume performance counters is just a windows thing? If so, we could add a comment at the top of this function that states that it only applies to windows.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. That's better

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting Windows in the option name would be even more obvious.

Also, consider that we are probably going to want to make this the default once we are satisfied with it. Probably just deleting the option would work once we do that but something to consider for the near future.

Copy link
Member

@mauri870 mauri870 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed again, LGTM. I'm learning quite a bit about Windows internals through your PR chain.

metric/cpu/metrics_windows.go Outdated Show resolved Hide resolved
VihasMakwana added a commit that referenced this pull request Dec 3, 2024
This PR is a split from
#192.
It aims to simplify the query building for performance counters.

Using a global variable might create problems if multiple modules (`cpu`
and `core`) are trying to initialise it at a same time.
`sync.Once` adds unnecessary CPU overhead (makes it 4x slower). 

Instead, we can initialise the performance counters in `New(...)` and
return error to the caller (i.e. metricbeat).
@VihasMakwana VihasMakwana requested a review from cmacknz December 5, 2024 13:49
Copy link
Contributor

@leehinman leehinman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, optional refactor.

Comment on lines 42 to 44
if m.query == nil {
return defaultGet()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit. Just to improve readability and testing. I think it would be nice if Get was something like:

func Get(m *Monitor) (CPUMetrics, error) {
    if m.query == nil {
        return getUsingSystemTimes()
    }
    return getUsingPerformanceCounters(m)
}

That way it is explicit that the only thing Get does is pick which implementation to use, and it is easy to test the 2 implementations independently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CPU metrics are incorrect on Windows machines with more than 64 cores
5 participants