Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

specs - profiling integration: Make host.id in registration message o… #900

Merged
merged 4 commits into from
Jan 3, 2025

Conversation

florianl
Copy link
Member

@florianl florianl commented Dec 12, 2024

…ptional

host.id is a not well and uniquely defined attribute, see open-telemetry/semantic-conventions#581 for example. In particular on containerized environments profiling agents do see a different host.id than APM-agents, which makes it harder to correlate information. To being able to correlate profiling and APM information, container.id was identified to fit the use case best. As profiling as well as APM agents already collect and send out container.id with their respective data. For non containerized environment host.id still can be used and in such a use cases profiling agents and APM-agents should have the same understanding of host.id.

For backwards compatibility reasons just make the argument for host-id in the registration message optional.

  • Create PR as draft
  • Approval by at least one other agent
  • Mark as Ready for Review (automatically requests reviews from all agents and PM via CODEOWNERS)
    • Remove PM from reviewers if impact on product is negligible
    • Remove agents from reviewers if the change is not relevant for them
  • Merge after 2 business days passed without objections
    To auto-merge the PR, add /schedule 2024-12-20 to the PR description.

…ptional

`host.id` is a not well and uniquely defined attribute, see open-telemetry/semantic-conventions#581 for example. In particular on containerized environments profiling agents do see a different `host.id` than APM-agents, which makes it harder to correlate information.
To being able to correlate profiling and APM information, `container.id` was identified to fit the use case best. As profiling as well as APM agents already collect and send out `container.id` with their respective data. For non containerized environment `host.id` still can be used and in such a use cases profiling agents and APM-agents should have the same understanding of `host.id`.

For backwards compatibility reasons just make the argument for `host-id` in the registration message optional.
* *samples-delay-ms*: A sane upper bound of the usual time taken in milliseconds by the profiling host agent between the collection of a stacktrace and it being written to the apm-agent via the [messaging socket](#cpu-profiler-trace-correlation-message). The APM-agent will assume that all profiling data related to a span has been written to the socket if a span ended at least the provided duration ago. Note that this value doesn't need to be a hard a guarantee, but it should be the 99% case so that profiling data isn't distorted in the expected case.
* *host-id*: The [`host.id` resource attribute](https://opentelemetry.io/docs/specs/semconv/attributes-registry/host/) used for the profiling data by this profiling host agent. If an APM-agent is already sending a `host.id` it MUST print a warning if the `host.id` is different and otherwise ignore the value received by the host agent. A mismatch will lead to certain correlation features (e.g. cost and CO2 consumption) not working. If an agent does not collect the `host.id` by itself, it MUST start sending the `host.id` after receiving it from the profiler host agent to ensure aforementioned correlation features work correctly.
* *samples-delay-ms*: A sane upper bound of the usual time taken in milliseconds by the profiling agent between the collection of a stacktrace and it being written to the apm-agent via the [messaging socket](#cpu-profiler-trace-correlation-message). The APM-agent will assume that all profiling data related to a span has been written to the socket if a span ended at least the provided duration ago. Note that this value doesn't need to be a hard a guarantee, but it should be the 99% case so that profiling data isn't distorted in the expected case.
* *host-id*: The [`host.id` resource attribute](https://opentelemetry.io/docs/specs/semconv/attributes-registry/host/) is an optional argument used to correlate profiling data by the profiling agent. If an APM-agent is already sending a `host.id` it MUST print a warning if the `host.id` is different and otherwise ignore the value received by the profiling agent. A mismatch will lead to certain correlation features (e.g. cost and CO2 consumption) not working. If an APM-agent does not collect the `host.id` by itself, it MUST start sending the `host.id` after receiving it from the profiling agent to ensure aforementioned correlation features work correctly.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When host.id is optional and no longer used for correlation, is logging a message still required (MUST)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the porposed optional host.id is sent by the profiling agent and APM agent is already sending a host.id and both host.IDs are different, then yes APM agent MUST log a warning. Event if the argument is optional, a collision of host.id values should be detected and trigger a warning.

specs/agents/universal-profiling-integration.md Outdated Show resolved Hide resolved
specs/agents/universal-profiling-integration.md Outdated Show resolved Hide resolved
florianl and others added 2 commits December 12, 2024 11:26
Co-authored-by: Jonas Kunz <jonas.kunz@elastic.co>
Co-authored-by: Jonas Kunz <jonas.kunz@elastic.co>
@florianl florianl marked this pull request as ready for review December 12, 2024 10:28
@florianl florianl requested review from a team as code owners December 12, 2024 10:28
@florianl florianl removed request for a team December 19, 2024 08:27
@florianl
Copy link
Member Author

/schedule 2024-12-20

Comment on lines +182 to +183
Whenever the profiling agent starts communicating for the first time with a process running an APM Agent, it MUST send this message.
This message is used to let the APM-agent know that a profiler is actually active on the current host. Note that an APM-agent may receive this message zero, one or several times: this may happen if no profiling agent is active, if one is active or if a profiling agent is restarted during the lifetime of the APM-agent respectively.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency, alternatively feel free to use APM-Agent everywhere

Suggested change
Whenever the profiling agent starts communicating for the first time with a process running an APM Agent, it MUST send this message.
This message is used to let the APM-agent know that a profiler is actually active on the current host. Note that an APM-agent may receive this message zero, one or several times: this may happen if no profiling agent is active, if one is active or if a profiling agent is restarted during the lifetime of the APM-agent respectively.
Whenever the profiling agent starts communicating for the first time with a process running an APM Agent, it MUST send this message.
This message is used to let the APM agent know that a profiler is actually active on the current host. Note that an APM agent may not receive this message or receive it once or several times: this may happen if no profiling agent is active, if one is active or if a profiling agent is restarted during the lifetime of the APM agent respectively.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The specification uses a mix of apm-agent, APM agent and APM-agent. Streamlining wording for the whole specification just for the part that got changes with this PR feels not right. So I will keep the current mix of wording to not mix the streamlining with the purpose of this change.

specs/agents/universal-profiling-integration.md Outdated Show resolved Hide resolved
* *samples-delay-ms*: A sane upper bound of the usual time taken in milliseconds by the profiling host agent between the collection of a stacktrace and it being written to the apm-agent via the [messaging socket](#cpu-profiler-trace-correlation-message). The APM-agent will assume that all profiling data related to a span has been written to the socket if a span ended at least the provided duration ago. Note that this value doesn't need to be a hard a guarantee, but it should be the 99% case so that profiling data isn't distorted in the expected case.
* *host-id*: The [`host.id` resource attribute](https://opentelemetry.io/docs/specs/semconv/attributes-registry/host/) used for the profiling data by this profiling host agent. If an APM-agent is already sending a `host.id` it MUST print a warning if the `host.id` is different and otherwise ignore the value received by the host agent. A mismatch will lead to certain correlation features (e.g. cost and CO2 consumption) not working. If an agent does not collect the `host.id` by itself, it MUST start sending the `host.id` after receiving it from the profiler host agent to ensure aforementioned correlation features work correctly.
* *samples-delay-ms*: A sane upper bound of the usual time taken in milliseconds by the profiling agent between the collection of a stacktrace and it being written to the apm-agent via the [messaging socket](#cpu-profiler-trace-correlation-message). The APM-agent will assume that all profiling data related to a span has been written to the socket if a span ended at least the provided duration ago. Note that this value doesn't need to be a hard a guarantee, but it should be the 99% case so that profiling data isn't distorted in the expected case.
* *host-id*: The [`host.id` resource attribute](https://opentelemetry.io/docs/specs/semconv/attributes-registry/host/) is an optional argument (the string may have a length of zero) used to correlate profiling data by the profiling agent. If an APM-agent is already sending a `host.id` it MUST print a warning if the `host.id` is different and otherwise ignore the value received by the profiling agent. A mismatch will lead to certain correlation features (e.g. cost and CO2 consumption) not working. If an APM-agent does not collect the `host.id` by itself, it MUST start sending the `host.id` after receiving a non-empty `host.id` from the profiling agent to ensure aforementioned correlation features work correctly.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* *host-id*: The [`host.id` resource attribute](https://opentelemetry.io/docs/specs/semconv/attributes-registry/host/) is an optional argument (the string may have a length of zero) used to correlate profiling data by the profiling agent. If an APM-agent is already sending a `host.id` it MUST print a warning if the `host.id` is different and otherwise ignore the value received by the profiling agent. A mismatch will lead to certain correlation features (e.g. cost and CO2 consumption) not working. If an APM-agent does not collect the `host.id` by itself, it MUST start sending the `host.id` after receiving a non-empty `host.id` from the profiling agent to ensure aforementioned correlation features work correctly.
* *host-id*: The [`host.id` resource attribute](https://opentelemetry.io/docs/specs/semconv/attributes-registry/host/) is an optional argument (the string may have a length of zero) used to correlate profiling data by the profiling agent. If an APM agent is already sending a `host.id` it MUST print a warning if the `host.id` is different and otherwise ignore the value received by the profiling agent. A mismatch will lead to certain correlation features (e.g. cost and CO2 consumption) not working. If an APM agent does not collect the `host.id` by itself, it MUST start sending the `host.id` after receiving a non-empty `host.id` from the profiling agent to ensure aforementioned correlation features work correctly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not including this change for the same reason as with https://github.com/elastic/apm/pull/900/files#r1901586158.

Co-authored-by: Christos Kalkanis <christos.kalkanis@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants