Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OTEP: Span Event API deprecation plan #4430

Open
wants to merge 55 commits into
base: main
Choose a base branch
from

Conversation

trask
Copy link
Member

@trask trask commented Feb 24, 2025

Based on number of comments in related PRs and issues, please go to the "Files changed" tab and add your comment somewhere there (instead of leaving the comment directly on the conversation), so that we can have threaded discussions. Thank you ❤️.

Related to #4333, #4393, #4429, #4414

@trask trask force-pushed the span-event-migration-plan branch from e08d0b4 to 0d0204a Compare February 24, 2025 21:04
@trask trask changed the title Span event migration plan Span event deprecation plan Feb 24, 2025
@trask trask force-pushed the span-event-migration-plan branch 3 times, most recently from e09c0f3 to 7c9ed89 Compare February 24, 2025 21:13
@trask trask force-pushed the span-event-migration-plan branch from 7c9ed89 to db7164d Compare February 24, 2025 21:16
@trask trask added the OTEP OpenTelemetry Enhancement Proposal (OTEP) label Feb 24, 2025
@trask trask changed the title Span event deprecation plan OTEP: Span event deprecation plan Feb 24, 2025
trask and others added 2 commits February 25, 2025 07:42
Co-authored-by: Liudmila Molkova <limolkova@microsoft.com>
Co-authored-by: Robert Pająk <pellared@hotmail.com>
Co-authored-by: Robert Pająk <pellared@hotmail.com>
Copy link
Member

@cijothomas cijothomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some nits, but looks good overall. Will wait for un-draft for another review. Thanks!

Copy link

@adriangb adriangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @trask. Left some comments.

TLDR of my feedback is that:

  • Both the OTEL Collector and SDKs should be able to act as a backward compatibility shim with relatively little configuration or negative impact.
  • A decision needs to be made as to what the story is for users that currently use only tracing and have no plans to ingest logs.

Co-authored-by: Robert Pająk <pellared@hotmail.com>
@HaloFour
Copy link

The vision of OpenTelemetry has always been to unify multiple streams of data into a single, coherent telemetry pipeline. This model was established in the documentation and specifications 5-6 years ago. However, some use cases are now being prioritized based on specific backend systems and query languages, which deviates from the original vision.

I would argue that OpenTelemetry is not a single, coherent telemetry pipeline today. It's three disparate SDKs with APIs aligned to their telemetry silos, and taking opinions as to which signals should be emitted to which silo. Perhaps a new abstraction above those SDKs would better encapsulate that vision? My instrumentation just emits "events" and where they go is entirely driven by configuration and convention to best fit my backend and use cases?

This mechanism SHOULD be implemented as follows (see
[prototype](https://github.com/open-telemetry/opentelemetry-java-contrib/blob/80adbe1cf8de647afa32c68f921aef2bbd4dfd71/processors/README.md#event-to-spanevent-bridge)):

- An SDK-based log processor that converts Event records to Span Events
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you mention the this does require LogSDK (i.e A valid LoggerProvider) to be setup.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

depending on the language and whether the language has declarative config, this may/may-not be code changes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you mention the this does require LogSDK (i.e A valid LoggerProvider) to be setup.

I'm not sure what this adds to the existing text? maybe you could make a specific suggestion?

depending on the language and whether the language has declarative config, this may/may-not be code changes.

declarative config is covered in the next bullet


### Via the SDK

There MUST be a way to send (log-based) Events as Span Events

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suppose I configure this. Then some time later I want to start using log-events that aren't emitted within the context of a span. So I need to configure logs to actually be sent via OTLP. Does this mean I also have to start sending new logs that were previously attached to a span?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean I also have to start sending new logs that were previously attached to a span?

I'm not sure I follow this question, can you elaborate? thanks

Copy link

@alexmojaki alexmojaki Mar 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to imagine how one would start using practically start using logs 'normally' without changing what happens to logs that are attached to spans. I guess the answer may be just be to use two different logging providers. #4430 (comment) makes this a bit hard to think about.

Copy link
Contributor

@lmolkova lmolkova Mar 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A agree with @alexmojaki here - there are a lot of logs reported in scope of a span, and only a few (if any) would be reported as span events.

By deprecating Span Events, we remove the ability to say that 'this event belongs to that span, its a good idea to sample and visualize this event along with this span'.

It can be addressed in a few different ways:

  1. a convention: E.g. if Logger instrumentation scope is the same as Tracer instrumentation scope, these logs and spans came from the same source, it can be used as a signal to attach logs to span.
  2. explicit API parameter on the LoggerProvider.getLogger allowing to request logger that emits this special class of events associated with a span.
  3. explicit API parameter when emitting a log advising that it belongs with a span (it may also be useful on backends to highlight or prioritize showing them along with a span)
  4. a more complicated convention: span adding a reference to the event via an attribute. E.g. when recording exception, log record is enriched with exception.id and the span gets the same exception.id attribute. This covers some cases, but not a general solution.

My preference would be to start with 1. I believe that backends that don't support logs is a temporary problem.

Having a convention allows us to solve the temporary problem without introducing anything new. If it turns our that I'm wrong we can always add new APIs at that point.

- In the instrumentation's current major version
- It SHOULD continue to use these
- In the instrumentation's next major version
- It SHOULD stop using these,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should have been the case when modelling before, right?

What's an example of these?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing you meant to comment on the line below?

instead recording additional details about spans (that don't need a timestamp) as spans attributes

if so, hopefully this helps: b7d37fd

reyang
reyang previously approved these changes Mar 19, 2025
Copy link
Member

@reyang reyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

API deprecation is clarified here https://github.com/open-telemetry/opentelemetry-specification/blob/main/oteps/0143-versioning-and-stability.md#deprecation.

Code is only marked as deprecated when the replacement becomes stable. Deprecated code still abides by the same support guarantees as stable code. Deprecated APIs remain stable and backwards compatible.

[prototype](https://github.com/open-telemetry/opentelemetry-java-contrib/blob/80adbe1cf8de647afa32c68f921aef2bbd4dfd71/processors/README.md#event-to-spanevent-bridge)):

- An SDK-based log processor that converts Event records to Span Events
and attaches them to the current span, whose behavior and configuration

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't fit well with separately configured non-global tracer/logger providers. Logs will simply attach to the current span regardless of which tracer that span came from. In particular this means that tracers can no longer provide flexible utilities for automatically recording spans on their own. Either tracer providers must be given a logger provider or they must use the global one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In particular this means that tracers can no longer provide flexible utilities for automatically recording spans on their own.

yeah, this can be a nice convenience function, Java has been considering adding something like this, but it's still under incubation: https://github.com/open-telemetry/opentelemetry-java/blob/main/api/incubator/src/main/java/io/opentelemetry/api/incubator/trace/ExtendedSpanBuilder.java#L37-L49

potentially this could be solved by having the Trace SDK depend on the Logs API (so at least we don't need to couple the APIs)

or better yet: open-telemetry/oteps#165 😢

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

potentially this could be solved by having the Trace SDK depend on the Logs API (so at least we don't need to couple the APIs)

This sounds like coupling to me, are you just saying at least the Trace API won't depend on the Logs API?

Copy link
Contributor

@carlosalberto carlosalberto Mar 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't LogRecords simply use the span in their provided Context? IIRC it is from this very Context that their proto trace_id and span_id are populated from.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They can. The question is how to create that record in the first place when all you have is a span/tracer. Which logger (provider) do you use? Does it have to be the global one?

Copy link
Contributor

@lmolkova lmolkova Mar 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this otep suggests that someone who used to write

tracer = get_local_or_global_trace_provider().get("test")

with tracer.start_span(...) as span:
    ...
    span.record_exception(ex)

would instead (after a major version bump in that instrumentation) write

tracer = get_local_or_global_trace_provider().get("test")
logger = get_local_or_global_log_provider().get("test")

with tracer.start_span(...) as span:
    ...
    logger.emit(severity=appropriate_severity, event_name="foo_operation_failed", exception=ex, context=get_context(span))

(there is a lot of convenience that should be done around that logger.emit, but I hope you've got the idea)

and then users who want to return the old behavior back, would configure the log-> span event processor in the SDK (with declarative config or programmatically)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with tracer.start_span records unhandled exceptions by itself. There wouldn't usually be span.record_exception(ex) unless you also wanted to record a handled exception, which IIUC is already discouraged.

Copy link
Contributor

@lmolkova lmolkova Mar 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point, I keep forgetting that python records exceptions by default. I believe it means that python OTel SDK will keep doing span.record_exception(ex) for many years to come (until there is OTel SDK 2.0).
Tracing SDK should not start logging and we should probably document that tracing SDK should not record exceptions without user asking for it.

@trask do you think it would be useful to mention that when it comes to OTel tracing SDKs, if they ever have v2, they should stop calling into span.RecordException by default (if they do it today like python) and MUST(?) NOT start calling into logger.emit

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lmolkova combined with the resistance to adopt an alternative like #4429 what I'm hearing you saying is that conveniences which automatically record span-ending exceptions anywhere (whether in attributes, span events, or logs) are undesirable. Am I misunderstanding?

Copy link
Contributor

@lmolkova lmolkova Mar 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not really, I'm saying that the Trace SDK should not do it by default - #4455.

Update: I don't believe this OTEP prevents something like setException on the span to happen in the future.

This will allow recording exceptions and events using the Logs API,
instead of recording them using the Span Event API.

2. Mark [Span RecordException](../specification/trace/api.md#record-exception)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the discussion in #4429 (comment), the implication here is that for users who haven't configured logging, exceptions (particularly stacktraces) which were previously recorded on spans will instead not be recorded at all, as code moves away from RecordException.

Moreover, 'typical' configuration of logs won't attach them to spans, so even when exceptions are recorded they will be moved away from the spans that they ended.

These seem like major decisions that should be explained and justified in this OTEP.

@trask trask dismissed stale reviews from jack-berg, pellared, austinlparker, alexmojaki, and lmolkova March 19, 2025 20:22

dismissing existing reviews due to significant changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog.opentelemetry.io OTEP OpenTelemetry Enhancement Proposal (OTEP)
Projects
None yet
Development

Successfully merging this pull request may close these issues.