-
Notifications
You must be signed in to change notification settings - Fork 907
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OTEP: Span Event API deprecation plan #4430
base: main
Are you sure you want to change the base?
Conversation
e08d0b4
to
0d0204a
Compare
e09c0f3
to
7c9ed89
Compare
7c9ed89
to
db7164d
Compare
Co-authored-by: Liudmila Molkova <limolkova@microsoft.com>
Co-authored-by: Robert Pająk <pellared@hotmail.com>
Co-authored-by: Robert Pająk <pellared@hotmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some nits, but looks good overall. Will wait for un-draft for another review. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @trask. Left some comments.
TLDR of my feedback is that:
- Both the OTEL Collector and SDKs should be able to act as a backward compatibility shim with relatively little configuration or negative impact.
- A decision needs to be made as to what the story is for users that currently use only tracing and have no plans to ingest logs.
Co-authored-by: Robert Pająk <pellared@hotmail.com>
I would argue that OpenTelemetry is not a single, coherent telemetry pipeline today. It's three disparate SDKs with APIs aligned to their telemetry silos, and taking opinions as to which signals should be emitted to which silo. Perhaps a new abstraction above those SDKs would better encapsulate that vision? My instrumentation just emits "events" and where they go is entirely driven by configuration and convention to best fit my backend and use cases? |
This mechanism SHOULD be implemented as follows (see | ||
[prototype](https://github.com/open-telemetry/opentelemetry-java-contrib/blob/80adbe1cf8de647afa32c68f921aef2bbd4dfd71/processors/README.md#event-to-spanevent-bridge)): | ||
|
||
- An SDK-based log processor that converts Event records to Span Events |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you mention the this does require LogSDK (i.e A valid LoggerProvider) to be setup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
depending on the language and whether the language has declarative config, this may/may-not be code changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you mention the this does require LogSDK (i.e A valid LoggerProvider) to be setup.
I'm not sure what this adds to the existing text? maybe you could make a specific suggestion?
depending on the language and whether the language has declarative config, this may/may-not be code changes.
declarative config is covered in the next bullet
|
||
### Via the SDK | ||
|
||
There MUST be a way to send (log-based) Events as Span Events |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suppose I configure this. Then some time later I want to start using log-events that aren't emitted within the context of a span. So I need to configure logs to actually be sent via OTLP. Does this mean I also have to start sending new logs that were previously attached to a span?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean I also have to start sending new logs that were previously attached to a span?
I'm not sure I follow this question, can you elaborate? thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to imagine how one would start using practically start using logs 'normally' without changing what happens to logs that are attached to spans. I guess the answer may be just be to use two different logging providers. #4430 (comment) makes this a bit hard to think about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A agree with @alexmojaki here - there are a lot of logs reported in scope of a span, and only a few (if any) would be reported as span events.
By deprecating Span Events, we remove the ability to say that 'this event belongs to that span, its a good idea to sample and visualize this event along with this span'.
It can be addressed in a few different ways:
- a convention: E.g. if Logger instrumentation scope is the same as Tracer instrumentation scope, these logs and spans came from the same source, it can be used as a signal to attach logs to span.
- explicit API parameter on the
LoggerProvider.getLogger
allowing to request logger that emits this special class of events associated with a span. - explicit API parameter when emitting a log advising that it belongs with a span (it may also be useful on backends to highlight or prioritize showing them along with a span)
- a more complicated convention: span adding a reference to the event via an attribute. E.g. when recording exception, log record is enriched with
exception.id
and the span gets the sameexception.id
attribute. This covers some cases, but not a general solution.
My preference would be to start with 1. I believe that backends that don't support logs is a temporary problem.
Having a convention allows us to solve the temporary problem without introducing anything new. If it turns our that I'm wrong we can always add new APIs at that point.
- In the instrumentation's current major version | ||
- It SHOULD continue to use these | ||
- In the instrumentation's next major version | ||
- It SHOULD stop using these, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should have been the case when modelling before, right?
What's an example of these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing you meant to comment on the line below?
instead recording additional details about spans (that don't need a timestamp) as spans attributes
if so, hopefully this helps: b7d37fd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
API deprecation is clarified here https://github.com/open-telemetry/opentelemetry-specification/blob/main/oteps/0143-versioning-and-stability.md#deprecation.
Code is only marked as deprecated when the replacement becomes stable. Deprecated code still abides by the same support guarantees as stable code. Deprecated APIs remain stable and backwards compatible.
[prototype](https://github.com/open-telemetry/opentelemetry-java-contrib/blob/80adbe1cf8de647afa32c68f921aef2bbd4dfd71/processors/README.md#event-to-spanevent-bridge)): | ||
|
||
- An SDK-based log processor that converts Event records to Span Events | ||
and attaches them to the current span, whose behavior and configuration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't fit well with separately configured non-global tracer/logger providers. Logs will simply attach to the current span regardless of which tracer that span came from. In particular this means that tracers can no longer provide flexible utilities for automatically recording spans on their own. Either tracer providers must be given a logger provider or they must use the global one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In particular this means that tracers can no longer provide flexible utilities for automatically recording spans on their own.
yeah, this can be a nice convenience function, Java has been considering adding something like this, but it's still under incubation: https://github.com/open-telemetry/opentelemetry-java/blob/main/api/incubator/src/main/java/io/opentelemetry/api/incubator/trace/ExtendedSpanBuilder.java#L37-L49
potentially this could be solved by having the Trace SDK depend on the Logs API (so at least we don't need to couple the APIs)
or better yet: open-telemetry/oteps#165 😢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
potentially this could be solved by having the Trace SDK depend on the Logs API (so at least we don't need to couple the APIs)
This sounds like coupling to me, are you just saying at least the Trace API won't depend on the Logs API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't LogRecord
s simply use the span in their provided Context
? IIRC it is from this very Context
that their proto trace_id
and span_id
are populated from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They can. The question is how to create that record in the first place when all you have is a span/tracer. Which logger (provider) do you use? Does it have to be the global one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this otep suggests that someone who used to write
tracer = get_local_or_global_trace_provider().get("test")
with tracer.start_span(...) as span:
...
span.record_exception(ex)
would instead (after a major version bump in that instrumentation) write
tracer = get_local_or_global_trace_provider().get("test")
logger = get_local_or_global_log_provider().get("test")
with tracer.start_span(...) as span:
...
logger.emit(severity=appropriate_severity, event_name="foo_operation_failed", exception=ex, context=get_context(span))
(there is a lot of convenience that should be done around that logger.emit
, but I hope you've got the idea)
and then users who want to return the old behavior back, would configure the log-> span event processor in the SDK (with declarative config or programmatically)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with tracer.start_span
records unhandled exceptions by itself. There wouldn't usually be span.record_exception(ex)
unless you also wanted to record a handled exception, which IIUC is already discouraged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great point, I keep forgetting that python records exceptions by default. I believe it means that python OTel SDK will keep doing span.record_exception(ex)
for many years to come (until there is OTel SDK 2.0).
Tracing SDK should not start logging and we should probably document that tracing SDK should not record exceptions without user asking for it.
@trask do you think it would be useful to mention that when it comes to OTel tracing SDKs, if they ever have v2, they should stop calling into span.RecordException
by default (if they do it today like python) and MUST(?) NOT start calling into logger.emit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not really, I'm saying that the Trace SDK should not do it by default - #4455.
Update: I don't believe this OTEP prevents something like setException
on the span to happen in the future.
This will allow recording exceptions and events using the Logs API, | ||
instead of recording them using the Span Event API. | ||
|
||
2. Mark [Span RecordException](../specification/trace/api.md#record-exception) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the discussion in #4429 (comment), the implication here is that for users who haven't configured logging, exceptions (particularly stacktraces) which were previously recorded on spans will instead not be recorded at all, as code moves away from RecordException
.
Moreover, 'typical' configuration of logs won't attach them to spans, so even when exceptions are recorded they will be moved away from the spans that they ended.
These seem like major decisions that should be explained and justified in this OTEP.
dismissing existing reviews due to significant changes
Dismiss my own approval due to the non-trivial changes after I approved.
Based on number of comments in related PRs and issues, please go to the "Files changed" tab and add your comment somewhere there (instead of leaving the comment directly on the conversation), so that we can have threaded discussions. Thank you ❤️.
Related to #4333, #4393, #4429, #4414