-
Notifications
You must be signed in to change notification settings - Fork 380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add meta_struct support on traces and spans #4334
base: master
Are you sure you want to change the base?
Conversation
b3dc5c2
to
0836e11
Compare
Datadog ReportBranch report: ✅ 0 Failed, 22095 Passed, 1476 Skipped, 5m 3.14s Total Time |
BenchmarksBenchmark execution time: 2025-02-06 12:54:53 Comparing candidate commit 65dd37b in PR branch Found 0 performance improvements and 4 performance regressions! Performance is the same for 27 metrics, 2 unstable metrics. scenario:tracing - 1 span trace - no writer
scenario:tracing - 10 span trace - no network
scenario:tracing - 10 span trace - no writer
scenario:tracing - 100 span trace - no writer
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #4334 +/- ##
==========================================
- Coverage 97.72% 97.71% -0.02%
==========================================
Files 1368 1370 +2
Lines 82998 83106 +108
Branches 4220 4227 +7
==========================================
+ Hits 81113 81209 +96
- Misses 1885 1897 +12 ☔ View full report in Codecov by Sentry. |
faf7819
to
fcda7ca
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work 👏🏼
I think we can run a round of light improvements over the naming and interfaces, rest looks solid!
# @public_api | ||
module MetaStruct | ||
def set_meta_struct(meta_struct) | ||
self.meta_struct.merge!(meta_struct) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the name prefix set_
implies that any data prior to that call will be overwritten.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -113,6 +113,9 @@ def to_msgpack(packer = nil) | |||
packer.write(span.meta) | |||
packer.write('metrics') | |||
packer.write(span.metrics) | |||
packer.write('meta_struct') | |||
# We encapsulate the resulting msgpack in a binary msgpack | |||
packer.write(span.meta_struct.transform_values(&:to_msgpack)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to encapsulate, then the logic of the values transformation should not be exposed, instead to_msgpack
should be called on entire meta_struct
packer.write(span.meta_struct.to_msgpack)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, to_msgpack
is implicitly called by packer.write
, if present.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But packer.write(span.meta_struct.transform_values(&:to_msgpack))
is ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, to_msgpack is implicitly called by packer.write but we need to call it a second time, so that the values are encoded as byte array. Or else they would be maps or arrays, but the backend wants byte arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe my comment is not descriptive enough. We encapsulate the values, not the whole hash, in a message pack (which is why we use transform values) which will give a String starting with something like \xDE or \xDF. Then this String will be converted to byte array because of packer.write. span.metastruct
is just a Hash, so using span.metastruct.to_msgpack
would give the wrong result as we only want to convert the values to message pack, not the keys. Do you think it would be preferable to use an external method, like convert_values_to_msgpack(span.metastruct)
?
sig/datadog/tracing/span.rbs
Outdated
@@ -6,13 +6,14 @@ module Datadog | |||
attr_accessor end_time: (Time | nil) | |||
attr_accessor id: Integer | |||
attr_accessor meta: Hash[String, String] | |||
attr_accessor meta_struct: Hash[String, untyped] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That should be a type alias inside the MetaStruct
rbs instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in f36e147
@@ -23,7 +23,7 @@ module Datadog | |||
attr_writer sampled: untyped | |||
attr_writer service: untyped | |||
|
|||
def initialize: (?agent_sample_rate: untyped?, ?events: untyped?, ?hostname: untyped?, ?id: untyped?, ?max_length: untyped, ?name: untyped?, ?origin: untyped?, ?parent_span_id: untyped?, ?rate_limiter_rate: untyped?, ?resource: untyped?, ?rule_sample_rate: untyped?, ?sample_rate: untyped?, ?sampled: untyped?, ?sampling_priority: untyped?, ?service: untyped?, ?tags: untyped?, ?metrics: untyped?, ?remote_parent: untyped?) -> void | |||
def initialize: (?agent_sample_rate: untyped?, ?events: untyped?, ?hostname: untyped?, ?id: untyped?, ?max_length: untyped, ?name: untyped?, ?origin: untyped?, ?parent_span_id: untyped?, ?rate_limiter_rate: untyped?, ?resource: untyped?, ?rule_sample_rate: untyped?, ?sample_rate: untyped?, ?sampled: untyped?, ?sampling_priority: untyped?, ?service: untyped?, ?tags: untyped?, ?meta_struct: untyped?, ?metrics: untyped?, ?remote_parent: untyped?) -> void |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for the new class you can use a type alias from MetaStruct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in f36e147
self.meta_struct.merge!(meta_struct) | ||
end | ||
|
||
def get_meta_struct(key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now sure about naming, but if I call get_meta_struct
I would expect the whole meta-struct object in return. I think we should somehow highlight that it's a part inside the meta-struct.
Maybe names like
get_meta_struct_value
get_meta_struct_field
or something way better (see ⬇️)
module Metadata | ||
# Adds complex structures tagging behavior through meta_struct | ||
# @public_api | ||
module MetaStruct |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a question about naming, we call Metadata
as a single word, but MetaStruct
as two words. Does it make sense if we call it Metastruct
? We can avoid long names on reading too
get_metastruct_value("key")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 7d76a8e
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requesting changes due to mismatch between no change log entry claimed in PR description and the new API being marked public.
# Adds complex structures tagging behavior through meta_struct | ||
# @public_api | ||
module MetaStruct | ||
def set_meta_struct(meta_struct) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Strech why not def meta_struct=
here and def meta_struct(key)
for reading?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, because it's merge eh... this is in interesting api
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, how is this API going to be used? By key? In batch? Is the "read" ever invoked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is used in two places:
- during TraceOperation initialisation (itself used in fork_clone) which just sets metastruct to what is given in initialize. metastruct is empty in this case and the behaviour is corresponding to metastruct=.
- in TraceFormatter, when we merge the metastruct contained in the TraceOperation with the one contained in the root span. And in our case, we don't want to replace an existing field, but merge it too, so effectively we want a deep_merge (which is not the behaviour we have right now).
So I'm going to split that method in two, metastruct=
and deep_merge_metastruct!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about that ? 0066df9
merger = proc do |_, v1, v2| | ||
if v1.is_a?(Hash) && v2.is_a?(Hash) | ||
v1.merge(v2, &merger) | ||
elsif v1.is_a?(Array) && v2.is_a?(Array) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⚪ Code Quality Violation
Consider using Array() to ensure the type is that of an array (...read more)
The rule "Use Array()
to ensure your variable is an array" is important for ensuring your code behaves as expected, regardless of the type of data it receives. It is common in Ruby to need to iterate through an array of items. However, if the variable is not an array, this can lead to unexpected behavior or errors.
The Array()
method in Ruby is a Kernel method that converts its argument to an Array. If the argument is already an Array, it returns the argument. If the argument is nil, it returns an empty Array. This can be used to ensure that a variable is an array before trying to iterate over it, preventing potential errors or unexpected behavior.
By using Array(foos)
, you can ensure that foos
is an array before you try to iterate over it with each
. This prevents the need to check if foos
is an array with foos.is_a?(Array)
and makes your code cleaner and easier to understand.
0c00cb0
to
7d76a8e
Compare
7d76a8e
to
9b62558
Compare
Split set_metastruct into deep_merge_metastruct! and metastruct=
9b62558
to
52d5a77
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends on the tracing team, but I've left a few notes following our discussion about the metastruct implementation
metastruct[key] | ||
end | ||
|
||
protected |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would go with private if we don't expose it outside, protected doesn't make much sense in Ruby.
def metastruct=(second) | ||
@metastruct = second | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this method used? I can't find, should we remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is used in TraceOperation initialisation, which itself is used by fork_clone
def get_metastruct_field(key) | ||
metastruct[key] | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know at the first iteration we can keep it simple, but honestly it feels like the prefixing is needed only because we don't give it as an object, just to showcase it:
mst = meta.metastruct # => Metadata::MetastructObject
mst['my-key'] # => {...}
mst['my-key'] = {...} # => {...}
mst.deep_merge!({...}) # => Metadata::MetastructObject
And then we can do for MetastructObject
all sorts of encapsulation to hide the logic
class MetastructObject
def []=(key)
# ...
end
end
We have discussed it, but just to give you perspective of how the interactions and naming could be eased with an object instead of prefixes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about that ? 65dd37b
b29e3fb
to
a771a6d
Compare
a771a6d
to
65dd37b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing move, I think it's good to go! Solid work 👏🏼
P.S I left few suggestions, but they are not blockers
def initialize(metastruct = nil) | ||
@metastruct = metastruct | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A small suggestion, you can get rid of metastruct
method if you guarantee in the initializer the success of the storage (metastruct)
def initialize(metastruct = nil) | |
@metastruct = metastruct | |
end | |
def initialize(metastruct = nil) | |
@metastruct = metastruct || {} | |
end |
v2 | ||
end | ||
end | ||
metastruct.merge!(second.to_h, &merger) # steep:ignore BlockTypeMismatch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure that we need to convert second into Hash before merging?
it do | ||
expect(trace_op.send(:metastruct).to_h).to eq({}) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it do | |
expect(trace_op.send(:metastruct).to_h).to eq({}) | |
end | |
it { expect(trace_op.send(:metastruct).to_h).to eq({}) } |
What does this PR do?
This PR adds meta_struct support to traces and spans. meta_struct is a way to send complex structures to the backend. It is sent as a Map<String, ByteArray> with the ByteArray being the structure encoded in messagepack.
Motivation:
meta_struct is required for stack trace collection, which is required for exploit prevention. But this can also be used by other products.
Change log entry
None.
Additional Notes:
How to test the change?