Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[APR-205] dogstatsd: add support for a configurable tag interceptor for filtering/augmenting metadata via tags #132

Merged
merged 2 commits into from
Jul 26, 2024

Conversation

tobz
Copy link
Member

@tobz tobz commented Jul 24, 2024

Context

A lot of code exists at the moment surrounding context resolving and the sharing of tags, including code to then try and allow us to modify tags in resolved contexts and minimize how much we have to clone/allocate, and so on and so forth.

The extent to which we work around this is due entirely to the need to modify tags after decoding. While we certainly aspire to allow for such free-form transformation of metrics, the current need to do so stems from origin enrichment, where we extract relevant tags that may or may not be present and use them to augment the metric metadata.

We consciously chose to do this in order to attempt to get the DogStatsD codec somewhat decoupled from being a one-to-one rewrite of the equivalent code in the Agent. In essence, though, our lives would be simpler if we could both exclude certain tags from the context resolving logic, as well as visit those tags as we build the Metric in order to update the metadata from them accordingly.

Solution

This PR introduces a new trait, TagMetadataInterceptor, which serves two purposes:

  • evaluate individual tags to determine whether they should be passed through, dropped, or "intercepted"
  • when a tag is intercepted, define how we use that tag to augment the metadata for the metric

Most of the code involved here is boilerplate to fit how we've structured decoding to avoid allocations when building ContextRef<'a, T>... so we've like TagSplitter<'a>/TagIter<'a>, we now have TagFilterer<I, TMI>/TagFiltererIter<I, TMI>.

We handle this in two phases:

  • wrap our normal TagSplitter<'a> to get TagFilterer<TagSplitter<'a>, TMI>, where TMI is the interceptor implementation to use (this can be cloned, and serves purely as a way to filter out intercepted/dropped tags when doing context stuff)
  • call a new function, update_metadata_from_tags, which handles updating the MetricMetadata we got from decoding with any tags that get identified as needing to be intercepted

For now, we've pushed this into DogstatsdCodec as a new generic parameter (TMI) with a default of (). All the relevant traits implementations are in place such that existing code works exactly as it did before, and no tags are filtered out or do anything to modify the metric metadata. In a subsequent PR, we'll add a Agent-like interceptor implementation specifically for handling some of the tags we currently handle in the origin_enrichment transform, such as dd.internal.entity_id, dd.internal.jxm_check_name, and dd.internal.card.

@tobz tobz requested review from a team as code owners July 24, 2024 19:22
@github-actions github-actions bot added the area/io General I/O and networking. label Jul 24, 2024
@tobz tobz added the type/enhancement An enhancement in functionality or support. label Jul 24, 2024
fn intercept(&self, tag: &str, metadata: &mut MetricMetadata) {
if let Some((key, value)) = tag.split_once(':') {
if key == "host" {
metadata.set_hostname(Arc::from(value));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated question: why is hostname the only field the needs to be wrapped in an Arcfor MetricMetadata?

Copy link
Member Author

@tobz tobz Jul 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the only string field, and to make MetricMetadata cloneable, at least cheaply... we wrap the hostname in Arc<T>.

Realistically, it should actually be MetaString so that we can intern it or inline it where possible, skipping a heap allocation entirely.

@pr-commenter
Copy link

pr-commenter bot commented Jul 25, 2024

Regression Detector (Saluki)

Regression Detector Results

Run ID: 1762dee2-9b50-45f6-897f-00b6d70f2ebb

Baseline: a133936
Comparison: d3f7fe6

Performance changes are noted in the perf column of each table:

  • ✅ = significantly better comparison variant performance
  • ❌ = significantly worse comparison variant performance
  • ➖ = no significant change in performance

Significant changes in experiment optimization goals

Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%

perf experiment goal Δ mean % Δ mean % CI links
dsd_uds_500mb_3k_contexts ingress throughput -9.71 [-9.82, -9.61]

Fine details of change detection per experiment

perf experiment goal Δ mean % Δ mean % CI links
dsd_uds_100mb_3k_contexts_distributions_only memory utilization +0.60 [+0.44, +0.77]
dsd_uds_100mb_250k_contexts ingress throughput +0.19 [-0.03, +0.42]
dsd_uds_512kb_3k_contexts ingress throughput +0.03 [-0.06, +0.13]
dsd_uds_50mb_10k_contexts_no_inlining ingress throughput -0.00 [-0.01, +0.01]
dsd_uds_50mb_10k_contexts_no_inlining_no_allocs ingress throughput -0.01 [-0.06, +0.04]
dsd_uds_100mb_3k_contexts ingress throughput -0.01 [-0.02, +0.00]
dsd_uds_1mb_3k_contexts ingress throughput -0.01 [-0.08, +0.05]
dsd_uds_10mb_3k_contexts ingress throughput -0.01 [-0.16, +0.13]
dsd_uds_1mb_50k_contexts ingress throughput -0.03 [-0.11, +0.04]
dsd_uds_1mb_50k_contexts_memlimit ingress throughput -1.50 [-3.94, +0.94]
dsd_uds_500mb_3k_contexts ingress throughput -9.71 [-9.82, -9.61]

Explanation

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

  1. Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.

  2. Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.

  3. Its configuration does not mark it "erratic".

@pr-commenter
Copy link

pr-commenter bot commented Jul 25, 2024

Regression Detector (DogStatsD)

Regression Detector Results

Run ID: f93c8284-704a-4d77-b192-82c08b54dd56

Baseline: 7.52.0
Comparison: 7.52.1

Performance changes are noted in the perf column of each table:

  • ✅ = significantly better comparison variant performance
  • ❌ = significantly worse comparison variant performance
  • ➖ = no significant change in performance

No significant changes in experiment optimization goals

Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%

There were no significant changes in experiment optimization goals at this confidence level and effect size tolerance.

Fine details of change detection per experiment

perf experiment goal Δ mean % Δ mean % CI links
dsd_uds_100mb_3k_contexts_distributions_only memory utilization +0.95 [+0.74, +1.15]
dsd_uds_10mb_3k_contexts ingress throughput +0.02 [-0.02, +0.07]
dsd_uds_1mb_50k_contexts ingress throughput +0.02 [-0.00, +0.04]
dsd_uds_1mb_50k_contexts_memlimit ingress throughput +0.02 [-0.00, +0.04]
dsd_uds_500mb_3k_contexts ingress throughput +0.00 [-0.00, +0.01]
dsd_uds_1mb_3k_contexts ingress throughput -0.00 [-0.04, +0.04]
dsd_uds_100mb_250k_contexts ingress throughput -0.01 [-0.04, +0.03]
dsd_uds_100mb_3k_contexts ingress throughput -0.02 [-0.03, -0.01]
dsd_uds_512kb_3k_contexts ingress throughput -0.05 [-0.10, +0.01]

Explanation

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

  1. Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.

  2. Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.

  3. Its configuration does not mark it "erratic".

@tobz tobz merged commit 906979a into main Jul 26, 2024
12 of 13 checks passed
@tobz tobz deleted the tobz/dsd-codec-tag-extractor-filter branch December 13, 2024 01:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/io General I/O and networking. type/enhancement An enhancement in functionality or support.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants