Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Workload Management] Add rule schema for workload management #17238

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

ruai0511
Copy link
Contributor

@ruai0511 ruai0511 commented Feb 3, 2025

Description

This PR introduces the schema for Rule object used in the workload management feature. A single rule will only have one feature but could have multiple attributes.
For the example below, the feature is query_group, and corresponds to the tag dev_query_group_id

{
    "_id": "fwhbuib397u4o03=="
    "attribute_1": ["logs123"],
    "attribute_2": ["xxxx", "yyyy],
    "query_group": "dev_query_group_id",
    "updated_at": "01-10-2025T21:23:456Z"
}

Related issues:
RFC: #16797
#16813

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

Signed-off-by: Ruirui Zhang <mariazrr@amazon.com>
Copy link
Contributor

github-actions bot commented Feb 4, 2025

✅ Gradle check result for f6a4a28: SUCCESS

Copy link

codecov bot commented Feb 4, 2025

Codecov Report

Attention: Patch coverage is 82.55034% with 26 lines in your changes missing coverage. Please review.

Project coverage is 72.42%. Comparing base (5b2692f) to head (e152d95).

Files with missing lines Patch % Lines
...src/main/java/org/opensearch/autotagging/Rule.java 82.55% 10 Missing and 16 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##               main   #17238    +/-   ##
==========================================
  Coverage     72.41%   72.42%            
- Complexity    65531    65577    +46     
==========================================
  Files          5291     5292     +1     
  Lines        304335   304484   +149     
  Branches      44181    44211    +30     
==========================================
+ Hits         220395   220513   +118     
- Misses        65838    65903    +65     
+ Partials      18102    18068    -34     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ruai0511 ruai0511 added the backport 2.x Backport to 2.x branch label Feb 7, 2025
Signed-off-by: Ruirui Zhang <mariazrr@amazon.com>
Copy link
Contributor

github-actions bot commented Feb 7, 2025

❕ Gradle check result for bb8d06c: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Comment on lines 37 to 38
* "_id": "fwehf8302582mglfio349==",
* "index_pattern": ["logs123", "user*"],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also have customer friendly - name/description fields for a rule?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah i do think we can add a name field for better user readability. Especially when they get all the rules in the system, and only looking at the id and attributes could be overwhelming.

Comment on lines 166 to 169
QUERY_GROUP("query_group", Set.of(RuleAttribute.INDEX_PATTERN));

private final String name;
private final Set<RuleAttribute> allowedAttributes;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This design of using same rules for multiple features becomes really tricky if they have different allowed attribute values. Having more attributes present than the ones that apply to a rule can be very confusing and if we are not allowing that, the possible attributes is the intersection of allowed attributes for all the features.

Comment on lines 39 to 41
* "query_group": "dev_query_group_id",
* "updated_at": "01-10-2025T21:23:21.456Z"
* }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ruai0511 / @kaushalmahi12 - Can we add some more examples of how these rules will look if we have multiple features and multiple attributes tomorrow? Especially when the allowed attributes for those features are not common

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A single Rule entity will always map to a single feature. Hence each feature will define their set of allowed attributes.

For example since the slow logs are defined at index level. It doesn't make sense for s.ow log feature to consider index_pattern as attribute. It can define next layer of attributes to provide more specific details about the slow logs,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah a single rule will only have one feature but could have multiple attributes.
For example,

{
    "_id": "fwhbuib397u4o03=="
    "attribute_1": ["logs123", "user*"],
    "attribute_2": ["logs123", "user*"],
    "query_group": "dev_query_group_id",
    "updated_at": "01-10-2025T21:23:456Z"
}

Here the feature is query_group, and corresponds to the tag dev_query_group_id

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should move this file under a new package e,g; auto_tagging since the construct is meant to be used in generic context but the structure hints a specific use.

* This RuleAttribute enum contains the attribute names for a rule.
* @opensearch.experimental
*/
public enum RuleAttribute {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should rename this enum to Attribute since it is already under Rule class and should be referred to as Rule.Attribute.

Signed-off-by: Ruirui Zhang <mariazrr@amazon.com>
Signed-off-by: Ruirui Zhang <mariazrr@amazon.com>
Signed-off-by: Ruirui Zhang <mariazrr@amazon.com>
Copy link
Contributor

✅ Gradle check result for e152d95: SUCCESS

@kaushalmahi12
Copy link
Contributor

LGTM! apart from the name field I think we should rename it to description

Signed-off-by: Ruirui Zhang <mariazrr@amazon.com>
Copy link
Contributor

❌ Gradle check result for aa1f8b7: null

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Collaborator

@jainankitk jainankitk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I feel we should move the enum Feature into enum FeatureType, and introduce Feature class that includes FeatureType, label (essentially nothing but value for feature type), and attributeMap (as allowed attributes are governed by FeatureType). Rule itself being generic in nature should be agnostic of coupling itself with those validations, etc.

private final Map<Attribute, Set<String>> attributeMap;
private final Feature feature;
private final String description;
private final String label;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

label is corresponding to the feature right? Does it make sense to include as part of Feature?

out.writeMap(attributeMap, Attribute::writeTo, StreamOutput::writeStringCollection);
out.writeString(label);
out.writeString(updatedAt);
out.writeString(feature.getName());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be delegated to Feature class

for (Map.Entry<Attribute, Set<String>> entry : attributeMap.entrySet()) {
builder.array(entry.getKey().getName(), entry.getValue().toArray(new String[0]));
}
builder.field(feature.getName(), label);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be delegated to the feature class

* @opensearch.experimental
*/
public enum Attribute {
INDEX_PATTERN("index_pattern");
Copy link
Contributor

@sgup432 sgup432 Feb 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if an external plugin like queryInsights or other want to use this schema and eventually this Attribute class?
Will they add another enum for their usecase? That doesn't seem right. As core should be agnostic of plugin level logic.

Same for Feature class above.

String label,
String updatedAt,
Feature feature,
ValidationException validationException
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing validationException as a method param doesn't seem a very nice way to me.

Why not do below?

public static Optional<ValidationException> validateRuleInputs() {}

Here you just pass an optional exception in case u encounter any issues. The caller can additionally check if any exceptions are present or not, and accordingly handle it.

Same for below method.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to this, instead of passing all these params, should we create an internal object which can be passed instead? And same can be used at Rule class level above?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think creating a Validator class with all of the attributes which needs to be validated as member variables will be more suitable.

we can create something like following

static class RuleValidator {
      String description,
        Map<Attribute, Set<String>> attributeMap;
        String label;
        String updatedAt;
        Feature feature;
....

     Optional<ValidationException> validate() {
               validateAttributes();
               validateFeature();
         ....
    }
...
}

Copy link
Contributor Author

@ruai0511 ruai0511 Feb 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put it as a method param to reuse the same ValidationException object and throw all the errors together after all the checks. If we use public static Optional<ValidationException> validateRuleInputs() {} then the internal error-checking functions like validateAttributeMap can't reuse the ValidationException object

* This enum enumerates the features that can use the Rule Based Auto-tagging
* @opensearch.experimental
*/
public enum Feature {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[NIT] Better to create a different file for this IMO. Same for below Attribute

@kaushalmahi12
Copy link
Contributor

Overall, I feel we should move the enum Feature into enum FeatureType, and introduce Feature class that includes FeatureType, label (essentially nothing but value for feature type), and attributeMap (as allowed attributes are governed by FeatureType). Rule itself being generic in nature should be agnostic of coupling itself with those validations, etc.

I might be missing some context here but this seems unnecessary at this point. The feature here is used only by the rule entity.
Why I believe creating a separate class doesn't make sense.

  1. Class itself means a blueprint for something to create similar types of things. Here we will still duplicate the FeatureType a lot. attributes are a again a static list of constants for a feature
  2. Feature here is a constant and it doesn't necessarily needs to carry the label for itself since label is rule's property.

I agree with @sgup432 comments, Feature and Feature supported attributes are specific to plugin.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants