Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor bwc test suite to re-use existing resources between tests #1171

Closed

Conversation

will-hwang
Copy link
Contributor

@will-hwang will-hwang commented Feb 5, 2025

Description

Currently, models and pipelines are re-created and re-deployed per test, leading to redundant model loads and pipeline creations. This change avoids this redundancy by reusing the created resource between test cases.

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@will-hwang will-hwang force-pushed the refactor_bwc_test_suite branch from dbb1174 to 69edfc7 Compare February 5, 2025 06:18
Copy link

codecov bot commented Feb 5, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.55%. Comparing base (e8ed3a4) to head (3e0f337).

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1171      +/-   ##
============================================
- Coverage     81.72%   81.55%   -0.17%     
+ Complexity     2494     1244    -1250     
============================================
  Files           186       93      -93     
  Lines          8426     4213    -4213     
  Branches       1428      714     -714     
============================================
- Hits           6886     3436    -3450     
+ Misses         1000      506     -494     
+ Partials        540      271     -269     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@vibrantvarun
Copy link
Member

Why do we need this change? Each test should be run independently in a dedicated environment with fresh, uncontaminated data, preventing interference from other tests and ensuring the test accurately evaluates the functionality of the component being tested in isolation.

@vibrantvarun
Copy link
Member

We have faced earlier issues with this where during the test run due to shared names the pipeline either used to get deleted or wrong information has been shared amongst test.

@@ -28,9 +28,9 @@ public void testSemanticSearch_E2EFlow() throws Exception {
waitForClusterHealthGreen(NODES_BWC_CLUSTER, 90);
switch (getClusterType()) {
case OLD:
modelId = uploadTextEmbeddingModel();
modelId = getOrUploadTextEmbeddingModel(getIngestionPipeline(PIPELINE_NAME), TEXT_EMBEDDING_PROCESSOR);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of OLD it will always upload. Then why it is changed to getOrUpload?

@@ -373,11 +373,11 @@ private static int getHitCount(final Map<String, Object> searchResponseAsMap) {
}

public static String getModelId(Map<String, Object> pipeline, String processor) {
assertNotNull(pipeline);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition validates that pipeline created earlier has the model Id. At line 377 we are fetching information from a pipeline. So we need to ensure that it is not null.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it is a validation condition

@@ -373,11 +373,11 @@ private static int getHitCount(final Map<String, Object> searchResponseAsMap) {
}

public static String getModelId(Map<String, Object> pipeline, String processor) {
assertNotNull(pipeline);
if (pipeline == null) return null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In anycase, it will not be null.

@heemin32
Copy link
Collaborator

heemin32 commented Feb 5, 2025

I suggest reusing only the model. Pipeline is cheap to create. We likely need just one model per type: text, text_embedding, and sparse. There should be a common method to retrieve these models based solely on the type, without requiring the caller to specify anything else. This will ensure the models can be shared correctly without errors.

@vibrantvarun The reason for reusing the model is that during BWC tests, multiple tests deploy the same model simultaneously, leading to model deployment failures during node upgrades. This, in turn, causes flaky test failures.

@vibrantvarun
Copy link
Member

@heemin32 I think test resources created for any test should be just meant to run it. The root cause of the model deployment failure is not the redeployment of the model. It should be something else.

@heemin32
Copy link
Collaborator

heemin32 commented Feb 5, 2025

I think test resources created for any test should be just meant to run it.

Our goal isn’t to test model deployment—that should be covered by ml-common. For us, model deployment is merely a prerequisite for running our feature tests.

Model deployment is resource-intensive, and if each test deploys its own model, the test cluster might not be large enough to handle them all at once. Sharing models across tests shouldn’t affect test coverage or validity in any way.

The primary objective here is to avoid flaky test failures caused by model deployment issues. We also want to ensure that problems in ml-common(issue with model deployment) don’t impact our test velocity. To further minimize test flakiness related to model deployment, we could even consider using lightweight or mock models.

@vibrantvarun
Copy link
Member

@heemin32 in bwc we do test the model deployed in older version of node still exists in new version of node. Because of that test we are always able to find critical issues with ml-commons just before the release.

@vibrantvarun
Copy link
Member

FYI: ml-commons till data does not have BWC framework in place. We as neural-search catch issues in BWC tests.

@vibrantvarun
Copy link
Member

vibrantvarun commented Feb 5, 2025

@heemin32 I am aligned with you on reducing test flakiness but we need to find the actual root cause. Here we are trying to reduce overhead on us by doing less model deployment at the cost of removing model deployment test in bwc. Model deployment is part of each feature we release.

@will-hwang will-hwang force-pushed the refactor_bwc_test_suite branch 5 times, most recently from bb2012d to 14cf7f9 Compare February 6, 2025 17:50
Signed-off-by: will-hwang <sang7239@gmail.com>
Signed-off-by: will-hwang <sang7239@gmail.com>
Signed-off-by: will-hwang <sang7239@gmail.com>
Signed-off-by: will-hwang <sang7239@gmail.com>
Signed-off-by: will-hwang <sang7239@gmail.com>
This reverts commit 4cbf419.

Signed-off-by: will-hwang <sang7239@gmail.com>
This reverts commit 89bbcb5.

Signed-off-by: will-hwang <sang7239@gmail.com>
This reverts commit 2423e13.

Signed-off-by: will-hwang <sang7239@gmail.com>
This reverts commit 69edfc7.

Signed-off-by: will-hwang <sang7239@gmail.com>
…tests"

This reverts commit 4fb6deb.

Signed-off-by: will-hwang <sang7239@gmail.com>
Signed-off-by: will-hwang <sang7239@gmail.com>
Signed-off-by: will-hwang <sang7239@gmail.com>
Signed-off-by: will-hwang <sang7239@gmail.com>
Signed-off-by: will-hwang <sang7239@gmail.com>
Signed-off-by: will-hwang <sang7239@gmail.com>
This reverts commit e371faf.

Signed-off-by: will-hwang <sang7239@gmail.com>
This reverts commit 94997a1.

Signed-off-by: will-hwang <sang7239@gmail.com>
This reverts commit 7513c4b.

Signed-off-by: will-hwang <sang7239@gmail.com>
This reverts commit d4c0d42.

Signed-off-by: will-hwang <sang7239@gmail.com>
This reverts commit 65c8add.

Signed-off-by: will-hwang <sang7239@gmail.com>
This reverts commit 0ec949c.

Signed-off-by: will-hwang <sang7239@gmail.com>
Signed-off-by: will-hwang <sang7239@gmail.com>
Signed-off-by: will-hwang <sang7239@gmail.com>
Signed-off-by: will-hwang <sang7239@gmail.com>
Signed-off-by: will-hwang <sang7239@gmail.com>
Signed-off-by: will-hwang <sang7239@gmail.com>
Signed-off-by: will-hwang <sang7239@gmail.com>
Signed-off-by: will-hwang <sang7239@gmail.com>
Signed-off-by: will-hwang <sang7239@gmail.com>
Signed-off-by: will-hwang <sang7239@gmail.com>
Signed-off-by: will-hwang <sang7239@gmail.com>
Signed-off-by: will-hwang <sang7239@gmail.com>
Signed-off-by: will-hwang <sang7239@gmail.com>
Signed-off-by: will-hwang <sang7239@gmail.com>
Signed-off-by: will-hwang <sang7239@gmail.com>
Signed-off-by: will-hwang <sang7239@gmail.com>
Signed-off-by: will-hwang <sang7239@gmail.com>
@will-hwang will-hwang force-pushed the refactor_bwc_test_suite branch from 92cb543 to 2d0f6a1 Compare February 6, 2025 18:41
@heemin32
Copy link
Collaborator

@heemin32 I am aligned with you on reducing test flakiness but we need to find the actual root cause. Here we are trying to reduce overhead on us by doing less model deployment at the cost of removing model deployment test in bwc. Model deployment is part of each feature we release.

The main objective of BWC tests is to ensure backward compatibility, not to conduct load testing. Currently, we are deploying the same text embedding model multiple times—once for each test. This approach doesn’t align with real customer scenarios, as they typically wouldn’t deploy the same model repeatedly. Reusing the model across tests will not compromise the purpose of the BWC tests but will help stabilize them, allowing us to focus on what truly matters.

@martin-gaievski
Copy link
Member

@will-hwang are you still working on this PR after neural got migrated to 3.0-alpha?

@will-hwang will-hwang closed this Feb 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants