Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MDS to msearch #9361

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

wanglam
Copy link
Contributor

@wanglam wanglam commented Feb 10, 2025

Description

This PR is for adding MDS for _internal/msearch. A data_source_id can be passed to the msearch API to determine which data source should be used to execute msearch with search requests.

Issues Resolved

#2174

Screenshot

No UI changes

Testing the changes

  • Clone branch code and run yarn osd bootstrap --single-version loose
  • Add below configs in config/opensearch_dashboards.yml
data_source.enabled: true
uiSettings:
  overrides:
    'courier:batchSearches': true
  • Run yarn start --no-base-path
  • Login with admin user
  • Visit /app/management/opensearch-dashboards/dataSources
  • Add an OpenSearch data source connection
  • Visit /app/home#/tutorial_directory
  • Click "Add data" on the "Sample eCommerce orders" card
  • Click "View data" on the "Sample eCommerce orders" card
  • Dashboards data in external data source should be showed

Changelog

  • feat: Add MDS to msearch

Check List

  • All tests pass
    • yarn test:jest
    • yarn test:jest_integration
  • New functionality includes testing.
  • New functionality has been documented.
  • Update CHANGELOG.md
  • Commits are signed per the DCO using --signoff

Signed-off-by: Lin Wang <wonglam@amazon.com>
Copy link

codecov bot commented Feb 10, 2025

Codecov Report

Attention: Patch coverage is 95.23810% with 1 line in your changes missing coverage. Please review.

Project coverage is 61.72%. Comparing base (6cf56f0) to head (2eef6da).
Report is 9 commits behind head on main.

Files with missing lines Patch % Lines
src/plugins/data/server/search/routes/msearch.ts 66.66% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #9361   +/-   ##
=======================================
  Coverage   61.71%   61.72%           
=======================================
  Files        3817     3817           
  Lines       91862    91868    +6     
  Branches    14552    14557    +5     
=======================================
+ Hits        56697    56702    +5     
  Misses      31509    31509           
- Partials     3656     3657    +1     
Flag Coverage Δ
Linux_1 28.99% <14.28%> (-0.01%) ⬇️
Linux_2 56.45% <15.00%> (-0.02%) ⬇️
Linux_3 39.20% <95.23%> (+<0.01%) ⬆️
Linux_4 28.89% <14.28%> (-0.01%) ⬇️
Windows_1 29.01% <14.28%> (-0.01%) ⬇️
Windows_2 56.40% <15.00%> (-0.02%) ⬇️
Windows_3 39.20% <95.23%> (+<0.01%) ⬆️
Windows_4 28.89% <14.28%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@wanglam wanglam requested a review from huyaboo as a code owner February 11, 2025 06:19
Comment on lines 87 to 91
const key = dataSourceId ?? '';

const i = requestsToFetch.length;
requestsToFetch = [...requestsToFetch, request];
requestOptions = [...requestOptions, options];
const i = requestsToFetch[key]?.length ?? 0;
requestsToFetch[key] = [...(requestsToFetch[key] ?? []), restRequest];
requestOptions[key] = [...(requestOptions[key] ?? []), options];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: better to add some comments here to elaborate that we are grouping request by data source.

Comment on lines +70 to +72
query: schema.object({
data_source_id: schema.maybe(schema.string()),
}),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not very sure if the validation will give error if the request does not have any query.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested with data source disabled case. The msearch API works as expected without the data source id query.

@@ -83,20 +83,27 @@ async function delayedFetch(
if (ms === 0) {
return callClient([request], [options], fetchHandlers)[0];
}
const { dataSourceId, ...restRequest } = request;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it is straightforward to add unit test for this function but it will be great that we have one, the logic is kind of complex.

SuZhou-Joe
SuZhou-Joe previously approved these changes Feb 11, 2025
@SuZhou-Joe
Copy link
Member

I thought it would be a simple several lines fix. Thanks for this amazing PR.

Signed-off-by: Lin Wang <wonglam@amazon.com>
@@ -67,16 +67,23 @@ export function registerMsearchRoute(router: IRouter, deps: SearchRouteDependenc
})
),
}),
query: schema.object({
data_source_id: schema.maybe(schema.string()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't tested this yet, but I see a potential issue:

Scenario:

  • Index pattern foo* saved on datasourceA
    • Matches: foo-1, foo-2, etc.
  • Index pattern bar* also saved on datasourceA
    • Matches: bar-1, bar-2, etc.
  • bar* indices actually exist on datasourceB

Won't this lead to 404s when querying bar* since we're looking in datasourceA but the indices are in datasourceB?

To me i almost think multisearch and multiple data sources is almost fundamentally conflicting ideas. How do we fetch the data from multiple data sources with a single query. this essentially feels like cross cluster searching. unless im missing something

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, trying to search for an index pattern in a different data source than where it exists will cause a 404 error.

To solve this, we separate the search requests by data source on the client-side before sending them. The fetchSoon function groups the requests based on their data source, creating multiple msearch requests, one for each data source.

This way, each msearch request is sent to the correct data source, avoiding 404 errors for indices that are in a different data source.

The internal msearch API needs to call msearch in different OpenSearch clients. They are also expensive when combined in the same request. That's why we only support one data source per request.

Copy link
Member

@zhongnansu zhongnansu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have opened an issue for this before. Could you take a look and see if your PR would address the problem there? Thanks #2174 @wanglam

@wanglam
Copy link
Contributor Author

wanglam commented Feb 17, 2025

I have opened an issue for this before. Could you take a look and see if your PR would address the problem there? Thanks #2174 @wanglam

Hi @zhongnansu , I've looked at the issue you mentioned (#2174). This PR can solve the problem described there. We will add a new parameter called "data source id" to the internal msearch API.

The way we manage queued search requests will also be improved in this PR. We will store the search requests, parameters, and ongoing promises separately for each data source.

Feel free to review this PR. If anything needs to be changed or improved, I will update the code accordingly.

@@ -36,7 +36,8 @@ import { ISearchOptions } from '../../index';
export function callClient(
searchRequests: SearchRequest[],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the search request have the index pattern ? Or data source info?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The search requests interface is not being updated. The data source ID is only present on the client side. It was added at line 359 and removed at line 86. The msearch API's search requests interface remains unchanged. The only modification to the internal msearch API was the addition of a new data_source_id query parameter. This parameter will trigger a msearch call on the specified data source.

@SuZhou-Joe SuZhou-Joe added the multiple datasource multiple datasource project label Feb 26, 2025
@@ -39,9 +39,10 @@ import { LegacyFetchHandlers } from '../../../common/search/search_source';
* @internal
*/
export function getCallMsearch({ http }: { http: HttpStart }): LegacyFetchHandlers['callMsearch'] {
return async ({ body, signal }) => {
return async ({ body, signal, dataSourceId }) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would this require modifying the msearch search params? if we have some?

@@ -67,16 +67,23 @@ export function registerMsearchRoute(router: IRouter, deps: SearchRouteDependenc
})
),
}),
query: schema.object({
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we actually just shove the data source id into each of the searches.body? and parse out that value to get the client?

Copy link
Member

@kavilla kavilla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall it looks good to me.

A couple thoughts:

  • I'm a bit hesitant about modifying the existing callMsearch API. Maybe we could stuff the dataSourceId into each search.body within searches instead and parse it out on the server side? This could mitigate modifying existing APIs.
  • Alternatively, what about adding a new _enhanced msearch route instead of changing the original? That could keep things more isolated.

But I can be convinced that this ok to pass by query param. Make sure we have solid testing coverage, especially for both MDS and non-MDS scenarios. Leveraging the existing Cypress tests from @ananzh and @angle943 could be a good starting point.

Otherwise, nice work! Let me know what you think about those suggestions. Looking forward to getting this merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants