-
Notifications
You must be signed in to change notification settings - Fork 968
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add MDS to msearch #9361
base: main
Are you sure you want to change the base?
Add MDS to msearch #9361
Conversation
Signed-off-by: Lin Wang <wonglam@amazon.com>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #9361 +/- ##
=======================================
Coverage 61.71% 61.72%
=======================================
Files 3817 3817
Lines 91862 91868 +6
Branches 14552 14557 +5
=======================================
+ Hits 56697 56702 +5
Misses 31509 31509
- Partials 3656 3657 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Signed-off-by: Lin Wang <wonglam@amazon.com>
const key = dataSourceId ?? ''; | ||
|
||
const i = requestsToFetch.length; | ||
requestsToFetch = [...requestsToFetch, request]; | ||
requestOptions = [...requestOptions, options]; | ||
const i = requestsToFetch[key]?.length ?? 0; | ||
requestsToFetch[key] = [...(requestsToFetch[key] ?? []), restRequest]; | ||
requestOptions[key] = [...(requestOptions[key] ?? []), options]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: better to add some comments here to elaborate that we are grouping request by data source.
query: schema.object({ | ||
data_source_id: schema.maybe(schema.string()), | ||
}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not very sure if the validation will give error if the request does not have any query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tested with data source disabled case. The msearch
API works as expected without the data source id query.
@@ -83,20 +83,27 @@ async function delayedFetch( | |||
if (ms === 0) { | |||
return callClient([request], [options], fetchHandlers)[0]; | |||
} | |||
const { dataSourceId, ...restRequest } = request; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if it is straightforward to add unit test for this function but it will be great that we have one, the logic is kind of complex.
I thought it would be a simple several lines fix. Thanks for this amazing PR. |
Signed-off-by: Lin Wang <wonglam@amazon.com>
Signed-off-by: Lin Wang <wonglam@amazon.com>
@@ -67,16 +67,23 @@ export function registerMsearchRoute(router: IRouter, deps: SearchRouteDependenc | |||
}) | |||
), | |||
}), | |||
query: schema.object({ | |||
data_source_id: schema.maybe(schema.string()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't tested this yet, but I see a potential issue:
Scenario:
- Index pattern foo* saved on datasourceA
- Matches: foo-1, foo-2, etc.
- Index pattern bar* also saved on datasourceA
- Matches: bar-1, bar-2, etc.
- bar* indices actually exist on datasourceB
Won't this lead to 404s when querying bar* since we're looking in datasourceA but the indices are in datasourceB?
To me i almost think multisearch and multiple data sources is almost fundamentally conflicting ideas. How do we fetch the data from multiple data sources with a single query. this essentially feels like cross cluster searching. unless im missing something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, trying to search for an index pattern in a different data source than where it exists will cause a 404 error.
To solve this, we separate the search requests by data source on the client-side before sending them. The fetchSoon
function groups the requests based on their data source, creating multiple msearch requests, one for each data source.
This way, each msearch request is sent to the correct data source, avoiding 404 errors for indices that are in a different data source.
The internal msearch API needs to call msearch in different OpenSearch clients. They are also expensive when combined in the same request. That's why we only support one data source per request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @zhongnansu , I've looked at the issue you mentioned (#2174). This PR can solve the problem described there. We will add a new parameter called "data source id" to the internal msearch API. The way we manage queued search requests will also be improved in this PR. We will store the search requests, parameters, and ongoing promises separately for each data source. Feel free to review this PR. If anything needs to be changed or improved, I will update the code accordingly. |
@@ -36,7 +36,8 @@ import { ISearchOptions } from '../../index'; | |||
export function callClient( | |||
searchRequests: SearchRequest[], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the search request have the index pattern ? Or data source info?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The search requests interface is not being updated. The data source ID is only present on the client side. It was added at line 359 and removed at line 86. The msearch API's search requests interface remains unchanged. The only modification to the internal msearch API was the addition of a new data_source_id
query parameter. This parameter will trigger a msearch
call on the specified data source.
@@ -39,9 +39,10 @@ import { LegacyFetchHandlers } from '../../../common/search/search_source'; | |||
* @internal | |||
*/ | |||
export function getCallMsearch({ http }: { http: HttpStart }): LegacyFetchHandlers['callMsearch'] { | |||
return async ({ body, signal }) => { | |||
return async ({ body, signal, dataSourceId }) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would this require modifying the msearch search params? if we have some?
@@ -67,16 +67,23 @@ export function registerMsearchRoute(router: IRouter, deps: SearchRouteDependenc | |||
}) | |||
), | |||
}), | |||
query: schema.object({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we actually just shove the data source id into each of the searches.body? and parse out that value to get the client?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall it looks good to me.
A couple thoughts:
- I'm a bit hesitant about modifying the existing
callMsearch
API. Maybe we could stuff thedataSourceId
into eachsearch.body
withinsearches
instead and parse it out on the server side? This could mitigate modifying existing APIs. - Alternatively, what about adding a new
_enhanced
msearch route instead of changing the original? That could keep things more isolated.
But I can be convinced that this ok to pass by query param. Make sure we have solid testing coverage, especially for both MDS and non-MDS scenarios. Leveraging the existing Cypress tests from @ananzh and @angle943 could be a good starting point.
Otherwise, nice work! Let me know what you think about those suggestions. Looking forward to getting this merged.
Description
This PR is for adding MDS for
_internal/msearch
. Adata_source_id
can be passed to the msearch API to determine which data source should be used to execute msearch with search requests.Issues Resolved
#2174
Screenshot
No UI changes
Testing the changes
yarn osd bootstrap --single-version loose
config/opensearch_dashboards.yml
yarn start --no-base-path
/app/management/opensearch-dashboards/dataSources
/app/home#/tutorial_directory
Changelog
Check List
yarn test:jest
yarn test:jest_integration