-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix a bug to unflatten the doc with list of map with multiple entries correctly #1204
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1204 +/- ##
============================================
- Coverage 81.77% 81.71% -0.06%
+ Complexity 2512 1255 -1257
============================================
Files 190 95 -95
Lines 8564 4282 -4282
Branches 1436 718 -718
============================================
- Hits 7003 3499 -3504
+ Misses 1003 506 -497
+ Partials 558 277 -281 ☔ View full report in Codecov by Sentry. |
looks good to me. signoff your commit |
Please fix the failed DCO checks. |
few questions here:
|
text embedding processor and sparse encoding processor would be affected by this change since they both extend from InferenceProcessor, where unflatten is used for pre-processing the ingest document. |
@bzhangam an alternative to consider is: we could treat
as a single item to be pushed to stack, and and modify unflattenSingleItem. That will allow the existing logic to work as is. I'll leave it up to you to decide, as this will require additional changes |
Yeah all the inference processors are impacted by this bug. If we run into the bug the doc will lose some data and the missing data will not be stored to the source or index. The missing data is removed from the ingestDoc. |
I think that one line fix should be good enough. I feel it should be done that way. The bug is more like a typo thing. |
… correctly. Signed-off-by: Bo Zhang <bzhangam@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fixing, LGTM.
Description
Fix a bug to unflatten the doc with list of map with multiple entries correctly.
Let's say we have a doc like:
If we define an ingest pipeline to create the embedding for the product_description we will try to unflatten the doc first to ensure the . in the field name will be handled properly. Before my fix the doc will be unflattened like:
We lose the product_description because when we handle the map in a list we incorrectly do
targetList.set(targetList.size() - 1, tempMap);
. With my fix the doc will be unflattened correctly:Related Issues
There is no related issue.
Check List
--signoff
.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.