-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce neural highlighter framework #1193
base: main
Are you sure you want to change the base?
Introduce neural highlighter framework #1193
Conversation
Signed-off-by: Junqiu Lei <junqiu@amazon.com>
1bc7505
to
9704f6c
Compare
public static final String NAME = "neural"; | ||
private static final String MODEL_ID_FIELD = "model_id"; | ||
|
||
private static MLCommonsClientAccessor mlCommonsClient; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This static pattern is not ideal for unit test. Can we receive accessor thought constructor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
|
||
@Override | ||
public boolean canHighlight(MappedFieldType fieldType) { | ||
// TODO: Implement actual condition check in subsequent PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be better to restrict it to TEXT field type and expand it later in subsequent PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
|
||
Map<String, Object> options = fieldContext.field.fieldOptions().options(); | ||
String modelId = getModelId(options); | ||
log.info("Using model ID: {}", modelId); // Will be replaced with actual model loading logic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This log will be flooded and obscure other important logs. Let't move it to debug level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to add a temp placeholder here, since client api function was planed in another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add javadoc and a comment of all the important steps this method is trying to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vibrantvarun sure
log.info("Using model ID: {}", modelId); // Will be replaced with actual model loading logic | ||
log.info("Using ML client: {}", mlCommonsClient); // Will be replaced with actual model loading logic | ||
|
||
// TODO: Implement actual highlighting logic in subsequent PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With all these TODO, it seems like this PR should not be merged to main now.
Let me stop here and review it when the code is fully completed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. My original purpose was trying to make the PR more clear by different component(highlighter framework, ml client, text processing logic...). I could also combine all code in one big PR.
|
||
private String formatHighlight(String text) { | ||
// TODO: Implement user provided format options in subsequent PR | ||
return "<em>" + text + "</em>"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use String format for safer concatenation
} | ||
} | ||
|
||
return query.toString().replaceAll("\\w+:", "").replaceAll("\\s+", " ").trim(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add comment describing what you are trying to extract from string here
// For now, return a basic highlight of the field text | ||
Text[] fragments = new Text[] { new Text(formatHighlight(fieldText)) }; | ||
return new HighlightField(fieldContext.fieldName, fragments); | ||
} catch (Exception e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to narrow cases when we can have exceptions? Having global level try/catch is not that great for performance on search path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, let me try to adjust to narrow exception catch cases.
} | ||
|
||
private String getFieldText(FieldHighlightContext fieldContext) { | ||
Object value = fieldContext.hitContext.sourceLookup().extractValue(fieldContext.fieldName, null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't this second argument a fallback in case requested key is missing? If yes can't we return empty string here and avoid the null check on a line below?
|
||
Map<String, Object> options = fieldContext.field.fieldOptions().options(); | ||
String modelId = getModelId(options); | ||
log.info("Using model ID: {}", modelId); // Will be replaced with actual model loading logic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add javadoc and a comment of all the important steps this method is trying to do.
Map<String, Object> options = fieldContext.field.fieldOptions().options(); | ||
String modelId = getModelId(options); | ||
log.info("Using model ID: {}", modelId); // Will be replaced with actual model loading logic | ||
log.info("Using ML client: {}", mlCommonsClient); // Will be replaced with actual model loading logic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log.info("Using ML client: {}", mlCommonsClient); // Will be replaced with actual model loading logic | |
log.debug("Using ML client: {}", mlCommonsClient); // Will be replaced with actual model loading logic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious why do we need this log this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the comment along with this line code
|
||
private String extractOriginalQuery(Query query) { | ||
if (query instanceof NeuralKNNQuery neuralQuery) { | ||
String originalText = neuralQuery.getOriginalQueryText(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
String originalText = neuralQuery.getOriginalQueryText(); | |
String originalQuery = neuralQuery.getOriginalQueryText(); |
this.knnQuery = knnQuery; | ||
this.originalQueryText = originalQueryText; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we add BWC test to validate that it does not break anything in bwc environment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will have a check on bwc side
@@ -107,6 +109,11 @@ public Builder rescoreContext(RescoreContext rescoreContext) { | |||
return this; | |||
} | |||
|
|||
public Builder originalQueryText(String originalQueryText) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add javadoc. We already have an issue for reducing the warning logs #1095
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If possible please add the javadoc on all public methods of this class.
For reference in this PR, in ml-commons we have PR for sentence highlighting model support : |
Description
This PR introduces the Neural Highlighter framework from OpenSearch core Highlighter, enhancing the semantic highlighting capabilities. Key changes include:
PR opensearch-project/ml-commons#3600 adds the functionality in ml-commons to use sentence highlighting model.
Related Issues
Part of #1182
Check List
--signoff
.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.