You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently we are facing an issue from OS core while testing #1191 where document field updates are not resulting in regenerating embeddings when using TextImageEmbeddingProcessor. The root cause seems to be the ingestion isn't hitting the processor at all.
In this case the root cause is OS core, but we should have tests for TextEmbeddingProcessor/TextImageEmbeddingProcessor to verify that embeddings are updated correctly as well to catch and verified these problems earlier
public void testBulkUpdateUpdatesEmbeddings() throws Exception {
// Upload and load the model
String modelId = uploadTextEmbeddingModel();
loadModel(modelId);
// Create pipeline with text embedding processor that maps "text" to "text_embedding"
String pipelineConfig = "{\n"
+ " \"description\": \"text embedding pipeline for bulk update test\",\n"
+ " \"processors\": [\n"
+ " {\n"
+ " \"text_embedding\": {\n"
+ " \"model_id\": \"%s\",\n"
+ " \"field_map\": {\n"
+ " \"text\": \"text_embedding\"\n"
+ " }\n"
+ " }\n"
+ " }\n"
+ " ]\n"
+ "}\n";
createPipelineProcessor(String.format(pipelineConfig, modelId), PIPELINE_NAME, modelId, null);
// Create index with pipeline
createIndexWithPipeline(INDEX_NAME, "IndexMappings.json", PIPELINE_NAME);
// Index original document
String documentId = "bulk-update-test";
String originalText = "{ \"text\": \"hello world\" }";
ingestDocument(INDEX_NAME, originalText, documentId);
// Get the original embedding
Map<String, Object> originalDoc = (Map<String, Object>) getDocById(INDEX_NAME, documentId).get("_source");
List<Double> originalEmbedding = (List<Double>) originalDoc.get("text_embedding");
assertNotNull(originalEmbedding);
// Update document using _bulk API with pipeline parameter
String updatedText = "bye world";
String bulkRequest = "{ \"update\": { \"_index\": \""
+ INDEX_NAME
+ "\", \"_id\": \""
+ documentId
+ "\", \"pipeline\": \""
+ PIPELINE_NAME
+ "\" } }\n"
+ "{ \"doc\" : { \"text\": \""
+ updatedText
+ "\" } }\n";
Map<String, String> params = new HashMap<>();
params.put("refresh", "true");
makeRequest(
client(),
"POST",
"_bulk",
params,
toHttpEntity(bulkRequest),
ImmutableList.of(new BasicHeader(HttpHeaders.USER_AGENT, "Kibana"))
);
// Get the updated embedding
Map<String, Object> updatedDoc = (Map<String, Object>) getDocById(INDEX_NAME, documentId).get("_source");
List<Double> updatedEmbedding = (List<Double>) updatedDoc.get("text_embedding");
assertNotNull(updatedEmbedding);
// Verify the text was updated
assertEquals(updatedText, updatedDoc.get("text"));
// Verify the embedding has changed
assertNotEquals(originalEmbedding, updatedEmbedding);
}
Test fails with
java.lang.AssertionError: Values should be different. Actual: [0.44230038, ....
What solution would you like?
Adding additional checks like the above in reindex integration tests for TextEmbeddingProcessor and TextImageEmbeddingProcessor added in #1075 to verify document embeddings are updated when the documents are updated
What alternatives have you considered?
Not having these tests, adding tests in core instead
Do you have any additional context?
Will update with core issue when it is cut or resolved
The text was updated successfully, but these errors were encountered:
martin-gaievski
changed the title
[FEATURE] Add ITs to check embeddings are updated when using TextEmbeddingProcessors in document reindex/update scenarios
Add ITs to check embeddings are updated when using TextEmbeddingProcessors in document reindex/update scenarios
Feb 28, 2025
Is your feature request related to a problem?
Currently we are facing an issue from OS core while testing #1191 where document field updates are not resulting in regenerating embeddings when using TextImageEmbeddingProcessor. The root cause seems to be the ingestion isn't hitting the processor at all.
In this case the root cause is OS core, but we should have tests for TextEmbeddingProcessor/TextImageEmbeddingProcessor to verify that embeddings are updated correctly as well to catch and verified these problems earlier
@junqiu-lei verified issue with the following:
Test fails with
What solution would you like?
Adding additional checks like the above in reindex integration tests for TextEmbeddingProcessor and TextImageEmbeddingProcessor added in #1075 to verify document embeddings are updated when the documents are updated
What alternatives have you considered?
Not having these tests, adding tests in core instead
Do you have any additional context?
Will update with core issue when it is cut or resolved
The text was updated successfully, but these errors were encountered: