-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Sometimes aggregations are empty with terminate_after #13288
Comments
@sandeshkr419 do you want to take a stab at verifying this issue and confirming if its a bug? |
Hi @Rmaan, Thanks for reporting this.
|
Thanks for taking time and replying.
I put terminate_after=1 but zero documents were processed. There are no buckets. This would happen with much higher numbers as well. Is this the intended behavior? Can we process LESS documents than terminate_after?
I'm sure that was the behavior, it's clearly stating this in ES v7.10 docs. Also my team tested this and it was the behavior till OpenSearch 2.9.
Sorry I didn't get this part, does it mean that if I put terminate after 1000 docs, it might terminate before 1000 docs? Like after 500 docs it might stop? |
Ideally, we should not be processing less documents than the terminate_after field, I'm still checking if I missed a change in 2.10 with @Rmaan Also, wanted to check if concurrent search is enabled for the index on which the queries are run upon. Some known bugs were reported and fixed with concurrent searches in later iteration is what I remember vaguely: https://github.com/opensearch-project/OpenSearch/pulls?q=is%3Apr+terminate_after+
I was mistaken, you are right. Striking off my responses above to eliminate confusion further.
I was talking about #11643 where in certain cases it would read aggregation values directly from Lucene index structure and not iterate over documents. In that cases(segments with no deletes, no |
I was also suspicious of concurrent search but I checked and it was off in settings, and as I understood when I set So do you think this is a bug? For now we downgraded to OpenSearch 2.9 and the issue is gone but we like to have access to new features such as hybrid search 😅 If you need any help regarding reproducing, our team can work on providing a sample document set and query. |
Thanks for the details @Rmaan. Yeah, please help me with sample document set & test query. I can try reproducing it on 2.10 and maybe a later 2.x version (if you already ran it for 2.13 with same results - let me know as well - I'll not spend time checking on differences then). |
We reproduced on 2.10, 2.11, 2.13 but on 2.9 it works. I will try to provide a reproduce, as I understood it needs a fair amount of docs but will give it a go. |
Hello We made a reproducible pack for it, it's a Go code that generate 5K documents, and then does the problematic search that results in no buckets. Download it here: opensearch_bug_proof.tar.gz To run just do docker-compose up And then you will see the result
That has no buckets in it, even though we had some docs matching. You can also connect it to your local OpenSearch by putting credentials at the top of the go file. Can you also please open the ticket? |
@sandeshkr419 Kindly reminder BTW we also reproduced with less complications, simply indexing a couple docs and aggregating a keyword field with terminate_after=1 will give no buckets. i.e. it terminates before processing 1 doc per shard. This doesn't happen with integer fields for example. |
Thanks @Rmaan for the reproduce SOP. |
Hi @Rmaan - sorry for delay in getting back to this. This was root caused and fixed in OS 2.15, you can find details on cause/fix here - #14208 I have verified this using the scripts you have provided (thanks again for making this easier for me). Here is the search result in 2.13 with no buckets:
Here is the search result in 2.15 with buckets as expected (fixed):
I'm resolving this issue as this is fixed in 2.15. Kindly reopen if there are further concerns. |
Thanks a lot. Cool, we will switch to from 2.9 to 2.15 soon! Have a nice weekend. |
Describe the bug
We found weird bugs in our search faceting after moving to OpenSearch 2.11 from Elasticsearch, it seems when
terminate_after
is passed, sometimes returned buckets are fully empty (Even though all processed docs should have a bucket) and sometimes it's way less than theterminate_after * primary_shard_count
, although search is terminated early and all items have a value for the aggregation.We couldn't reproduce this issues with OpenSearch 2.9 but 2.10 was affected.
Related component
Search:Aggregations
To Reproduce
Exact reproduction is hard, seems we need to have a couple segments to see the problem, and reported issues are when we aggregate on a keyword field while filtering based on some integer field. When we aggregate on the same integer column the issue doesn't happen.
Sample request:
Sample response:
Expected behavior
We should see a bucket in
aggregations.materials_facet.buckets
.As you can see we have
terminate_after=1
means each shard should at least process 1 document, we have 3 shards so in total 3 docs should be processed. This can be verified inhits.total.value
and inhits
array. But as you can see aggregations doesn't match with the documents that you can see in hits.The issue will go away if we remove
terminate_after
but that will hurt performance because we have a high number of documents. Terminating after a 100K items is enough for us.Additional Details
Host/Environment:
The text was updated successfully, but these errors were encountered: