Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Missing bucket in terms aggregation with missing value #17391

Closed
noCharger opened this issue Feb 19, 2025 · 14 comments · Fixed by #17418
Closed

[BUG] Missing bucket in terms aggregation with missing value #17391

noCharger opened this issue Feb 19, 2025 · 14 comments · Fixed by #17418
Assignees
Labels
bug Something isn't working Search:Aggregations v3.0.0 Issues and PRs related to version 3.0.0

Comments

@noCharger
Copy link
Contributor

noCharger commented Feb 19, 2025

Describe the bug

When executing a terms aggregation with a specified missing value, the expected bucket for documents without the specified field is not appearing in the result. This is identified via an SQL IT comparing v3.0.0 alpha1 to non-alpha1 snapshot.

Related component

No response

To Reproduce

Example docs

{
  "_shards": {
    "total": 1,
    "failed": 0,
    "successful": 1,
    "skipped": 0
  },
  "hits": {
    "hits": [
      {
        "_index": "opensearch-sql_test_index_game_of_thrones",
        "_source": {
          "gender": "F",
          "name": {
            "ofHerName": 1,
            "firstname": "Daenerys",
            "lastname": "Targaryen"
          },
          "nickname": "Daenerys \"Stormborn\"",
          "titles": [
            "motherOfDragons",
            "queenOfTheAndals",
            "breakerOfChains",
            "Khaleesi"
          ],
          "house": "Targaryen",
          "parents": {
            "mother": "Rhaella",
            "father": "Aerys"
          }
        },
        "_id": "1",
        "_score": 1
      },
      {
        "_index": "opensearch-sql_test_index_game_of_thrones",
        "_source": {
          "gender": "M",
          "name": {
            "firstname": "Eddard",
            "ofHisName": 1,
            "lastname": "Stark"
          },
          "titles": [
            "lordOfWinterfell",
            "wardenOfTheNorth",
            "handOfTheKing"
          ],
          "house": "Stark",
          "parents": {
            "mother": "Lyarra",
            "father": "Rickard"
          }
        },
        "_id": "2",
        "_score": 1
      },
      {
        "_index": "opensearch-sql_test_index_game_of_thrones",
        "_source": {
          "gender": "M",
          "name": {
            "firstname": "Brandon",
            "ofHisName": 4,
            "lastname": "Stark"
          },
          "titles": [
            "princeOfWinterfell"
          ],
          "house": "Stark",
          "parents": {
            "mother": "Catelyn",
            "father": "Eddard"
          },
          "@wolf": "Summer"
        },
        "_id": "3",
        "_score": 1
      },
      {
        "_index": "opensearch-sql_test_index_game_of_thrones",
        "_source": {
          "gender": "M",
          "name": {
            "firstname": "Jaime",
            "ofHisName": 1,
            "lastname": "Lannister"
          },
          "titles": [
            "kingSlayer",
            "lordCommanderOfTheKingsguard",
            "Ser"
          ],
          "house": "Lannister",
          "parents": {
            "mother": "Joanna",
            "father": "Tywin"
          }
        },
        "_id": "4",
        "_score": 1
      },
      {
        "_index": "opensearch-sql_test_index_game_of_thrones",
        "_source": {
          "seat": "Dragonstone",
          "sigil": "Dragon",
          "words": "fireAndBlood",
          "hname": "Targaryen"
        },
        "_id": "5",
        "_score": 1
      },
      {
        "_index": "opensearch-sql_test_index_game_of_thrones",
        "_source": {
          "seat": "Winterfell",
          "sigil": "direwolf",
          "words": "winterIsComing",
          "hname": "Stark"
        },
        "_id": "6",
        "_score": 1
      },
      {
        "_index": "opensearch-sql_test_index_game_of_thrones",
        "_source": {
          "seat": "CasterlyRock",
          "sigil": "lion",
          "words": "hearMeRoar",
          "hname": "Lannister"
        },
        "_id": "7",
        "_score": 1
      }
    ],
    "total": {
      "value": 7,
      "relation": "eq"
    },
    "max_score": 1
  },
  "took": 57,
  "timed_out": false
}

Example Query

{
  "from": 0,
  "size": 0,
  "_source": {
    "includes": [
      "COUNT"
    ],
    "excludes": []
  },
  "aggregations": {
    "nick": {
      "terms": {
        "field": "nickname",
        "missing": "no_nickname",
        "size": 10,
        "min_doc_count": 1,
        "shard_min_doc_count": 0,
        "show_term_doc_count_error": false,
        "order": [
          {
            "_count": "desc"
          },
          {
            "_key": "asc"
          }
        ]
      },
      "aggregations": {
        "COUNT_0": {
          "value_count": {
            "field": "_index"
          }
        }
      }
    }
  }
}

Example result on 3.0.0 alpha1

 1> Query Result:
  1> {
  1>   "_shards": {
  1>     "total": 1,
  1>     "failed": 0,
  1>     "successful": 1,
  1>     "skipped": 0
  1>   },
  1>   "hits": {
  1>     "hits": [],
  1>     "total": {
  1>       "value": 7,
  1>       "relation": "eq"
  1>     },
  1>     "max_score": null
  1>   },
  1>   "took": 78,
  1>   "timed_out": false,
  1>   "aggregations": {"nick": {
  1>     "doc_count_error_upper_bound": 0,
  1>     "sum_other_doc_count": 0,
  1>     "buckets": [
  1>       {
  1>         "COUNT(*)": {"value": 1},
  1>         "doc_count": 1,
  1>         "key": "daenerys"
  1>       },
  1>       {
  1>         "COUNT(*)": {"value": 1},
  1>         "doc_count": 1,
  1>         "key": "stormborn"
  1>       }
  1>     ]
  1>   }}
  1> }

Expected behavior

Expect result (from non-alpha1)

    {
      "_shards": {
        "total": 1,
        "failed": 0,
        "successful": 1,
        "skipped": 0
      },
      "hits": {
        "hits": [],
        "total": {
          "value": 7,
          "relation": "eq"
        },
        "max_score": null
      },
      "took": 76,
      "timed_out": false,
      "aggregations": {"nick": {
        "doc_count_error_upper_bound": 0,
        "sum_other_doc_count": 0,
        "buckets": [
          {
            "COUNT(*)": {"value": 6},
            "doc_count": 6,
            "key": "no_nickname"
          },
          {
            "COUNT(*)": {"value": 1},
            "doc_count": 1,
            "key": "daenerys"
          },
          {
            "COUNT(*)": {"value": 1},
            "doc_count": 1,
            "key": "stormborn"
          }
        ]
      }}
    }

Additional Details

No response

@peterzhuamazon
Copy link
Member

Hi @andrross @msfroh @prudhvigodithi could you help take a look at this?
As this is blocking the SQL bump to 3.0.0.0-alpha1.

Thanks!

@sandeshkr419
Copy link
Contributor

@jainankitk Can you take first stab at this?

@peterzhuamazon
Copy link
Member

Hi @sandeshkr419 @jainankitk could you help with this issue?

Thanks!

@kkewwei
Copy link
Contributor

kkewwei commented Feb 20, 2025

@peterzhuamazon @sandeshkr419 @noCharger I have tested with the newest main branch, it seems ok.

{
   "took": 74,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "skipped": 0,
      "failed": 0
   },
   "hits": {
      "total": {
         "value": 7,
         "relation": "eq"
      },
      "max_score": null,
      "hits": []
   },
   "aggregations": {
      "nick": {
         "doc_count_error_upper_bound": 0,
         "sum_other_doc_count": 0,
         "buckets": [
            {
               "key": "no_nickname",
               "doc_count": 6,
               "COUNT_0": {
                  "value": 6
               }
            },
            {
               "key": "Daenerys \"Stormborn\"",
               "doc_count": 1,
               "COUNT_0": {
                  "value": 1
               }
            }
         ]
      }
   }
}
``

@andrross
Copy link
Member

@noCharger Can you take a look at this again? Are you able to reproduce it? If so, please share the exact commit that you're running. If you're able to reproduce it as a failing integration test or rest-spec test that would be great too.

@peterzhuamazon
Copy link
Member

My test on latest core main:

{
  "took": 12,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 7,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "nick": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "no_nickname",
          "doc_count": 6,
          "COUNT_0": {
            "value": 6
          }
        },
        {
          "key": "Daenerys \"Stormborn\"",
          "doc_count": 1,
          "COUNT_0": {
            "value": 1
          }
        }
      ]
    }
  }
}

@noCharger
Copy link
Contributor Author

noCharger commented Feb 20, 2025

@kkewwei @andrross @peterzhuamazon This issue reproduced from alpha1 snapshot https://ci.opensearch.org/ci/dbc/distribution-build-opensearch/3.0.0-alpha1/10829/linux/x64/tar/builds/opensearch/dist/opensearch-min-3.0.0-alpha1-linux-x64.tar.gz

Steps:

  1. create index (This mapping is definied in sql plugin and no change so far https://github.com/search?q=repo%3Aopensearch-project%2Fsql%20nickname&type=code)
curl -X PUT "http://localhost:9200/opensearch-sql_test_index_game_of_thrones" -H "Content-Type: application/json" -d '
{
  "mappings": {
    "properties": {
      "nickname": {
        "type": "text",
        "fielddata": true
      },
      "name": {
        "properties": {
          "firstname": {
            "type": "text",
            "fielddata": true
          },
          "lastname": {
            "type": "text",
            "fielddata": true
          },
          "ofHerName": {
            "type": "integer"
          },
          "ofHisName": {
            "type": "integer"
          }
        }
      },
      "house": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword"
          }
        }
      },
      "gender": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword"
          }
        }
      }
    }
  }
}'
  1. Ingest docs
curl -X POST "http://localhost:9200/_bulk" -H "Content-Type: application/json" -d '{"index":{"_index":"opensearch-sql_test_index_game_of_thrones","_id":"1"}}
{"gender":"F","name":{"ofHerName":1,"firstname":"Daenerys","lastname":"Targaryen"},"nickname":"Daenerys \"Stormborn\"","titles":["motherOfDragons","queenOfTheAndals","breakerOfChains","Khaleesi"],"house":"Targaryen","parents":{"mother":"Rhaella","father":"Aerys"}}
{"index":{"_index":"opensearch-sql_test_index_game_of_thrones","_id":"2"}}
{"gender":"M","name":{"firstname":"Eddard","ofHisName":1,"lastname":"Stark"},"titles":["lordOfWinterfell","wardenOfTheNorth","handOfTheKing"],"house":"Stark","parents":{"mother":"Lyarra","father":"Rickard"}}
{"index":{"_index":"opensearch-sql_test_index_game_of_thrones","_id":"3"}}
{"gender":"M","name":{"firstname":"Brandon","ofHisName":4,"lastname":"Stark"},"titles":["princeOfWinterfell"],"house":"Stark","parents":{"mother":"Catelyn","father":"Eddard"},"@wolf":"Summer"}
{"index":{"_index":"opensearch-sql_test_index_game_of_thrones","_id":"4"}}
{"gender":"M","name":{"firstname":"Jaime","ofHisName":1,"lastname":"Lannister"},"titles":["kingSlayer","lordCommanderOfTheKingsguard","Ser"],"house":"Lannister","parents":{"mother":"Joanna","father":"Tywin"}}
{"index":{"_index":"opensearch-sql_test_index_game_of_thrones","_id":"5"}}
{"seat":"Dragonstone","sigil":"Dragon","words":"fireAndBlood","hname":"Targaryen"}
{"index":{"_index":"opensearch-sql_test_index_game_of_thrones","_id":"6"}}
{"seat":"Winterfell","sigil":"direwolf","words":"winterIsComing","hname":"Stark"}
{"index":{"_index":"opensearch-sql_test_index_game_of_thrones","_id":"7"}}
{"seat":"CasterlyRock","sigil":"lion","words":"hearMeRoar","hname":"Lannister"}
'
  1. search
 curl -X GET "http://localhost:9200/opensearch-sql_test_index_game_of_thrones/_search?pretty" -H "Content-Type: application/json" -d '
{           
  "from": 0,
  "size": 0,  
  "_source": {   
    "includes": [
      "COUNT"
    ],            
    "excludes": []
  },               
  "aggregations": {
    "nick": {   
      "terms": {            
        "field": "nickname",     
        "missing": "no_nickname",
        "size": 10,        
        "min_doc_count": 1,      
        "shard_min_doc_count": 0,          
        "show_term_doc_count_error": false,
        "order": [
          {                 
            "_count": "desc"
          },
          {              
            "_key": "asc"
          }
        ]
      },               
      "aggregations": {
        "COUNT_0": {      
          "value_count": {   
            "field": "_index"
          }
        }
      }
    }
  }
}
'                       

Result:

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 7,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "nick" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "daenerys",
          "doc_count" : 1,
          "COUNT_0" : {
            "value" : 1
          }
        },
        {
          "key" : "stormborn",
          "doc_count" : 1,
          "COUNT_0" : {
            "value" : 1
          }
        }
      ]
    }
  }
}

@peterzhuamazon
Copy link
Member

peterzhuamazon commented Feb 20, 2025

We just built another new build:
https://ci.opensearch.org/ci/dbc/distribution-build-opensearch/3.0.0-alpha1/10835/linux/x64/tar/builds/opensearch/dist/opensearch-min-3.0.0-alpha1-linux-x64.tar.gz


{
  "took" : 72,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 7,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "nick" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "daenerys",
          "doc_count" : 1,
          "COUNT_0" : {
            "value" : 1
          }
        },
        {
          "key" : "stormborn",
          "doc_count" : 1,
          "COUNT_0" : {
            "value" : 1
          }
        }
      ]
    }
  }
}

@peterzhuamazon
Copy link
Member

Changing index mapping from text to keyword somehow can query back the no_nickname bucket:

curl -X PUT "http://localhost:9200/opensearch-sql_test_index_game_of_thrones" -H "Content-Type: application/json" -d '{
  "mappings": {
    "properties": {
      "gender": { "type": "keyword" },
      "name": {
        "properties": {
          "firstname": { "type": "keyword" },
          "lastname": { "type": "keyword" },
          "ofHerName": { "type": "integer" },
          "ofHisName": { "type": "integer" }
        }
      },
      "nickname": { "type": "keyword" },
      "titles": { "type": "keyword" },
      "house": { "type": "keyword" },
      "parents": {
        "properties": {
          "mother": { "type": "keyword" },
          "father": { "type": "keyword" }
        }
      },
      "seat": { "type": "keyword" },
      "sigil": { "type": "keyword" },
      "words": { "type": "keyword" },
      "hname": { "type": "keyword" },
      "@wolf": { "type": "keyword" }
    }
  }
}'

@andrross
Copy link
Member

I'm able to reproduce this with a simplified setup:

curl -X PUT "http://localhost:9200/opensearch-sql_test_index_game_of_thrones" -H "Content-Type: application/json" -d '
{
  "mappings": {
    "properties": {
      "nickname": {
        "type": "text",
        "fielddata": true
      }
    }
  }
}'
curl -X POST "http://localhost:9200/_bulk" -H "Content-Type: application/json" -d '
{"index":{"_index":"opensearch-sql_test_index_game_of_thrones","_id":"1"}}
{"name":"Danerys","nickname":"Daenerys \"Stormborn\""}
{"index":{"_index":"opensearch-sql_test_index_game_of_thrones","_id":"2"}}
{"name":"Eddard"}
'
curl -X GET "http://localhost:9200/opensearch-sql_test_index_game_of_thrones/_search?pretty" -H "Content-Type: application/json" -d '
{           
  "from": 0,
  "size": 0,  
  "aggregations": {
    "nick": {   
      "terms": {            
        "field": "nickname",     
        "missing": "no_nickname"
      }
    }
  }
}
'

I do not get the "no_nickname" bucket. It appears this was introduced with the upgrade to Lucene 10 in commit 7c46f8f. If I go to the previous commit on main then I get the "no_nickname" bucket in the results as expected.

@peterzhuamazon
Copy link
Member

Can also confirm when change back to non-lucene10 commit the results are up.

@andrross
Copy link
Member

After some experimenting, it seems that if I take the second word out of the nickname field (change "nickname":"Daenerys \"Stormborn\"" to "nickname":"Daenerys") then I get the expected result.

@kkewwei
Copy link
Contributor

kkewwei commented Feb 21, 2025

After some experimenting, it seems that if I take the second word out of the nickname field (change "nickname":"Daenerys \"Stormborn\"" to "nickname":"Daenerys") then I get the expected result.

@andrross I'm debugging, can you assign it to me?

@kkewwei
Copy link
Contributor

kkewwei commented Feb 21, 2025

In GlobalOrdinalsStringTermsAggregator, we change the code: when the count > 0, we will collect the Id, but for the missing filed doc, the count must be 0, so we ignore the doc, which iolates the usage of missing

Image

I will pull request to fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Search:Aggregations v3.0.0 Issues and PRs related to version 3.0.0
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

5 participants