Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Searching with exact matches like page titles doesn't always return the right result in llamaindex-db #26

Open
k-allagbe opened this issue May 9, 2024 · 4 comments
Assignees

Comments

@k-allagbe
Copy link
Member

Description

If a page is indexed and we search an exact match of a portion of it's content, we should expect to receive the page in the results. But we observe that it's not always the case.

image

Notebook: link

This is worth investigating.

@k-allagbe k-allagbe added this to Finesse May 9, 2024
@k-allagbe k-allagbe moved this to Todo in Finesse May 9, 2024
@ibrahim-kabir
Copy link

@leejaeka any updates on this ?

@leejaeka
Copy link

Work In progress. Writing custom retriever for hybrid search solution.

@leejaeka
Copy link

Built a new keyword index. And a custom retriever for hybrid search. Hybrid search was not able to solve issue26 as it also failed to retrieve a query with exact title. Going to look a little more into this but issue26 may require cs solution to match title with query . Will discuss further with Guy once he is back.
MicrosoftTeams-image (7)

@leejaeka
Copy link

leejaeka commented Jun 4, 2024

2024-06-03 update

  • Found a solution: MetadataFilters is a filter function from llamaindex in which you can match any metadata given to doc (before index is built)
  • Code pushed here in 17-include-metadata-in-embedding branch, llamaindex-hybrid-search.ipynb.
  • Key line of code is the following

from llama_index.core.vector_stores import MetadataFilters
from llama_index.core.vector_stores import ExactMatchFilter
filters = MetadataFilters(filters=[ ExactMatchFilter( key="title", value='Audit of the Project Management of the Food Safety Action Plan - Canadian Food Inspection Agency' ), ])

and
node = Document(text=curr['content'], metadata={'id_':curr['id'],'title':curr['title'], 'subtitle':curr['subtitle']})

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

No branches or pull requests

3 participants