Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add score normalization and combination documentation #4985

Merged
merged 36 commits into from
Sep 22, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
6ab7d57
Add search phase results processor
kolchfa-aws Aug 28, 2023
79e9597
Add hybrid query
kolchfa-aws Aug 29, 2023
69d4274
Normalization processor additions
kolchfa-aws Sep 6, 2023
06dcb26
Add more details
kolchfa-aws Sep 6, 2023
f0d1667
Continue writing
kolchfa-aws Sep 7, 2023
0ff381f
Add more query then fetch details and diagram
kolchfa-aws Sep 7, 2023
32f7a6e
Small rewording
kolchfa-aws Sep 7, 2023
8b0bb3d
Leaner left nav headers
kolchfa-aws Sep 7, 2023
76e5164
Tech review feedback
kolchfa-aws Sep 7, 2023
2fe3464
Add semantic search tutorial
kolchfa-aws Sep 10, 2023
c353572
Reworded prerequisites
kolchfa-aws Sep 11, 2023
9cff096
Removed comma
kolchfa-aws Sep 11, 2023
7ee90cd
Rewording advanced prerequisites
kolchfa-aws Sep 11, 2023
7f360ba
Changed searching for ML model to shorter request
kolchfa-aws Sep 11, 2023
a898585
Update task type in register model response
kolchfa-aws Sep 11, 2023
6e1a73c
Changing example
kolchfa-aws Sep 12, 2023
b842fcf
Added huggingface prefix to model names
kolchfa-aws Sep 12, 2023
d7971cb
Change example responses
kolchfa-aws Sep 12, 2023
6ca775f
Added note about huggingface prefix
kolchfa-aws Sep 12, 2023
b16de8d
Update _ml-commons-plugin/semantic-search.md
kolchfa-aws Sep 12, 2023
f7bc213
Implemented doc review comments
kolchfa-aws Sep 12, 2023
c605b5a
List weights under parameters
kolchfa-aws Sep 12, 2023
1f89522
Remove one-shard warning for normalization processor
kolchfa-aws Sep 12, 2023
1bbb929
Apply suggestions from code review
kolchfa-aws Sep 13, 2023
e42f8ad
Implemented editorial comments
kolchfa-aws Sep 13, 2023
76a893b
Editorial comments and resolve merge conflicts
kolchfa-aws Sep 13, 2023
e126508
Change links
kolchfa-aws Sep 13, 2023
0c7b587
More editorial feedback
kolchfa-aws Sep 13, 2023
6d48caf
Change model-serving framework to ML framework
kolchfa-aws Sep 13, 2023
838b42f
Use get model API to check model status
kolchfa-aws Sep 13, 2023
9ead908
Implemented tech review comments
kolchfa-aws Sep 13, 2023
8f292f1
Added neural search description and diagram
kolchfa-aws Sep 14, 2023
6fd7468
More editorial comments
kolchfa-aws Sep 15, 2023
20cb3df
Add link to profile API
kolchfa-aws Sep 15, 2023
0c3f589
Addressed more tech review comments
kolchfa-aws Sep 18, 2023
76036c4
Implemented editorial comments on changes
kolchfa-aws Sep 18, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 9 additions & 3 deletions _ml-commons-plugin/semantic-search.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,13 @@ In this tutorial, you'll learn how to:
It's helpful to understand the following terms before starting this tutorial:

- _Semantic search_: Employs neural search in order to determine the intention of the user's query in the search context and improve search relevance.
- _Neural search_: When indexing documents containing text, neural search uses language models to generate vector embeddings from that text. When you then use a _neural query_, the query text is passed through a language model, and the resulting vector embeddings are compared with the document text vector embeddings to find the most relevant results.
- _Neural search_: Facilitates vector search at ingestion time and at search time:
- At ingestion time, neural search uses language models to generate vector embeddings from the text fields in the document. The documents containing both the original text field and the vector embedding of the field are then indexed in a k-NN index, as shown in the following diagram.

![Neural search at ingestion time diagram]({{site.url}}{{site.baseurl}}/images/neural-search-ingestion.png)
- At search time, when you then use a _neural query_, the query text is passed through a language model, and the resulting vector embeddings are compared with the document text vector embeddings to find the most relevant results, as shown in the following diagram.

![Neural search at search time diagram]({{site.url}}{{site.baseurl}}/images/neural-search-query.png)

## OpenSearch components for semantic search

Expand Down Expand Up @@ -97,15 +103,15 @@ Neural search requires a language model in order to generate vector embeddings f

### Step 1(a): Choose a language model

For this tutorial, you'll use the [DistilBERT](https://huggingface.co/docs/transformers/model_doc/distilbert) model from Hugging Face. It is one of the pretrained sentence transformer models available in OpenSearch. You'll need the name, version, and dimension of the model to register it. You can find this information in the [pretrained model table]({{site.url}}{{site.baseurl}}/ml-commons-plugin/pretrained-models/#sentence-transformers) by selecting the `config_url` link corresponding to the model's TorchScript artifact:
For this tutorial, you'll use the [DistilBERT](https://huggingface.co/docs/transformers/model_doc/distilbert) model from Hugging Face. It is one of the pretrained sentence transformer models available in OpenSearch that has shown one of the best results in benchmarking tests (for details, see [this blog](https://opensearch.org/blog/semantic-science-benchmarks/)). You'll need the name, version, and dimension of the model to register it. You can find this information in the [pretrained model table]({{site.url}}{{site.baseurl}}/ml-commons-plugin/pretrained-models/#sentence-transformers) by selecting the `config_url` link corresponding to the model's TorchScript artifact:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"some" instead of "one" of the best results. see this blog "post".


- The model name is `huggingface/sentence-transformers/msmarco-distilbert-base-tas-b`.
- The model version is `1.0.1`.
- The number of dimensions for this model is `768`.

#### Advanced: Using a different model

Alternatively, you can choose to use one of the [pretrained language models provided by OpenSearch]({{site.url}}{{site.baseurl}}/ml-commons-plugin/pretrained-models/) or your own custom model. For instructions on how to set up a custom model, see [ML framework]({{site.url}}{{site.baseurl}}/ml-commons-plugin/ml-framework/).
Alternatively, you can choose to use one of the [pretrained language models provided by OpenSearch]({{site.url}}{{site.baseurl}}/ml-commons-plugin/pretrained-models/) or your own custom model. For information about choosing a model, see [Further reading](#further-reading). For instructions on how to set up a custom model, see [ML framework]({{site.url}}{{site.baseurl}}/ml-commons-plugin/ml-framework/).

Take note of the dimensionality of the model because you'll need it when you set up a k-NN index.
{: .important}
Expand Down
Binary file added images/neural-search-ingestion.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/neural-search-query.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.