-
Notifications
You must be signed in to change notification settings - Fork 256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOCS-1186 AI Accelerator updates for 2.1.1 #6445
Open
djw-m
wants to merge
3
commits into
develop
Choose a base branch
from
DOCS-1186-ai-accelerator-2-1-1-release
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+384
−102
Open
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
104 changes: 104 additions & 0 deletions
104
...acy_docs/edb-postgres-ai/ai-accelerator/models/supported-models/completions.mdx
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,104 @@ | ||
--- | ||
title: "Completions" | ||
navTitle: "Completions" | ||
description: "Completions is a text completion model that enables use of any OpenAI API compatible text generation model." | ||
--- | ||
|
||
Model name: `completions` | ||
|
||
Model aliases: | ||
|
||
* `openai_completions` | ||
* `nim_completions` | ||
|
||
## About Completions | ||
|
||
Completions is a text completion model that enables use of any OpenAI API compatible text generation model. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since this is an interface for a common API format I would probably drop the first half of this sentence and say something like:
|
||
|
||
It is suitable for chat/text transforms, text completion, and other text generation tasks. | ||
|
||
Depending on the name of the model, the model provider will set defaults accordingly. | ||
|
||
When invoked as `completions` or `openai_completions`, the model provider will default to using the OpenAI API. | ||
|
||
When invoked as `nim_completions`, the model provider will default to using the NVIDIA NIM API. | ||
|
||
|
||
## Supported aidb operations | ||
|
||
* decode_text | ||
* decode_text_batch | ||
|
||
## Supported models | ||
|
||
* Any text generation model that is supported by the provider. | ||
|
||
## Supported OpenAI models | ||
|
||
See a list of supported OpenAI models [here](https://platform.openai.com/docs/models#models-overview). | ||
|
||
## Supported NIM models | ||
|
||
* [ibm/granite-guardian-3.0-8b](https://build.nvidia.com/ibm/granite-guardian-3_0-8b) | ||
* [ibm/granite-3.0-8b-instruct](https://build.nvidia.com/ibm/granite-3_0-8b-instruct) | ||
* [ibm/granite-3.0-3b-a800m-instruct](https://build.nvidia.com/ibm/granite-3_0-3b-a800m-instruct) | ||
* [meta/llama-3.3-70b-instruct](https://build.nvidia.com/meta/llama-3_3-70b-instruct) | ||
* [meta/llama-3.2-3b-instruct](https://build.nvidia.com/meta/llama-3.2-3b-instruct) | ||
* [meta/llama-3.2-1b-instruct](https://build.nvidia.com/meta/llama-3.2-1b-instruct) | ||
* [meta/llama-3.1-405b-instruct](https://build.nvidia.com/meta/llama-3_1-405b-instruct) | ||
* [meta/llama-3.1-70b-instruct](https://build.nvidia.com/meta/llama-3_1-70b-instruct) | ||
* [meta/llama-3.1-8b-instruct](https://build.nvidia.com/meta/llama-3_1-8b-instruct) | ||
* [meta/llama3-70b-instruct](https://build.nvidia.com/meta/llama3-70b) | ||
* [meta/llama3-8b-instruct](https://build.nvidia.com/meta/llama3-8b) | ||
* [nvidia/llama-3.1-nemotron-70b-instruct](https://build.nvidia.com/nvidia/llama-3_1-nemotron-70b-instruct) | ||
* [nvidia/llama-3.1-nemotron-51b-instruct](https://build.nvidia.com/nvidia/llama-3_1-nemotron-51b-instruct) | ||
* [nvidia/nemotron-mini-4b-instruct](https://build.nvidia.com/nvidia/nemotron-mini-4b-instruct) | ||
* [nvidia/nemotron-4-340b-instruct](https://build.nvidia.com/nvidia/nemotron-4-340b-instruct) | ||
* [google/shieldgemma-9b](https://build.nvidia.com/google/shieldgemma-9b) | ||
* [google/gemma-7b](https://build.nvidia.com/google/gemma-7b) | ||
* [google/codegemma-7b](https://build.nvidia.com/google/codegemma-7b) | ||
|
||
## Creating the default model | ||
|
||
There is no default model for Completions. You can create any supported model using the `aidb.create_model` function. | ||
|
||
## Creating an OpenAI model | ||
|
||
You can create any supported OpenAI model using the `aidb.create_model` function. | ||
|
||
In this example, we are creating a GPT-4o model with the name `my_openai_model`: | ||
|
||
```sql | ||
SELECT aidb.create_model( | ||
'my_openai_model', | ||
'openai_completions', | ||
'{"model": "gpt-4o"}'::JSONB, | ||
'{"api_key": "sk-abc123xyz456def789ghi012jkl345mn"}'::JSONB | ||
); | ||
``` | ||
|
||
## Creating a NIM model | ||
|
||
```sql | ||
SELECT aidb.create_model( | ||
'my_nim_completions', | ||
'nim_completions', | ||
'{"model": "meta/llama-3.2-1b-instruct"}'::JSONB, | ||
credentials=>'{"api_key": "sk-abc123xyz456def789ghi012jkl345mn"'::JSONB); | ||
``` | ||
|
||
## Model configuration settings | ||
|
||
The following configuration settings are available for OpenAI models: | ||
|
||
* `model` - The model to use. | ||
* `url` - The URL of the model to use. This is optional and can be used to specify a custom model URL. | ||
* If `openai_completions` (or `completions`) is the `model`, `url` defaults to `https://api.openai.com/v1/chat/completions`. | ||
* If `nim_completions` is the `model`, `url` defaults to `https://integrate.api.nvidia.com/v1/chat/completions`. | ||
* `max_concurrent_requests` - The maximum number of concurrent requests to make to the OpenAI model. Defaults to `25`. | ||
|
||
## Model credentials | ||
|
||
The following credentials are required for these models: | ||
|
||
* `api_key` - The API key to use for authentication. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
54 changes: 54 additions & 0 deletions
54
advocacy_docs/edb-postgres-ai/ai-accelerator/models/supported-models/nim_clip.mdx
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
--- | ||
title: "CLIP" | ||
navTitle: "CLIP" | ||
description: "CLIP (Contrastive Language-Image Pre-training) is a model that learns visual concepts from natural language supervision." | ||
--- | ||
|
||
Model name: `nim_clip` | ||
|
||
## About CLIP | ||
|
||
CLIP (Contrastive Language-Image Pre-training) is a model that learns visual concepts from natural language supervision. It is a zero-shot learning model that can be used for a wide range of vision and language tasks. | ||
|
||
This specific model runs on NVIDIA NIM. More information about CLIP on NIM can be found [here](https://build.nvidia.com/nvidia/nvclip). | ||
|
||
|
||
## Supported aidb operations | ||
|
||
* encode_text | ||
* encode_text_batch | ||
* encode_image | ||
* encode_image_batch | ||
|
||
## Supported models | ||
|
||
### NVIDIA NGC | ||
|
||
* nvidia/nvclip (default) | ||
|
||
|
||
## Creating the default model | ||
|
||
```sql | ||
SELECT aidb.create_model( | ||
'my_nim_clip_model', | ||
'nim_clip', | ||
credentials=>'{"api_key": "<API_KEY_HERE>"'::JSONB | ||
); | ||
``` | ||
|
||
There is only one model, the default `nvidia/nvclip`, so we do not need to specify the model in the configuration. | ||
|
||
## Model configuration settings | ||
|
||
The following configuration settings are available for CLIP models: | ||
|
||
* `model` - The NIM model to use. The default is `nvidia/nvclip` and is the only model available. | ||
* `url` - The URL of the model to use. This is optional and can be used to specify a custom model URL. Defaults to `https://integrate.api.nvidia.com/v1/embeddings`. | ||
* `dimensions` - Model output vector size, defaults to 1024 | ||
|
||
## Model credentials | ||
|
||
The following credentials are required if executing inside NVIDIA NGC: | ||
|
||
* `api_key` - The NVIDIA Cloud API key to use for authentication. |
48 changes: 48 additions & 0 deletions
48
...y_docs/edb-postgres-ai/ai-accelerator/models/supported-models/nim_reranking.mdx
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
--- | ||
title: "Reranking (NIM)" | ||
navTitle: "reranking" | ||
description: "Reranking is a method in text search that sorts results by relevance to make them more accurate." | ||
--- | ||
|
||
Model name: `nim_reranking` | ||
|
||
## About Reranking | ||
|
||
Reranking is a method in text search that sorts results by relevance to make them more accurate. It gives scores to documents using cross-attention mechanisms, improving the initial search results. | ||
|
||
## Supported aidb operations | ||
|
||
* rerank_text | ||
|
||
## Supported models | ||
|
||
### NVIDIA NGC | ||
|
||
* nvidia/llama-3.2-nv-rerankqa-1b-v2 (default) | ||
|
||
|
||
|
||
## Creating the default model | ||
|
||
```sql | ||
SELECT aidb.create_model( | ||
'my_nim_reranker', | ||
'nim_reranking', | ||
credentials=>'{"api_key": "<API_KEY_HERE>"'::JSONB | ||
); | ||
``` | ||
|
||
There is only one model, the default `nvidia/nvclip`, so we do not need to specify the model in the configuration. | ||
|
||
## Model configuration settings | ||
|
||
The following configuration settings are available for CLIP models: | ||
|
||
* `model` - The NIM model to use. The default is `nvidia/llama-3.2-nv-rerankqa-1b-v2` and is the only model available. | ||
* `url` - The URL of the model to use. This is optional and can be used to specify a custom model URL. Defaults to `https://ai.api.nvidia.com/v1/retrieval`. | ||
|
||
## Model credentials | ||
|
||
The following credentials are required if executing inside NVIDIA NGC: | ||
|
||
* `api_key` - The NVIDIA Cloud API key to use for authentication. |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we need to add a warning somewhere after this example about the pgvector limitation. Since it doesn't support storing over 2000 vectors. Otherwise, this example can be misleading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to remember how I got to 8192... will bring the local box up and see about shrinking it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code may indicate the maximum is 8192, but there. is a non-obvious technical limitation of 2000.