Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to use llm.Chat (OpenAI) with only one existing embedder for OpenAI #372

Closed
zivkovicn opened this issue Nov 29, 2023 · 9 comments
Closed

Comments

@zivkovicn
Copy link
Contributor

zivkovicn commented Nov 29, 2023

Description

code example (chat LLM through Azure) with vectorstore:

package main

import (
	"context"
	"os"

	openAIChatEmbedder "github.com/tmc/langchaingo/embeddings/openai/openaichat"
	"github.com/tmc/langchaingo/llms/openai"
	"github.com/tmc/langchaingo/vectorstores"
	"github.com/tmc/langchaingo/vectorstores/pinecone"
)

type Config struct {
	ApiKey          string // Azure OpenAI key 1 or (key 2)
	BaseUrl         string // Azure OpenAI base url
	Model           string // Azure OpenAI model
	EmbeddingsModel string // Azure OpenAI embeddings model
}

func main() {
	ctx := context.Background()
	config := Config{
		ApiKey:          "",
		BaseUrl:         "",
		Model:           "",
		EmbeddingsModel: "",
	}

	chatLLM, err := openai.NewChat(
		openai.WithAPIType(openai.APITypeAzure),
		openai.WithBaseURL(config.BaseUrl),
		openai.WithToken(config.ApiKey),
		openai.WithModel(config.Model),
		openai.WithEmbeddingModel(config.EmbeddingsModel),
	)
	if err != nil {
		print(err)
		os.Exit(1)
	}

	emb, err := openAIChatEmbedder.NewChatOpenAI(openAIChatEmbedder.WithClient(chatLLM)) // we cannot do this anymore
	if err != nil {
		print(err)
		os.Exit(1)
	}

	vectorDB, err := pinecone.New(ctx, pinecone.WithEmbedder(emb))
	if err != nil {
		print(err)
		os.Exit(1)
	}

	retriever := vectorstores.ToRetriever(
		vectorDB,
		1,
		vectorstores.WithNameSpace("namespace"),
		vectorstores.WithEmbedder(emb), // even here we could pass in the embedder separately
	)
	
        all other logic here not that important
	....
}

this code above shows how we can use llm.Chat with it's embedder.

The whole handicap is that now, without "github.com/tmc/langchaingo/embeddings/openai/openaichat" or the same for vertexai we cannot create the the embedder needed to be used in the llm.Chat or to pass it via WithClient llm option.

@tmc @eliben, continuing the discussion from the merged PR above.

This is the main blocker for us/me and whoever wants to use the chat llm.

@eliben
Copy link
Collaborator

eliben commented Nov 29, 2023

A chat just defers to its underlying client to do an embedding: https://github.com/tmc/langchaingo/blob/51a3a0a0f54a/llms/openai/openaillm_chat.go#L127

There's a huge amount of duplication between Chat and LLM already.

There are multiple approaches to solve this without adding even more duplication. For example, we could add an interface for types that provide CreateEmbedding, and then the code in embeddings could use that. How does that sound? I can try to prototype something later today or tomorrow.

@eliben
Copy link
Collaborator

eliben commented Nov 29, 2023

... another option is just to let each Chat expose its underlying Client, and then that can be used for the embeddings package.

@zivkovicn
Copy link
Contributor Author

yeah, that could resolve the issue, we should revert back the PR #362, then once we have better approach to satisfy the needs, we could do the clean up (and remove the duplication), b/c at the moment all people that were using llm.Chat are blocked for updates for the newer changes from the lib. cc @tmc

@tmc
Copy link
Owner

tmc commented Nov 29, 2023

I can help with this a bit later today, if we don't have a simple enough forward fix I'm open to reverting for now.

@eliben
Copy link
Collaborator

eliben commented Nov 30, 2023

PTAL at #374 - this is a sketch of what I mean. It makes the EmbedderClient interface public (we can still keep it in internal later) and adds NewEmbedder which takes EmbedderClient and returns a type that itself implements Embedder.

Both LLM and Chat implement EmbedderClient implicitly, so @zivkovicn you can take a Chat you created with openai.NewChat and pass it to NewEmbedder. Its return value can then be passed into pinecone.WithEmbedder.

Please try it and let me know if it works for you.


This approach can help reduce a lot of duplication that currently exists in the embeddings package; happy to get to that once we hash out the basic design, which is using interfaces to decouple things. What the embeddings package does is really take lower level clients that have CreateEmbedding and adorns them with some additional options (like stripping newlines and batching) to produce implementations of the more full-featured Embedder interface. #374 implements this with interfaces instead of duplicating the same code over and over and over.

@zivkovicn
Copy link
Contributor Author

@eliben yeah this is great! thanks for putting this together! @tmc do you mind reviewing #374 and merging! IMO this abstraction solves bot problems (usage of Chat llm and duplication).

@tmc
Copy link
Owner

tmc commented Nov 30, 2023

Merged! Can we consider this resolved?

@zivkovicn
Copy link
Contributor Author

I think that we are good. Will resolve this issue. Thanks both!

@eliben
Copy link
Collaborator

eliben commented Nov 30, 2023

SGTM, I created #379 to track the followup cleanup and refactoring work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants