diff --git a/docs/context-segmentation.png b/docs/context-segmentation.png new file mode 100644 index 0000000..f7351c1 Binary files /dev/null and b/docs/context-segmentation.png differ diff --git a/docs/initial-schema-scope.md b/docs/old/initial-schema-scope.md similarity index 97% rename from docs/initial-schema-scope.md rename to docs/old/initial-schema-scope.md index 02aa65c..5f0d927 100644 --- a/docs/initial-schema-scope.md +++ b/docs/old/initial-schema-scope.md @@ -1,5 +1,7 @@ # Initial Schema Scope +This is the initial scope for AgencyAI. This project reuses the codebase for AgencyAI, so this documentation may still be relevant. + This is a SaaS app which is a simple UI for managing and building multiagent AI systems. Here is a rough outline of the how the app works technically: diff --git a/docs/message-management-scope.md b/docs/old/message-management-scope.md similarity index 95% rename from docs/message-management-scope.md rename to docs/old/message-management-scope.md index 0490155..85163d2 100644 --- a/docs/message-management-scope.md +++ b/docs/old/message-management-scope.md @@ -1,5 +1,7 @@ # Message Management +This is the initial scope for AgencyAI. This project reuses the codebase for AgencyAI, so this documentation may still be relevant. + This is a UI for building and managing multiagent AI. There are multiple agents which will be talking to each other. diff --git a/docs/rag-with-memory.png b/docs/rag-with-memory.png new file mode 100644 index 0000000..c1950ef Binary files /dev/null and b/docs/rag-with-memory.png differ diff --git a/docs/setup.md b/docs/setup.md new file mode 100644 index 0000000..2415123 --- /dev/null +++ b/docs/setup.md @@ -0,0 +1,52 @@ +# Setup + +0. Open this `readme.md` +1. `yarn` +2. `cp src/backend/secrets/.env.defaults src/backend/secrets/.env` +3. Add environment variables to `.env` + 1. Use `openssl rand -base64 32` to generate secrets. +4. `npm run clean-sql` +5. `npm start` +6. Mark `./build`, `./src/backend/build`, `./src/common/build`, `./src/web/build` as excluded from IDE analysis. +7. Mark `./src/build/secrets` as excluded from IDE analysis. + +### Deploying + +1. Set your custom domain name in `serverless.yml`. +2. Create a certificate for your domain in [AWS Certificate Manager](https://us-east-1.console.aws.amazon.com/acm/home). +3. `npm run deploy` + +### Installing MySQL (MacOS) + +1. `brew install mysql` + +### rebuild-aio + +`rebuild-aio` is used for several purposes: +* Building and bundling files into a build folder or artifact which will be zipped and uploaded when deployed. +* Files can be transpiled during the build process using a custom `scripts/transformer.js` script. +* During local development, it will monitor file changes, rebuild, and restart the app. +* It handles graceful restarts and clean shutdowns. See [`rebuild-aio`](https://www.npmjs.com/package/rebuild-aio) documentation for more details. +* It allows multiple entry points which is useful for starting the main app, but also potentially generating other files. + +### Entry Points + +There are multiple entry points: +* serverlessServer.js - The main entry point when the app is deployed. +* localServer.js - The main entry point during local development. +* generateSql.js - Used for generating a `.sql` script for creating tables on fresh startup. +* generateSchema.js - Generates a `schema.graphql` file in the monorepo root so that IDE tools can be aware of the schema. +* mysql/localClean.js - This deletes the database so that the next start is fresh. + +It is worth noting that the generateSchema.js entry point does not cause the MySQL database to start running. +In general, it should be possible to import the executable schema from schema.js without causing the database to start. +The database only starts whenever the first DB query is made. +When the server.js entry points are used, one of the first things they do is import `migrate.js`, and this is the first DB query. + +## Workflows + +### Changing the schema + +### Creating a resolver + +### Creating a MySQL query diff --git a/readme.md b/readme.md index 2415123..2946ea4 100644 --- a/readme.md +++ b/readme.md @@ -1,52 +1,52 @@ -# Setup +# RAG with Memory -0. Open this `readme.md` -1. `yarn` -2. `cp src/backend/secrets/.env.defaults src/backend/secrets/.env` -3. Add environment variables to `.env` - 1. Use `openssl rand -base64 32` to generate secrets. -4. `npm run clean-sql` -5. `npm start` -6. Mark `./build`, `./src/backend/build`, `./src/common/build`, `./src/web/build` as excluded from IDE analysis. -7. Mark `./src/build/secrets` as excluded from IDE analysis. +This project uses a RAG technique to implement long term memory. There is also an implementation for short term memory. Together, these enable a chatbot which only requires a single chat window for interaction. There are no new chats. Users can simply open the app and begin talking. -### Deploying +## Use Cases -1. Set your custom domain name in `serverless.yml`. -2. Create a certificate for your domain in [AWS Certificate Manager](https://us-east-1.console.aws.amazon.com/acm/home). -3. `npm run deploy` +This is useful as a convenience in some cases, and in others, it may be crucial. -### Installing MySQL (MacOS) +### No Chat Management -1. `brew install mysql` +Some chatbots have the ability to allow users to create new chats. Each chat is independent of the other chats, and this can be useful to prevent the chatbot from getting confused with irrelevant informtion. However, this kind of feature requires the end user to spend time doing chat management. They might have a chat which is dedicated to a certain topic, and they need to dig through past chats to find it. Chat management is a bookkeeping problem. -### rebuild-aio +As a convenience, a chatbot which only requires a single chat window for interaction is useful. Chat management is no longer a concern for the end user. This can be seen as offloading the bookkeeping work to the AI system. -`rebuild-aio` is used for several purposes: -* Building and bundling files into a build folder or artifact which will be zipped and uploaded when deployed. -* Files can be transpiled during the build process using a custom `scripts/transformer.js` script. -* During local development, it will monitor file changes, rebuild, and restart the app. -* It handles graceful restarts and clean shutdowns. See [`rebuild-aio`](https://www.npmjs.com/package/rebuild-aio) documentation for more details. -* It allows multiple entry points which is useful for starting the main app, but also potentially generating other files. +The benefit new chats provide, that is, starting with a fresh context to avoid confusing the bot, can still be achieved with a single chat window. The user could simply say, "Let's change topics", and the memory system will automatically adjust the context appropriately. -### Entry Points +### Smart Speakers -There are multiple entry points: -* serverlessServer.js - The main entry point when the app is deployed. -* localServer.js - The main entry point during local development. -* generateSql.js - Used for generating a `.sql` script for creating tables on fresh startup. -* generateSchema.js - Generates a `schema.graphql` file in the monorepo root so that IDE tools can be aware of the schema. -* mysql/localClean.js - This deletes the database so that the next start is fresh. +For smart speaker devices, there is no UI the user can access to find what they are looking for. Providing a means for interaction in a way that is natural for the user is crucial for enabling the AI with services such as knowledge base management, brainstorming, and providing tailored suggestions. -It is worth noting that the generateSchema.js entry point does not cause the MySQL database to start running. -In general, it should be possible to import the executable schema from schema.js without causing the database to start. -The database only starts whenever the first DB query is made. -When the server.js entry points are used, one of the first things they do is import `migrate.js`, and this is the first DB query. +## Privacy: Open Source and Edge Computing -## Workflows +The ability for long term memory systems in AI to collect user data is a privacy concern. A proprietary system which collects mass amounts of user data over many years could be a gold mine for a company but a privacy nightmare for the user, allowing it to do things such as perform sentiment analysis, ad targeting, health predictions, or imitation. -### Changing the schema +An option for users of AI to be able to use AI without having to worry about their data being collected is to use an open source AI system which runs on edge devices. This is a system which runs on the user's device, and the user has full control over the data. The AI system is privacy aligned and careful to not send personal information to a server, and the user can see the source code and verify that the AI system is not distributing data. -### Creating a resolver +## Short Term Memory -### Creating a MySQL query +By giving the chatbot short term, limitations imposed by token limit can be overcome. As a comparison, an implemention of a chatbot that uses a sliding context window where past messages are simply truncated causes the bot to forget the beginning of the conversation, leading to innacurate responses. Important instructions established at the beginning of the conversation may be lost, leading the AI to suddenly behave as if it had never received those instructions. + +Here, a sliding window is still used, but truncated messages are summarized into a section of the context window dedicated to the short term memory summary. If we can fine-tune a model to perform short term memory summarization well, then the AI will be able to remember the important details or instructions of the cnversation, even if the messages where those details were initally introduces have been truncated. + +## Long Term Memory + +To support features such as knowledge base management, brainstorming, and providing tailored suggestions, a long term memory system is needed. This is a system which can remember information from past conversations and use that information to inform future conversations. + +The RAG technique is used to implement long term memory. All user inputs are annotated with vector embeddings, and then when a new user input is received, a section in the context window dedicated to long term memory is updated with relevant information. This section is then used to inform the AI's response. + +## Chat Iteration + +A single chat iteration is a loop starting with the user inputing a new message, and ending with the AI outputing a response. + +When a new message is sent to the chatbot, it will: +1. Store in chat logs +2. Prepare the context window +3. Generate a response + +A background process will eventually annotate the newly stored chat message with vector embeddings, so that long term retrieval will have eventual access. + +![Context Segmentation](/docs/context-segmentation.png) + +![RAG with Memory](/docs/rag-with-memory.png) diff --git a/src/backend/src/agent/AgentMind.js b/src/backend/src/agent/AgentMind.js index b22bb5d..84d5936 100644 --- a/src/backend/src/agent/AgentMind.js +++ b/src/backend/src/agent/AgentMind.js @@ -35,9 +35,9 @@ import type { MessageSQL, } from '../schema/Message/MessageSchema.js' import MessageInterface from '../schema/Message/MessageInterface.js' -import type { GPTMessage } from '../rest/ChatGPTRest.js' -import ChatGPTRest from '../rest/ChatGPTRest.js' -import type { ChatCompletionsResponse } from '../rest/ChatGPTRest.js' +import type { GPTMessage } from '../rest/InferenceRest.js' +import InferenceRest from '../rest/InferenceRest.js' +import type { ChatCompletionsResponse } from '../rest/InferenceRest.js' import { MessageRole, MessageType } from '../schema/Message/MessageSchema.js' import type { AppendMessageOutput, @@ -47,12 +47,12 @@ import AgentInterface from '../schema/Agent/AgentInterface.js' import AgentConversationInterface from '../schema/AgentConversation/AgentConversationInterface.js' import { getCallbacks } from '../websocket/callbacks.js' import { dequeue, enqueue } from './mindQueue.js' +import type { UserSQL } from '../schema/User/UserSchema.js' export default class AgentMind { static chatIteration( // These inputs should be verified for ownership already: - userId: string, - openAiKey: string, + user: UserSQL, agencyId: number, currentAgentVersionId: number, agencyConversationId: string, @@ -62,14 +62,10 @@ export default class AgentMind { onMessageFromAgent: (output: AppendMessageOutput) => any, onUpdateMessage: (output: UpdateMessageOutput) => any, ): void { + const userId = user.userId + Promise.resolve().then(async () => { try { - // console.debug('chatIteration') - - // todo: whenever an agent receives a new message, - // if it is already in the middle of a chatIteration, - // then the incoming message is queued. - const agents = await AgentInterface.getAll(agencyId) const agentMap: { [agentVersionId: number]: AgentSQL } = {} let managerAgent @@ -89,186 +85,135 @@ export default class AgentMind { throw new Error('Agent not found.') } + // See readme.md for an overview of the chat iteration process. + // Get all messages for this conversation: const startingMessages = await MessageInterface.getAll( agentConversationId, ) - /* - The following section contains extra messages we append to the end of every user message. - This can be usful for nullifying any instruction by the end user to change behavior, - and also to reinforce desired behavior. - * */ - // This extra message was needed to get gpt-3.5-turbo to start responding in JSON: - // originalGptMessages.push({ - // role: 'user', - // content: 'Respond in JSON format.', - // }) - - const toList = await generateToListWithRetry( - openAiKey, - currentAgent.model, - startingMessages, - agentConversationId, + const context: Array = [] + + let newIterationWillBeStarted = false + + const envokeMind: ?EnvokeMind = await generateResponse( + user, + agencyId, currentAgent, managerAgent, - agentMap, - 3, - ) - - const toListMessage = await MessageInterface.insert( - currentAgent.agentId, + agencyConversationId, agentConversationId, - MessageRole.SYSTEM, - { - fromAgentId: currentAgent.versionId, - text: JSON.stringify(toList), + context, + (output: AppendMessageOutput) => onMessageFromAgent(output), + (output: UpdateMessageOutput) => { + onUpdateMessage(output) }, - ) - - const messagesWithToList = [...startingMessages, toListMessage] - - let newIterationWillBeStarted = false - // Create a blank Message row for each agent in the toList. - for (const nextAgentVersionId of toList.to) { - if (nextAgentVersionId === currentAgent.versionId) { - // Agent's aren't supposed to send messages to themselves. - console.error('Agent is sending a message to itself.') - // todo: send this error to the agent as a system message. Maybe the agent will learn not to repeat this mistake. - continue - } - if (nextAgentVersionId !== null && !agentMap[nextAgentVersionId]) { - console.error('Agent is sending an agent that does not exist.') - continue - } - - // $FlowFixMe - console.log( - `Building response: ${currentAgent.versionId} -> ${ - nextAgentVersionId ?? 'end user' - }`, - ) - - // For each blank message, each going to a different party, - // stream a completion for the content of the response. - - const envokeMind: ?EnvokeMind = await generateResponse( - openAiKey, - agencyId, - currentAgent, - managerAgent, - agencyConversationId, - agentConversationId, - messagesWithToList, - nextAgentVersionId, - (output: AppendMessageOutput) => onMessageFromAgent(output), - (output: UpdateMessageOutput) => { - onUpdateMessage(output) - }, - ).then(async (response: MessageSQL): Promise => { - const nextPrompt = JSON.parse(response.data.text).text - - if (nextAgentVersionId === null) { - console.log('Response to end user.') - } - - // When an iteration is complete, we need to start a new iteration for each agent in the toList. - if (nextAgentVersionId !== null) { - const nextAgent = agentMap[nextAgentVersionId] - if (!nextAgent) { - console.error( - `Unable to find agent given (agencyId, nextAgentVersionId): (${agencyId}, ${nextAgentVersionId})`, - ) - return - } - - const nextAgentConversation = - await AgentConversationInterface.get( - agencyConversationId, - nextAgent.agentId, - ) - if (!nextAgentConversation) { - console.error( - `Unable to find agentConversation for agent ${nextAgent.versionId}`, - ) - return - } - - // Send nextPrompt to next agent as a GetToList message. - const userMessageData: MessageData = { - fromAgentId: currentAgent.versionId, - toAgentId: nextAgentVersionId, - text: JSON.stringify({ - type: MessageType.GetToList, - messages: [ - { - from: currentAgent.versionId, - text: nextPrompt, - }, - ], - }), - } - const userMessage: MessageSQL = await MessageInterface.insert( - nextAgent.agentId, - nextAgentConversation.agentConversationId, - // When an agent sends a message to another agent, it is a user message. - // Responses from agents are assistant messages. - MessageRole.USER, - userMessageData, - ) - - await MessageInterface.linkMessages( - response.messageId, - userMessage.messageId, - ) - userMessage.linkedMessageId = response.messageId - response.linkedMessageId = userMessage.messageId - - // Update the client for the newly linkedMessageId: - onUpdateMessage({ - agencyId, - chatId: agencyConversationId, - managerAgentId: managerAgent.agentId, - agentId: currentAgent.agentId, - message: response, - }) - - // Send this message to the client so that it can be displayed. - const output: AppendMessageOutput = { - agencyId: agencyId, - chatId: agencyConversationId, - managerAgentId: managerAgent.agentId, - message: userMessage, - } - onMessageFromAgent(output) - - console.log( - `Queued message: ${currentAgent.versionId} -> ${nextAgentVersionId}`, - ) - return () => - AgentMind.chatIteration( - userId, - openAiKey, - agencyId, - nextAgentVersionId, - agencyConversationId, - nextAgentConversation.agentConversationId, - nextPrompt, - onMessageFromAgent, - onUpdateMessage, - ) - } - }) - if (envokeMind) { - await enqueue(envokeMind) - newIterationWillBeStarted = true - } - } - - const nextEnvocation = await dequeue() - if (nextEnvocation) { - nextEnvocation() - } + ).then(async (response: MessageSQL): Promise => { + // The following is stuff to do after the response has been generated. + // It was from AgencyAI, where agents can talk to other agents. + // This is not needed for this project. + // const nextPrompt = JSON.parse(response.data.text).text + // if (nextAgentVersionId === null) { + // console.log('Response to end user.') + // } + // When an iteration is complete, we need to start a new iteration for each agent in the toList. + // if (nextAgentVersionId !== null) { + // const nextAgent = agentMap[nextAgentVersionId] + // if (!nextAgent) { + // console.error( + // `Unable to find agent given (agencyId, nextAgentVersionId): (${agencyId}, ${nextAgentVersionId})`, + // ) + // return + // } + // + // const nextAgentConversation = + // await AgentConversationInterface.get( + // agencyConversationId, + // nextAgent.agentId, + // ) + // if (!nextAgentConversation) { + // console.error( + // `Unable to find agentConversation for agent ${nextAgent.versionId}`, + // ) + // return + // } + // + // // Send nextPrompt to next agent as a GetToList message. + // const userMessageData: MessageData = { + // fromAgentId: currentAgent.versionId, + // toAgentId: nextAgentVersionId, + // text: JSON.stringify({ + // type: MessageType.GetToList, + // messages: [ + // { + // from: currentAgent.versionId, + // text: nextPrompt, + // }, + // ], + // }), + // } + // const userMessage: MessageSQL = await MessageInterface.insert( + // nextAgent.agentId, + // nextAgentConversation.agentConversationId, + // // When an agent sends a message to another agent, it is a user message. + // // Responses from agents are assistant messages. + // MessageRole.USER, + // userMessageData, + // ) + // + // await MessageInterface.linkMessages( + // response.messageId, + // userMessage.messageId, + // ) + // userMessage.linkedMessageId = response.messageId + // response.linkedMessageId = userMessage.messageId + // + // // Update the client for the newly linkedMessageId: + // onUpdateMessage({ + // agencyId, + // chatId: agencyConversationId, + // managerAgentId: managerAgent.agentId, + // agentId: currentAgent.agentId, + // message: response, + // }) + // + // // Send this message to the client so that it can be displayed. + // const output: AppendMessageOutput = { + // agencyId: agencyId, + // chatId: agencyConversationId, + // managerAgentId: managerAgent.agentId, + // message: userMessage, + // } + // onMessageFromAgent(output) + // + // console.log( + // `Queued message: ${currentAgent.versionId} -> ${nextAgentVersionId}`, + // ) + // return () => + // AgentMind.chatIteration( + // user, + // agencyId, + // nextAgentVersionId, + // agencyConversationId, + // nextAgentConversation.agentConversationId, + // nextPrompt, + // onMessageFromAgent, + // onUpdateMessage, + // ) + // } + }) + + // The following envokeMind queueing is used to start the next iteration, + // which was something that was used in AgencyAI to enable back and forth communication + // between agents. This is not needed in the current implementation. + // if (envokeMind) { + // await enqueue(envokeMind) + // newIterationWillBeStarted = true + // } + // const nextEnvocation = await dequeue() + // if (nextEnvocation) { + // nextEnvocation() + // } // todo: When using Lambdas and SQS, // the connection back to the client is held by a master server, @@ -285,11 +230,6 @@ export default class AgentMind { } } -type ToList = { - type: 'ToList', - to: Array, -} - class RetryError extends Error { constructor(message: any) { super(message) @@ -297,196 +237,29 @@ class RetryError extends Error { } } -async function generateToListWithRetry( - openAiKey: string, - model: string, - messages: Array, - agentConversationId: string, - currentAgent: AgentSQL, - managerAgent: AgentSQL, - agentMap: { [agentId: number]: AgentSQL }, - retryCount: number, -): Promise { - if (retryCount <= 0) { - throw new Error('Retry count exceeded.') - } - - try { - return await generateToList( - openAiKey, - model, - messages, - currentAgent, - managerAgent, - agentMap, - ) - } catch (err) { - console.error(err) - - if (err instanceof RetryError) { - console.debug('retrying') - - const systemMessageData: MessageData = { - correctionInstruction: true, - text: err.message, - } - - const systemMessage = await MessageInterface.insert( - currentAgent.agentId, - agentConversationId, - MessageRole.SYSTEM, - systemMessageData, - ) - - return await generateToListWithRetry( - openAiKey, - model, - [...messages, systemMessage], - agentConversationId, - currentAgent, - managerAgent, - agentMap, - retryCount - 1, - ) - } - - throw err - } -} - -async function generateToList( - openAiKey: string, - model: string, - messages: Array, - currentAgent: AgentSQL, - managerAgent: AgentSQL, - agentMap: { [agentId: number]: AgentSQL }, -): Promise { - const isCurrentManager = currentAgent.agentId === managerAgent.agentId - - // Format messages for /chat/completions: - const gptMessages: Array = messages.map((m) => ({ - role: m.role.toLowerCase(), - content: m.data.text, - })) - - const lastMessage = gptMessages.findLast( - (message) => message.role !== 'system', - ) - if (!lastMessage) { - throw new Error('No last non-system message found.') - } - const lastMessageContent = JSON.parse(lastMessage.content) - if (lastMessageContent.type !== MessageType.GetToList) { - throw new Error('The last message must be a GetToList message.') - } - - console.debug('generate ToList') - const res = await ChatGPTRest.chatCompletion(openAiKey, model, gptMessages) - - const finishReason = res.choices[0]?.finish_reason - - if (finishReason === 'length') { - throw new Error('The maximum number of tokens was reached.') - } else if (finishReason === 'content_filter') { - throw new Error('A content filter flag was raised.') - } else if (finishReason === 'tool_calls') { - throw new Error('Tool calls are not yet implemented.') - } else if (finishReason !== 'stop') { - throw new Error('Unknown finish reason.') - } - - const text = res.choices[0]?.message?.content || '' - - console.log('ToList:', text) - - let parsedText - try { - parsedText = JSON.parse(text) - } catch (err) { - console.error(text) - throw new RetryError('The response could not be parsed into JSON.') - } - - if (parsedText.type !== MessageType.ToList) { - console.error(text) - throw new RetryError(`The response must be a ToList.`) - } - - if (!Array.isArray(parsedText.to)) { - console.error(text) - throw new RetryError(`The response must include a "to" array.`) - } - - for (const agentId of parsedText.to) { - if (agentId !== null && typeof agentId !== 'number') { - console.error(text) - throw new RetryError(`The "to" array must contain only numbers or null.`) - } - - if (agentId !== null && !agentMap[agentId]) { - throw new RetryError( - `The "to" array contained an agentId that does not exist.`, - ) - } - - if (!isCurrentManager && agentId === null) { - throw new RetryError( - `The "to" array contained null, signifying the intent to send a message to the end user, but only the manager agent can do this.`, - ) - } - } - - return parsedText -} - function generateResponse( - openAiKey: string, + user: UserSQL, agencyId: number, agent: AgentSQL, managerAgent: AgentSQL, agencyConversationId: string, agentConversationId: string, - messages: Array, - forAgentVersionId: number | null, + context: Array, onAppendMessage: (output: AppendMessageOutput) => any, onUpdateMessage: (output: UpdateMessageOutput) => any, ): Promise { return new Promise(async (resolve, reject) => { try { - const getResponseMessageData: MessageData = { - internalInstruction: true, - text: JSON.stringify({ - type: 'GetResponse', - for: forAgentVersionId, - }), - } - - const getResponseMessage = await MessageInterface.insert( - agent.agentId, - agentConversationId, - MessageRole.SYSTEM, - getResponseMessageData, - ) - - // Format messages for /chat/completions: - const gptMessages: Array = [ - ...messages, - getResponseMessage, - ].map((m) => ({ - role: m.role.toLowerCase(), - content: m.data.text, - })) - - const intro = `{"type":"Response","text":"` - - // MessageData uses the versionId rather than the agentId + // In this mod of AgencyAI, all messages go to the end user. + // There is also no JSON mode. const responseMessageData: MessageData = { fromAgentId: agent.versionId, - toAgentId: forAgentVersionId === null ? null : forAgentVersionId, - toApi: forAgentVersionId === null, - text: intro + `"}`, + toAgentId: null, + toApi: true, + text: '', } + + // The response is already added to the database before streaming starts as an empty message. const responseMessage = await MessageInterface.insert( agent.agentId, agentConversationId, @@ -494,6 +267,8 @@ function generateResponse( responseMessageData, ) responseMessage.id = responseMessage.messageId.toString() + + // Similarly the response is sent to the client as an empty message to start. const output: AppendMessageOutput = { agencyId, chatId: agencyConversationId, @@ -518,20 +293,19 @@ function generateResponse( timeout = setTimeout(() => { // todo: If timeout, // then retry without system message. - // But maybe this isn't needed because ChatGPTRest.relayChatCompletionStream already does 3 retries. + // But maybe this isn't needed because InferenceRest.relayChatCompletionStream already does 3 retries. stop = true const error = new Error('Event stream timeout.') reject(error) - }, 10000) // 10 seconds + }, 30000) // 30 seconds } startTimeout() // Send a /chat/completions call - ChatGPTRest.relayChatCompletionStream( - openAiKey, - agent.model, - gptMessages, + InferenceRest.relayChatCompletionStream( + user, + context, (res: ChatCompletionsResponse) => { if (stop) { return @@ -559,18 +333,21 @@ function generateResponse( throw error } - // todo: check if finish_reason=stop always has an empty delta. Do not send a token update if it does. + // todo: check if finish_reason=stop always has an empty delta. + // Do not send a token update if it does as this would be redundant. const text = res?.choices[0]?.delta?.content || '' buffer += text - if (buffer.length < intro.length) { - // The buffer is not long enough to contain the intro. - return - } + // Used for JSON mode: + // if (buffer.length < intro.length) { + // // The buffer is not long enough to contain the intro. + // return + // } + // const autocompletion = JSON.stringify(parseIncompleteJSON(buffer)) - const autocompletion = JSON.stringify(parseIncompleteJSON(buffer)) + const autocompletion = buffer const messageChanged = autocompletion !== previousAutocompletion previousAutocompletion = autocompletion diff --git a/src/backend/src/mutation/CreateAgentResolver.js b/src/backend/src/mutation/CreateAgentResolver.js index 661bc6f..f92936c 100644 --- a/src/backend/src/mutation/CreateAgentResolver.js +++ b/src/backend/src/mutation/CreateAgentResolver.js @@ -3,7 +3,7 @@ import type { InstructionSQL } from '../schema/Instruction/InstructionSchema.js' import gql from 'graphql-tag' import { unpackSession } from '../utils/Token.js' -import ChatGPTRest from '../rest/ChatGPTRest.js' +import InferenceRest from '../rest/InferenceRest.js' import UserInterface from '../schema/User/UserInterface.js' import AgentInterface from '../schema/Agent/AgentInterface.js' import InstructionInterface from '../schema/Instruction/InstructionInterface.js' diff --git a/src/backend/src/mutation/TestOpenAiKeyResolver.js b/src/backend/src/mutation/TestOpenAiKeyResolver.js index bafc5ae..2405fd6 100644 --- a/src/backend/src/mutation/TestOpenAiKeyResolver.js +++ b/src/backend/src/mutation/TestOpenAiKeyResolver.js @@ -2,7 +2,7 @@ import gql from 'graphql-tag' import { unpackSession } from '../utils/Token.js' -import ChatGPTRest from '../rest/ChatGPTRest.js' +import InferenceRest from '../rest/InferenceRest.js' type TestOpenAiKeyInput = { sessionToken: string, @@ -34,7 +34,7 @@ export async function resolver( if (openAiKey) { // If this call does not error out, then the key is good. - const models = await ChatGPTRest.getAvailableModels(openAiKey) + const models = await InferenceRest.getAvailableModels(openAiKey) } return { diff --git a/src/backend/src/mutation/UpdateSettingsResolver.js b/src/backend/src/mutation/UpdateSettingsResolver.js index 30a6c90..a89f4f9 100644 --- a/src/backend/src/mutation/UpdateSettingsResolver.js +++ b/src/backend/src/mutation/UpdateSettingsResolver.js @@ -8,7 +8,7 @@ import nonMaybe from 'non-maybe' import Phone from '../utils/Phone.js' import Security from '../utils/Security.js' import validator from 'validator' -import ChatGPTRest from '../rest/ChatGPTRest.js' +import InferenceRest from '../rest/InferenceRest.js' import createViewer from '../utils/createViewer.js' import Config from 'common/src/Config.js' import Email from '../rest/Email.js' diff --git a/src/backend/src/mysql/database.js b/src/backend/src/mysql/database.js index 6eaa4e1..4f72e24 100644 --- a/src/backend/src/mysql/database.js +++ b/src/backend/src/mysql/database.js @@ -37,7 +37,7 @@ async function query(queryObject: { await importPool() } - console.debug(mysql.format(queryObject.sql, queryObject.values)) + // console.debug(mysql.format(queryObject.sql, queryObject.values)) const res = await new Promise((resolve, reject) => { _pool.query(queryObject, (err, res) => { diff --git a/src/backend/src/rest/ChatGPTRest.js b/src/backend/src/rest/InferenceRest.js similarity index 81% rename from src/backend/src/rest/ChatGPTRest.js rename to src/backend/src/rest/InferenceRest.js index cda56b5..a3b6478 100644 --- a/src/backend/src/rest/ChatGPTRest.js +++ b/src/backend/src/rest/InferenceRest.js @@ -3,6 +3,7 @@ import axios from 'axios' import parseAxiosError from '../utils/parseAxiosError.js' import Config from 'common/src/Config.js' +import type { UserSQL } from '../schema/User/UserSchema.js' export type GPTMessage = { role: string, @@ -59,7 +60,7 @@ export const standardModels = [ // 'gpt-4', ] -export default class ChatGPTRest { +export default class InferenceRest { static async getAvailableModels(authToken: ?string): Promise> { if (!authToken) return [] @@ -123,22 +124,25 @@ export default class ChatGPTRest { } static relayChatCompletionStream( - authToken: string, - model: string, + user: UserSQL, messages: Array, onData: (data: ChatCompletionsResponse) => any, onError?: () => any, - options?: ?{ - responseFormat?: 'json_object' | 'text', - }, ): void { - const responseFormat = canUseJSON(model) - ? { - type: options?.responseFormat ?? 'json_object', - } - : { - type: options?.responseFormat || 'text', - } + const apiBase = user.inferenceServerConfig?.apiBase + const apiKey = user.inferenceServerConfig?.apiKey + + if (!apiBase) { + throw new Error('apiBase is required') + } + + // const responseFormat = canUseJSON(model) + // ? { + // type: options?.responseFormat ?? 'json_object', + // } + // : { + // type: options?.responseFormat || 'text', + // } // console.debug('GPT', messages[messages.length - 1]) @@ -148,26 +152,33 @@ export default class ChatGPTRest { // console.debug('start openai relay') + let headers = {} + if (apiKey) { + headers = { + Authorization: `Bearer ${apiKey}`, + } + } + + // todo: As long as completionOptions is configured by the end user, + // it should be safe to pass them through without checking what they are. + const data: any = user.inferenceServerConfig?.completionOptions ?? {} + data.messages = messages + data.stream = true + makeRequestWithRetry( () => axios({ method: 'POST', - url: 'https://api.openai.com/v1/chat/completions', - headers: { - Authorization: `Bearer ${authToken}`, - }, - data: { - model, - messages, - response_format: responseFormat, - stream: true, - }, + url: `${apiBase}/v1/chat/completions`, + headers, + data, responseType: 'stream', timeout: 10000, }) .then((response) => { response.data.on('data', (chunk) => { const data = chunk.toString() + dataLog.push(data) buffer += data @@ -233,34 +244,37 @@ export default class ChatGPTRest { } static async chatCompletion( - authToken: string, - model: string, + user: UserSQL, messages: Array, - options?: ?{ - responseFormat?: 'json_object' | 'text', - }, ): Promise { - const responseFormat = canUseJSON(model) - ? { - type: options?.responseFormat ?? 'json_object', - } - : { - type: options?.responseFormat || 'text', - } + const apiBase = user.inferenceServerConfig?.apiBase + const apiKey = user.inferenceServerConfig?.apiKey + + if (!apiBase) { + throw new Error('apiBase is required') + } + + // const responseFormat = canUseJSON(model) + // ? { + // type: options?.responseFormat ?? 'json_object', + // } + // : { + // type: options?.responseFormat || 'text', + // } // console.debug('GPT', messages[messages.length - 1]) + // todo: As long as completionOptions is configured by the end user, + // it should be safe to pass them through without checking what they are. + const data: any = user.inferenceServerConfig?.completionOptions ?? {} + data.messages = messages + data.stream = false + const res = await send( 'POST', - 'https://api.openai.com/v1/chat/completions', - { - model, - messages, - response_format: responseFormat, - // frequency_penalty: -0.1, - // presence_penalty: -0.1 - }, - authToken, + `${apiBase}/v1/chat/completions`, + data, + apiKey, ) if (res.error) { @@ -290,7 +304,7 @@ async function send( method: 'GET' | 'POST', url: string, data: ?{ [string]: any }, - authToken: string, + authToken: ?string, ): any { const config: { url: string, headers?: { ... }, ... } = { method: method.toLowerCase(), diff --git a/src/backend/src/rest/internal/InternalChatApi.js b/src/backend/src/rest/internal/InternalChatApi.js index afd83a5..f8577e1 100644 --- a/src/backend/src/rest/internal/InternalChatApi.js +++ b/src/backend/src/rest/internal/InternalChatApi.js @@ -10,7 +10,7 @@ import AgentConversationInterface from '../../schema/AgentConversation/AgentConv import AgencyConversationInterface from '../../schema/AgencyConversation/AgencyConversationInterface.js' import AgentInterface from '../../schema/Agent/AgentInterface.js' import MessageInterface from '../../schema/Message/MessageInterface.js' -import ChatGPTRest from '../ChatGPTRest.js' +import InferenceRest from '../InferenceRest.js' import InstructionInterface from '../../schema/Instruction/InstructionInterface.js' import { MessageRole, MessageType } from '../../schema/Message/MessageSchema.js' import type { UserSQL } from '../../schema/User/UserSchema.js' @@ -290,29 +290,21 @@ export async function generateName( agencyConversationId: string, managerAgentId: number, user: UserSQL, - openAiKey: string, originalName: string, userPrompt: string, ): Promise { console.debug('generating a name') - const nameRes = await ChatGPTRest.chatCompletion( - openAiKey, - user.gptModels[0] || 'gpt-3.5-turbo', - [ - { - role: 'system', - content: - 'You are a writer tasked with generating a title for some text. The title should be short. The following message is the text.', - }, - { - role: 'user', - content: userPrompt, - }, - ], + const nameRes = await InferenceRest.chatCompletion(user, [ { - responseFormat: 'text', + role: 'system', + content: + 'You are a writer tasked with generating a title for some text. The title should be short. The following message is the text.', }, - ) + { + role: 'user', + content: userPrompt, + }, + ]) if (nameRes?.choices?.[0]?.finish_reason === 'stop') { const updatedName = nameRes?.choices?.[0]?.message?.content?.replace(/^"(.*)"$/, '$1') || '' diff --git a/src/backend/src/schema/schema.js b/src/backend/src/schema/schema.js index ae9d733..f1581eb 100644 --- a/src/backend/src/schema/schema.js +++ b/src/backend/src/schema/schema.js @@ -25,7 +25,7 @@ import * as Mutation from '../mutation/Mutation.js' import type { ResolverDefs } from '../utils/resolver.js' import resolver from '../utils/resolver.js' import createViewer from '../utils/createViewer.js' -import ChatGPTRest from '../rest/ChatGPTRest.js' +import InferenceRest from '../rest/InferenceRest.js' import UserInterface from './User/UserInterface.js' const { idDirectiveTransformer } = createIdDirective('id') @@ -93,7 +93,9 @@ export const resolvers: ResolverDefs = { if (user?.openAiKey) { // Sometimes the OpenAI call to get models fails, so this is why it is denormalized // and in try-catch block. - const models = await ChatGPTRest.getAvailableModels(user.openAiKey) + const models = await InferenceRest.getAvailableModels( + user.openAiKey, + ) await UserInterface.updateGptModels(user.userId, models) } } diff --git a/src/backend/src/utils/Token.js b/src/backend/src/utils/Token.js index 1c8f832..18f48cf 100644 --- a/src/backend/src/utils/Token.js +++ b/src/backend/src/utils/Token.js @@ -2,7 +2,7 @@ import * as jose from 'jose' import Config from 'common/src/Config.js' -import ChatGPTRest from '../rest/ChatGPTRest.js' +import InferenceRest from '../rest/InferenceRest.js' import UserInterface from '../schema/User/UserInterface.js' export const OAuthProviders = { diff --git a/src/backend/src/websocket/loadChatHandler.js b/src/backend/src/websocket/loadChatHandler.js index bf05457..04c98d2 100644 --- a/src/backend/src/websocket/loadChatHandler.js +++ b/src/backend/src/websocket/loadChatHandler.js @@ -16,10 +16,22 @@ import AgentInterface from '../schema/Agent/AgentInterface.js' async function loadChatHandler(user: UserSQL, data: LoadChatInput) { console.debug('loadChat', data) - const openAiKey = user?.openAiKey - if (!openAiKey) { + // const openAiKey = user?.openAiKey + // if (!openAiKey) { + // throw new Error('Unauthorized') + // } + const apiBase = user?.inferenceServerConfig?.apiBase + if (!apiBase) { throw new Error('Unauthorized') } + const apiKey = user?.inferenceServerConfig?.apiKey + const isOpenAi = + user?.inferenceServerConfig?.apiBase === 'https://api.openai.com' + if (isOpenAi) { + if (!apiKey) { + throw new Error('Unauthorized') + } + } if (!data.agencyId) { throw new Error('JSON body missing agencyId') diff --git a/src/backend/src/websocket/newChatHandler.js b/src/backend/src/websocket/newChatHandler.js index 84eb070..dc15170 100644 --- a/src/backend/src/websocket/newChatHandler.js +++ b/src/backend/src/websocket/newChatHandler.js @@ -15,10 +15,22 @@ const newChatHandler = async ( data: NewChatInput, ): Promise => { console.debug('newChat', data) - const openAiKey = user?.openAiKey - if (!openAiKey) { + // const openAiKey = user?.openAiKey + // if (!openAiKey) { + // throw new Error('Unauthorized') + // } + const apiBase = user?.inferenceServerConfig?.apiBase + if (!apiBase) { throw new Error('Unauthorized') } + const apiKey = user?.inferenceServerConfig?.apiKey + const isOpenAi = + user?.inferenceServerConfig?.apiBase === 'https://api.openai.com' + if (isOpenAi) { + if (!apiKey) { + throw new Error('Unauthorized') + } + } if (!data.agencyId) { throw new Error('JSON body missing agencyId') @@ -66,7 +78,6 @@ const newChatHandler = async ( newChatOutput.chatId, managerAgentId, user, - openAiKey, 'New Chat', data.userPrompt, ) @@ -81,8 +92,7 @@ const newChatHandler = async ( // This can fire off additional iterations, // so this starts off long running asynchronous process: AgentMind.chatIteration( - user.userId, - openAiKey, + user, data.agencyId, managerVersionId, newChatOutput.chatId, diff --git a/src/backend/src/websocket/newMessageHandler.js b/src/backend/src/websocket/newMessageHandler.js index 2725b8a..b89ce04 100644 --- a/src/backend/src/websocket/newMessageHandler.js +++ b/src/backend/src/websocket/newMessageHandler.js @@ -15,10 +15,22 @@ import { getCallbacks } from './callbacks.js' async function newMessageHandler(user: UserSQL, data: NewMessageInput) { console.debug('newMessage', data) - const openAiKey = user?.openAiKey - if (!openAiKey) { + // const openAiKey = user?.openAiKey + // if (!openAiKey) { + // throw new Error('Unauthorized') + // } + const apiBase = user?.inferenceServerConfig?.apiBase + if (!apiBase) { throw new Error('Unauthorized') } + const apiKey = user?.inferenceServerConfig?.apiKey + const isOpenAi = + user?.inferenceServerConfig?.apiBase === 'https://api.openai.com' + if (isOpenAi) { + if (!apiKey) { + throw new Error('Unauthorized') + } + } if (!data.agencyId) { throw new Error('JSON body missing agencyId') @@ -122,8 +134,7 @@ async function newMessageHandler(user: UserSQL, data: NewMessageInput) { // Start the next chat iteration with the AgentMind, and stream token updates to client: AgentMind.chatIteration( - user.userId, - openAiKey, + user, data.agencyId, managerAgentConversation.agentId, data.chatId, diff --git a/src/backend/src/websocket/setupWebsockets.js b/src/backend/src/websocket/setupWebsockets.js index 457431f..40f6202 100644 --- a/src/backend/src/websocket/setupWebsockets.js +++ b/src/backend/src/websocket/setupWebsockets.js @@ -134,37 +134,38 @@ export default function setupWebsockets(app: any): any { session.userId, ) } else if (socket.handshake.query.accessKey) { - const agencyId = socket.handshake.query.agencyId - if (!agencyId) { - throw new Error( - 'You must specify which agencyId you want to listen to.', - ) - } - - const authKey = await AuthTokenInterface.getByToken( - socket.handshake.query.accessKey, - ) - if (!authKey) { - throw new Error('Unauthorized') - } - const agencyVersions = await AgencyInterface.getActiveVersions( - authKey.agencyVersionId, - ) - // This verifies the agency version associated with the auth token contains the agency id. - listenToAgency = agencyVersions.find( - (agency) => agency.agencyId === agencyId, - ) - if (!listenToAgency) { - throw new Error('Unauthorized') - } - user = await UserInterface.getUser(listenToAgency?.userId) - - const isTrialKey = user?.openAiKey === Config.openAiPublicTrialKey - if (isTrialKey) { - throw new Error( - 'When using the trial key, you cannot use the publish API with permanent keys.', - ) - } + throw new Error('accessKey is no longer supported') + // const agencyId = socket.handshake.query.agencyId + // if (!agencyId) { + // throw new Error( + // 'You must specify which agencyId you want to listen to.', + // ) + // } + // + // const authKey = await AuthTokenInterface.getByToken( + // socket.handshake.query.accessKey, + // ) + // if (!authKey) { + // throw new Error('Unauthorized') + // } + // const agencyVersions = await AgencyInterface.getActiveVersions( + // authKey.agencyVersionId, + // ) + // // This verifies the agency version associated with the auth token contains the agency id. + // listenToAgency = agencyVersions.find( + // (agency) => agency.agencyId === agencyId, + // ) + // if (!listenToAgency) { + // throw new Error('Unauthorized') + // } + // user = await UserInterface.getUser(listenToAgency?.userId) + // + // const isTrialKey = user?.openAiKey === Config.openAiPublicTrialKey + // if (isTrialKey) { + // throw new Error( + // 'When using the trial key, you cannot use the publish API with permanent keys.', + // ) + // } } else { throw new Error('Unauthorized') } @@ -177,10 +178,22 @@ export default function setupWebsockets(app: any): any { throw new Error('Unauthorized') } - const openAiKey = user?.openAiKey - if (!openAiKey) { + // const openAiKey = user?.openAiKey + // if (!openAiKey) { + // throw new Error('Unauthorized') + // } + const apiBase = user?.inferenceServerConfig?.apiBase + if (!apiBase) { throw new Error('Unauthorized') } + const apiKey = user?.inferenceServerConfig?.apiKey + const isOpenAi = + user?.inferenceServerConfig?.apiBase === 'https://api.openai.com' + if (isOpenAi) { + if (!apiKey) { + throw new Error('Unauthorized') + } + } socket.custom = { listenToEverything: listenToEverything,