-
Notifications
You must be signed in to change notification settings - Fork 604
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support AWS plugin for TTS, STT and LLM #1302
base: main
Are you sure you want to change the base?
Conversation
🦋 Changeset detectedLatest commit: c2e1384 The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
return credentials.access_key, credentials.secret_key | ||
|
||
|
||
TTS_SPEECH_ENGINE = Literal["standard", "neural", "long-form", "generative"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should move this to another file. Check how we do it for other TTS/STT
|
||
response = await client.synthesize_speech(**_strip_nones(params)) | ||
|
||
if "AudioStream" in response: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit avoid the extra indent here
except Exception as e: | ||
logger.exception(f"an error occurred while streaming inputs: {e}") | ||
|
||
handler = TranscriptEventHandler(stream.output_stream, self._event_ch) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we create a separate class?
self, | ||
*, | ||
voice: str | None = "Ruth", | ||
language: TTS_LANGUAGE | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
language: TTS_LANGUAGE | None = None, | |
language: TTS_LANGUAGE | None = None, |
We should always allow a str too here, we can't guarantee we will update the languages quickly
*, | ||
voice: str | None = "Ruth", | ||
language: TTS_LANGUAGE | None = None, | ||
output_format: TTS_OUTPUT_FORMAT = "pcm", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it makes sense to expose the output format. we only support pcm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do support mp3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah true!
|
||
# If API key and secret are provided, create a session with them | ||
if api_key and api_secret: | ||
session = boto3.Session( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this making network calls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
boto3.Session()
doesn’t make network calls, but session.get_credentials() does if API keys and secrets aren’t cached., but we’re calling it during initialization, it’s a one-time operation.
try: | ||
async for frame in self._input_ch: | ||
if isinstance(frame, rtc.AudioFrame): | ||
await stream.input_stream.send_audio_event( | ||
audio_chunk=frame.data.tobytes() | ||
) | ||
await stream.input_stream.end_stream() | ||
|
||
except Exception as e: | ||
logger.exception(f"an error occurred while streaming inputs: {e}") | ||
|
||
async def handle_transcript_events(): | ||
try: | ||
async for event in stream.output_stream: | ||
if isinstance(event, TranscriptEvent): | ||
self._process_transcript_event(event) | ||
except Exception as e: | ||
logger.exception( | ||
f"An error occurred while handling transcript events: {e}" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of using try — finally. We have an utility for it here
finally: | ||
await utils.aio.gracefully_cancel(*tasks) | ||
except Exception as e: | ||
logger.exception(f"An error occurred while streaming inputs: {e}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is swallowing exceptions? In this case the baseclass will not try to reconnect on failure
def get_client(self): | ||
"""Returns a client creator context.""" | ||
return self._session.create_client( | ||
"polly", | ||
region_name=self._opts.speech_region, | ||
aws_access_key_id=self._api_key, | ||
aws_secret_access_key=self._api_secret, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we hide this?
from typing import Any, Callable | ||
|
||
import aiohttp | ||
from aiobotocore.session import AioSession, get_session # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They don't support types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They don't have official support for types. I found this thread boto/boto3#2213 and this library https://github.com/youtype should we include it?
Getting the following error when calling a function: |
Hey @jayeshp19 i was going thorugh the error and wanted to see if these solutions could help
and the last one for converse_stream would we be able to use invoke_model_with_response_stream in _run method. |
Good work so far. I am looking forward to this one. |
Just tested this PR. Overall, amazing speeds when using aws.LLM(), aws.TTS(), and aws.STT() !! Here are a couple of errors from amazon_transcribe: 2025-02-06 17:05:51,236 - ERROR livekit.plugins.aws - Error in handle_transcript_events |
Also, seeing the same error as @jenfic when using the function calls with aws.LLM(): ValidationException:http://internal.amazon.com/coral/com.amazon.bedrock/ Not sure if pytest could/should catch these errors. |
@jayeshp19 let me know if you need further help testing. I can help spin-up all the services needed in AWS... (wink, wink). |
I tried using this AWS plugin for TTS but could not hear any audio.
but did not hear any audio. Below are the details sent to aws while creating the TTS object: |
This PR implements AWS plugin for TTS and STT