This is a simple version of OpenAI's voice functionality using free APIs. This demo lets you talk, listen, and converse with LLMs.
Original blog post is here: - Blog: Blog Post Youtube video explainer is here: YouTube Video
Feel free to play around!
- LLM Host: Groq
- LLM: LLAMA 3
- TTS: DeepGram
- STT: SpeechRecognition API
- Web Framework: NextJS (React front-end, Express API)
- download the repo
- npm i
- setup .env.local with DEEPGRAM_API_KEY and GROQ_API_KEY
- npm run dev
You might want to edit all the prompts to change the tone of the response.
The architecture is simple, Voice -> Text -> LLM -> Text -> Voice. Rag and all sorts of fun creative things can be used to spice up the LLM.
You'll probably want to switch out SpeechRecognition for Whisper AI if you want non-chrome APIs or something more stable.
There is a lot of investment needed in handling state in the AudioPlayer, not necessary for this demo.
Playing with the prompts and context going to Groq is the key for personalisation.
Contact me for feedback!
I built a demo where you can:
- Talk into the browser using the WebSpeechRecognitionAPI.
- Stream the transcribed text to Groq for processing.
- Stream the response from Groq to DeepGram for text-to-speech conversion.
- Play the generated audio response in the browser.
- NextJS: ★★★★★ - Wonderful technology, simplifies client and server-side development.
- Groq: ★★★★★ - New benchmarks in speed and cost.
- Llama3: ★★★★☆ - Noticeable difference from GPT-io, great for cheap requests and demos.
- DeepGram: ★★★☆☆ - Generous starting credits, good latency. Still green as a tech.
- Demo: AI Voice Generation Demo
- GitHub Repository: GitHub
- Video: YouTube Video
- Blog: Blog Post
Edward Ejb503, Tying Shoelaces Blog