A powerful TypeScript library for real-time voice streaming in React applications, designed for AI-powered voice applications, real-time transcription, and audio processing.
- ποΈ Real-time voice streaming with configurable audio processing
- π Automatic silence detection and handling
- β‘ Configurable sample rate and buffer size
- π Base64 encoded audio chunks for easy transmission
- π οΈ TypeScript support with full type definitions
- π¦ Zero dependencies (except for React)
yarn add voice-stream
# or
npm install voice-stream
- React 18 or higher
- Modern browser with Web Audio API support
import { useVoiceStream } from "voice-stream";
function App() {
const { startStreaming, stopStreaming, isStreaming } = useVoiceStream({
onStartStreaming: () => {
console.log("Streaming started");
},
onStopStreaming: () => {
console.log("Streaming stopped");
},
onAudioChunked: (chunkBase64) => {
// Handle the audio chunk
console.log("Received audio chunk");
},
});
return (
<div>
<button onClick={startStreaming} disabled={isStreaming}>
Start Recording
</button>
<button onClick={stopStreaming} disabled={!isStreaming}>
Stop Recording
</button>
</div>
);
}
The useVoiceStream
hook accepts several configuration options for advanced use cases:
const options = {
// Basic callbacks
onStartStreaming: () => void,
onStopStreaming: () => void,
onAudioChunked: (base64Data: string) => void,
onError: (error: Error) => void,
// Audio processing options
targetSampleRate: 16000, // Default: 16000
bufferSize: 4096, // Default: 4096
// Silence detection options
enableSilenceDetection: true, // Default: false
silenceThreshold: -50, // Default: -50 (dB)
silenceDuration: 1000, // Default: 1000 (ms)
autoStopOnSilence: true, // Default: false
// Audio routing
includeDestination: true, // Default: true - routes audio to speakers
};
Real-time speech-to-text using OpenAI's Whisper API:
function WhisperTranscription() {
const [transcript, setTranscript] = useState("");
const { startStreaming, stopStreaming } = useVoiceStream({
targetSampleRate: 16000, // Whisper's preferred sample rate
onAudioChunked: async (base64Data) => {
const response = await fetch('https://api.openai.com/v1/audio/transcriptions', {
method: 'POST',
headers: {
'Authorization': `Bearer ${OPENAI_API_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
audio: base64Data,
model: 'whisper-1',
response_format: 'text'
})
});
const text = await response.text();
setTranscript(text);
}
});
return (
// ... UI implementation
);
}
Real-time text-to-speech using ElevenLabs' WebSocket API:
function ElevenLabsStreaming() {
const ws = useRef<WebSocket | null>(null);
const { startStreaming, stopStreaming } = useVoiceStream({
targetSampleRate: 44100, // ElevenLabs preferred sample rate
onAudioChunked: (base64Data) => {
if (ws.current?.readyState === WebSocket.OPEN) {
ws.current.send(JSON.stringify({
audio: base64Data,
voice_settings: {
stability: 0.5,
similarity_boost: 0.75
}
}));
}
}
});
useEffect(() => {
ws.current = new WebSocket('wss://api.elevenlabs.io/v1/text-to-speech');
return () => {
ws.current?.close();
};
}, []);
return (
// ... UI implementation
);
}
Implement voice activity detection with automatic silence handling:
function VoiceActivityDetection() {
const { startStreaming, stopStreaming } = useVoiceStream({
enableSilenceDetection: true,
silenceThreshold: -50,
silenceDuration: 1000,
autoStopOnSilence: true,
onStartStreaming: () => console.log("Voice detected"),
onStopStreaming: () => console.log("Silence detected"),
});
return (
// ... UI implementation
);
}
startStreaming: () => Promise<void>
- Function to start voice streamingstopStreaming: () => void
- Function to stop voice streamingisStreaming: boolean
- Current streaming status
onStartStreaming?: () => void
- Called when streaming startsonStopStreaming?: () => void
- Called when streaming stopsonAudioChunked?: (chunkBase64: string) => void
- Called with each audio chunkonError?: (error: Error) => void
- Called when an error occurstargetSampleRate?: number
- Target sample rate for audio processingbufferSize?: number
- Size of the audio processing bufferenableSilenceDetection?: boolean
- Enable silence detectionsilenceThreshold?: number
- Threshold for silence detection in dBsilenceDuration?: number
- Duration of silence before trigger in msautoStopOnSilence?: boolean
- Automatically stop streaming on silenceincludeDestination?: boolean
- Route audio to speakers
We welcome contributions! Whether it's bug reports, feature requests, or code contributions, please feel free to reach out or submit a pull request.
- Fork the repository
- Install dependencies:
yarn install
- Run tests:
yarn test
MIT