Scriberr is a self-hostable AI audio transcription app. It leverages the open-source Whisper models from OpenAI, utilizing the high-performance WhisperX transcription engine to transcribe audio files locally on your hardware. Scriberr also allows you to summarize transcripts using Ollama or OpenAI's ChatGPT API, with your own custom prompts. From v0.2.0, Scriberr supports offline speaker diarization with significant improvements.
Note: This app is under active development, and this release includes breaking changes. You will lose your old data. Please read the installation instructions carefully.
- Features
- Demo and Screenshots
- Installation
- Contributing
- License
- Acknowledgments
- Fast Local Transcription: Transcribe audio files locally using WhisperX for high performance.
- Hardware Acceleration: Supports both CPU and GPU (NVIDIA) acceleration.
- Customizable Compute Settings: Configure the number of threads, cores, and model size.
- Speaker Diarization: Improved speaker identification with HuggingFace models.
- Multilingual Support: Supports all languages that the Whisper model supports.
- Customize Summarization: Optionally summarize transcripts with ChatGPT or Ollama using custom prompts.
- API Access: Exposes API endpoints for automation and integration.
- User-Friendly Interface: New UI with glassmorphism design.
- Mobile Ready: Responsive design suitable for mobile devices.
Note:
Demo was run locally on a MacBook Air M2 using Docker. Performance depends on the size of the model used and the number of cores and threads assigned. The demo was running in development mode, so performance may be slower than production.
CleanShot.2024-10-04.at.14.55.46.mp4
- Docker and Docker Compose installed on your system. Install Docker.
- NVIDIA GPU (optional): If you plan to use GPU acceleration, ensure you have an NVIDIA GPU and the NVIDIA Container Toolkit installed.
- HuggingFace API Key (required for speaker diarization): You'll need a free API key from HuggingFace to download diarization models.
git clone https://github.com/rishikanthc/Scriberr.git
cd Scriberr
Copy the example .env
file and adjust the settings as needed:
cp env.example .env
Edit the .env
file to set your desired configuration, including:
ADMIN_USERNAME
andADMIN_PASSWORD
for accessing the web interface.OPENAI_API_KEY
if you plan to use OpenAI's GPT models for summarization.HARDWARE_ACCEL
set togpu
if you have an NVIDIA GPU.- Other configurations as needed.
To run Scriberr without GPU acceleration:
docker-compose up -d
This command uses the docker-compose.yml
file and builds the Docker image using the Dockerfile
.
To run Scriberr with GPU acceleration:
docker-compose -f docker-compose.yml -f docker-compose.gpu.yml up -d
This command uses both docker-compose.yml
and docker-compose.gpu.yml
files and builds the Docker image using the Dockerfile-gpu
.
Note: Ensure that you have the NVIDIA Container Toolkit installed and properly configured.
Once the containers are up and running, access the Scriberr web interface at http://localhost:3000
(or the port you specified in the .env
file).
If you wish to build the Docker images yourself, you can use the provided Dockerfile
and Dockerfile-gpu
.
docker build -t scriberr:latest -f Dockerfile .
docker build -t scriberr:latest-gpu -f Dockerfile-cuda128 .
The application can be customized using the following environment variables in your .env
file.
ADMIN_USERNAME
: Username for the admin user in the web interface.ADMIN_PASSWORD
: Password for the admin user.AI_MODEL
: Default model to use for summarization (e.g.,"gpt-3.5-turbo"
).OLLAMA_BASE_URL
: Base URL of your OpenAI API-compatible server if not using OpenAI (e.g., your Ollama server).OPENAI_API_KEY
: Your OpenAI API key if using OpenAI for summarization (Or Ollama ifOLLAMA_BASE_URL
is set)DIARIZATION_MODEL
: Default model for speaker diarization (e.g.,"pyannote/speaker-diarization@3.1"
).MODELS_DIR
,WORK_DIR
,AUDIO_DIR
: Directories for models, temporary files, and uploads.BODY_SIZE_LIMIT
: Maximum request body size (e.g.,"1G"
).HARDWARE_ACCEL
: Set togpu
for GPU acceleration (NVIDIA GPU required), defaults tocpu
.
The application requires access to the following Hugging Face models:
- pyannote/speaker-diarization-3.1
- pyannote/segmentation-3.0
- Create a free account at HuggingFace if you don’t already have one.
- Generate an API token at HuggingFace Tokens.
- Accept user conditions for the required models on Hugging Face:
- Visit pyannote/speaker-diarization-3.1 and accept the conditions.
- Visit pyannote/segmentation-3.0 and accept the conditions.
- Enter the API token in the setup wizard when prompted. The token is only used during initial setup and is not stored permanently. Storage and Usage
The diarization models are downloaded once and stored locally, so you won’t need to provide the API key again after the initial setup.
Important: This release includes breaking changes and is not backward compatible with previous versions. You will lose your existing data. Please back up your data before proceeding.
Changes include:
- Performance Improvements: The rewrite takes advantage of Svelte 5 reactivity features.
- Transcription Engine Change: Switched from Whisper.cpp to WhisperX.
- Improved Diarization: Significant improvements to the diarization pipeline.
- Simplified Setup: Streamlined setup process with improved wizard.
- New UI: Implemented a new UI design with glassmorphism.
- Multilingual Support: Transcription and diarization now support all languages that Whisper models support.
- Database Connection Issues: Ensure that the PostgreSQL container is running and accessible.
- GPU Not Detected: Ensure that the NVIDIA Container Toolkit is installed and that Docker is configured correctly.
- Permission Issues: Running Docker commands may require root permissions or being part of the
docker
group. - Diarization Model Download Failure: Make sure you've entered a valid HuggingFace API key during setup.
- CUDA failed with error out of memory: Ensure that your GPU has enough memory to run the models. You can try reducing the batch size by adding WHISPER_BATCH_SIZE= to your .env file. The default is 16, you can reduce to 8, 4, 2, etc. (Got large-v2 running on a 1.5 hour audio file on a 3070 with 8GB VRAM using batch size of 1. Anything higher and it died.)
Check the logs for more details:
docker-compose logs -f
If you encounter issues or have questions, feel free to open an issue.
Contributions are welcome! Feel free to submit pull requests or open issues.
- Fork the Repository: Create a personal fork of the repository on GitHub.
- Clone Your Fork: Clone your forked repository to your local machine.
- Create a Feature Branch: Make a branch for your feature or fix.
- Commit Changes: Make your changes and commit them.
- Push to Your Fork: Push your changes to your fork on GitHub.
- Submit a Pull Request: Create a pull request to merge your changes into the main repository.
For major changes, please open an issue first to discuss what you would like to change.
This project is licensed under the MIT License. See the LICENSE file for details.
- OpenAI Whisper
- WhisperX
- HuggingFace
- PyAnnote Speaker Diarization
- Ollama
- Community contributors who have submitted great PRs and helped the app evolve.
Thank you for your patience, support, and interest in the project. Looking forward to any and all feedback.