Skip to content

rishikanthc/Scriberr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Scriberr - Self-hosted AI Transcription App

About

Scriberr is a self-hostable AI audio transcription app. It leverages the open-source Whisper models from OpenAI, utilizing the high-performance WhisperX transcription engine to transcribe audio files locally on your hardware. Scriberr also allows you to summarize transcripts using Ollama or OpenAI's ChatGPT API, with your own custom prompts. From v0.2.0, Scriberr supports offline speaker diarization with significant improvements.

Note: This app is under active development, and this release includes breaking changes. You will lose your old data. Please read the installation instructions carefully.

Build Status

Main Branch: Main Docker Main CUDA Docker

Nightly Branch: Nightly Docker Nightly CUDA Docker

Table of Contents

Features

  • Fast Local Transcription: Transcribe audio files locally using WhisperX for high performance.
  • Hardware Acceleration: Supports both CPU and GPU (NVIDIA) acceleration.
  • Customizable Compute Settings: Configure the number of threads, cores, and model size.
  • Speaker Diarization: Improved speaker identification with HuggingFace models.
  • Multilingual Support: Supports all languages that the Whisper model supports.
  • Customize Summarization: Optionally summarize transcripts with ChatGPT or Ollama using custom prompts.
  • API Access: Exposes API endpoints for automation and integration.
  • User-Friendly Interface: New UI with glassmorphism design.
  • Mobile Ready: Responsive design suitable for mobile devices.

Demo and Screenshots

Note:
Demo was run locally on a MacBook Air M2 using Docker. Performance depends on the size of the model used and the number of cores and threads assigned. The demo was running in development mode, so performance may be slower than production.

CleanShot.2024-10-04.at.14.55.46.mp4

CleanShot 2024-10-04 at 14 42 54@2x CleanShot 2024-10-04 at 14 48 31@2x CleanShot 2024-10-04 at 14 49 08@2x CleanShot 2024-10-04 at 15 11 27@2x

Installation

Requirements

  • Docker and Docker Compose installed on your system. Install Docker.
  • NVIDIA GPU (optional): If you plan to use GPU acceleration, ensure you have an NVIDIA GPU and the NVIDIA Container Toolkit installed.
  • HuggingFace API Key (required for speaker diarization): You'll need a free API key from HuggingFace to download diarization models.

Quick Start

Clone the Repository

git clone https://github.com/rishikanthc/Scriberr.git
cd Scriberr

Configure Environment Variables

Copy the example .env file and adjust the settings as needed:

cp env.example .env

Edit the .env file to set your desired configuration, including:

  • ADMIN_USERNAME and ADMIN_PASSWORD for accessing the web interface.
  • OPENAI_API_KEY if you plan to use OpenAI's GPT models for summarization.
  • HARDWARE_ACCEL set to gpu if you have an NVIDIA GPU.
  • Other configurations as needed.

Running with Docker Compose (CPU Only)

To run Scriberr without GPU acceleration:

docker-compose up -d

This command uses the docker-compose.yml file and builds the Docker image using the Dockerfile.

Running with Docker Compose (GPU Support)

To run Scriberr with GPU acceleration:

docker-compose -f docker-compose.yml -f docker-compose.gpu.yml up -d

This command uses both docker-compose.yml and docker-compose.gpu.yml files and builds the Docker image using the Dockerfile-gpu.

Note: Ensure that you have the NVIDIA Container Toolkit installed and properly configured.

Access the Application

Once the containers are up and running, access the Scriberr web interface at http://localhost:3000 (or the port you specified in the .env file).

Building Docker Images Manually

If you wish to build the Docker images yourself, you can use the provided Dockerfile and Dockerfile-gpu.

CPU Image

docker build -t scriberr:latest -f Dockerfile .

GPU Image

docker build -t scriberr:latest-gpu -f Dockerfile-cuda128 .

Advanced Configuration

The application can be customized using the following environment variables in your .env file.

  • ADMIN_USERNAME: Username for the admin user in the web interface.
  • ADMIN_PASSWORD: Password for the admin user.
  • AI_MODEL: Default model to use for summarization (e.g., "gpt-3.5-turbo").
  • OLLAMA_BASE_URL: Base URL of your OpenAI API-compatible server if not using OpenAI (e.g., your Ollama server).
  • OPENAI_API_KEY: Your OpenAI API key if using OpenAI for summarization (Or Ollama if OLLAMA_BASE_URL is set)
  • DIARIZATION_MODEL: Default model for speaker diarization (e.g., "pyannote/speaker-diarization@3.1").
  • MODELS_DIR, WORK_DIR, AUDIO_DIR: Directories for models, temporary files, and uploads.
  • BODY_SIZE_LIMIT: Maximum request body size (e.g., "1G").
  • HARDWARE_ACCEL: Set to gpu for GPU acceleration (NVIDIA GPU required), defaults to cpu.

Speaker Diarization Setup

Required Models

The application requires access to the following Hugging Face models:

  • pyannote/speaker-diarization-3.1
  • pyannote/segmentation-3.0
Setup Steps
  1. Create a free account at HuggingFace if you don’t already have one.
  2. Generate an API token at HuggingFace Tokens.
  3. Accept user conditions for the required models on Hugging Face:
    • Visit pyannote/speaker-diarization-3.1 and accept the conditions.
    • Visit pyannote/segmentation-3.0 and accept the conditions.
  4. Enter the API token in the setup wizard when prompted. The token is only used during initial setup and is not stored permanently. Storage and Usage

The diarization models are downloaded once and stored locally, so you won’t need to provide the API key again after the initial setup.

Updating from Previous Versions

Important: This release includes breaking changes and is not backward compatible with previous versions. You will lose your existing data. Please back up your data before proceeding.

Changes include:

  • Performance Improvements: The rewrite takes advantage of Svelte 5 reactivity features.
  • Transcription Engine Change: Switched from Whisper.cpp to WhisperX.
  • Improved Diarization: Significant improvements to the diarization pipeline.
  • Simplified Setup: Streamlined setup process with improved wizard.
  • New UI: Implemented a new UI design with glassmorphism.
  • Multilingual Support: Transcription and diarization now support all languages that Whisper models support.

Troubleshooting

  • Database Connection Issues: Ensure that the PostgreSQL container is running and accessible.
  • GPU Not Detected: Ensure that the NVIDIA Container Toolkit is installed and that Docker is configured correctly.
  • Permission Issues: Running Docker commands may require root permissions or being part of the docker group.
  • Diarization Model Download Failure: Make sure you've entered a valid HuggingFace API key during setup.
  • CUDA failed with error out of memory: Ensure that your GPU has enough memory to run the models. You can try reducing the batch size by adding WHISPER_BATCH_SIZE= to your .env file. The default is 16, you can reduce to 8, 4, 2, etc. (Got large-v2 running on a 1.5 hour audio file on a 3070 with 8GB VRAM using batch size of 1. Anything higher and it died.)

Check the logs for more details:

docker-compose logs -f

Need Help?

If you encounter issues or have questions, feel free to open an issue.

Contributing

Contributions are welcome! Feel free to submit pull requests or open issues.

  • Fork the Repository: Create a personal fork of the repository on GitHub.
  • Clone Your Fork: Clone your forked repository to your local machine.
  • Create a Feature Branch: Make a branch for your feature or fix.
  • Commit Changes: Make your changes and commit them.
  • Push to Your Fork: Push your changes to your fork on GitHub.
  • Submit a Pull Request: Create a pull request to merge your changes into the main repository.

For major changes, please open an issue first to discuss what you would like to change.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgments


Thank you for your patience, support, and interest in the project. Looking forward to any and all feedback.