Skip to content

kyryl-opens-ml/no-ocr

Repository files navigation

No OCR

A simple tool for exploring documents with AI, no fancy text extraction required. Just upload your files, then quickly search or ask questions about content across multiple collections.

Release blog with details

Here is a blog with release details about this project: No-OCR Product

Demo

Here's a quick GIF demonstrating the basic flow of using No OCR:

No OCR Flow

Table of Contents

  1. Overview
  2. Key Features
  3. Architecture
  4. Flow
  5. Roadmap
  6. Prerequisites
  7. Dev Installation

Overview

The core purpose of "No OCR" is to simplify AI-based PDF processing:

  • Process and store PDF pages without relying on OCR.
  • Perform text and/or visual queries using modern embeddings.
  • Use open source models for advanced question-answering on document-based diagrams, text, and more.

Key technologies:

  • React-based front end (no-ocr-ui) for uploading, managing, and searching documents.
  • Python-based API (no-ocr-api) that coordinates ingestion, indexing, and searching.
  • Qdrant for efficient vector search and retrieval.
  • ColPali & Qwen2-VL handle inference tasks (both text and vision-based).

Key Features

  • Create and manage PDF/document collections, also referred to as "cases".
  • Automated ingestion to build Hugging Face-style datasets (HF_Dataset).
  • Vector-based search over PDF pages (and relevant images) in Qdrant.
  • Visual question-answering on images and diagrams via Qwen2-VL.
  • Deployable via Docker for both the backend (Python) and UI (React).

Architecture

Below is a high-level workflow overview:

Architecture

Flow

Create case:

sequenceDiagram
    participant User
    participant no-ocr-ui (CreateCase)
    participant no-ocr-api
    participant HF_Dataset
    participant IngestClient
    participant Qdrant

    User->>no-ocr-ui (CreateCase): Upload PDFs & specify case name
    no-ocr-ui (CreateCase)->>no-ocr-api: POST /create_case with PDFs
    no-ocr-api->>no-ocr-api: Save PDFs to local storage
    no-ocr-api->>no-ocr-api: Spawn background task (process_case)
    no-ocr-api->>HF_Dataset: Convert PDFs to HF dataset
    HF_Dataset-->>no-ocr-api: Return dataset
    no-ocr-api->>IngestClient: Ingest dataset
    IngestClient->>Qdrant: Create collection & upload points
    Qdrant-->>IngestClient: Acknowledge ingestion
    IngestClient-->>no-ocr-api: Done ingestion
    no-ocr-api->>no-ocr-api: Mark case status as 'done'
    no-ocr-api-->>no-ocr-ui (CreateCase): Return creation response
    no-ocr-ui (CreateCase)-->>User: Display success message
Loading

Search:

sequenceDiagram
    participant User
    participant no-ocr-ui
    participant SearchClient
    participant Qdrant
    participant HF_Dataset
    participant VLLM

    User->>no-ocr-ui: Enter search query and select case
    no-ocr-ui->>SearchClient: Search images by text
    SearchClient->>Qdrant: Query collection with text embedding
    Qdrant-->>SearchClient: Return search results
    SearchClient-->>no-ocr-ui: Provide search results
    no-ocr-ui->>HF_Dataset: Load dataset for collection
    HF_Dataset-->>no-ocr-ui: Return dataset
    no-ocr-ui->>VLLM: Process images with VLLM
    VLLM-->>no-ocr-ui: Return VLLM output
    no-ocr-ui-->>User: Display search results and VLLM output
Loading

Roadmap

  • Better models for reasoning and retrieval 72B and QVQ.
  • Agentic workflows - go beyond search and toward complete peace of work.
  • Training models per case - turn your workflow into data moat and train unique models.
  • UI/UX improvement - simplify, simplify, simplify.

Prerequisites

  • Python 3.x
  • Node.js 18.x
  • Docker (optional for containerized deployments)
  • Superbase
    • Create an account at https://app.supabase.io/
    • Create a .env file in the no-ocr-ui directory
    • Add the following variables to the .env file:
      VITE_SUPABASE_URL=""
      VITE_SUPABASE_ANON_KEY=""
      VITE_REACT_APP_API_URI=""
      
  • Modal
    • Create an account at https://modal.com/
    • Deploy models:
      pip install modal
      modal setup
      
      modal run no-ocr-llms/llm_serving_load_models.py --model-name Qwen/Qwen2-VL-7B-Instruct --model-revision 51c47430f97dd7c74aa1fa6825e68a813478097f
      modal run no-ocr-llms/llm_serving_load_models.py --model-name vidore/colqwen2-v1.0-merged --model-revision 364a4f5df97231e233e15cbbaf0b9dbe352ba92c
      
      
      modal deploy no-ocr-llms/llm_serving.py
      modal deploy no-ocr-llms/llm_serving_colpali.py
    • Create a .env file in the no-ocr-api directory
    • Update the environment variables.

Dev Installation

  1. Clone the repository:

    git clone https://github.com/kyryl-opens-ml/no-ocr
  2. (API) Install dependencies:

    cd no-ocr-api
    pip install -r requirements.txt
  3. (API) Run server:

    cd no-ocr-api
    fastapi dev api.py
  4. (UI) Install dependencies:

    cd no-ocr-ui
    npm install
  5. (UI) Run UI:

    cd no-ocr-ui
    npm run dev
  6. (Qdrant) Run qdrant

    docker run -p 6333:6333 qdrant/qdrant:v1.12.5