Image
Hyper Fast and safe image manipulation library for python . Powered by rust.
Rembg is a tool to remove images background
Tesseract Open Source OCR Engine (main repository)
A python module that wraps the pdftoppm utility to convert PDF to PIL Image object
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
Text detection with Python | Tesseract vs Easyocr vs AWS Textract | What is the best OCR?
OCR, layout analysis, reading order, table recognition in 90+ languages
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and…
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
📄 Awesome OCR multiple programing languages toolkits based on ONNXRuntime, OpenVINO, PaddlePaddle and PyTorch.
A packaged and flexible version of the CRAFT text detector and Keras CRNN recognition model.
Easily train or fine-tune SOTA computer vision models with one open source training library. The home of Yolo-NAS.
Functions that create PNG and animated PNG files from numpy arrays.
Web interface for recognizing text, proofreading OCR, and creating fully-digitized documents.
Community maintained fork of pdfminer - we fathom PDF
A Python library for reading and writing PDF, powered by QPDF
PyExifTool (active PyPI project) - A Python library to communicate with an instance of Phil Harvey's ExifTool command-line application. Runs one process with special -stay_open flag, and pipes data…