Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for PDF Grafting to overlay OCR results #281

Open
hackslashX opened this issue Mar 11, 2025 · 1 comment
Open

Support for PDF Grafting to overlay OCR results #281

hackslashX opened this issue Mar 11, 2025 · 1 comment

Comments

@hackslashX
Copy link

Hi,

The OCR feature is awesome since Tesseract results aren't always up to the mark. I was wondering if PDF grafting can be added, i.e. use the bounding boxes from Azure Document or Google Document Engines (LLMs will not work since they don't output it), and overlay it on top of the resulting PDF. This can be challenging since Paperless also performs a lot of PDF processing, so just asking if it's something that is feasible and can be discussed. I'll be also happy to contribute to this feature.

Thanks!

@gardar
Copy link
Contributor

gardar commented Mar 12, 2025

There is a discussion on the topic at #135 and some initial work has been done in #212

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants