Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Some PDF files cannot be ingested since Aleph 3.15.6 #4163

Open
eddie1113 opened this issue Mar 25, 2025 · 1 comment
Open

BUG: Some PDF files cannot be ingested since Aleph 3.15.6 #4163

eddie1113 opened this issue Mar 25, 2025 · 1 comment
Labels
bug Things that should work, but don’t triage These issues need to be reviewed by the Aleph team

Comments

@eddie1113
Copy link

Title: BUG: Some PDF files cannot be ingested since Aleph 3.15.6

Describe the bug
Some PDF files cannot be ingested since Aleph 3.15.6.
The problematic files contain about 50 scanned pages each.
Logs seem to indicate there is an issue with the ingest-file component.

To Reproduce
Steps to reproduce the behavior:

  1. Install Aleph 3.15.6
  2. Create a new project
  3. Upload multiple documents containing 50 scanned pages each
  4. Try to view the documents

Expected behavior
All uploaded documents should be viewable

Aleph version
Last working version: Aleph 3.15.5 (ingest-file 3.19.3)
Broken version: Aleph 3.15.6 (ingest-file 3.20.3)

Last working ingest-file version: 3.20.0
Broken ingest-file version: 3.20.1

Screenshots

Image

Additional context
In version 3.15.5, the first document is unviewable (another bug?), but subsequent documents are OK.
In version 3.15.6, all document are unviewable.

Aleph_3.15.5.log
Aleph_3.15.6.log

@eddie1113 eddie1113 added bug Things that should work, but don’t triage These issues need to be reviewed by the Aleph team labels Mar 25, 2025
@stchris
Copy link
Contributor

stchris commented Mar 25, 2025

Hi @eddie1113 , can you check if #4002 (comment) fixes this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Things that should work, but don’t triage These issues need to be reviewed by the Aleph team
Projects
None yet
Development

No branches or pull requests

2 participants