Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to read JPX header #2904

Closed
myhloli opened this issue Dec 18, 2023 · 3 comments
Closed

Failed to read JPX header #2904

myhloli opened this issue Dec 18, 2023 · 3 comments
Labels
not a bug not a bug / user error / unable to reproduce Waiting for information

Comments

@myhloli
Copy link

myhloli commented Dec 18, 2023

Description of the bug

when i use the api page.get_image_rects,there is a RuntimeError: Failed to read JPX header

How to reproduce the bug

JFMA_15_11.pdf
page_id:75(page_id start from 0)

pdf_docs = fitz.open("pdf", pdf_bytes)
for page_id, page in enumerate(pdf_docs):
    page_imgs = page.get_images()
    for img in page_imgs:
    recs = page.get_image_rects(img, transform=True)

my code like this demo

recs = page.get_image_rects(img, transform=True)
           │    │               └ (1555, 0, 300, 28, 1, 'Indexed', '', 'Im1', 'JPXDecode')
           │    └ <function get_image_rects at 0x000001B37F685E40>
           └ page 75 of <memory, doc# 1>
pix = Pixmap(page.parent, xref)  # make pixmap of the image to compute MD5
          │      │    │       └ 1555
          │      │    └ <weakproxy at 0x000001B301178630 to Document at 0x000001B37E8D5910>
          │      └ page 75 of <memory, doc# 1>
          └ <class 'fitz.fitz.Pixmap'>
_fitz.Pixmap_swiginit(self, _fitz.new_Pixmap(*args))
    │     │               │     │     │           └ (<weakproxy at 0x000001B301178630 to Document at 0x000001B37E8D5910>, 1555)
    │     │               │     │     └ <built-in function new_Pixmap>
    │     │               │     └ <module 'fitz._fitz' from 
    │     │               └ <unprintable Pixmap object>
    │     └ <built-in function Pixmap_swiginit>
    └ <module 'fitz._fitz' from '

there is stack when i meet this error

PyMuPDF version

1.23.7

Operating system

Windows

Python version

3.11

@julian-smith-artifex-com
Copy link
Collaborator

Thanks for the reproducer.

It looks like the image with page=75 (zero-based) image=3 (zero-based) and xref=1555, is corrupted. But MuPDF isn't coping with particularly well with this - we end up getting this image's error for all images on page 75.

I'll mark this as an upstream bug and keep it open here.

@julian-smith-artifex-com julian-smith-artifex-com added the upstream bug bug outside this package label Dec 18, 2023
@julian-smith-artifex-com
Copy link
Collaborator

Note that so far i've been testing this with PyMuPDF built with MuPDF master. It looks like MuPDF master may have a regression where a single corrupt JPX image causes PyMuPDF to return an error for all images on the same page.

But the current release of PyMuPDF-1.23.8 (which is built with MuPDF-1.23.7) appears to be handling things ok. The image with xref=1555 is corrupt, and it's returning an error for just that image. I think this is correct behaviour.

So i think this is not actually a bug in the current release after all.

@julian-smith-artifex-com
Copy link
Collaborator

Given that the image in question is corrupt, i think this isn't a bug after all, so i'll close it sometime soon.

Please comment here if you think there's still something wrong with pymupdf's behaviour.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
not a bug not a bug / user error / unable to reproduce Waiting for information
Projects
None yet
Development

No branches or pull requests

2 participants