Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

table extraction not working properly - when there is a change in contrast between Title and rows #3668

Closed
sreeram1658 opened this issue Jul 9, 2024 · 2 comments

Comments

@sreeram1658
Copy link

Description of the bug

I am trying to extract a table inside my pdf document using fitz -

doc = fitz.open("sample_table.pdf")
page = doc[4]
tabs = page.find_tables(horizontal_strategy="lines", vertical_strategy="lines",)
tab = tabs[0]
df = tab.to_pandas()
df

My document -
image

Output comes something like this -
image

  • Clearly the cells in Rows which are not highlighted are not captured in here - how can I rectify this

How to reproduce the bug

Already explained above

PyMuPDF version

1.24.5

Operating system

Windows

Python version

3.9

@JorjMcKie
Copy link
Collaborator

This post cannot be accepted as a an issue yet because a reproducing file has not been supplied.

@JorjMcKie
Copy link
Collaborator

Closed b/o extended period of time without user's reaction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants