table extraction not working properly - when there is a change in contrast between Title and rows #3668

sreeram1658 · 2024-07-09T08:10:11Z

Description of the bug

I am trying to extract a table inside my pdf document using fitz -

doc = fitz.open("sample_table.pdf")
page = doc[4]
tabs = page.find_tables(horizontal_strategy="lines", vertical_strategy="lines",)
tab = tabs[0]
df = tab.to_pandas()
df

My document -

Output comes something like this -

Clearly the cells in Rows which are not highlighted are not captured in here - how can I rectify this

How to reproduce the bug

Already explained above

PyMuPDF version

1.24.5

Operating system

Windows

Python version

3.9

JorjMcKie · 2024-07-09T14:58:15Z

This post cannot be accepted as a an issue yet because a reproducing file has not been supplied.

JorjMcKie · 2024-07-15T12:07:21Z

Closed b/o extended period of time without user's reaction.

JorjMcKie added example required Waiting for information labels Jul 9, 2024

JorjMcKie closed this as completed Jul 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

table extraction not working properly - when there is a change in contrast between Title and rows #3668

table extraction not working properly - when there is a change in contrast between Title and rows #3668

sreeram1658 commented Jul 9, 2024

JorjMcKie commented Jul 9, 2024

JorjMcKie commented Jul 15, 2024

table extraction not working properly - when there is a change in contrast between Title and rows #3668

table extraction not working properly - when there is a change in contrast between Title and rows #3668

Comments

sreeram1658 commented Jul 9, 2024

Description of the bug

How to reproduce the bug

PyMuPDF version

Operating system

Python version

JorjMcKie commented Jul 9, 2024

JorjMcKie commented Jul 15, 2024