Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extracting tables which is continued/split across multiple pages #3208

Closed
rahulgoel11 opened this issue Feb 28, 2024 · 1 comment
Closed
Labels
enhancement wontfix no intention to resolve

Comments

@rahulgoel11
Copy link

Hello,

How do we extract tables which is continued across multiple pages, without main table headers being passed to another page?

For Eg :
Below table is split across multiple page

image

Can we have some solution for this scenario's, because Table extraction is currently at page level (find_tables operate only on page object)?

@JorjMcKie
Copy link
Collaborator

This situation has no general solution and must be solved by your own code.

First of all, this only works on the basis of pandas Dataframes or the Python lists table.extract(). Then, equality of column counts must be assured.
Then, header repetition must be confirmed or excluded.
Potentially in addition, table cell data type must be checked for compatibility.
Etc.

We currently have no plans to address this functionality. The utilities repository has an example script for joining pandas Dataframes. You should be able to adapt it for your specific needs.

@JorjMcKie JorjMcKie added the wontfix no intention to resolve label Feb 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement wontfix no intention to resolve
Projects
None yet
Development

No branches or pull requests

2 participants