Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Further improve datafusion-cli memory usage if we setting huge number for maxrow size. #14810

Open
zhuqi-lucas opened this issue Feb 21, 2025 · 2 comments
Labels
enhancement New feature or request

Comments

@zhuqi-lucas
Copy link
Contributor

Is your feature request related to a problem or challenge?

This is a follow-up for the bellow comments:

#14766 (comment)

Describe the solution you'd like

Streaming datafusion-cli the print batch progress.

Describe alternatives you've considered

No response

Additional context

No response

@alamb
Copy link
Contributor

alamb commented Feb 21, 2025

Basically the idea of this ticket is to print rows as they come in in batches rather than buffering them all up at once

I think this will take some non trivial work as the formatter wants to know the width of all cells up front

I believe Postgres does something like "compute column widths based on the first 1000 cells" and then just has a crappy display if the rows after that happen to have wider columns

@zhuqi-lucas
Copy link
Contributor Author

Thank you @alamb for the great idea.

Besides this improvement, i also found a bug for unlimited cases which we are missing for the buffer. Filed a ticket now:

#14814

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants