Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DataFrame fill_nan #14770

Open
kosiew opened this issue Feb 19, 2025 · 1 comment
Open

Add DataFrame fill_nan #14770

kosiew opened this issue Feb 19, 2025 · 1 comment
Labels
enhancement New feature or request

Comments

@kosiew
Copy link
Contributor

kosiew commented Feb 19, 2025

Is your feature request related to a problem or challenge?

There is a common operation in libraries such as PySpark to fill NaN values across an entire DataFrame (or limit by columns). It would be useful to have a similar feature in DataFusion and datafusion-python.

Describe the solution you'd like

If I have a dataframe with a bunch of null values in different columns, I would want to replace all NaNs in those columns with the provided value IF it can be cast to the column's type. Otherwise no-op should happen. Also the user should be able to limit which columns this applies to.

Describe alternatives you've considered

Additional context

This is a repost from apache/datafusion-python#922, prompted by this PR comment

@kosiew kosiew added the enhancement New feature or request label Feb 19, 2025
@niebayes
Copy link
Contributor

A possible implementation is to design a bunch of fill UDFs, such as fill_value, fill_prev, fill_linear etc, which act on a column expression in the select list.
The users can then use SELECT fill_value(a, 1.0), fill_prev(b) FROM table to fill selected columns with available fill strategies.
Or use DataFrame::new(..).fill(vec![(Column, FillStrategy), (Column, FillStrategy)]) to apply filling on specified columns of a data frame.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants