You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem or challenge?
There is a common operation in libraries such as PySpark to fill NaN values across an entire DataFrame (or limit by columns). It would be useful to have a similar feature in DataFusion and datafusion-python.
Describe the solution you'd like
If I have a dataframe with a bunch of null values in different columns, I would want to replace all NaNs in those columns with the provided value IF it can be cast to the column's type. Otherwise no-op should happen. Also the user should be able to limit which columns this applies to.
A possible implementation is to design a bunch of fill UDFs, such as fill_value, fill_prev, fill_linear etc, which act on a column expression in the select list.
The users can then use SELECT fill_value(a, 1.0), fill_prev(b) FROM table to fill selected columns with available fill strategies.
Or use DataFrame::new(..).fill(vec![(Column, FillStrategy), (Column, FillStrategy)]) to apply filling on specified columns of a data frame.
Is your feature request related to a problem or challenge?
There is a common operation in libraries such as PySpark to fill NaN values across an entire DataFrame (or limit by columns). It would be useful to have a similar feature in DataFusion and datafusion-python.
Describe the solution you'd like
If I have a dataframe with a bunch of null values in different columns, I would want to replace all NaNs in those columns with the provided value IF it can be cast to the column's type. Otherwise no-op should happen. Also the user should be able to limit which columns this applies to.
Describe alternatives you've considered
Additional context
This is a repost from apache/datafusion-python#922, prompted by this PR comment
The text was updated successfully, but these errors were encountered: