You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Subgrounds should offer dataframe graphql function support for multiple libraries as well, not just Pandas. Currently the only dataframe utility functions are Pandas, found here
The current direction of Subgrounds is going towards a multi-client world. One alternative client to the base client would be to utilize polars instead of pandas dataframes. However, currently dataframe_utils.py only offers pandas function helpers, which actively discriminates against using polars with Subgrounds.
To utilize subgrounds with polars, examples of functions that need to be constantly defined are fmt_dict_cols and fmt_arr_cols.
fmt_dict_cols - required to convert graphql json data into polars dataframe columns
fmt_arr_cols - required to separate graphql json data fields that contain arrays into polars individual dataframe columns.
Example code:
deffmt_dict_cols(df: pl.DataFrame) ->pl.DataFrame:
""" formats dictionary cols, which are 'structs' in a polars df, into separate columns and renames accordingly. """forcolumnindf.columns:
ifisinstance(df[column][0], dict):
col_names=df[column][0].keys()
# rename struct columnsstruct_df=df.select(
pl.col(column).struct.rename_fields([f'{column}_{c}'forcincol_names])
)
struct_df=struct_df.unnest(column)
# add struct_df columns to df anddf=df.with_columns(struct_df)
# drop the df columndf=df.drop(column)
returndfdeffmt_arr_cols(df: pl.DataFrame) ->pl.DataFrame:
""" formats lists, which are arrays in a polars df, into separate columns and renames accordingly. Since there isn't a direct way to convert array -> new columns, we convert the array to a struct and then unnest the struct into new columns. """# use this logic if column is a list (rows show up as pl.Series)forcolumnindf.columns:
ifisinstance(df[column][0], pl.Series):
# convert struct to arraystruct_df=df.select([pl.col(column).arr.to_struct()])
# rename struct fieldsstruct_df=struct_df.select(
pl.col(column).struct.rename_fields([f"{column}_{i}"foriinrange(len(struct_df.shape))])
)
# unnest struct fields into their own columnsstruct_df=struct_df.unnest(column)
# add struct_df columns to df anddf=df.with_columns(struct_df)
# drop the df columndf=df.drop(column)
returndf```
The text was updated successfully, but these errors were encountered:
Feature Request & Rationale
Subgrounds should offer dataframe graphql function support for multiple libraries as well, not just Pandas. Currently the only dataframe utility functions are Pandas, found here
The current direction of Subgrounds is going towards a multi-client world. One alternative client to the base client would be to utilize
polars
instead ofpandas
dataframes. However, currentlydataframe_utils.py
only offers pandas function helpers, which actively discriminates against using polars with Subgrounds.To utilize subgrounds with polars, examples of functions that need to be constantly defined are
fmt_dict_cols
andfmt_arr_cols
.fmt_dict_cols
- required to convert graphql json data into polars dataframe columnsfmt_arr_cols
- required to separate graphql json data fields that contain arrays into polars individual dataframe columns.Example code:
The text was updated successfully, but these errors were encountered: