-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add support for renaming and retaining columns in data preprocessor #466
feat: Add support for renaming and retaining columns in data preprocessor #466
Conversation
preprocessor. Signed-off-by: Dushyant Behl <dushyantbehl@in.ibm.com>
Thanks for making a pull request! 😃 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, this might require some investigation with iterable datasets after it is merged in.
Should we add one more test in |
Signed-off-by: Dushyant Behl <dushyantbehl@in.ibm.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart for this small nit, LGTM! Thanks Dushyant!
Signed-off-by: Dushyant Behl <dushyantbehl@in.ibm.com>
3f67a3c
to
9e2f75f
Compare
Description of the change
Add two flags to the
data_config
underdataset
definition.rename_columns
which allows users to rename columns by passing a dictretain_columns
which allows users to specify which columns to retain and drop others.This is different and outside the
remove_columns
argument todata handlers
because people might want to use this functionality even without invoking data handlers.This is especially needed in the case of interleaving multiple datasets with different features.
Related issue number
How to verify the PR
Was the PR tested