-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: model_selection
module docstrings
#732
base: main
Are you sure you want to change the base?
Conversation
model_selection
module docstrings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @mkalimeri!
The only concern I have is related to the part of code in TimeGapSplit
summary. @koaning WDYT?
We can keep it as a follow up, but that's certainly not the right assumption to make in this context
### Start date 2024-01-01 00:00:00 | ||
### End date 2024-01-01 00:00:09 | ||
### Period 0 days 00:00:09 | ||
### Unique days 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this returning 10 days? 🤔
Edit: Looking into the code, this is fairly random assumption that the number of unique dates are in days.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this should not be unique days but maybe timepoints? Shall we leave this as is for now and open another issue to generate additional/different information in the summary? Happy to pick that up!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is a nit that would be nice to tackle. I would also totally be open to doing that in this PR, less waiting for merges that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I can do that. I will come back to you with a plan of what information I would include. Updating this would require changes in tests as well, so it will be more than one file. Is that ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, I got to take a look at this today. As @FBruzzesi mentioned, we can't assume that the number of days is relevant to all use cases. We could keep the field and correct the calculation, to give an idea of the frequency of data, but it might not be relevant if the granularity of the time series is lower. Another idea is to add the granularity of the time series as a field of the summary. What are your thoughts?
# Create dataset | ||
np.random.seed(1) | ||
num_rows = 30 | ||
df = pd.DataFrame(np.random.randn(num_rows, 4)).rename(columns={0: 'c1', 1: 'c2', 2: 'c3', 3: 'c4'}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can pass the column names directly:
df = pd.DataFrame(np.random.randn(num_rows, 4)).rename(columns={0: 'c1', 1: 'c2', 2: 'c3', 3: 'c4'}) | |
df = pd.DataFrame(np.random.randn(num_rows, 4), columns=["c1", "c2", "c3", "c4"]) |
Description
Related to #596
Examples added for
Type of change
Checklist:
If you feel your PR is ready for a review, ping @FBruzzesi or @koaning.