docs: `model_selection` module docstrings #732

mkalimeri · 2025-01-22T12:50:51Z

Description

Related to #596

Examples added for

model_selection.TimeGapSplit
model_selection.GroupTimeSeriesSplit
model_selection.ClusterFoldValidation (I see that KlusterFoldValidation has been renamed as ClusterFoldValidation)

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

My code follows the style guidelines (ruff)
I have commented my code, particularly in hard-to-understand areas
[NA] I have made corresponding changes to the documentation (also to the readme.md)
[NA] I have added tests that prove my fix is effective or that my feature works
[NA] I have added tests to check whether the new feature adheres to the sklearn convention
New and existing unit tests pass locally with my changes

If you feel your PR is ready for a review, ping @FBruzzesi or @koaning.

FBruzzesi

Thanks for the PR @mkalimeri!

The only concern I have is related to the part of code in TimeGapSplit summary. @koaning WDYT?
We can keep it as a follow up, but that's certainly not the right assumption to make in this context

FBruzzesi · 2025-01-24T20:35:22Z

sklego/model_selection.py

+    ### Start date     2024-01-01 00:00:00
+    ### End date       2024-01-01 00:00:09
+    ### Period             0 days 00:00:09
+    ### Unique days                     10


How is this returning 10 days? 🤔

Edit: Looking into the code, this is fairly random assumption that the number of unique dates are in days.

Yes, this should not be unique days but maybe timepoints? Shall we leave this as is for now and open another issue to generate additional/different information in the summary? Happy to pick that up!

Yeah, this is a nit that would be nice to tackle. I would also totally be open to doing that in this PR, less waiting for merges that way.

Sure, I can do that. I will come back to you with a plan of what information I would include. Updating this would require changes in tests as well, so it will be more than one file. Is that ok?

Hi, I got to take a look at this today. As @FBruzzesi mentioned, we can't assume that the number of days is relevant to all use cases. We could keep the field and correct the calculation, to give an idea of the frequency of data, but it might not be relevant if the granularity of the time series is lower. Another idea is to add the granularity of the time series as a field of the summary. What are your thoughts?

FBruzzesi · 2025-01-24T20:36:06Z

sklego/model_selection.py

+    # Create dataset
+    np.random.seed(1)
+    num_rows = 30
+    df = pd.DataFrame(np.random.randn(num_rows, 4)).rename(columns={0: 'c1', 1: 'c2', 2: 'c3', 3: 'c4'})


You can pass the column names directly:

Suggested change

df = pd.DataFrame(np.random.randn(num_rows, 4)).rename(columns={0: 'c1', 1: 'c2', 2: 'c3', 3: 'c4'})

df = pd.DataFrame(np.random.randn(num_rows, 4), columns=["c1", "c2", "c3", "c4"])

Update model_selection.py

a0d197a

FBruzzesi changed the title ~~Update model_selection.py~~ docs: model_selection module docstrings Jan 24, 2025

FBruzzesi reviewed Jan 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: `model_selection` module docstrings #732

docs: `model_selection` module docstrings #732

mkalimeri commented Jan 22, 2025 •

edited by FBruzzesi

Loading

FBruzzesi left a comment

FBruzzesi Jan 24, 2025

mkalimeri Jan 28, 2025

koaning Jan 28, 2025

mkalimeri Jan 28, 2025

mkalimeri Feb 14, 2025 •

edited

Loading

FBruzzesi Jan 24, 2025

	df = pd.DataFrame(np.random.randn(num_rows, 4)).rename(columns={0: 'c1', 1: 'c2', 2: 'c3', 3: 'c4'})
	df = pd.DataFrame(np.random.randn(num_rows, 4), columns=["c1", "c2", "c3", "c4"])

docs: model_selection module docstrings #732

Are you sure you want to change the base?

docs: model_selection module docstrings #732

Conversation

mkalimeri commented Jan 22, 2025 • edited by FBruzzesi Loading

Description

Type of change

Checklist:

FBruzzesi left a comment

Choose a reason for hiding this comment

FBruzzesi Jan 24, 2025

Choose a reason for hiding this comment

mkalimeri Jan 28, 2025

Choose a reason for hiding this comment

koaning Jan 28, 2025

Choose a reason for hiding this comment

mkalimeri Jan 28, 2025

Choose a reason for hiding this comment

mkalimeri Feb 14, 2025 • edited Loading

Choose a reason for hiding this comment

FBruzzesi Jan 24, 2025

Choose a reason for hiding this comment

docs: `model_selection` module docstrings #732

docs: `model_selection` module docstrings #732

mkalimeri commented Jan 22, 2025 •

edited by FBruzzesi

Loading

mkalimeri Feb 14, 2025 •

edited

Loading