Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Global Refactoring #92

Merged
merged 75 commits into from
Jan 15, 2025
Merged

Global Refactoring #92

merged 75 commits into from
Jan 15, 2025

Conversation

brash6
Copy link
Collaborator

@brash6 brash6 commented Aug 1, 2024

I am opening this PR so that you can start to see the work I have done until now.
This PR is still messy and not clean.

What I have done (a lot is inspired from the work initiated by @houssamzenati and what we have defined in the issue #90 :

  • Modularize estimators
  • Add nuisances functions in a nuisances folder / Add useful utils functions
  • Create an experiment.py file to test modularized estimators with the data generation from med_bench

TO DO :

  • Ensure every estimator is working as expected
  • Explicitly decide which set of parameters to be used in every scenario (data generation, classifiers, regressors, every other hyper parameters)
  • Create tests files covering every defined scenario
  • Adapt CI to new tests files
  • Add clean docstrings everywhere
  • Clean files (remove unused files / functions / variables, enhance / unify naming, strongtyping, etc.)
  • Add an example file in which each estimator is run

FYI @judithabk6 @bthirion @houssamzenati

@codecov-commenter
Copy link

codecov-commenter commented Aug 1, 2024

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 89.22801% with 60 lines in your changes missing coverage. Please review.

Please upload report for BASE (develop@8530e26). Learn more about missing BASE report.

Files with missing lines Patch % Lines
src/med_bench/utils/decorators.py 23.68% 29 Missing ⚠️
src/med_bench/estimation/mediation_mr.py 81.53% 12 Missing ⚠️
src/med_bench/estimation/base.py 93.44% 8 Missing ⚠️
src/med_bench/utils/utils.py 73.91% 6 Missing ⚠️
src/med_bench/estimation/mediation_dml.py 91.83% 4 Missing ⚠️
..._bench/estimation/mediation_coefficient_product.py 96.77% 1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             develop      #92   +/-   ##
==========================================
  Coverage           ?   88.53%           
==========================================
  Files              ?       11           
  Lines              ?      663           
  Branches           ?        0           
==========================================
  Hits               ?      587           
  Misses             ?       76           
  Partials           ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@bthirion bthirion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a big step.
I just provided a few thoughts. Hope it helps.

"""
self.rng = np.random.RandomState(123)

self.mediator_type = mediator_type
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curation needed on mediator_type

Copy link
Owner

@judithabk6 judithabk6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

additional comment here, there is a whole nuisances folder and a nuisances.py files in utils, should be reorganized.


self._crossfit = 0
self._procedure = procedure
self._density_estimation_method = density_estimation_method
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like density_estimation_method is not used?

def fit_score_nuisances(self, t, m, x, y, *args, **kwargs):
""" Fits the score of the nuisance parameters
"""
clf_param_grid = {}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can't pass params to the grid then? not sure I get it


class CoefficientProduct(Estimator):

def __init__(self, clip : float, trim : float, regularize : bool, **kwargs):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree they are not needed here

m = m.reshape(-1, 1)
if len(t.shape) == 1:
t = t.reshape(-1, 1)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be common to all estimators? or at least a function, like _check_inputs from src/med_bench/utils/utils.py

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same for y.ravel()

self._clip = clip
self._trim = trim
self._normalized = normalized
self._sample_split = sample_split
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parameters needs to be documented

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like sample_split is not used?

coef_prod_estimator.fit(t_train, m_train, x_train, y_train)
causal_effects = coef_prod_estimator.estimate(t_test, m_test, x_test, y_test)

r_risk_score = coef_prod_estimator.score(t_test, m_test, x_test, y_test, causal_effects['total_effect'])
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be implemented in utils maybe, with a metrics module. Just saw that it existed, so need to import from there. Not sure about the naming scores vs metrics


from med_bench.utils.utils import _get_train_test_lists, _get_interactions

def estimate_cross_conditional_mean_outcome_discrete(m, x, y, f, regressors):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure about a separate file from estimate_conditional_mean_outcome

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, it would be fine having all nuisances in one module IMHO


return f_t0, f_t1

def estimate_mediator_probability(t, m, x, y, crossfit, clf_m,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should harmonize and explain more propensity vs probability vs density

reg = RidgeCV(alphas=alphas, cv=CV_FOLDS)
else:
reg = RandomForestRegressor(n_estimators=100, min_samples_leaf=10)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the objective of the refactor is to generalize to any method beyond ridge and forests, and not hardcode parameters

@@ -0,0 +1,45 @@
import numpy as np
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs docstrings and refs

Copy link
Collaborator

@bthirion bthirion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LMK whenever you want a review

@brash6
Copy link
Collaborator Author

brash6 commented Oct 25, 2024

LMK whenever you want a review

Thank you @bthirion, probably after our next iteration with @houssamzenati on next Tuesday

@brash6
Copy link
Collaborator Author

brash6 commented Nov 14, 2024

@judithabk6 @bthirion
We just addressed all the comments from Judith with @houssamzenati, however, some issues still remain :

  • GComputation estimator still has large discrepencies with expected values in the tolerance tests : Do we increase the tolerance thresholds or we try to understand the problem deeper ?
  • TMLE is still not tested
  • Documentations has to be enhanced (docstrings and online doc)
  • Some optimizations can still be performed in a further PR (DML removal, some functions merging, etc.)

@bthirion
Copy link
Collaborator

FOr GComputation, ideally, we should localize the issue. If it takes too long, merge as is but open in parallel an issue together with a code that outllines this issue so that we can investigate it later.

@bthirion
Copy link
Collaborator

can't we have a test on TMLE similar to that of MultiplyRobust (actually the same test ?)

@bthirion
Copy link
Collaborator

For code and doc improvements, the good way to proceed is to open a bunch of issues so that we don't lose track of the post-refactoring changes.

@judithabk6
Copy link
Owner

@bthirion I'm on it, I agree with you that more exploration is necessary to figure out exactly what is going on with Gcomputation, and add tests for TMLE, at least, and improve the docstring and naming of the methods of the base estimator

@judithabk6 judithabk6 mentioned this pull request Dec 6, 2024
Copy link
Collaborator

@houssamzenati houssamzenati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current version seems ready to my mind to be merged. Should we do so and move on to the documentation?

Copy link
Collaborator

@bthirion bthirion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bthirion
Copy link
Collaborator

The current version seems ready to my mind to be merged. Should we do so and move on to the documentation?

+1 for merging now.

@judithabk6 judithabk6 merged commit 08a997d into develop Jan 15, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants