-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Global Refactoring #92
Conversation
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## develop #92 +/- ##
==========================================
Coverage ? 88.53%
==========================================
Files ? 11
Lines ? 663
Branches ? 0
==========================================
Hits ? 587
Misses ? 76
Partials ? 0 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a big step.
I just provided a few thoughts. Hope it helps.
src/med_bench/estimation/base.py
Outdated
""" | ||
self.rng = np.random.RandomState(123) | ||
|
||
self.mediator_type = mediator_type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curation needed on mediator_type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
additional comment here, there is a whole nuisances
folder and a nuisances.py files in utils, should be reorganized.
src/med_bench/estimation/dml.py
Outdated
|
||
self._crossfit = 0 | ||
self._procedure = procedure | ||
self._density_estimation_method = density_estimation_method |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like density_estimation_method
is not used?
src/med_bench/estimation/base.py
Outdated
def fit_score_nuisances(self, t, m, x, y, *args, **kwargs): | ||
""" Fits the score of the nuisance parameters | ||
""" | ||
clf_param_grid = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can't pass params to the grid then? not sure I get it
|
||
class CoefficientProduct(Estimator): | ||
|
||
def __init__(self, clip : float, trim : float, regularize : bool, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree they are not needed here
m = m.reshape(-1, 1) | ||
if len(t.shape) == 1: | ||
t = t.reshape(-1, 1) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be common to all estimators? or at least a function, like _check_inputs
from src/med_bench/utils/utils.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same for y.ravel()
src/med_bench/estimation/dml.py
Outdated
self._clip = clip | ||
self._trim = trim | ||
self._normalized = normalized | ||
self._sample_split = sample_split |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parameters needs to be documented
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like sample_split is not used?
src/med_bench/experiment.py
Outdated
coef_prod_estimator.fit(t_train, m_train, x_train, y_train) | ||
causal_effects = coef_prod_estimator.estimate(t_test, m_test, x_test, y_test) | ||
|
||
r_risk_score = coef_prod_estimator.score(t_test, m_test, x_test, y_test, causal_effects['total_effect']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be implemented in utils maybe, with a metrics module. Just saw that it existed, so need to import from there. Not sure about the naming scores vs metrics
|
||
from med_bench.utils.utils import _get_train_test_lists, _get_interactions | ||
|
||
def estimate_cross_conditional_mean_outcome_discrete(m, x, y, f, regressors): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure about a separate file from estimate_conditional_mean_outcome
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, it would be fine having all nuisances in one module IMHO
src/med_bench/nuisances/density.py
Outdated
|
||
return f_t0, f_t1 | ||
|
||
def estimate_mediator_probability(t, m, x, y, crossfit, clf_m, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should harmonize and explain more propensity vs probability vs density
src/med_bench/nuisances/utils.py
Outdated
reg = RidgeCV(alphas=alphas, cv=CV_FOLDS) | ||
else: | ||
reg = RandomForestRegressor(n_estimators=100, min_samples_leaf=10) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the objective of the refactor is to generalize to any method beyond ridge and forests, and not hardcode parameters
src/med_bench/utils/scores.py
Outdated
@@ -0,0 +1,45 @@ | |||
import numpy as np |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs docstrings and refs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LMK whenever you want a review
Thank you @bthirion, probably after our next iteration with @houssamzenati on next Tuesday |
@judithabk6 @bthirion
|
FOr GComputation, ideally, we should localize the issue. If it takes too long, merge as is but open in parallel an issue together with a code that outllines this issue so that we can investigate it later. |
can't we have a test on TMLE similar to that of MultiplyRobust (actually the same test ?) |
For code and doc improvements, the good way to proceed is to open a bunch of issues so that we don't lose track of the post-refactoring changes. |
@bthirion I'm on it, I agree with you that more exploration is necessary to figure out exactly what is going on with Gcomputation, and add tests for TMLE, at least, and improve the docstring and naming of the methods of the base estimator |
… tests to debug, and refactor tolerance to make it useful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current version seems ready to my mind to be merged. Should we do so and move on to the documentation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
+1 for merging now. |
I am opening this PR so that you can start to see the work I have done until now.
This PR is still messy and not clean.
What I have done (a lot is inspired from the work initiated by @houssamzenati and what we have defined in the issue #90 :
TO DO :
FYI @judithabk6 @bthirion @houssamzenati