-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strategy Evaluation #3
Comments
Update to the batch metrics calculated for each iteration:
Please add any others you can think of. |
These look comprehensive to me. Do we want to track how well the classifier performed in the last round or how well-calibrated it is? We could evaluate how it performed on the last batch of compounds by comparing its activity prediction and confidence with the new experimental screening data. This is slightly related to Prof. Raschka's idea to make sure the classifier doesn't get worse over the iterations and start making mistakes on examples it correctly classified previously. |
@agitter Do you mean like this:
|
Alternative model evaluation:
|
The above model evaluation is what I had in mind. Prof. Raschka's idea is less clear to me. It may involve tracking your overall model quality as you train on more and more data, but how to do that is ambiguous. You could do something like cross-validation on batches 0,...,i-1 at each batch, but those metrics would be for different size datasets and not easily comparable. We can probably drop this idea for now. |
See strategy at #1.
Currently, strategy evaluation looks at 3 main metrics for each selected batch:
batch_size
; i.e. if we have 100 remaining unlabeled hits andbatch_size=96
, thenmax_hits=min(96, 100)
.novel_n_hits = w * norm_hits_ratio + (1-w) * norm_cluster_hits_ratio
These metrics can then be used to compare different strategies.
The text was updated successfully, but these errors were encountered: