Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strategy Evaluation #3

Open
Malnammi opened this issue Jan 9, 2019 · 5 comments
Open

Strategy Evaluation #3

Malnammi opened this issue Jan 9, 2019 · 5 comments

Comments

@Malnammi
Copy link
Collaborator

Malnammi commented Jan 9, 2019

See strategy at #1.

Currently, strategy evaluation looks at 3 main metrics for each selected batch:

  1. n_hits: how many hits.
  2. norm_hits_ratio: the ratio of the number of hits to the max number of hits possible for this batch. Max hits is a function of the remaining unlabeled instances and the batch_size; i.e. if we have 100 remaining unlabeled hits and batch_size=96, then max_hits=min(96, 100).
  3. n_cluster_hits: how many unique clusters with actives. Note this does not necessarily look at new clusters found (clusters not in the training set), it merely computes the number of unique clusters with hits in the selected batch.
  4. norm_cluster_hits_ratio: similar to norm_hits_ratio but for cluster hits.
  5. novelty_n_hits: This was a metric @agitter and I discussed in the past. The formula is
    novel_n_hits = w * norm_hits_ratio + (1-w) * norm_cluster_hits_ratio

These metrics can then be used to compare different strategies.

@Malnammi
Copy link
Collaborator Author

Malnammi commented Jan 26, 2019

Update to the batch metrics calculated for each iteration:

  1. n_hits: how many hits.
  2. norm_hits_ratio: the ratio of the number of hits to the max number of hits possible for this batch. Max hits is a function of the remaining unlabeled instances and the batch_size; i.e. if we have 100 remaining unlabeled hits and batch_size=96, then max_hits=min(96, 100).
  3. n_cluster_hits: how many unique clusters with actives. Note this does not necessarily look at new clusters found (clusters not in the training set), it merely computes the number of unique clusters with hits in the selected batch.
  4. norm_cluster_hits_ratio: similar to norm_hits_ratio but for cluster hits.
  5. novelty_n_hits: This takes into account two cluster ID sets. The training_hit_clusters and batch_hit_clusters.
    novelty_n_hits = | batch_hit_clusters SETDIFF training_hit_clusters|
    This is the number of newly identified clusters with hits.
  6. batch_size:: This is recorded in the form of two metrics:
    batch_size = exploitation_batch_size + exploration_batch_size
    So we can keep track of the algorithm's exploit-explore budget allocation every iteration.
  7. computation_time: The computation time taken to select this iteration's batch. While this is machine-dependent, it will still be helpful to keep these in our records.
  8. batch_cost: The cost of the selected batch.
  9. screening_time: Time estimate of the time taken to: cherry-pick the selected batch, schedule, physically screen, and data-retrieve for the next computation iteration. This will be added to the general config files as an estimate for each screen. Later on we can define this in more detail at the plate or molecule level. Also see issue Incorporating cost #2.
    screening_time = cherry_picking_time_per_cpd * batch_size + screening_time_per_plate

Please add any others you can think of.

@agitter
Copy link
Member

agitter commented Jan 28, 2019

These look comprehensive to me. Do we want to track how well the classifier performed in the last round or how well-calibrated it is? We could evaluate how it performed on the last batch of compounds by comparing its activity prediction and confidence with the new experimental screening data.

This is slightly related to Prof. Raschka's idea to make sure the classifier doesn't get worse over the iterations and start making mistakes on examples it correctly classified previously.

@Malnammi
Copy link
Collaborator Author

@agitter Do you mean like this:

  1. At iteration i, given data for batch i-1, evaluate batch i-1 using AL metrics.
  2. Train model using batches 0, 1, ..., i-1.
  3. Evaluate trained model on batch i-2 using appropriate model metrics.
  4. Select batch i.
  5. Pass to selected batch to screening facility.

@Malnammi
Copy link
Collaborator Author

Alternative model evaluation:

  1. train on batches 0, 1, ..., i-2.
  2. evaluate model quality on batch i-1 data.
  3. record suitable model quality metrics. (need to determine).

@agitter
Copy link
Member

agitter commented Jan 30, 2019

The above model evaluation is what I had in mind.

Prof. Raschka's idea is less clear to me. It may involve tracking your overall model quality as you train on more and more data, but how to do that is ambiguous. You could do something like cross-validation on batches 0,...,i-1 at each batch, but those metrics would be for different size datasets and not easily comparable. We can probably drop this idea for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants