Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ordinal regression #7

Open
agitter opened this issue Jan 28, 2019 · 1 comment
Open

Ordinal regression #7

agitter opened this issue Jan 28, 2019 · 1 comment

Comments

@agitter
Copy link
Member

agitter commented Jan 28, 2019

Prof. Raschka had an idea of prioritizing clusters instead of individual compounds. His idea was to use ordinal regression to predict the number of actives in the cluster. This would require featurizing clusters with a consensus fingerprint or other feature summarizations.

We do not want to change our approach, but we should consider the pros and cons of this idea so that we know the strengths of our approach.

@Malnammi
Copy link
Collaborator

This is somewhat related to consensus fingerprint for a cluster. The cluster-based-selector now supports an option for computing cluster dissimilarity using consensus fingerprints rather than comparing every instance within each cluster. The formula for consensus fingerprint of cluster ci:

ci_instances = np.where(clusters == ci)[0]
X_consensus = ((np.sum(X[ci_instances,:], axis=0) / ci_instances .shape[0]) >= 0.5).astype(float)

In words, we set the bit at position i if the majority of the instances have that bit set. Randomly applying this dissimilarity computation on 20 dense clusters gives results that are mostly similar to the instance-by-instance method (within +- 0.04 in most cases, few cases had +-0.1).

This consensus method should reduce overall memory costs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants