PyMolSAR aims to provide a generalizable open-source tool for calculating 759 molecular descriptors and test out several different supervised learning algorithms to build the most-appropriate Quantitative Structure-Activity Relationship (QSAR) classification or regression model that accurately predicts the chemical properties or activities of small molecules.
Table of contents:
Using a conda environment
git clone https://github.com/BeckResearchLab/small-molecule-design-toolkit.git
cd small-molecule-design-toolkit
python setup.py install
Two good tutorials to get started are Melting Point Prediction and Blood-Brain Barrier Permeability. Follow along with the tutorials to see how to predict properties on molecules using machine learning.
- A column containing SMILES strings.
- A column containing an experimental measurement.
Most machine learning algorithms require that input data form vectors.
However, input data for cheminformatics and drug discovery datasets routinely come in the format of lists of molecules and associated experimental readouts. To transform lists of molecules into vectors,
we need to calculate a set of molecular descriptors using smdt.molecular_descriptors.getAllDescriptors()
smdt
can build and evaluate different classification and regression models built on top of sklearn
.
A model report is generated to facilitate the user to choose the most appropriate Quantitative Structure-Activity Relationship (QSAR) or
Quantitative Structure-Property Relationship (QSPR) model.