This repository is hosting the source code for Keras-based machine learning model ChrNet, which stands for Chromosome-based 1D-CNN network.
Our model is flexiable in both input end and output end, capable of retraining.
Input end: Accept different versions of reference genome.
Output end: Accept different output cell types.
The code requires python3. When dataset is large, you may force cpu in main.py
to process the training or predicting, which is recommended when predicting the output.
The default input for model is 24 chromosomes (including 1-22, X and Y). Inputs however can be customized by inputting another reference_bed in main.py
and modify the different sizes of chr_list in model.py
.
The default output cell types are:
"Dendritic cell", "NK cell", "B cell", "CD4+ T cell", "CD8+ T cell", "CD14+ monocyte" and "Other".
We accept customized output cell types for retraining. In data.py
, change the variable dict_label to your own
dictionary labels to alter the cell type output.
IntegratedGradients function was acquired from here.
findMetaFeature followed the procedure from Improving interpretability of deep learning models: splicing codes as a case study -- Anupama Jha, Joseph K. Aicher, Deependra Singh, Yoseph Barash, 2019.
. ├── data.py ├── dataset/ │ ├── testing_tpm.tsv │ └── training_tpm.tsv ├── front.png ├── main.py ├── model.py ├── IntegratedGradients.py ├── pre_trained/ │ └── ChrNet.hdf5 ├── README.md └── reference/ └── hg19.sorted.bed
data.py
: Storing all functions and tools.
dataset
: Storing training and testing set.
main.py
: The main function for ChrNet.
model.py
: Storing the model.
pre_trained
: Storing pre_trained weight for ChrNet.
reference
: Storing the reference bed for model.