Skip to content

Commit

Permalink
Merge branch 'revised'
Browse files Browse the repository at this point in the history
  • Loading branch information
Juychen committed Jun 18, 2022
2 parents 2ce1e6f + 651d6fe commit c0e9936
Show file tree
Hide file tree
Showing 9 changed files with 1,227 additions and 11,085 deletions.
34 changes: 23 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,7 @@ Deep Transfer Learning of Drug Sensitivity by Integrating Bulk and Single-cell R
2. Migrate the source of testing data from the FTP to OneDrive.

## System requirements

Following packages are required to run the program:
The following packages are required to run the program:

- conda 4.8.4
- python 3.7.3
Expand All @@ -21,24 +20,24 @@ This software is developed and tested in the following software environment:
Python 3.7.3 (default, Apr 24 2019, 15:29:51) [MSC v.1915 64 bit (AMD64)] :: Anaconda, Inc. on win32
```

This software is developed and tested on the hardware envrionments of:
This software is developed and tested on the hardware environments of:
- CPU: HexaCore Intel Core i7-9750H, 4100 MHz (41 x 100)
- RAM: 32GB
- GPU: nVIDIA GeForce GTX 1660 Ti with Max-Q Design

No non-standard hardware are required for the software.
No non-standard hardware is required for the software.

## Installation guide
### Instructions
The software is a stand-alone python scirpt package. It can be downloaded and installed with this github repository:
The software is a stand-alone python script package. It can be downloaded and installed with this github repository:

```
git clone https://github.com/OSU-BMBL/scDEAL.git
```

### Typical install time
Or download the .zip file form the repository/ acuire the .zip file then decompress .zip file.
The download time depends on the network speed of user. No extra compile or installation time is required to run our main script. The installation time of dependencies is normally 1 minute per package.
Or download the .zip file from the repository/ acquire the .zip file then decompress .zip file.
The download time depends on the network speed of the user. No extra compile or installation time is required to run our main script. The installation time of dependencies is normally 1 minute per package.

## Data preparation
### Data download
Expand Down Expand Up @@ -101,9 +100,9 @@ Folders in our package will store the corresponding contents:

- root: python scripts to run the program and README.md
- data: datasets required for the learning
- saved/logs: log and error files that recording running status.
- saved/logs: log and error files that record running status.
- saved/figures & figures: figures generated through the run.
- saved/models: models triained through the run.
- saved/models: models trained through the run.
- saved/adata: results AnnData outputs.
- DaNN: python scripts describe the model.
- scanpypip: python scripts of utilities.
Expand All @@ -122,7 +121,7 @@ python bulkmodel.py --drug 'I-BET-762' --dimreduce 'AE' --encoder_h_dims "256,25

This step takes the expression profile of bulk RNA-Seq and the drug response annotations as input. Iw will train a drug sensitivity predictor for the drug 'I-BET-762.' The output model will be stored in the directory "saved/models." The prefix of the model's file name will be 'bulk_predictor_ae_' and its full name will be dependent on parameters that users insert. In this case. The file name of the bulk model will be "bulk_predictor_AEI-BET-762.pkl". For all available drug names, please refer to the columns names of files: ALL_label_binary_wf.csv.

For the transfer learning, we provide a built-in testing case of acute myeloid leukemia cells [Bell et al.](https://doi.org/10.1038/s41467-019-10652-9) accessed from Gene Expression Omnibus (GEO) accession [GSE110894](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE110894). The training time of the cases are:
For the transfer learning, we provide a built-in testing case of acute myeloid leukemia cells [Bell et al.](https://doi.org/10.1038/s41467-019-10652-9) accessed from Gene Expression Omnibus (GEO) accession [GSE110894](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE110894). The training time for the cases are:


```
Expand All @@ -136,7 +135,7 @@ This step trains the scDEAL model and predicts the sensitivity of I-BET-762 of t
The training time of the test case including bulk-level and single-cell-level training on the testing computer was 4 minutes.

### Expected output
The expected output format of scDEAL is the [AnnData](https://anndata.readthedocs.io/en/latest/anndata.AnnData.html) object (.h5ad) applied by the scanpy package. The file will be stored in the directory "scDEAL/saved/adata/data/". The prediction of sensitivity will be stored in adata.obs["sens_label"] (if you load your AnnDdata object named as adata) where 0 represents resistance and 1 represents sensitivity respectively. Further analysis for the output can be processed by the package [Scanpy](https://scanpy.readthedocs.io/en/stable/). The object can be loaded in to python through the function: [scanpy.read_h5ad](https://scanpy.readthedocs.io/en/latest/generated/scanpy.read_h5ad.html#scanpy-read-h5ad).
The expected output format of scDEAL is the [AnnData](https://anndata.readthedocs.io/en/latest/anndata.AnnData.html) object (.h5ad) applied by the scanpy package. The file will be stored in the directory "scDEAL/saved/adata/data/". The prediction of sensitivity will be stored in adata.obs["sens_label"] (if you load your AnnDdata object named as adata) where 0 represents resistance and 1 represents sensitivity respectively. Further analysis for the output can be processed by the package [Scanpy](https://scanpy.readthedocs.io/en/stable/). The object can be loaded into python through the function: [scanpy.read_h5ad](https://scanpy.readthedocs.io/en/latest/generated/scanpy.read_h5ad.html#scanpy-read-h5ad).

The expected output format of a successful run show includes:

Expand Down Expand Up @@ -172,6 +171,19 @@ Formats for your own drug resistance table and your bulk level expression should

For more detailed parameter settings of the two scripts, please refer to the documentation section.


## * Appendix
### * Appendix A: case studies
The folder named "casestudy" contains Jupyter notebook templates of selected case studies in the paper. You can follow the introduction within each notebook to create analysis results for scDEAL. For example, run bulkmode.py with user-defined parameters:

```
python bulkmodel.py --drug [*Your selected drug*] --data [*Your own bulk level expression*] --label [*Your own bulk level drug resistance table*] ... --printgene 'T'
python scmodel.py --sc_data [*Your own data path*] ... --printgene 'T'
```
- [casestudy/analysis_criticalgenes.ipynb](casestudy/analysis_criticalgenes.ipynb): critical gene identification by integrated gradient matrix;
- [casestudy/analysis_tarined_anndata.ipynb](casestudy/analysis_tarined_anndata.ipynb): umap, gene score, and regression plot of single-cell level prediction.

## Documentation
* Command: python bulkmodel.py
```
Expand Down
Loading

0 comments on commit c0e9936

Please sign in to comment.