LSTrAP-Kingdom: an automated pipeline to generate annotated gene expression atlases for kingdoms of life
By NTU Plants Systems Biology and Evolution Laboratory
This repository is found in this Github Repository, with an accompanying paper found here and preprint version here. Do create pull requests for issues/bugs and feature requests. Contact me for feedback or reporting bugs.
A. First local setup of the pipeline
- This segment only needs to be implemented at the first setup of this repository on your local machine/server.
B. Initialization for each session
- These commands need to be run everytime the pipeline is accessed from a new terminal session. They will load the python environment with the installed packages, and add ascp and kallisto commands to the global environment $PATH. If kallisto or ascp(Aspera CLI) is not downloaded, they will also be downloaded.
-
C. Bulk Download
- The steps to run a download job for multiple species are outlined here.
-
D. Small download job
- The steps to run a download job for a single species are outlined here.
- This segment provides an overview of the structure of the directories in the main scripts
plants-pipeline
directory and the data directorypipeline-data
.
- After the download job is completed, these are the steps needed to generate the TPM matrices and perform quality control, which includes:
- Generating TPM matrices
- Quality control
- Performing coexpression to count number of ribosomal gene neighbours for every gene
- The F1 scores for the benchmark in the paper are generated using the scripts here.
- Describes how the annotation accuracy and coverage were derived for the publication.
The figures in the publication were prepared with the code given in this iPython notebook.