The GeneQC package requires Python 3 to execute in Linux environment, includes the blast+ and SAMtools libraries, GeneQC takes Reference Genome, Annotation file, Read alignment (mapped by HISAT2, compressed to BAM file by SAMtools) as input, you may take the following steps to obtain the preliminary files. The GeneQC will generate feature extraction and modeling results(D-score).
The package contains:
Minimum package of Blast for makeblastdb and Blastn
Samtools package version 1.2.1
GeneQC python code
A special script "" which used to transform the genome sequences to transcript sequences and the corresponding annotation file.
Hardware requirements are based on the size of Reference Genome and Annotation file. High performance cluster preferred.
GeneQC package requires Python 3 to execute. We recommended user use anaconda 5 to execute the GeneQC. Most of high performance clusters have already installed the anaconda, so user can load anaconda directly using following code.
- Check available modules in the cluster:
module avail
- find anaconda (4 or above) in available modules and load it. (for example, if anaconda5/5.0.0-3.6 is in the available modules list):
module load anaconda5/5.0.0-3.6
The source code of GeneQC is freely available at:
To install GeneQC, first download the zip file manually from github, or use the code below in Unix:
cd your_folder_path
Unzip the file:
Seven sample species of data (Reference Genome, Annotation file, Read alignment (mapped by HISAT2, compressed to BAM file by SAMtools) can be downloaded from our website ( Please following this instruction to download your data.
For example, here we want to download RNA-Seq data of Homo sapines specie
cd your_folder_path_of_"GeneQC_Python"
gunzip -c Humo_raw-data.tar.gz | tar xopf -
For running GeneQC for plant data: Three inputs (data) are required: reference genome, annotation file, sam or bam file should be uploaded to folder of "GeneQC_Python" under the GeneQC-master folder in the cluster.
Move to the path of folder of "GeneQC_Python"
cd your_folder_path_of_"GeneQC_Python"
module load anaconda(chose_your_version)
Run GeneQC:
python [1] [reference genome] [standard gff annotation file] [sam or bam file]
Example: A.thaliana
python 1 Athaliana_167_TAIR9.fa Athaliana_167_TAIR10.gene.gff3 ERR1297323.bam
The outputs will be generated in this folder as well. ERR1297323_out.txt will be feature extraction results. ERR1297323_out.csv will be D-scoure results.
For running GeneQC for animal data: Three inputs (data) are required: reference genome, annotation file, fastq file should be uploaded to folder of "GeneQC_Python" under the GeneQC-master folder in the clust
Move to the path of folder of "GeneQC_Python"
cd your_folder_path_of_"GeneQC_Python"
module load anaconda(chose_your_version)
Step1: Create new defined transcripts and new defined transcripts annotation:
python [reference genome] [standard gff annotation file] [new defined transcripts sequence file] [new defined transcripts gff annotation file]
Example: Humo sapiens
python GCF_000001405.37_GRCh38.p11_genomic.fna GCF_000001405.37_GRCh38.p11_genomic.gff human_transcripts_seq.fa human_transcripts_seq.gff
Step2: Do RNA-seq mapping work with the new mapping results, use following commands (the example used aligner HISAT2):
module load hisat2
hisat2-build -f human_transcripts_seq.fa ./hisatindex/Humo
hisat2 -x ./hisatindex/Humo -k 10 -p 40 -1 SRR6029567_1.fastq -2 SRR6029567_2.fastq -S SRR6029567.sam
Step3: Run GeneQC:
python 2 human_transcripts_seq.fa human_transcripts_seq.gff SRR6029567.sam
Step2: Do RNA-seq mapping work with the new mapping results, use following commands (the example used aligner HISAT2):
module load hisat2
hisat2-build -f human_transcripts_seq.fa ./hisatindex/Humo
hisat2 -x ./hisatindex/Humo -k 10 SRR491087.fastq -S SRR491087.sam
Step3: Run GeneQC:
python 2 human_transcripts_seq.fa human_transcripts_seq.gff SRR491087.sam
The outputs will be generated in this folder as well. SRR6029567_out.txt will be feature extraction results. SRR6029567_out.csv will be D-scoure results.