Skip to content

ceccarellilab/ANTARES

Repository files navigation

ANTARES

The ANTigens from Repetitive ElementS (ANTARES) pipeline allows detecting high quality antigens derived from expressed tumor-associated transposable elements.

Screenshot

The pipeline consists in the following steps:

tumor_associated_TE.R: TE expression is quantified in each tumor sample using featureCounts by enabling optimal parameters for counting reads mapping on repetitive genomic regions. Tumor specific TEs are filtered with differential analysis

potential_peptide-coding_TE.R: The second step of the workflow detects potential peptidecoding TE. Briefly, the nucleotide sequences of each tumor associated TE are formatted into 6 possible frames and only putative ORFs of at least 10 aminoacids from their translated sequences are retained.

false_positive_filter.R: The third step reduces the number of false-positive TE-derived peptides by aligning their sequences against two pre-built databases of known proteins (one specific for LTR and one for LINE TE families) using Blastp.

The required db files used for blast can be obtained as following:

Methionine ORF fasta file for LTR can be download from: http://geve.med.u-tokai.ac.jp/download/
ORF1 e ORF2 fasta file for LINE can be downloaded from NCBI protein database

conda create -n blast -c conda-forge -c bioconda -c defaults blast
conda activate blast
makeblastdb -dbtype prot -in LINE_ORF.fasta -out db_LINE

Only conserved sequences (identity > 90% and e-value < 0.01) are retained and considered as high-quality TE-derived peptides.

Putative antigens derived from the selected peptides are detected using netMHCpan. Given the file of HLA-types for each patient, netMHCpan can be executed as following:

sample_list_file="HLA_per_patient.txt"
netMHCpan_executable="netMHCpan-4.1/netMHCpan"
fasta_file="predicted_peptides_filtered.fasta"

for sample in $(cat "$sample_list_file"); do
    output_file="netMHpan_predictions_${sample}.txt"
    "$netMHCpan_executable" -f "$fasta_file" -a "$sample" > "$output_file" &
done

antigen_prediction_prioritization.R: Putative antigens are prioritized both for MHC-I binding affinity and recognition potential scores.

About

ANTigens from Repetitive ElementS

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages