This repository contains trained reference sets that can be used with the Ribosomal Database Project classifier (Wang et al., 2007) to taxonomically assign vertebrate 12S mitochondrial gene sequences. The latest releases can be downloaded from https://github.com/terrimporter/12SvertebrateClassifier/releases
This classifier is suitable for classifying vertebrate 12S mitochondrial gene sequences to species rank. Reference sequences were obtained from the NCBI nucleotide [accessed July 2021] and MitoFish [accessed March 2020]. Taxonomy is based on the NCBI taxonomy database.
You can cite this repository directly:
Teresita M. Porter. (2021). terrimporter/12SvertebrateClassifier: 12S Vertebrate Classifier v2.0.0. Zenodo. https://zenodo.org/badge/latestdoi/391459819
If you use a version of the classifier that contains sequences from MitoFish please also cite: Sato, Y., Miya, M., Fukunaga, T., Sado, T., Iwasaki, W., Kumar, S. (2018) MitoFish and MiFish Pipeline: A mitochondrial genome database of fish with an analysis pipeline for environmental DNA metabarcoding. Molecular Biology and Evolution, 35: 1553.
If you use this reference set with the RDP classifier please also cite the naive Bayesian classification tool:
Wang et al. (2007) Naïve Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Applied and Environmental Microbiology, 73: 5261.
Decompress the tar.gz file:
$ tar -xvzf FileName.tar.gz
Use with the RDP classifier:
java -Xmx8g -jar /path/to/rdp_classifier_2.13/dist/classifier.jar classify -t /path/to/mydata_trained/rRNAClassifier.properties -o ClassifiedQueryFilename QueryFilename
This version was created from the NCBI nucleotide database [accessed March 9, 2023]. This version contains 3,706 reference sequences, 4,007 taxa in total including 1,868 species. Set was filtered only include sequences at least 500bp+, sequences with no ambiguous bases, and records with Linnean binomial species names. Non-vertebrate outgroup sequences were added. Sequences were checked for human and non-vertebrate contaminants. This version of the database only includes GenBank records from North America. For a reference set that includes global species, use v2.0.0.
Rank | 500 bp+ | 400 bp | 300 bp | 200 bp | 100 bp |
---|---|---|---|---|---|
Superkingdom | 0 | 0 | 0 | 0 | 0 |
Kingdom | 0 | 0 | 0 | 0 | 0 |
Phylum | 0 | 0 | 0 | 0 | 0.2 |
Class | 0 | 0 | 0.2 | 0.3 | 0.3 |
Order | 0.3 | 0.4 | 0.4 | 0.4 | 0.4 |
Family | 0.6 | 0.5 | 0.6 | 0.6 | 0.5 |
Genus | 0.9 | 0.9 | 0.9 | 0.8 | 0.9 |
Species | NA | NA | NA | NA | NA |
NA = No cutoff available will result in 99% correct assignments
Rank | 500 bp+ | 400 bp | 300 bp | 200 bp | 100 bp |
---|---|---|---|---|---|
Superkingdom | 0 | 0 | 0 | 0 | 0 |
Kingdom | 0 | 0 | 0 | 0 | 0 |
Phylum | 0 | 0 | 0 | 0 | 0 |
Class | 0 | 0 | 0 | 0 | 0 |
Order | 0 | 0 | 0 | 0 | 0.2 |
Family | 0 | 0 | 0.1 | 0.3 | 0.2 |
Genus | 0.1 | 0.2 | 0.4 | 0.3 | 0.4 |
Species | 0.8 | 0.8 | 0.8 | 0.8 | 0.9 |
NA = No cutoff available will result in 95% correct assignments
Rank | 500 bp+ | 400 bp | 300 bp | 200 bp | 100 bp |
---|---|---|---|---|---|
Superkingdom | 0 | 0 | 0 | 0 | 0 |
Kingdom | 0 | 0 | 0 | 0 | 0 |
Phylum | 0 | 0 | 0 | 0 | 0 |
Class | 0 | 0 | 0 | 0 | 0 |
Order | 0 | 0 | 0 | 0 | 0 |
Family | 0 | 0 | 0 | 0 | 0 |
Genus | 0 | 0 | 0 | 0 | 0.1 |
Species | 0 | 0 | 0.2 | 0.3 | 0.4 |
These files are ready to be used with the RDP Classifier. Created from the NCBI nucleotide database [accessed July 26, 2020]. Added additional 12S sequences from the 12S fish classifier available from https://github.com/terrimporter/12SfishClassifier that includes sequences from MitoFish [March 2020]. This version contains 19,654 reference sequences and 15,007 taxa at all ranks, including 9,564 species. This version of the classifier is recommended if there is a needed emphasis on fish, otherwise v1.0.0 classification performance is slightly better with lower bootstrap support cutoffs needed to ensure accuracy.
These files were used to train the RDP Classifier. Provided here for reference only. Includes a FASTA file and a taxonomy file.
Rank | 500 bp+ | 400 bp | 300 bp | 200 bp | 100 bp |
---|---|---|---|---|---|
Superkingdom | 0 | 0 | 0 | 0 | 0 |
Kingdom | 0 | 0 | 0 | 0 | 0 |
Phylum | 0 | 0 | 0 | 0 | 0 |
Class | 0 | 0 | 0 | 0 | 20 |
Order | 10 | 20 | 30 | 30 | 40 |
Family | 50 | 50 | 50 | 50 | 50 |
Genus | NA | NA | NA | NA | 95 |
Species * | NA | NA | NA | NA | NA |
NA = No cutoff available will result in 99% correct assignments
Rank | Full | 400 bp | 300 bp | 200 bp | 100 bp |
---|---|---|---|---|---|
Superkingdom | 0 | 0 | 0 | 0 | 0 |
Kingdom | 0 | 0 | 0 | 0 | 0 |
Phylum | 0 | 0 | 0 | 0 | 0 |
Class | 0 | 0 | 0 | 0 | 0 |
Order | 0 | 0 | 0 | 0 | 0 |
Family | 0 | 0 | 0 | 0 | 10 |
Genus | 30 | 40 | 40 | 40 | 50 |
Species * | NA | NA | 95 | 95 | NA |
NA = No cutoff available will result in 95% correct assignments
Rank | Full | 400 bp | 300 bp | 200 bp | 100 bp |
---|---|---|---|---|---|
Superkingdom | 0 | 0 | 0 | 0 | 0 |
Kingdom | 0 | 0 | 0 | 0 | 0 |
Phylum | 0 | 0 | 0 | 0 | 0 |
Class | 0 | 0 | 0 | 0 | 0 |
Order | 0 | 0 | 0 | 0 | 0 |
Family | 0 | 0 | 0 | 0 | 0 |
Genus | 0 | 0 | 0 | 0 | 20 |
Species * | 60 | 60 | 60 | 60 | 80 |
These files are ready to be used with the RDP Classifier. Created from the NCBI nucleotide database [accessed July 26, 2020]. This version contains 16,801 reference sequences and 11,633 taxa at all ranks, including 7,277 species.
These files were used to train the RDP Classifier. Provided here for reference only. Includes a FASTA file and a taxonomy file.
Taxonomic assignment results should be filtered according to their bootstrap support values to reduce false positive assignments. Cutoffs are based on leave-one-sequence-out testing of non-singleton species. Here we recommend MINIMUM bootstrap cutoffs according to query length and assignment rank. Assuming your query sequences are represented in the reference set, using the cutoffs presented in the first table below should ensure 99% accuracy. If you wish to cast a wider net, you can use the second tables below for 95% or 90% accuracy.
Rank | 500 bp+ | 400 bp | 300 bp | 200 bp | 100 bp |
---|---|---|---|---|---|
Superkingdom | 0 | 0 | 0 | 0 | 0 |
Kingdom | 0 | 0 | 0 | 0 | 0 |
Phylum | 0 | 0 | 0 | 0 | 0 |
Class | 0 | 0 | 0 | 0 | 20 |
Order | 20 | 20 | 30 | 30 | 30 |
Family | 50 | 40 | 50 | 50 | 50 |
Genus | 95 | 90 | 90 | 90 | 90 |
Species * | NA | NA | NA | NA | NA |
NA = No cutoff available will result in 99% correct assignments
Rank | Full | 400 bp | 300 bp | 200 bp | 100 bp |
---|---|---|---|---|---|
Superkingdom | 0 | 0 | 0 | 0 | 0 |
Kingdom | 0 | 0 | 0 | 0 | 0 |
Phylum | 0 | 0 | 0 | 0 | 0 |
Class | 0 | 0 | 0 | 0 | 0 |
Order | 0 | 0 | 0 | 0 | 0 |
Family | 0 | 0 | 0 | 0 | 10 |
Genus | 0 | 0 | 0 | 20 | 30 |
Species * | 95 | 90 | 90 | 95 | NA |
NA = No cutoff available will result in 95% correct assignments
Rank | Full | 400 bp | 300 bp | 200 bp | 100 bp |
---|---|---|---|---|---|
Superkingdom | 0 | 0 | 0 | 0 | 0 |
Kingdom | 0 | 0 | 0 | 0 | 0 |
Phylum | 0 | 0 | 0 | 0 | 0 |
Class | 0 | 0 | 0 | 0 | 0 |
Order | 0 | 0 | 0 | 0 | 0 |
Family | 0 | 0 | 0 | 0 | 0 |
Genus | 0 | 0 | 0 | 0 | 0 |
Species * | 20 | 40 | 50 | 50 | 70 |
Sato, Y., Miya, M., Fukunaga, T., Sado, T., Iwasaki, W., Kumar, S. (2018) MitoFish and MiFish Pipeline: A mitochondrial genome database of fish with an analysis pipeline for environmental DNA metabarcoding. Molecular Biology and Evolution, 35: 1553.
Wang, Q., Garrity, G. M., Tiedje, J. M., & Cole, J. R. (2007). Naive Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy. Applied and Environmental Microbiology, 73(16), 5261–5267. Available from https://sourceforge.net/projects/rdp-classifier/
We acknowledge support from the Canadian federal Genomics Research & Development Initiative (GRDI), Metagenomics-Based Ecosystem Biomonitoring (Ecobiomics) project.
Last updated: May 26, 2023