-
Notifications
You must be signed in to change notification settings - Fork 0
Codon Aversion Motifs for Alignment-free Phylogenies
License
ridgelab/cam
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
########################## Codon Aversion Motif Distance Matrix cam.py Created By: Justin Miller Email: justin.miller@uky.edu ########################## Purpose: Create a distance matrix using codon aversion. ########################## ARGUMENT OPTIONS: -h, --help show this help message and exit -i [INPUT [INPUT ...]]Input Fasta Files -id INPUTDIR Input Directory with Fasta Files -o OUTPUT Output File -t THREADS Number of Cores -p PERCENT Percent of codon aversion tuples that must overlap between species -w WRITE Immediately writes the distance matrix. Header will be on the bottom. -rna Flag for RNA sequences -a Flag for amino acid sequences (Not Recommended) -aa Flag to convert DNA/RNA to amino acid protein sequence (Not Recommended) ########################## REQUIREMENTS: cam.py uses Python version 3.5 Python libraries that must be installed include: 1. sys 2. multiprocessing 3. argparse 4. re 5. collections 6. itertools 7. os (optional if using -id option) 8. gzip (optional if using gzip files) 9. Bio.Seq (biopython) 10. Bio.Alphabet (biopython) If any of those libraries is not currently in your Python Path, use the following command: pip install --user [library_name] to install the library to your path. ########################## Input Files: This algorithm requires two or more fasta files, or a directory which contains two or more fasta files. Output File: An output file is not required. If an output file is not supplied, the distance matrix will be written to standard out. ########################## USAGE: Typical usage requires the -i or the -id option. When using -i, input file names are space separated, with two or more files required. The algorithm will compare all sequences in all files to find all codon aversion motifs. By default, all possible threads are used. If you want to change that, use the -t option. By default, DNA sequences are expected. By default, the distance matrix is first stored in memory and then written to the output file or standard out. This allows the header line to be at the top. Example usage: python cam.py -id testFiles/mammals/ -o output python cam.py -i testFiles/mammals/* > output Running one of the above commands will produce a single output called output in the current directory. This command should take under a minute on a single core. If your machine allows multithreading, it should finish much faster. ########################## Thank you, and happy researching!!
About
Codon Aversion Motifs for Alignment-free Phylogenies
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published