-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME
85 lines (57 loc) · 2.53 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
##########################
Codon Aversion Motif Distance Matrix
cam.py
Created By: Justin Miller
Email: justin.miller@uky.edu
##########################
Purpose: Create a distance matrix using codon aversion.
##########################
ARGUMENT OPTIONS:
-h, --help show this help message and exit
-i [INPUT [INPUT ...]]Input Fasta Files
-id INPUTDIR Input Directory with Fasta Files
-o OUTPUT Output File
-t THREADS Number of Cores
-p PERCENT Percent of codon aversion tuples that must overlap between species
-w WRITE Immediately writes the distance matrix. Header will be on the bottom.
-rna Flag for RNA sequences
-a Flag for amino acid sequences (Not Recommended)
-aa Flag to convert DNA/RNA to amino acid protein sequence (Not Recommended)
##########################
REQUIREMENTS:
cam.py uses Python version 3.5
Python libraries that must be installed include:
1. sys
2. multiprocessing
3. argparse
4. re
5. collections
6. itertools
7. os (optional if using -id option)
8. gzip (optional if using gzip files)
9. Bio.Seq (biopython)
10. Bio.Alphabet (biopython)
If any of those libraries is not currently in your Python Path, use the following command:
pip install --user [library_name]
to install the library to your path.
##########################
Input Files:
This algorithm requires two or more fasta files, or a directory which contains two or more fasta files.
Output File:
An output file is not required. If an output file is not supplied, the distance matrix will be written to standard out.
##########################
USAGE:
Typical usage requires the -i or the -id option. When using -i, input file names are space separated,
with two or more files required. The algorithm will compare all sequences in all files to find
all codon aversion motifs.
By default, all possible threads are used. If you want to change that, use the -t option.
By default, DNA sequences are expected.
By default, the distance matrix is first stored in memory and then written to the output file
or standard out. This allows the header line to be at the top.
Example usage:
python cam.py -id testFiles/mammals/ -o output
python cam.py -i testFiles/mammals/* > output
Running one of the above commands will produce a single output called output in the current directory.
This command should take under a minute on a single core. If your machine allows multithreading, it should finish much faster.
##########################
Thank you, and happy researching!!