Here are some python scripts I wrote. Most of them process fasta/fastq/gb files.
From now on, I will add some descriptions for each program.
You can type
python3 pyfile -h
or
python3 pyfile
to print usage of each program.
You can ask me any question about these programs via wpwupingwp@outlook.com .
-
Be sure to install python3 rather than python 2.7. Besides, to use subprocess.run(), you would better install python 3.5 or above.
And notice that all scripts were just tested on Linux system, although theoretically they may works fine on Windows.
Many of programs in this repository support batch mode. See examples below. Note that "*.fasta" is files you want to process, and i is variable you can use other name if you want. And parameters of program was omitted.
for i in (*.fasta) do python program.py %i
for i in *.fasta;do python3 program.py $i
Just type:
python3 program.py -h
Parallel run other programs.
python3 parallel.py "command %i" "file"
Make sure you do not omit quotation mark.
The "%i" in "command" is the filename. You can use glob pattern in "file".
python3 parallel.py "python3 gb2fasta.py %i" "*.gb"
Split fasta or fastq files according to given "-s".
python3 split.py -i input_file -s 10000000 -o output_path
It only support fasta or fastq file. The option "-s" means how many sequences you want in one file. The default value is 100000. You can change output folder by "-o".
python3 split.py -i pe150.fastq -s 100000 -o pe150_split
Convert file format.
python3 convert.py old_file_name old_format new_file_name new_format
python3 convert.py Zea.nex nexus Zea.fasta fasta
Convert xml format BLAST result to fasta format and output result table.
- BLAST your sequences.
- Download xml format result.
- Run
python3 xml2fasta.py BlastResult.xml
python3 xml2fasta.py BlastResult.xml -s
python3 xml2fasta.py BlastResult.xml -ss
If you use option "-s", it will only proces first hsp for each hit in every query sequence.
If you use option "-ss", it will only process first hsp of first hit for each query sequence.
Note that for Microsoft Windows user, maybe you should replace "python3" with "python".
- fasta file. The first sequence is your query sequence, and others are matched fragment sequences of the query.
- tsv file. Table for simple analyze.
- NotFound.log Hint for those query sequences did not found match by BLAST.
Trim fragment in given fasta file, or replace trimmed bases with 'N'.
python3 trim.py input.fasta from:to
Here from and to are integers which represents region you want to cut off. If you want to cut tail of sequence and you do not know specific length of sequence, you can use negative from with a big to to handle it. For instance, "-20:10000" means cut last 20 bases -- assumes that every sequence you give shorter than 10000.
- Cut middle
python3 trim.py rbcL.fasta 100:150
- Cut head
python3 trim.py rbcL.fasta 1:24
Cut tail
python3 trim.py rbcL.fasta "-5:1000000"
Remove identical sequence in give fasta/nexus file. New file will be write into ".new" with the same format of input file.
Duplicated sequences will be printed on screen.
python3 no_same.py input_file
python3 no_same.py cbs.fasta
Expand a given table according to range.
Input table (CSV format) looks like this:
A,B,C
It will generate a new table:
D,E
where D was expanded from range(B, C) and E is related A.
Rename fasta files in one directory according to gene info provided by the first record in each file
Pick fasta record according to id list
Screen sequence assembled by spades according to sequence length and coverage info in sequence id.
Warning: This program use regular expression to recognize infomation, it may generate wrong output when it was used on other sequence if format.
Remove illegal characters in sequence id for Mrbayes.
Only support nexus format. Sequence ID longer than 90 will be cutted
python3 nex_for_mb.py nexus_file_name
Combine fasta files into one nexus file with partition information.
python3 fasta2nexus.py input_files -o output_filename
Some old code.
Some program to deal with genbank files, most of them belongs to chloroplast.
Use matplotlib to draw figures for my master thesis.
Some code to analyze data from microreader. For Cystathionine beta-synthase inhibitor project.
Some useful code fragments.
Programs for 1kp.