-
Notifications
You must be signed in to change notification settings - Fork 1
Load leaders
-
Important! Identifying the leader regions is needed to get properly oriented (by strand) spacer sequences for spacer blasting. So, do this before attempting spacer blasting if you want to investige PAMs, spacer-protospacer mismatches, or anything else involving proper orientation (strand) of the protospacer.
-
Only use this workflow if you know where on the genome(s) the leaders are found
-
Based on the positions you provide, the regions will be pulled from the genome(s) and written to a fasta file. You can then upload the leader regions (just as you would if you did not know the leaders to begin with).
Tab-delimited file containing 5 columns; no headers!:
-
taxon_name (need this or taxon_id)
-
taxon_id (need this or taxon_name)
-
scaffold ('CLDB_ONE_CHROMOSOME' if scaffold was not provided in loci.txt table)
-
region_start (> region end if negative strand)
-
region_end
If region_start > region_end, the leader is assumed to be on the negative strand (and also the associated CRISPR array).
CLdb_getLeaderRegions.pl -d CLdb.sqlite -location locations.txt > leaders.fna
You can now check the conservation of your identified leader regions. You could also align them (e.g. mafft --adjustdirection) if needed.
If you did not need to modify the identified leader regions:
CLdb_loadLeaders.pl -d CLdb.sqlite leaders.fna leaders.fna
If you did align the leaders, provide both the aligned & unaligned. Both the aligned and unaligned sequenced are needed because mafft can alter orientation during alignment (--adjustdirect).
CLdb_loadLeaders.pl -d CLdb.sqlite possible_leaders.fna possible_leaders_aln.fna
If you did align the leaders and need to trim off, say the furthest 50 bp from the array:
CLdb_loadLeaders.pl -d CLdb.sqlite -t 50 possible_leaders.fna possible_leaders_aln.fna