-
Notifications
You must be signed in to change notification settings - Fork 0
TP16_Sequence_comparison_using_dotlet\TP16_Sequence_comparison_using_dotlet
Sequence comparison using dotlet
Dotlet is a simple to use program that implements the dotplot algorithm with a substitution matrix for protein comparison. This paper describes the Dotlet software.
https://dotlet.vital-it.ch You should see a window like Figure 1
Using Dotlet
The important controls are located in a row of buttons (Fig 2).
Click the input button, an you will see a window like Fig 3.
Here you can enter a sequence and a name for the sequence. IMPORTANT! dotlet only recognizes the RAW format i.e. only aminoacids or nucleotides, not other characters.
Enter the sequence below and name it “c” like in Fig 8)
GLLNQHVPEPRVEVWNYNMNTLHCANINVPIVPEWYAYDGGEIWHLNLTSFDTRNMHEV
YFDRHGCWETWVFLFTWWVRASDMSCHNYDVTVWATNVALQIVDWRIDPMHHRNYQRF
DMWWCHHNNFEDATGCEFNWVWKIFAWNSEEERHPNQERTYQTAACTENICSPDIVRMV
TPENCMNICGNYNGQENFKNAFFYMLFIRSVSMMTLMIVCAVQSPCMIFSYDYHDMMLWF
RSMPCFNISRVNIWFGLELYNAVPGLEEISRCNLGWVYEGRI
Click on input again and add the sequence below and name it “d” (Fig 9)
QNEMLHGYCWSVWDRALSQSSPCRFQSYFKACEQMWIGWILKDRMNQRNKTMLQQGSI
RWLHWVWKIFAWNSEEERHPNQERTYQTAACTENICSPDIVRMVTPENCMNICGNSYWNS
LYHQRLSSHNMAVEFSDYVAIHDAMNQICHCITPYFKEGVMACLHEPTLTHQNMWFDNLP
RHTHWTQNYPMHRYFRYEYQLFAWTSQYRNASFPRNVNAKYIDHAWDHYQPLNHLD
You should see an image similar to Fig 10. You now can see a faint diagonal band representing a region of homology (Fig 10).
Now, we should try and make the picture clearer.
Move the lower slider of the histogram to 36% and the upper slider to 37%.
click “compute”
put the mouse at the upper corner of the long diagonal line and click.
you should now see an image in Fig 11.
What you have done now is make all scores below certain threshold (36%) appear black and above the threshold (37%) appear white. Before, each pixel (alignment between two aminoacids) were colored by a grayscale to indicate the score.
The position indicated by the blue cross above is
C: 133
D: 59
Now, click on the lower right corner of the diagonal (Fig 7). The position indicated by the blue cross in Fig 7 is
C:190
D:116
Which means that the sequences C and D share homology approximately between positions 133 and 190 in C and 59 and 116 in D.
The sequences are actually identical in the regions:
C 136 - 187
D 62 - 113
Dotlet is somewhat subjective since the user locates the blue cross. The values will always differ a bit between users using the same data.
Question 1:
Redo the analysis above using Sequence A and B below:
A
LNRPHEVTNFWIGMINNEEEIPTQCKEYYAMVIRPKETLVRSLPWCQDKTVWWEVKNALDKAMDWLNCLEHMADEIYWHPTHHRQKWTKRYIEVNYNMCGRHYTIYDHSAMIFRTPPLRCPWAFICKVWQHNEWNTGCQMCEVPAFKLQQRLWLIVCQWWEWWWAYKWAVEVKPPFERRCHCMEKFREWTWQGPCMQQCKFRIIQWSWMYRNTWKGCYTRWQHMTILSLDKVNWKDHKFDGNKTMIDPGCNDRRWYLGFWVLKVGCYKLSNSAYINIHNNCQWTSYEIPNGNTLVTQV
B
LEWLEFAWKWVVFEEQIHKWCGFAKPEAHIFLGLTAFHDVRKTHECTEWFMYWYAEYHEPEVQMFHQKFHLYSLPRWEVPAFKLQQRLWLIVCQWWEWWWAYKWAVEVKPPFERRCHCMEKFREWTWQGPCMQQCKFRIIQWSWMYRNTWKGCYTRWQHMTILSYSTDFPEIEMDLHQQINTYEWWLYDMKVPMGNTFHLTCDFGVLINWIWEYHVSIVMYQRDTELDCLDPPRSDYHEFKNNMTEYWPDYPLDICCDDPNPRCWPDEEATNYSEIDLWIPKSYEHHVY
What are the positions in each sequence where the similarity between the two sequences start and end?
Question 2:
Use dotlet to find the region of low complexity in Sequence_E. This means that you have to compare the sequence to itself (Fig 14). You also have to change the amplification to 1:2 in order to see the whole sequence. Fig 9 show useful settings.
What are the positions in the sequence where the low complexity region start and end?
E
FSLFEFFHRQYWHHGHSVDCYTNGAYWKYKRSMPLGRDVGEFVIHQWSTENDVCDAVTGSRQGITCVDVMWCMDYPSTNWDDQYVDFCPCSMTHKQTTPRQNTCWEYFNTAKIPHLYKVKMAQMTRCRKDHSSSTPAQSWPNYGHKYVGKYEWCCMYRVLCSRFCCAIAAMQHPWWEIGGVMVQTMHPFHIDMYTLAWQVRRVGNLSGNGIDWTRVQGWAHHRYPWFVHKLFGMSRTIDWWMVTFGCVHRPSVDFRLHNWDECFINPRTQITSCVNAGQEVMGSTFSCLKMRSKHNVQEASWMCDVAMTAWFVHVHPQKPVMSADWVFRITMVFNRVSPMVIQQSSTLAMHCDHPLEFTMNNHSSRINFMRVMPCYCTQITQQITQQITQQITQQITQQITQQITQQITQQITQQITQQITQQITQRGNDAQIYPIHMTIDSACRMIPGVNKYKCYRPLKRVEKEYMEMRDSRGQKNFSVSQSRHDCGLNCRKRSHQCITQKDTCQLQQWTYEKWMNFFFYMFAKDWCYDVMNKNMQYVNSRSSIFNEIQGNNWCNCDREYAMMEISKIHHFTVNDPCEPAKNVTEPCALVLSRMQHRFMVETCPKKQDRFNAYANGFSWALCINYFSGSITHGESIMENRFLVHHFQILPTYWQWPYLDDKGGTHWTRDDHMWYRVYKTQCDRCWNPNRLAQQGPQTKVCWIYLYWWSGESRALKDRPRAHWSNDKGFLNMWLEVRCLHVGNQKDFEKMNSPQMITFLRAPCSGQQVGVPVDIGDLDYYGMGDLFLHNDERFSDMDLHDCRRAPGVEHKCFDQSNWNHLAGQYHNFQLWYEHGWNYMATDKWWIVIVC
Question 3
Use dotlet to find the positions of three repetitive sequences in Sequence_F
>sequence_F
WFEGYVGQEIMGMVRFCTVVRMQREIDSISIIDYWEKECFHEWNSYEKAMTMEWKANWEEWYAYEYFCFWPMLFRQKLPKSIGKGCRAPCGDRPESMHWFKDMRRMAYWSGFEQEKRIKVCCGWYVPDPKALEEYFGQMLAYDYFCFWPMLFRQKLPKPQVWWIDRPREVPISECTDFCKFCGHATNRGVFYQDNGIVNKDGLFESYECTRVKPKCMQFKCTCQICMGEQYFCFWPMLFRQKLPKGCSWVLYYIENMMRGYGVINPARLEYKLTFRALRHCRMARMWSMRIQCRMISREIVNNKQSNMLFRKDVLQHQKMPYCC
Determinação de tandem repeats:
Obtenha a proteína com o número de acesso Q9P255 da GenBank
Usa zoom 1:2 e mova a janela até encontrar claramente um conjunto de diagonais. O número de diagonais em cima ou em baixo da diagonal principal diz-nos quantas repetições existem.
Questão 4:
Quantas repetições pode encontrar?
Obtenha a proteína com o número de acesso P21997 da GenBank
Copie-a para o Dotlet como antes.
Usa zoom 1:2 e mova a janela até encontrar claramente uma “caixa” preta no meio da sequência.
Question 5:
Qual e o aminoácido mais comum na “box”