-
Notifications
You must be signed in to change notification settings - Fork 0
TP06
Genbank is a major resource for biological sequence data. Genbank is available online through a web interface.
This guide describes how to download a sequence file or a part of a sequence file from genbank given an accession number or an accession number and start and stop positions.
The public search interface of Genbank is called “Entrez”. Search on Google for “Entrez” then click on the first search result (Fig 1) to enter the genbank search page.
Search Entrez for the accession number AJ937350 as depicted in Fig 2.
The search result page should look a bit like the page shown in Fig 3. There are a total of four results (Three under “Literature”, one under “Proteins” and one under “Genomes”. Click on the result under “Genomes” called “Nucleotide”.
You should see a page similar to the one shown in Fig 4.
The file shown in Fig 4 is the genbank file describing the gene for a sugar transporter protein.
Click on the “send” button, then select “complete record” and “file” (Fig 4, Fig 5). Click “Create file” (Fig 5). There should now be a file on you computer called “sequence.gb”. Open this file with a text editor such as Notepad (Fig 6).
Question 1:
The first five characters of the SEGUID of the sequence in Fig 6 is nRdsz, what are the last five? Replicate the steps described in Fig 3 to Fig 5. The accession number is AJ937350.
Downloading a part of a large genbank sequence from Genbank For some large genbank files, the sequence is initially hidden due to its large size. The file with the accession number NC_001133 describe the Saccharomyces cerevisiae S288C chromosome I that is the smallest of the sixteen chromosomes of this organism, but still over 200 000 bp (Fig 7).
The gene FUN48 is located on chromosome I between position 37464 and 38972. In order to download this sequence, enter the start and stop positions in the gray box on the right side of the screen (Fig 8) and then click the “Update View” button.
Now you should see a screen like the one in Fig 9.
Scroll down to the end of the page and you will be able to see the sequence for the FUN48 gene (Fig 10).
Question 2:
The first five characters of the SEGUID for the sequence described in Fig 10 is s6gYO
What are the last five characters?
The gene ACS1 is located on the same chromosome as the FUN48 gene, but on the complement strand to the one in the database between position 42881 and 45022. Click on “show reverse complement” and then “Update View” (Fig 11) to show the gene in the correct order where the first three nucleotides are the start codon.
The resulting sequence should be similar to the one in Fig 12.
Question 3:
The first five characters of the SEGUID for the sequence in Fig 12 are Uc_MA
What are the last five characters of the SEGUID?
Question 4:
This is an individual question for each student. Follow this link that points to a Google Spreadsheet. You should find your name in the leftmost column. There are four columns called ACCESSION, start, stop and Watson/Crick. Download the sequence described by your row in the four columns and calculate the SEGUID checksum for the sequence. IMPORTANT! If your entry has “crick” instead of “watson”, it means that the correct sequence is the reverse complement of the sequence in the database. Use the settings in Fig 11 “Show reverse complement” to fix this. Put this SEGUID code in the column marked “SEGUID”. Please answer with only the SEGUID code as indicated for the first example student "Max Maximus". This will speed up correction. If your name is *not* in the list, please inform your instructor.
© Björn Johansson 2013 - 2024