Tuesday, April 6, 2010

Submitting to GenBank

Anyone who does much in the way of DNA sequencing/analysis eventually has to deal with depositing sequences in GenBank.  For a sequence or two, it is relatively simple to use the online 'BankIt' submission system.  However, for larger batches it becomes necessary to take advantage of the batch functions available with the 'Sequin' program.  To get to the point of being able to submit large batches each containing multiple 'features', there is quite a steep learning curve (especially if you are trying to teach yourself).  Unfortunately, the web-pages on the NCBI website do not seem quite sufficient to make submission a simple process.  While submitting sequences for a number of recent papers (e.g., Hodkinson & Lutzoni 2009, Hodkinson & Lendemer 2010, Lendemer & Hodkinson 2009, 2010), I wrote myself a tutorial on how to submit RNA-encoding sequences (rRNA, introns, transcribed spacers, etc.) to GenBank.  Most of this will apply to all sequence types, but getting the information for protein-coding sequences correct might still be an issue requiring some extra assistance.  In the 16-step outline below, the most difficult and problematic aspect (i.e., annotating multiple sequence features) is emphasized and greater detail is given in this area.

GenBank Submission Using SEQUIN:
1)  Make a FASTA+GAP file with bracketed modifiers for all basic info that varies between sequences (e.g., organism, isolate, specimen-voucher, etc.; for basic formatting see http://www.ncbi.nlm.nih.gov/Sequin/QuickGuide/sequin.htm#AlignmentFormats; for appropriate modifiers see http://www.ncbi.nlm.nih.gov/Sequin/QuickGuide/sequin.htm#DefinitionLine).
2) Run SEQUIN and type in the authorship, contact, and citation information that applies to all sequences.
3) Import file into SEQUIN as a 'Phylogenetic Study' set in 'FASTA+GAP' format.
4) Click 'Edit' 'Alignment Assistant...'.
5) Click 'Features' 'Apply To Alignment >' 'RNA'.
6) For the feature that you wish to annotate, be sure to check the box saying if the 5' or 3' end is partial, if either one is.
7) Type in the alignment coordinates of the particular feature that you are annotating.
8) In the 'RNA Type' box, pick the type of feature (e.g., 'misc_RNA' for ITS1 and ITS2, or 'rRNA' for 18S, 5.8S, or 28S).
9) In the field next to 'RNA Name', put in the specific type of RNA (18S ribosomal RNA, internal transcribed spacer 1, etc.).
10) Click 'Accept.'
11) Repeat steps 5-10 for each section of RNA in the sequence set.
12) In the 'Alignment Assistant' window, go to the 'File' menu and click 'Close'.
13) To check/edit your work: next to 'Target Sequence' choose 'ALL SEQUENCES' and next to 'Format' choose 'Graphic' (double-click on any particular feature annotation to see details and/or make changes; if a particular annotation is entirely erronious, highlight the annotation and go to 'Edit' then 'Clear').
14) Click 'Done' on the main viewing window.
15) When it asks 'Are you ready to save the record?' click 'Yes'.
16) Save the file to the hard drive and email it to 'gb-sub@ncbi.nlm.nih.gov'.

I have been told that this protocol is helpful in getting Sequin to 'work'.  I hope that posting it here will help even more people!

- Brendan


[If you found this post to be a useful guide, and employed the information found here as part of the GenBank submission process, please cite this work as follows:

Hodkinson, B. P. 2010. Submitting to GenBank. Squamules Unlimited, Durham, NC. http://squamules.blogspot.com/2010/04/submitting-to-genbank.html

Many thanks!]



Works Cited:

Hodkinson, B. P., and F. Lutzoni. 2009. A microbiotic survey of lichen-associated bacteria reveals a new lineage from the Rhizobiales. Symbiosis 49: 163-180.
Download publication (PDF file)
Download alignment (NEXUS file)

Hodkinson, B. P., and J. C. Lendemer. In press. Molecular analyses reveal semi-cryptic species in Xanthoparmelia tasmanica. Bibliotheca Lichenologica.
Download draft (PDF file)
Download alignment (NEXUS file)

Lendemer, J. C., and B. P. Hodkinson. 2009. The Wisdom of Fools: new molecular and morphological insights into the North American apodetiate species of Cladonia. Opuscula Philolichenum 7: 79-100.
Download publication (PDF file)
Download alignment (NEXUS file)

Lendemer, J. C., and B. P. Hodkinson. 2010. A new perspective on Punctelia subrudecta in North America: previously-rejected morphological characters corroborate molecular phylogenetic evidence and provide insight into an old problem. The Lichenologist 42(4): 405-421.
Download publication (PDF file)
Download alignment (NEXUS file)

No comments:

Post a Comment