DNA barcoding is an important technique for identifying many kinds of animals, insects, and plants. In this technique, PCR is used to amplify a short (650 base) region of the MT-COI gene from mitochondrial DNA. The DNA sequence is then determined from the PCR product. If this sequence has been found before, it can be used to identify the type of organism that contributed the DNA. If a barcode sequence has not been found before, one can still identify related species by comparing the new sequence to other sequences in a database such as GenBank or BOLD.
The bioinformatics steps in DNA barcoding involve identifying high quality regions in the trace files generated from DNA sequencing instruments, extracting the DNA sequences from those files, assembling the sequences, and identifying the most likely source of the DNA by comparing the assembled DNA sequence to a database of DNA sequences.
The trace files supplied here were obtained from the NCBI. These data come from actual sequencing experiments that were submitted to this public database.
Using DNA barcoding to identify unknown samples
Before you begin, download FinchTV and install it on your computer. FinchTV is a free program that will show base calls and quality values (if available).
A. Get the data
1. Download the trace files for an unknown organism by clicking one of the sample links in the data set.
2. Unzip the archive to obtain the individual files.
B. Review the data quality
1. Open the file in FinchTV.
2. Select the high quality region with your mouse. High quality bases have well-resolved peaks, with quality values that fall above the dotted line on the histogram plot (these values are 20 or higher).
3. Export the FASTA formatted DNA sequence to a text file as shown below.
C. Assemble the FASTA sequences into a contig
1. When you have obtained all of your sequences in a FASTA format, go to the CAP3 assembly site.
2. Open each sequence file in a text editor.
3. Copy and paste each sequence in the box on the assembly page. Be sure to include the part that begins with “>”.
4. Click Submit.
5. When the assembly is complete, select the Contigs link.
6. Copy the contig sequence and paste it into a text file.
A “contig” is a sequence that is produced by putting shorter sequences together. Each letter in the sequence represents a base in a strand of DNA. Assembling sequences into a contig is another way to check the quality of the DNA sequences and see if all the sequences are alike. If there are differences between sequences you may wish to review the trace file and determine if the base was misidentified.
D. Identify the most like source of the DNA
Now, you’ll be a DNA detective. You’ll use a program called blast to compare your contig sequence to a database of sequences (the nr database) at the NCBI. You may be able to identify your organism, and determine which types of organisms are most closely related to yours, by looking for the database sequence that matches the best. In the case of DNA barcoding, look for the DNA sequence (or sequences) that matches most of your contig and has the highest percentage of identical bases.
1. Go to the blastn page at the NCBI.
2. Paste your contig sequence in the text box on the blast page.
3. Choose the nucleotide collection database as shown below.
4. Click BLAST.
5. Use the BLAST results to identify the potential source of the DNA in your sample.
Multiple sequences will probably be similar to your entry. The organisms that are most closely related to yours are the ones whose DNA sequences are most similar to your contig. The best match will be the sequence that matches over the longest region of DNA, with the greatest number of identical bases, and the lowest E value.
E. Learn more about your organism
1. Select the accession number for the best matching sequence to see where the sequence came from and to get the scientific name of the organism. Some entries will have the entire scientific name (genus and species). Some entries might only have the name of a taxonomic group (i.e the order, family, or genus).
2. Use the scientific name to search the Encyclopedia of Life.
3. Use the scientific name to search Google.
4. What can you learn about this creature and where it lives?
Funding for this project was provided by the National Science Foundation, through grant DRL-0833779, as part of a collaboration between Digital World Biology, the Northwest Association for Biomedical Research, and the EdLab Group.