Friday, October 26, 2007 - 07:20
We have lots of DNA samples from bacteria that were isolated from dirt. Now it's time to our own metagenomics project and figure out what they are. Our class project is on a much smaller scale than the honeybee metagenomics project that I wrote about yesterday, but we're using many of the same principles. The general process is this:
- 1. We sort the chromatogram data to identify good data and separate it from bad data. Informatics can help you determine if data is good, and measure how good it is, but it cannot turn bad data into good data. And, there's no point in wasting time with crappy data.
- 2. We use FinchTV to take a closer look at the data and determine if our sequences represent a pure culture or not
- 3. We use blast to find the best matches in GenBank.
- 4. We evaluate the results and decide on the genus for our sample.
- 5. We edit the read in FinchTV and blast again, if need be, to see if we can improve the match.
- 6. We enter the genus of our sample and the biome where it was found (see the overview) in the FinchTV comment field, then save our results back to the iFinch database.
- 7. We will use SQL to query the iFinch database and determine which bacteria were found, compare the bacteria in the different biomes, and compare the bacteria from different years.
I'll write more about each of the steps as we go along.
You're all welcome to do a few samples yourself and help us out. Especially since I have three years worth of data. If you're a teacher and you want to get a data set to use with your class, you're welcome to log in and download the data from iFinch.
Write to me at digitalbio at gmail dot com and I'll send you information for logging in to iFinch and playing along. Of course, you could always get a trial account anyway, but this way you get to play with our data and be part of our project.