Thursday, September 6, 2007 - 05:16
One of my readers asked: Why does genome sequencing cost so much? My short answer is because it's big. But I thought it would be fun to give a better answer to this question, especially since I'm sure many of you are wondering the same thing. Okay, so let's do some math. Don't worry, this math isn't very complicated and I'll explain where most of the numbers come from. Estimating costs from salaries First, we'll take the easy route. My experience with grant budgets has taught me that the greatest cost for any project comes from salaries. If we look at the PLoS paper with Craig Venter's genome sequence, we can see that there are 31 authors. That's a lot of people! And, they probably all got paid. I think it probably took at least a year to do the sequencing and analyze the data. So, let's say that we paid all those 31 people for one year. If we said that their average salary was $50,000 per year (Of course, JCV and Robert Strausberg probably made much more and any graduate students made much less, but still on average, I think this is close.), and their benefits are 25% of their salary, the cost in human labor would be: 31 x [$50,000 + (0.25 x $50,000)] = $1,937,500. Aren't you forgetting overhead? Oh, yeah. Well, I tried, but I'm never able to get with that. In grants, Universities and non-profit institutions charge additional costs for overhead. Some universities and non-profit research institutions charge as much as 90% of salaries, some places charge less. If we're conservative and say that overhead costs were 50% of salaries, this would be another $775,000 [ this comes from 0.5 x 50,000 x 31]. Now, the cost for doing Craig's genome is up to $2,712,500 and we haven't even bought supplies. We would still need to factor in the cost of the facilities, sequencing instruments, software, computers, reagents, laboratory instruments, autoclaves, robots, gel boxes, and consumables like plastic pipette tips, microfuge tubes, and 96 well plates. [Washinton University made a great movie that shows the inner workings of a DNA sequencing operation and all the stuff that they use.] But do you really think all those people worked full-time on the project for a year? Why would it take so many? No. I think many of the 31 authors were probably working on other projects in addition to doing the genome sequencing, putting the sequence together, analyzing the sequence, and writing the paper. Estimating the costs from reads So, let's try calculating costs another way. Lots of scientists outsource their DNA sequencing activities to core facilities. Core labs come up with pricing models based that reflect their costs for personnel, reagents, equipment maintenance, robots, etc. What do the core labs charge for DNA sequencing? I looked at the web pages for a few University core labs to find out. The University of Michigan DNA Sequencing Core seems pretty typical. They charge $4 per sample and each sample, presumably each sample would be good enough to produce a chromatogram and give us a read. [A read, by the way, is a sequence of bases that has been derived from a chromatogram.] This cost is also based on the current sequencing technologies and these were the methods used for JCV's genome. I have no idea what it costs for next generation sequencing methods. Alright, so at $4 a read, what's the total cost? First, we need to know how many reads it took to sequence JCV's genome. I was all set to estimate the number of reads, based on the Lander Waterman tables, when I realized that Amit had posted this very handy link to the Venter institute's info on JCV's genome. From there, I found a pdf Fact sheet that listed the number of reads that were generated as part of this project. The Fact sheet states that they used 32 million reads. It would be really, really unusual if all their reads were usable. I would estimate that at least 10% probably weren't. But, we'll use the 32 million value for now. So, now we have: 32 million reads x $4 per read = $128 million. And that's just the sequencing. That wouldn't include the cost of assembling the sequence, computers, software, or analysis. If it really only took $2 million to sequence JCV's genome, as Chris wrote, I'd say this sequence was quite a bargain. And, now I wonder how they got it so cheap.