|Date||Cost per Mb||Cost per Genome|
Based on data collected by NHGRI from the Institute’s funded genome-sequencing groups, the cost to generate a high-quality ‘draft’ human genome sequence had dropped to ~$14 million by 2006. Hypothetically, it would have likely cost upwards of $20-25 million to generate a ‘finished’ human genome sequence – expensive, but still considerably less so than for generating the first reference human genome sequence.
A primer about genome sequencing
A genome consists of all of the DNA contained in a cell’s nucleus. DNA is composed of four chemical building blocks or “bases” (for simplicity, abbreviated G, A, T, and C), with the biological information encoded within DNA determined by the order of those bases. Diploid organisms, like humans and all other mammals, contain duplicate copies of almost all of their DNA (i.e., pairs of chromosomes; with one chromosome of each pair inherited from each parent). The size of an organism’s genome is generally considered to be the total number of bases in one representative copy of its nuclear DNA. In the case of diploid organisms (like humans), that corresponds to the sum of the sizes of one copy of each chromosome pair.
Organisms generally differ in their genome sizes. For example, the genome of E. coli (a bacterium that lives in your gut) is ~5 million bases (also called megabases), that of a fruit fly is ~123 million bases, and that of a human is ~3,000 million bases (or ~3 billion bases). There are also some surprising extremes, such as with the loblolly pine tree – its genome is ~23 billion bases in size, over seven times larger than ours. Obviously, the cost to sequence a genome depends on its size. The discussion below is focused on the human genome; keep in mind that a single ‘representative’ copy of the human genome is ~3 billion bases in size, whereas a given person’s actual (diploid) genome is ~6 billion bases in size.
Genomes are large and, at least with today’s methods, their bases cannot be ‘read out’ in order (i.e., sequenced) end-to-end in a single step. Rather, to sequence a genome, its DNA must first be broken down into smaller pieces, with each resulting piece then subjected to chemical reactions that allow the identity and order of its bases to be deduced. The established base order derived from each piece of DNA is often called a ‘sequence read,’ and the collection of the resulting set of sequence reads (often numbering in the billions) is then computationally assembled back together to deduce the sequence of the starting genome. Sequencing human genomes are nowadays aided by the availability of available ‘reference’ sequences of the human genome, which play an important role in the computational assembly process. Historically, the process of breaking down genomes, sequencing the individual pieces of DNA, and then reassembling the individual sequence reads to generate a sequence of the starting genome was called ‘shotgun sequencing’ (although this terminology is used less frequently today). When an entire genome is being sequenced, the process is called ‘whole-genome sequencing.’See Figure 2 for a comparison of human genome sequencing methods during the time of the Human Genome Project and circa ~ 2016.
Note that such cost-accounting does not typically include activities such as quality assurance/quality control (QA/QC), alignment of generated sequence to a reference human genome, sequence assembly, genomic variant calling, or annotation.
Based on the data collected from NHGRI-funded genome-sequencing groups, the cost to generate a high-quality ‘draft’ whole human genome sequence in mid-2015 was just above $4,000; by late in 2015, that figure had fallen below $1,500.