Genome Sequencing

"Genome sequencing" redirects here. For the sequencing only of DNA, see DNA sequencing.

Whole genome sequencing (also known as full genome sequencing, complete genome sequencing, or entire genome sequencing) is a laboratory process that determines the complete DNA sequence of an organism's genome at a single time. This entails sequencing all of an organism's chromosomal DNA as well as DNA contained in the mitochondria and, for plants, in the chloroplast. Almost any biological sample containing a full copy of the DNA—even a very small amount of DNA or ancient DNA—can provide the genetic material necessary for full genome sequencing. Such samples may include saliva, epithelial cells, bone marrow, hair (as long as the hair contains a hair follicle), seeds, plant leaves, or anything else that has DNA-containing cells. Because the sequence data that is produced can be quite large (for example, there are approximately six billion base pairs in each human diploid genome), genomic data is stored electronically and requires a large amount of computing power and storage capacity. Full genome sequencing would have been nearly impossible before the advent of the microprocessor, computers, and the Information Age.

Unlike full genome sequencing, DNA profiling only determines the likelihood that genetic material came from a particular individual or group; it does not contain additional information on genetic relationships, origin or susceptibility to specific diseases.[1] Also unlike full genome sequencing, SNP genotyping covers less than 0.1% of the genome. Almost all truly complete genomes are of microbes; the term "full genome" is thus sometimes used loosely to mean "greater than 95%". The remainder of this article focuses on nearly complete human genomes.

In general, knowing the complete DNA sequence of an individual's genome does not, on its own, provide useful clinical information, but this may change over time as a large number of scientific studies continue to be published detailing clear associations between specific genetic variants and disease.[2][3]

The first nearly complete human genomes sequenced were J. Craig Venter's (Caucasian at 7.5-fold average coverage),[4][5][6] James Watson's (Caucasian male at 7.4-fold),[7][8][9] a Han Chinese (YH at 36-fold),[10] a Yoruban from Nigeria (at 30-fold),[11] a female leukemia patient (at 33 and 14-fold coverage for tumor and normal tissues),[12] and Seong-Jin Kim (Korean at 29-fold).[13] As of June 2012, there are 69 nearly complete human genomes publicly available.[14] Steve Jobs also had his genome sequenced for $100,000.[15] Commercialization of full genome sequencing is in an early stage and growing rapidly.

Early techniques

Sequencing of nearly an entire human genome was first accomplished in 2000 partly through the use of shotgun sequencing technology. While full genome shotgun sequencing for small (4000–7000 base pair) genomes was already in use in 1979,[16] broader application benefited from pairwise end sequencing, known colloquially as double-barrel shotgun sequencing. As sequencing projects began to take on longer and more complicated genomes, multiple groups began to realize that useful information could be obtained by sequencing both ends of a fragment of DNA. Although sequencing both ends of the same fragment and keeping track of the paired data was more cumbersome than sequencing a single end of two distinct fragments, the knowledge that the two sequences were oriented in opposite directions and were about the length of a fragment apart from each other was valuable in reconstructing the sequence of the original target fragment.

The first published description of the use of paired ends was in 1990 as part of the sequencing of the human HPRT locus,[17] although the use of paired ends was limited to closing gaps after the application of a traditional shotgun sequencing approach. The first theoretical description of a pure pairwise end sequencing strategy, assuming fragments of constant length, was in 1991.[18] In 1995 Roach et al. introduced the innovation of using fragments of varying sizes,[19] and demonstrated that a pure pairwise end-sequencing strategy would be possible on large targets. The strategy was subsequently adopted by The Institute for Genomic Research (TIGR) to sequence the entire genome of the bacterium Haemophilus influenzae in 1995,[20] and then by Celera Genomics to sequence the entire fruit fly genome in 2000,[21] and subsequently the entire human genome. Applied Biosystems, now called Life Technologies, manufactured the automated capillary sequencers utilized by both Celera Genomics and The Human Genome Project.

While capillary sequencing was the first approach to successfully sequence a nearly full human genome, it is still too expensive and takes too long for commercial purposes. Because of this, shotgun sequencing technology, even though it is still relatively 'new', since 2005 is being displaced by technologies like pyrosequencing, SMRT sequencing, and nanopore technology.[22]

Current research

One possible way to accomplish the cost-effective high-throughput sequencing necessary to accomplish full genome sequencing is by using nanopore technology, which is a patented technology held by Harvard University and Oxford Nanopore Technologies and licensed to biotechnology companies.[23] To facilitate their full genome sequencing initiatives, Illumina licensed nanopore sequencing technology from Oxford Nanopore Technologies and Sequenom licensed the technology from Harvard University.[24][25]

Another possible way to accomplish cost-effective high-throughput sequencing is by utilizing fluorophore technology. Pacific Biosciences is currently using this approach in their SMRT (single molecule real time) DNA sequencing technology.[26]

Complete Genomics has developed DNA Nanoball (DNB) technology that arranges DNA on self-assembling arrays.sequencing technology combines its DNB arrays with its proprietary cPAL read technology.

Pyrosequencing is a method of DNA sequencing based on the sequencing by synthesis principle.[28] The technique was developed by Pål Nyrén and his student Mostafa Ronaghi at the Royal Institute of Technology in Stockholm in 1996,[29][30][31] and is currently being used by 454 Life Sciences as a basis for a full genome sequencing platform.[32]


A number of public and private companies are competing to develop a full genome sequencing platform that is commercially robust for both research and clinical use,[33] including Illumina,[34] Knome,[35] Sequenom,[36] 454 Life Sciences,[37] Pacific Biosciences,[38] Complete Genomics,[39] Helicos Biosciences,[40] GE Global Research (General Electric), Affymetrix, AQI Sciences, Base4, Callida Genomics, CrackerBio, Dover Systems, Electron Optica, Electronic BioSciences, Full Genomes, Genia Technologies, Genizon Biosciences, GenoVoxx, GnuBio, Halcyon Molecular, IBM, LaserGen, Li-Cor, LingVitae, Life Technologies, LightSpeed Genomics, Mobious Genomics, NABsys, Nanophotonics Biosciences, NetBio, Noblegen Biosciences, Oxford Nanopore Technologies Population Genetics, Reveo, Seirad, PathoGenetix, VisiGen Biotechnologies, ZS Genetics, and These companies are heavily financed and backed by venture capitalists, hedge funds, and investment banks.[43][44]


In October 2006, the X Prize Foundation, working in collaboration with the J. Craig Venter Science Foundation, established the Archon X Prize for Genomics,[45] intending to award US$10 million to "the first Team that can build a device and use it to sequence 100 human genomes within 10 days or less, with an accuracy of no more than one error in every 1,000,000 bases sequenced, with sequences accurately covering at least 98% of the genome, and at a recurring cost of no more than $1,000 per genome".[46] An error rate of 1 in 1,000,000 bases, out of a total of approximately six billion bases in the human diploid genome, would mean about 6,000 errors per genome. The error rates required for widespread clinical use, such as Predictive Medicine[47] is currently set by over 1,400 clinical single gene sequencing tests[48] (for example, errors in BRCA1 gene for breast cancer risk analysis). As of September 2013, the Archon X Prize for Genomics has been cancelled.[49]


In 2007, Applied Biosystems started selling a new type of sequencer called SOLiD System.[50] The technology allowed users to sequence 60 gigabases per run.[51]


In early February 2009, Complete Genomics released a full sequence of a human genome that was sequenced using their service. The data indicates that Complete Genomics' full genome sequencing service accuracy is just under 99.999%, meaning that just one in every one hundred thousand variants was called incorrectly. This means that their full sequence of the human genome will contain approximately 80,000–100,000 false positive errors in each genome. However, this accuracy rate was based on Complete Genomics' sequence that was completed utilizing a 90× depth of coverage (each base in the genome was sequenced 90 times) while their commercialized sequence is reported to be only 40×. This accuracy rate may be acceptable for research purposes, and clinical use would require confirmation by other methods of any reportable alleles.[52][53] Complete Genomics announced in Dec. 2010 that for the last 500 complete human genomes that it had sequenced, an average of over 98 percent of the genome was read at 10-fold or greater coverage. In addition, its software made high confidence calls of an average of over 95 percent of the genome and over 94 percent of the exome.

In March 2009, it was announced that Complete Genomics has signed a deal with the Broad Institute to sequence cancer patients' genomes and will be sequencing five full genomes to start.[54] In April 2009, Complete Genomics announced that it plans to sequence 1,000 full genomes between June 2009 and the end of the year and that they plan to be able to sequence one million full genomes per year by 2013.[55] Complete Genomics sequenced 50 genomes in 2009. Since then, it has significantly increased the throughout in its genome sequencing center and was able to sequence and analyze 300 complete human genomes in Q3 2010. Complete Genomics plans to officially launch in June 2009, although it is unknown if their lab will have received CLIA-certification by that time. Complete Genomics announced its R&D human genome sequencing service in October 2008 and its commercial sequencing service in May 2010. The company does not produce clinical data and as such its genome center does not require CLIA certification.

In June 2009, NABsys announced their goal of full genome sequencing for under US$100 per genome with a turnaround time of less than 15 minutes.[56]

In June 2009, Illumina announced that they were launching their own Personal Full Genome Sequencing Service at a depth of 30× for $48,000 per genome.[57] This is still expensive for widespread consumer use, but the price may decrease substantially over the next few years as they realize economies of scale and given the competition with other companies such as Complete Genomics.[58][59] Jay Flatley, Illumina's President and CEO, stated that "during the next five years, perhaps markedly sooner," the price point for full genome sequencing will fall from $48,000 to under $1,000.[60] Illumina has already signed agreements to supply full genome sequencing services to multiple direct-to-consumer personal genomics companies.

In August 2009, the founder of Helicos Biosciences, Dr. Stephen Quake, stated that using the company's Heliscope Single Molecule Sequencer he sequenced his own full genome for less than $50,000. He stated that he expects the cost to decrease to the $1,000 range within the next two to three years.[61]

In August 2009, Pacific Biosciences secured an additional $68 million in new financing, bringing their total capitalization to $188 million.[62] Pacific Biosciences said they are going to use this additional investment in order to prepare for the upcoming launch of their full genome sequencing service in 2010.[63] Complete Genomics followed by securing another $45 million in a fourth round venture funding during the same month.[64] Complete Genomics has also made the claim that it will sequence 10,000 full genomes by the end of 2010.[65] Since then, it has significantly increased the throughput in its genome sequencing center and was able to sequence and analyze 300 complete human genomes in Q3 2010.

GE Global Research is also part of this race to commercialize full genome sequencing as they have been working on creating a service that will deliver a full genome for $1,000 or less.[66][67]

In September 2009, the President of Halcyon Molecular announced that they will be able to provide full genome sequencing in under 10 minutes for less than $100 per genome.[68] This is, to date, the most ambitious promise of any full genome sequencing company.

In October 2009, IBM announced that they were also in the heated race to provide full genome sequencing for under $1,000, with their ultimate goal being able to provide their service for US$100 per genome.[69] IBM's full genome sequencing technology, which uses nanopores, is known as the "DNA Transistor".[70]

In November 2009, Complete Genomics published a peer-reviewed paper in Science demonstrating its ability to sequence a complete human genome for $1,700.[71] If true, this would mean the cost of full genome sequencing has come down exponentially within just a single year from around $100,000 to $50,000 and now to $1,700. This consumables cost was clearly detailed in the Science paper.[72] However, Complete Genomics has previously released statements that it was unable to follow through on. For example, the company stated it would officially launch and release its service during the "summer of 2009", provide a "$5,000" full genome sequencing service by the "summer of 2009", and "sequence 1,000 genomes between June 2009 and the end of 2009" – all of which, as of November 2009, have not yet occurred.[53][55][55][73] Complete Genomics launched its R&D human genome sequencing service in October 2008 and its commercial service in May 2010. The company sequenced 50 genomes in 2009. Since then, it has significantly increased the throughput of its genome sequencing factory and was able to sequence and analyze 300 genomes in Q3 2010.

Also in November 2009, Complete Genomics announced that it was beginning a large-scale human genome sequencing study of Huntington’s disease (up to 100 genomes) with the Institute for Systems Biology.


In March 2010, Researchers from the Medical College of Wisconsin announced the first successful use of Genome Wide sequencing to change the treatment of a patient.[74] This story was later retold in a Pulitzer prize winning article [75] and touted as a significant accomplishment in Nature [76] and by the director of the NIH in presentations at congress.

In March 2010, Pacific Biosciences said they have raised more than $256 million in venture capital money and that they will be shipping their first ten full genome sequencing machines by the end of 2010. The company reported that the market initially will be researchers and academic institutions and then will rapidly turn into clinical applications that will be applicable to every single person in the world. Pacific Biosciences also stated that their second-generation machine, which is scheduled for release in 2015, will be capable of providing a full genome sequence for a person in just 15 minutes for less than $100. Several other technologies have similar goals. Meanwhile, full genome sequencing might revolutionize medicine at even current prices by providing a clinician with a full genome for each one of his or her patients. However, some critics have stated that even if they are supplied with a full genome sequence of a patient, they would not know how to analyze or make use of that data.[77] Since then, new resources have begun to address this.[78][79]

Also in March 2010, Complete Genomics’ customers began publishing papers describing research breakthroughs that they have made using data it has provided. Examples included the Institute for Systems Biology’s project to sequence a family of four and verify the gene responsible for Miller syndrome, a rare craniofacial disorder[x] and Genentech’s work to sequence and compare a patient’s primary lung tumor and adjacent normal tissue[y].

In June 2010, Illumina lowered the cost of its individual sequencing service to $19,500 from $48,000. The company is offering a discounted price of $9,500 for people with serious medical conditions who could potentially benefit from having their genomes decoded.


Knome[80] provides full genome sequencing (98%) services for US$39,500 for consumers, or $29,500 for researchers (depending on their requirements).[81][82]

Complete Genomics charges approximately $10,000 to sequence a complete human genome (less for large orders).

In May 2011, Illumina lowered its Full Genome Sequencing service to $5,000 per human genome, or $4,000 if ordering 50 or more.[83] Helicos Biosciences, Pacific Biosciences, Complete Genomics, Illumina, Sequenom, ION Torrent Systems, Halcyon Molecular, NABsys, IBM, and GE Global appear to all be going head to head in the race to commercialize full genome sequencing.[22][67]


In January 2012, Life Technologies introduced a sequencer to decode a human genome in one day for $1,000.[84] A UK firm spun out from Oxford University has come up with a DNA sequencing machine (the MinION) the size of a USB memory stick which costs $900 and can sequence simple genomes (but not full human genomes).[85] (While Oxford Nanopore stated in February that they would have a sequencer for sale by the end of 2012, this did not occur.)

In November 2012, Gene by Gene, Ltd started offering whole genome sequencing at an introductory price of $5,495 (with a minimum requirement of 3 samples per order). Currently the price is $6,995 and the minimum requirement has been removed.[86][87][88]

A series of publications in 2012 showed the utility of SMRT sequencing from Pacific Biosciences in generating full genome sequences with de novo assembly.[89] Some of these papers reported automated pipelines that could be used for generating these whole-genome assemblies.[90][90] Other papers demonstrated how PacBio sequence data could be used to upgrade draft genomes to complete genomes.[91]

Disruptive technology

Full genome sequencing provides information on a genome that is orders of magnitude larger than that provided by the previous leader in genotyping technology, DNA arrays. For humans, DNA arrays currently provide genotypic information on up to one million genetic variants,[92][93][94] while full genome sequencing will provide information on all six billion bases in the human genome, or 3,000 times more data. Because of this, full genome sequencing is considered disruptive to the DNA array markets as the accuracy of both range from 99.98% to 99.999% (in non-repetitive DNA regions) and their consumables cost of $5000 per 6 billion base pairs is competitive (for some applications) with DNA arrays ($500 per 1 million basepairs).[37] Agilent, another established DNA array manufacturer, is working on targeted (selective region) genome sequencing technologies.[95] It is thought that Affymetrix, the pioneer of array technology in the 1990s, has fallen behind due to significant corporate and stock turbulence and is currently not working on any known full genome sequencing approach.[96][97][98] It is unknown what will happen to the DNA array market once full genome sequencing becomes commercially widespread, especially as companies and laboratories providing this disruptive technology start to realize economies of scale. It is postulated, however, that this new technology may significantly diminish the total market size for arrays and any other sequencing technology once it becomes commonplace for individuals and newborns to have their full genomes sequenced.[99]

Sequencing versus analysis

In principle, full genome sequencing can provide raw data on all six billion nucleotides in an individual's DNA. However, it does not provide an analysis of what that information means or how it might be utilized in various clinical applications, such as in medicine to help prevent disease. As of 2010 the companies that are working on providing full genome sequencing provide clinical CLIA certified data (Illumina) and analytical services for the interpretation of the full genome data (Knome), with only one institution offering sequencing and analysis in a clinical setting.[100] Nevertheless there is plenty of room for researchers or companies to improve such analyses and make it useful to physicians and patients.[77][78][79]

Societal impact

Further information: Personal genomics

Inexpensive, time-efficient full genome sequencing will be a major accomplishment not only for the field of genomics, but for the entire human civilization because, for the first time, individuals will be able to have their entire genome sequenced. Utilizing this information, it is speculated that health care professionals, such as physicians and genetic counselors, will eventually be able to use genomic information to predict what diseases a person may get in the future and attempt to either minimize the impact of that disease or avoid it altogether through the implementation of personalized, preventive medicine. Full genome sequencing will allow health care professionals to analyze the entire human genome of an individual and therefore detect all disease-related genetic variants, regardless of the genetic variant's prevalence or frequency. This will enable the rapidly emerging medical fields of predictive medicine and personalized medicine and will mark a significant leap forward for the clinical genetic revolution. Full genome sequencing is clearly of great importance for research into the basis of genetic disease and has shown significant benefit to a subset of individuals with rare disease in the clinical setting.[101][102][103][104] Illumina's CEO, Jay Flatley, stated in February 2009 that "A complete DNA read-out for every newborn will be technically feasible and affordable in less than five years, promising a revolution in healthcare" and that "by 2019 it will have become routine to map infants' genes when they are born".[105] This potential use of genome sequencing is highly controversial, as it runs counter to established ethical norms for predictive genetic testing of asymptomatic minors that have been well established in the fields of medical genetics and genetic counseling.[106][107][108][109] The traditional guidelines for genetic testing have been developed over the course of several decades since it first became possible to test for genetic markers associated with disease, prior to the advent of cost-effective, comprehensive genetic screening. It is established that norms, such as in the sciences and the field of genetics, are subject to change and evolve over time.[110][111] It is unknown whether traditional norms practiced in medical genetics today will be altered by new technological advancements such as full genome sequencing.

Today, parents have the legal authority to obtain testing of any kind for their children.[where?] Currently available newborn screening for childhood diseases allows detection of rare disorders that can be prevented or better treated by early detection and intervention. Specific genetic tests are also available to determine an etiology when a child's symptoms appear to have a genetic basis. Full genome sequencing, in addition has the potential to reveal a large amount of information (such as carrier status for autosomal recessive disorders, genetic risk factors for complex adult-onset diseases, and other predictive medical and non-medical information) that is currently not completely understood, may not be clinically useful to the child during childhood, and may not necessarily be wanted by the individual upon reaching adulthood.[112] In addition to predicting disease risk in childhood, genetic testing may have other benefits (such as discovery of non-paternity) but may also have potential downsides (genetic discrimination, loss of anonymity, and psychological impacts).[113] Many publications regarding ethical guidelines for predictive genetic testing of asymptomatic minors may therefore have more to do with protecting minors and preserving the individual's privacy and autonomy to know or not to know their genetic information, than with the technology that makes the tests themselves possible.[114]

Ethical Concerns

The majority of ethicists insist that the privacy of individuals undergoing genetic testing must be protected under all circumstances[115] . Data obtained from whole genome sequencing can reveal a lot of information not about the individual who is the source of DNA, but it also reveal a lot of probabilistic information about the DNA sequence of close genetic relatives[116] . Furthermore, the data obtained from whole genome sequencing can also reveal a lot of useful predictive information about the relatives' present and future health risks [117] . This raises important questions about what obligations, if any, are owned to the family members of the individuals who are undergoing genetic testing. In our Western/European society, tested individuals are usually encouraged to share important information on the genetic diagnosis with their close relatives since the importance of the genetic diagnosis for offspring and other close relatives is usually one of the reasons for seeking a genetic testing in the first place.[118]. Nevertheless, Sijmons et al. (2011) also mention that a major ethical dilemma can develop when the patients refuse to share information on a diagnosis that is made for serious genetic disorder that is highly preventable and where there is a high risk to relatives carrying the same disease mutation[119]. Under such circumstances, the clinician may suspect that the relatives would rather know of the diagnosis and hence the clinician can face a conflict of interest with respect to patient-doctor confidentiality[120].

Another major privacy concern is the scientific need to put information on patient's genotypes and phenotypes into the public scientific databases such as the locus specific databases[121]. Although only anonymous patient data are submitted to the locus specific databases, patients might still be identifiable by their relatives in the case of finding a rare disease or a rare missense mutation[122].

See also


External links

  • Archon X Prize for Genomics
  • James Watson's Personal Genome Sequence
  • AAAS/Science: Genome Sequencing Poster
  • Outsmart Your Genes: Book that discusses full genome sequencing and its impact upon health care and society
  • Whole genome linkage analysis

This article was sourced from Creative Commons Attribution-ShareAlike License; additional terms may apply. World Heritage Encyclopedia content is assembled from numerous content providers, Open Access Publishing, and in compliance with The Fair Access to Science and Technology Research Act (FASTR), Wikimedia Foundation, Inc., Public Library of Science, The Encyclopedia of Life, Open Book Publishers (OBP), PubMed, U.S. National Library of Medicine, National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health (NIH), U.S. Department of Health & Human Services, and, which sources content from all federal, state, local, tribal, and territorial government publication portals (.gov, .mil, .edu). Funding for and content contributors is made possible from the U.S. Congress, E-Government Act of 2002.
Crowd sourced content that is contributed to World Heritage Encyclopedia is peer reviewed and edited by our editorial staff to ensure quality scholarly research articles.
By using this site, you agree to the Terms of Use and Privacy Policy. World Heritage Encyclopedia™ is a registered trademark of the World Public Library Association, a non-profit organization.