New Generation Sequencing Technology

The first generation of sequencing technology, represented by the Sanger method, has realized the leap of sequencing technology from scratch, with which people have accomplished the "human genome sketch", as well as the genome sequencing of plants, animals, microorganisms, and other model organisms. The first generation of sequencing has been developed and automated, but in general, the throughput is low, the process is long, the cost is high, and it is difficult to be commercialized. New sequencing technologies are developing with irresistible momentum.

Next-generation sequencing (NGS) began with the invention of the Roche 454, Illumina GA/HiSeq, Life SOLiD/Ion Torrent, and PacBio RS technologies in 2005, which have led to rapid increases in sequencing throughput and dramatic reductions in sequencing costs. sequencing costs have been greatly reduced. This article introduces Illumina sequencing technology, Pacific BioSciences SMRT RS sequencing technology, Oxford nanopore sequencing technology, and DNBSEQ TM sequencing technology of UW platform.

This technology is currently the most widely used next-generation genome sequencing platform, which uses sequencing-as-synthesis technology to realize massively parallel sequencing. Originally developed by Solexa, it is also known as Solexa Sequencing, and Illumina sequencing platforms are available in a wide range of instruments, from the HiSeq series of sequencers, which are ultra-high-throughput and suitable for sequencing centers and companies, to the MiSeq series, which is suitable for small- to medium-sized laboratory sequencing and hospital-based medical diagnostic sequencing. There are also MiSeq and NextSeq500 sequencers for small and medium-sized laboratory sequencing and medical diagnostic sequencing in hospitals, as well as MiniSeq and NextSeq500 sequencers for medical diagnostic sequencing. The sequencing modes are high-throughput mode and fast mode. The high-throughput mode is compatible with a wider range of sample sizes and applications, while the fast mode is faster and less expensive per run.

The principle of Solexa sequencing technology is explained by the user "Star Idealist" in "Zhihu", "Z_Y_H" in "Z_Y_H" and other platforms. "

(A detailed analysis of the principles of second-generation sequencing! - Starry Idealist's article - 知乎);

(illumina double-end sequencing - Z_Y_H's article - ).

The following is only a brief summary. The first step in the method is sequencing library preparation (Figure 1). The genome is broken into small fragments, the ends are made up, an A tail is added at the 3' end, and an Illumina-specific junction (with an oligo T at one end) is added, which contains an enzyme cleavage site and is a circular, non-complementary strand, thus forming a "spherical" junction. When cut with restriction enzymes, it forms a "Y-shaped" end, allowing for step-by-step replication. At the end of the double strand, a fragment complementary to the subsequent amplification primer is added (this is achieved by replicating the primer twice, with different primer sequences at the two ends, P5 and P7), and a barcode is added to the inside of the primer (with different markers at the two ends), which can be made up to the flat end by Taq enzyme, if necessary. After the above work is done, purification should be carried out, and only the sequences with correctly added junctions should be used in the subsequent reaction to remove impurities such as primers and enzymes. This can be done for the entire library at once.

After purification, the library undergoes cluster generation, a process designed to amplify the sequencing signal. The denatured sequencing library is amplified on a flow cell with eight lanes and binds to oligonucleotide-specific complementary binding (P5', P7') cured on the walls of the lane slides, and after one round of replication (5'->3'), the template is cut off and washed away. ), the template is cut off and washed away, and the oligonucleotides immobilized on the chip are synthesized into complementary strands. Bridge amplification is then initiated to amplify the library with the DNA fragments to be tested to about 1000 copies, each copy having the same DNA sequence, thus forming a cluster. The last amplification product is denatured into single strands, which can be used for paired-end sequencing, but it is common practice to utilize the 8-oxo-G selective cleavage effect of formamidopyrimidine glycosylase (Fpg) to selectively cut the forward or reverse sequences away from the junction, avoiding the effect of sequence addition due to the proximity of the strands. The effect on sequence addition due to close chain spacing is avoided. If the reverse sequence is cut first, then after sequencing, repeat the previous reaction until selective cutting, cut the forward sequence, and reverse sequencing can be performed.

The third step is the sequencing process, which is performed on the Illumina platform using SBS technology and a 3'-blocked reversible terminator. 3'-blocked reversible terminators are used for sequencing. The 3'-blocked reversible terminator technology utilizes a fluorescently labeled special nucleotide whose 3'-hydroxyl group is chemically blocked, resulting in the incorporation of only one nucleotide per DNA strand synthesis, but when the fluorescent image is recorded, the chemically blocked group is enzymatically cleaved off and the 3' end regains its ability to be joined, and so on. end regains its ability to join, and so on. Conceivably, since the denaturing library initially contains template-complementary strands (the labeling of which is indistinguishable) that are sequenced at the same time in a single process, complementary strand identification is required when collecting the results of single read sequencing to prevent sequence splicing from occurring in a chaotic manner. Forward and reverse bipartite sequencing greatly improves sequencing efficiency.

Because Illumina sequencing technology uses amplification into clusters to amplify signals, it is not possible to reach the level of one generation of sequencing because when the amplification is increased to hundreds of bases, the addition of bases within the cluster will no longer be synchronized by the concentration of nucleotides, enzyme activity, and so on, which will result in the signal being confused, leading to inaccurate read sequencing.

This technology is Pacific BioSciences' single molecular real time DNA sequencing technology, which is based on the principle of sequencing while synthesizing. The technology is characterized by long read lengths, with an average sequence length of more than 10 kb and a maximum read length of 40 kb. It uses the SMRT Cell as the sequencing carrier, which is a 100-nm-thick metal sheet with 150,000 (2014) tens of nanometers in diameter on one side of the sheet, known as a zero-mode waveguide (ZMW). mode waveguide, ZMW), also known as nanoholes. For sequencing, the system places sequencing libraries, DNA polymerase, and dDNA with different fluorescent markers into the bottom of the nanopore for DNA synthesis reactions. Typically, one nanopore holds one DNA polymerase, Fenix, and one DNA template.

The special feature of this method is that sequencing libraries do not need to be amplified because the zero-mode waveguide dramatically eliminates the noisy fluorescence background, and the fluorescence released by the single-molecule reaction can be accurately recorded. In addition, SMRT technology is used to detect the binding of the labeled fluorescent group to the deoxyribonucleotide not at the base, but at the 3rd phosphate group of the 5' phosphate group, so that the fluorescent group is cleaved off along with the pyrophosphate after the condensation reaction when dNTP binds complementarily to the sequencing template to the sequencing primer, eliminating the need for an enzymatic cleavage step.

In 2014, Oxford Nanopore introduced several sequencers based on nanopore sequencing technology, MinION, PromenthION, and GridION. The basic principle is that by applying a certain voltage at both ends of a small nanoscale hole filled with electrolyte, the current intensity through this nanopore can be easily measured. The diameter of the nanopore can only accommodate the passage of one nucleotide, and when the nucleotide passes through, the nanopore is blocked by the nucleotide, and the intensity of the current passing through becomes weaker with it. Due to the different spatial conformations of the four nucleotide bases, the degree of change in the current intensity that is attenuated as they pass through the nanopore varies. Therefore, real-time sequencing can be realized by simply detecting the change in current strength when DNA passes through the nanopore.

The method also eliminates the need for library amplification, and the long, real-time, single-molecule reads can greatly reduce sequencing costs.

This technology originated from Complete Genomics and was improved by UW. Below is a brief description of how the method works, based on an article on the UW website.

Following the process in Figure 2, UW Genomics amplifies DNA sequences using DNA nanosphere technology, in which DNA is split into segments of about 100 base pairs. Each side of the fragment is joined to the first junction, junction 1. The joined DNA fragment is amplified to form a ring-joined sequence fragment by combining the two ends of the junction. This looped DNA fragment is cut to join the second articulator, junction 2, and the amplification and cyclization process is repeated. Once the four junctions, junctions 1,2,3 and 4 (the sequences of the four junctions are different, some are oligonucleotides complementary to the primers, some are markers, etc.) have been incorporated into the DNA fragment, the final circular DNA is amplified by ring rolling to produce a DNA nanoball (DNA nanoball, DNB). The rolled ball amplification does not cut off individual copies, but rather utilizes a method that denatures the amplified DNA without affecting the extension, resulting in a multicopy ball of nanoballs. The nanospheres are immobilized in a mesh of small holes on the chip.

Sequencing uses combinatorial probe-anchor ligation sequencing (cPAS) and multiple-displacement amplified double-end sequencing (MDA-PE) to increase the sequencing read length, allowing for double-end sequences of 100bp/150bp. So far, there is no detailed disclosure of the combined probe-anchored polymerization technique. The specific principle of MDA-P is that after sequencing the first strand (Forward Strand) by replicative extension, the first strand template is removed and the second strand (Reverse Strand) is synthesized at the same time under the action of high-fidelity polymerase with strand replacement function, and the second strand is sequenced by DNA molecular anchor.

In contrast to other second-generation sequencing, DNBSEQ TM sequencing technology is based on DNB, which is amplified with the initial template, unlike the exponential growth of PCR, and most importantly does not allow for the accumulation of errors that occur during amplification.

[1] Haofeng Chen. Next Generation Genome Sequencing Technologies [M]. First Edition. Beijing: Science Press, 2016.

[2] Detailed analysis of the principles of second-generation sequencing! - Articles by Starry Idealist - 知乎.

[3] illumina double-ended sequencing - Z_Y_H's article - .

[4] UW Technologies - Product Services - Whole Exome Sequencing - Product Introduction .

[5] Demystifying the Human Genome: Next Generation Sequencing - (I) - Jing Zhou's article - LEXOLOGY.