Deep learning enables the use of ultra-high-density array in DNBSEQ

DNBSEQ workflow

We conducted DNBSEQ sequencing experiments on UHD arrays, generating four DNB image datasets (dataset 1, dataset 2, dataset 3, dataset 4) from single-end 50 bp (SE50) sequencing and four DNB image datasets (dataset 5, dataset 6, dataset 7, dataset 8) from paired-end 100 bp (PE100) sequencing. SE50 read length is sufficient for RNA-seq profiling or counting experiments, such as noninvasive prenatal testing (NIFTY), and PE100 read length is typical for whole-genome sequencing (WGS), which requires deep coverage.

Library preparation and sequencing

Standard reference E. coli DNA was used to construct a single-strand circular DNA library following the single-strand circularization protocol of DNBSEQ26. The insert fragment of the single-strand circular DNA was approximately 170 bp. To make DNBs, single-strand circular DNAs were hybridized with primers. Next, dNTPs and DNA polymerase were introduced into the reaction mixture to initiate rolling circle amplification for 30 min, which was then terminated by the DNB stopping buffer. Finally, a DNB concentration greater than 10 ng/µL was quantified using a Qubit ssDNA Assay Kit and a Qubit Fluorometer.

For loading DNBs, we added 1/3 volume of DNB loading buffer to the DNB product from the previous step. Then, the mixture was placed into the loader device and automatically loaded onto the UHD array. Here we waited 30 min to ensure that DNBs settled adequately on the chip. Finally, post-loading reagents were added to further stabilize the DNBs.

For sequencing by synthesis, DNBs were hybridized with primers complementary to the adapter region. During each cycle, a mixture of four individually labeled dNTPs was added and incorporated with the elongating complementary templates under the catalysis of DNA polymerase. Following the synthesis process, unbound dNTPs were removed by washing buffer, and the DNBs were protected from laser-induced damage by imaging buffer. Upon laser excitation, the DNBs emitted fluorescence, and a fluorescence microscopic system captured WF DNB images of the four base channels. Afterwards, the fluorochrome and termination groups were removed, and the next cycle began. This synthesis-imaging-cleavage cycle continues until the designed read length has been reached.

Base-calling

Zebracall (base-calling software provided by the DNBSEQ manufacturer MGI Inc.) was used to call bases from the DNB images. The general workflow is shown in Fig. 3. First, the background was removed for each image. Next, four channel images of the same cycle were aligned with a grid pattern template, and the raw intensities were extracted from each DNB site. The raw intensities were subjected to normalization, crosstalk correction, and phasing correction to remove spatial and temporal signal crosstalk. The crosstalk among channels is caused by imperfect wavelength filtering. Phasing correction is required because incorporations are out of phase among the hundreds of DNA copies within a single DNB, thereby degrading the purity of the DNB signal in the wavelength domain. After all correction steps, the bases with the highest probability were called based on the final intensities across the four channels. The possible error rate of each base was subsequently determined using a previously established probability model, and the corresponding quality score was calculated by

$$begin{array}{c}Q=-10{mathit{log}}_{10}P ,end{array}$$

(1)

where (P) denotes the possible error rate and (Q) is the quality score. After base-calling for all cycles was completed, a binary file containing the raw reads and quality score of each base was converted to FASTQ format with the Phred+33 quality score.

Fig. 3
figure 3

The general workflow of the base-calling process, which converts cyclic raw DNB images collected from four wavelength channels into sequential reads. During each cycle, images from the four channels are registered based on a grid pattern template, and the base identity of each DNB spot is determined based on its relative intensities in the four channels. In this way, reads are called sequentially. Green, yellow, orange, and blue sites indicate fluorescence corresponding to A, T, G, and C bases, respectively, while black sites indicate no fluorescence.

To assess the quality of the reads extracted from DNB images, global quality metrics such as the Q30, mapping rate (MR), and effective spot rate (ESR) were further calculated. Q30 represents the proportion of bases with an estimated error rate no more than 0.001 in all reads and serves as a measure of the overall quality of the reads. MR is the proportion of reads successfully mapped to the reference genome and serves as a measure of the accuracy of the reads. BWA27 was employed to calculate MR. ESR represents the proportion of effective reads to total reads following filtering and reflects the productivity of the array. Bases with a quality score less than 20 were considered bad bases. In SE50 sequencing, reads with a bad base proportion exceeding 20% were filtered out. In PE100 sequencing, when the percentage of bad bases exceeded 30% in the first read or 35% in the second read, the read was filtered out.

Training dataset preparation

To conduct deep learning for DNB image SR, it is necessary to generate a training dataset that contains pairs of LR-HR DNB images. Although generating LR images using a regular WF microscopic system and HR images using an SR microscopic system is feasible, co-register image pairs is not simple. On the other hand, various microscopic systems may not share the same optical specifications, which prevents the model from transferring in a convenient way. Therefore, we generated the training dataset by simulating pairs of LR-HR DNB images.

We performed a simulation of DNB images comprising multiple cycles. Each cycle consisted of four pairs of LR and HR images corresponding to the A, T, G, and C base channels. The simulation involved three stages: DNB preparation, placement, and imaging. First, sequence templates were sampled from the human reference genome and assigned different copy numbers to simulate DNBs. DNBs were randomly placed at each binding site within the simulated array for imaging purposes. The binding sites had a diameter of 130 nm and a pitch of 360 nm. Before imaging, the point spread function (PSF) of the microscopic system was modeled by a Gaussian function that is determined by two key parameters: the emission wavelength of the fluorochrome and the NA of the collection objective. The center emission wavelengths of the fluorochromes for the A, T, G, and C bases were 560 nm, 611 nm, 685 nm, and 728 nm, respectively. The objective’s NAs for generating LR and HR images were 0.8 and 1.39, respectively. The NA for the LR images was set to 0.8 following the imaging system’s specification. The NA for the HR images was set to 1.39 to clearly separate adjacent DNBs. During the imaging simulation of each cycle, numerous seed points were selected at 1 nm intervals within each DNB. For each base channel, the fluorescence intensity of every seed point was calculated based on a series of parameters, such as DNA sequence, copy number, and fluorochrome distribution, and the corresponding PSF was used to sequentially generate images of all seed points, which were subsequently superimposed to obtain the final LR and HR nanoscale images. The nanoscale images were digitized and transformed into grayscale with a pixel size of 130 nm. Gaussian noise was introduced as a global background to both the LR and HR images. Figure 4 illustrates the process of imaging simulation for a single DNB, and Fig. 5 shows a sample of simulated LR and HR image pairs.

Fig. 4
figure 4

The process of imaging simulation for a single DNB. (a) Numerous seed points are selected for imaging. (b) The seed points are imaged independently and then superimposed to generate a nanoscale image. (c) Digitized and converted into a grayscale image.

Fig. 5
figure 5

The simulated LR (a) and HR (b) image pairs.

DNBSRN structure

Inspired by the framework of RFDN, we propose the efficient network DNBSRN, which consists of one global residual connection, one coarse feature extraction (CFE) layer, three shallow residual blocks (SRBs), and three intermediate information collection (IIC) layers. Given a preprocessed LR image ({I}_{LR}) as input, the SR image ({I}_{SR}) can be generated by

$$begin{array}{c}{I}_{SR}={H}_{DNBSRN}left({I}_{LR}right) ,end{array}$$

(2)

where ({H}_{DNBSRN}left(cdot right)) is DNBSRN. It is optimized by the smooth L1 loss28. Given a training set that has (N) LR-HR image pairs with a pixel size of (wtimes h), the loss function of DNBSRN can be expressed by

$$begin{array}{c}{L}_{DNBSRN}left(theta right)=frac{1}{N}sum_{n=1}^{N}left(frac{1}{wtimes h}sum_{i=1}^{wtimes h}{loss}_{i}^{n}right) ,end{array}$$

(3)

where (theta) represents the learnable parameters of DNBSRN. ({loss}_{i}^{n}) denotes the loss value of the (i)-th pixel in the (n)-th image. It can be expressed as

$$begin{array}{c}{loss}_{i}=left{begin{array}{c}frac{0.5{left({widehat{Y}}_{i}-{Y}_{i}right)}^{2}}{beta },if left|{widehat{Y}}_{i}-{Y}_{i}right|<beta left|{widehat{Y}}_{i}-{Y}_{i}right|-0.5times beta , otherwiseend{array} ,right.end{array}$$

(4)

where ({widehat{Y}}_{i}) and ({Y}_{i}) denote the (i)-th pixel values of the predicted HR and ground truth HR, respectively. (beta) is set to 1 in this study.

The detailed structure of DNBSRN is shown in Fig. 6. Initially, we use the CFE, which contains a convolutional layer, to extract coarse features from the input ({I}_{LR}).

$$begin{array}{c}{F}_{0}={H}_{CFE}left({I}_{LR}right) ,end{array}$$

(5)

where ({H}_{CFE}(cdot )) represents the convolutional operation for coarse feature extraction and ({F}_{0}) is the extracted coarse feature. Then, three SRBs are employed for deeper feature extraction. This process can be described as

$${F}_{k}={H}_{{SRB}_{k}}left({F}_{k-1}right),k=text{1,2},3 ,$$

(6)

where ({H}_{{SRB}_{k}}) denotes the (k)-th SRB and ({F}_{k}) denotes the (k)-th feature extracted by the (k)-th SRB. Following each SRB, an IIC branch containing a convolutional layer is introduced to collect the intermediate information extracted by the preceding layers. This process can be described as

$${P}_{k}={H}_{{IIC}_{k}}left({F}_{k}right),k=text{1,2},3 ,$$

(7)

where ({H}_{{IIC}_{k}}) denotes the (k)-th IIC after the (k)-th SRB and ({P}_{k}) denotes the (k)-th information collected by the (k)-th IIC. Finally, we add the collected information to the input ({I}_{LR}) to obtain the SR image ({I}_{SR}).

Fig. 6
figure 6

DNBSRN structure. “Conv-7” denotes the convolutional layer with a kernel size of 7 × 7. The number displayed on the right side of each layer indicates the number of output channels.

$$begin{array}{c}{I}_{SR}={I}_{LR}+sum_{k=1}^{3}{P}_{k} end{array}.$$

(8)

The highlights of the proposed DNBSRN can be summarized as follows:

  1. (1)

    Given that LR and HR DNB images have the same pixel size and similar features at corresponding positions, we establish a direct global residual connection between the input LR and output HR. This allows the main body of DNBSRN to focus on learning the disparity between LR and HR, leading to a considerable decrease in the complexity of DNB image SR.

  2. (2)

    The body of DNBSRN uses three SRBs to progressively extract deeper features. SRB consists of a convolutional layer, a ReLU activation layer, and a local residual connection. It can take advantage of residual learning while maintaining efficiency.

  3. (3)

    An IIC branch is constructed following each SRB to collect the intermediate information, which is subsequently used to hierarchically modify the pixel value of the LR to produce the HR. IIC is composed solely of a single convolutional layer and fully utilizes the extracted features.

  4. (4)

    Since a single DNB in LR images typically occupies no more than 7 × 7 pixels, we use 7 × 7 kernel size for all convolutional layers. This approach enables the extracted feature maps to effectively perceive the information from each DNB.

Image preprocessing

For preprocessing the training dataset, the LR and HR images were normalized and subsequently fed into the network for training. This can be expressed as

$$begin{array}{c}{Input}_{1}=frac{{I}_{1}}{percentileleft({I}_{1}, 99right)times randomleft(0.9, 1.1right)} ,end{array}$$

(9)

where ({I}_{1}) denotes the original image of the training dataset. (percentileleft({I}_{1}, 99right)) calculates the ({99}^{th}) percentile intensity value of image ({I}_{1}). (randomleft(0.9, 1.1right)) provides a random number between 0.9 and 1.1, which is used to augment the training data. ({Input}_{1}) represents the normalized image that is fed into the network.

In the training dataset, images of all cycles exhibit exceptional quality with strong DNB signals and high contrast. However, in real experiments, only the images captured during the first few cycles would exhibit equal quality as the training dataset, and the image quality progressively deteriorates due to incomplete biochemical reactions and DNB degradation, as shown in Fig. 7. The image quality degradation is evident for cycles between 10 and 50 rounds. This results in an increased disparity in the data distribution compared to that of the training dataset, thereby diminishing the quality of the reconstructed SR images. To address this problem, we introduce a histogram matching (HM) approach for preprocessing real images. HM enables us to increase the image quality of subsequent cycles without altering the pixel-to-pixel size relationship. The approach adheres to the same fundamental principle as the conventional HM algorithm29, which seeks to map pixel values from the gray histograms of the input and reference images and then adjusts the pixel value of the input image accordingly.

Fig. 7
figure 7

The image quality degradation over cycles in the real experiment.

For preprocessing the real images, images from the first cycle are used as a reference. The images from subsequent cycles are adjusted to match the histogram of the reference. Subsequently, the images are normalized and fed into the network for SR reconstruction. This can be expressed as

$$begin{array}{c}{Input}_{2}=frac{HMleft({I}_{2}right)}{percentileleft(HMleft({I}_{2}right), 99right)} ,end{array}$$

(10)

where ({I}_{2}) denotes the original image of the real images. (HMleft(cdot right)) denotes the HM operation, and the HM result is then normalized with its ({99}^{th}) percentile intensity. ({Input}_{2}) denotes the preprocessed image as input to the network.

Modification to other networks for comparison

We selected six representative image SR networks (EDSR, RDN, RCAN, IMDN, RFDN, and RLFN) for comparison with the proposed DNBSRN. These networks were proposed for improving the resolution of photorealistic RGB images, which consist of three input and output channels, along with an upscale module at the tail. Considering that DNB images are grayscale with only one channel and do not require upscaling for SR, these networks were adapted to satisfy the specific demands of the DNB image SR task. The number of input and output channels was reduced from three to one, and the upscale module was removed. It is worth emphasizing that the core structure of these networks remained unchanged.

Network training and real image reconstruction

Seven networks, including DNBSRN, EDSR, RDN, RCAN, IMDN, RFDN, and RLFN, were trained and evaluated. Additionally, five modified structures of DNBSRN were trained to conduct ablation experiments. A total of 50 cycles of simulated images were used to train the networks. The initial 40 cycles were allocated for training, and the remaining 10 cycles were allocated for validation. During each epoch, a total of 3200 sub-images with a size of 512 × 512 pixels (128 × 128 pixels for EDSR, RDN, and RCAN) were randomly cropped from training images (with a size of 2200 × 2200 pixels), as well as 800 sub-images from validation images. These sub-images were then shuffled, subjected to preprocessing, and fed into the network with a batch size of 8. The networks were trained for 70 epochs, and the final selected model was the one with the lowest loss in the validation set. The smooth L1 was employed as the loss function. The learning rate was set to 0.0001 throughout the training process. The networks were optimized using the Adam optimizer30 with β1 set to 0.9, β2 set to 0.999, and eps set to 10–8. All networks were trained using identical configurations and the same random seed, except for the differences in sub-images. The real images of the eight experimental datasets were also subjected to identical preprocessing steps before being fed into different networks for reconstruction. The training and reconstruction were performed on a computer workstation equipped with an Intel(R) Xeon(R) Gold 5115 CPU @ 2.40 GHz 2.40 GHz (2 processors) and an NVIDIA Tesla A100 GPU (80 GB). The networks were implemented using Python v.3.11.3 and PyTorch v.2.0.0.