Search
Close this search box.

Cost and time-efficient construction of a 3′-end mRNA library from unpurified bulk RNA in a single tube – Experimental & Molecular Medicine

Bulk transcriptome profiling of cell lysate in a single pot—BOLT-seq library preparation and workflow

As a method for gene expression profiling, BOLT-seq is a streamlined, inexpensive, and time-efficient; utilizes a 96-well microtiter plate format; does not require intermediate purification steps; and does not involve commercially available kits. The workflow and sequential steps of BOLT-seq library preparation are shown schematically in Fig. 1. To summarize, after cells were lysed in the well of a 96-well plate, reverse transcription was performed using in-house purified Moloney murine leukemia virus (M-MuLV) reverse transcriptase and bead-anchored oligo-dT primers [Supplementary Table 2]. Note that a substantial cost reduction was achieved by using in-house purified M-MuLV and in-house prepared reaction buffers. To ensure optimal RT efficiency, the activity of in-house purified M-MuLV RT was tested with different reaction buffers14. After the RT step, RNA/DNA hybrid duplexes were used for the Tn5 transposase-mediated tagmentation reaction; thus, second-strand cDNA synthesis could be omitted16. As the BOLT-seq protocol eliminates the step in which RNA transcripts are converted to double-strand DNA, the experimental cost and the time needed for library preparation are also reduced. Further cost reduction was achieved by purifying Tn5 transposase and preparing its reaction buffer in-house according to a protocol published by Picelli et al.15. The products of Tn5 transposase-mediated tagmentation were then used without purification in the subsequent gap-filling and PCR amplification steps. Thus, BOLT-seq is a streamlined procedure that reduces the total time for NGS library construction to 4 h at a dramatically reduced cost to US $1.40 per well.

Fig. 1: Overall BOLT-seq scheme.
figure 1

Bulk transcriptOme profiling of cell Lysate in a single poT—BOLT-seq. The steps of BOLT-seq are shown schematically. Sequential reactions were carried out in a single well of a 96-well microtiter plate with no intermediate purification steps. After the final library amplification step, products are purified from each well and pooled as desired.

Performance of BOLT-seq

To readily compared the method to traditional RNA-seq methods, BOLT-seq was performed under several different experimental conditions. Initially, the variables tested included the number of input cells per well and the presence or absence of a crowding agent, such as polyethylene glycol 8000 (PEG8000). HEK293T and A549 cell lines were used to compare each experimental condition. To test the robustness of BOLT-seq, initial experiments were performed using an appropriate number of cells per well, and in-house purified Tn5 transposase was used for RNA/DNA hybrid tagmentation. Finally, the index PCR cycle was fixed at 18 cycles for comparison.

As reported, the efficiency of the previous protocol decreased when >1000 cells were lysed in a single well29. Therefore, the BOLT-seq libraries prepared in this study were derived from no more than 1000 cells per well, and the performance of BOLT-seq using 100, 500, and 1000 cells per well was compared. For libraries derived from 100 cells, many NGS sequencing reads had to be discarded due to unacceptably short length (Supplementary Fig. 2a); as a result, no additional experiments were performed using 100 cells per well. More genes were detected in libraries prepared from 1000 cells than in libraries prepared from 500 cells, but this difference was not significant (Supplementary Fig. 2b). For consistency and comparability of reaction controls, all subsequent experiments in this study used 1000 cells per well.

PEG8000 is frequently added to RT reactions to increase molecular crowding, but it has also been used to effect conformational change and facilitate Tn5 transposase-mediated tagmentation of RNA/DNA hybrid duplexes16. In this study, no significant difference in gene detection was observed when BOLT-seq was performed in the presence of 0% or 9% PEG8000 (Supplementary Fig. 2c). However, a significantly higher DNA yield was obtained during the final PCR stage of BOLT-seq in the presence of 9% PEG8000. Therefore, all subsequent BOLT-seq reactions were performed in the presence of 9% PEG8000.

The reproducibility of BOLT-seq was evaluated by comparing the number of expressed genes detected in common between three independent replicate experiments in two different cell lines. The results showed that 11,150 genes were detected in all three replicates in HEK293T cells, and 10,823 genes were detected in all three replicates in A549 cells, representing 56.5% and 56.4% of all genes detected in HEK293T and A549 cells, respectively (Fig. 2a). In addition, normalized gene read counts were compared pairwise for all three replicate experiments in HEK293T and A549 cells, and correlation values ranged from 0.980 to 0.994 in HEK293T cells and from 0.988 to 0.993 in A549 cells (Fig. 2b).

Fig. 2: Performance of BOLT-seq.
figure 2

Reproducibility of the BOLT-seq method All experimental replicates were subsampled to 1 M. Venn diagrams in (a) represent overlapping subsets of genes detected in each of three replicate BOLT-seq samples using 1000 HEK293T (left) or A549 (right) cells per well in a 96-well microtiter plate. b Normalized gene read counts from each of the three replicate experiments per cell line shown in (a) were compared pairwise. Correlation values (r) for each pairwise comparison are shown in the upper left corner of each panel. Upper panels, HEK293T cells; lower panels, A549 cells. c Correlation between unique normalized ERCC read counts in two experimental BOLT-seq replicates using 1000 HEK293T cells. Left panel, NEBNext; right panel, BOLT-seq. d Correlation between observed and expected ERCC read counts using NEBNext (left) or BOLT-seq (right). e The relationship between the ERCC concentration and probe length using NEBNext (left), BOLT-seq 1/30 (middle), and BOLT-seq 1/100 (right). Undetected dropouts are indicated by a black circle.

BOLT-seq data were also verified using variable amounts of a spike-in reference standard and methods developed by the US National Institute of Standards and Technology-sponsored External RNA Controls Consortium (ERCC). For each experimental replicate in HEK293T and A549 cells, ERCC read counts were compared with expected ERCC values, and BOLT-seq data were compared with data from the NEBNext Ultra II RNA Kit. The results showed that normalized ERCC counts between replicates were highly correlated (r > 0.999) (Fig. 2c, Supplementary Fig. 3a) for the BOLT-seq and NEBNext methods, indicating that both methods achieve a high level of reproducibility. The observed ERCC counts and expected ERCC amounts were also highly correlated (Fig. 2d, Supplementary Fig. 3b), with correlation values ranging from 0.942 to 0.976 for NEBNext and from 0.924 to 0.950 for BOLT-seq. These results confirm that BOLT-seq achieves an acceptable and desirable level of performance. Transcript detection efficiency was also evaluated by plotting the ERCC probe concentration and length using the RNA-seq method; the results indicate that selective sequencing does not occur as a function of probe length. The results also show dropout, which is depicted as black circles in Fig. 2e, at low ERCC probe concentrations independent of the ERCC probe length. For replicates with BOLT-seq, the ERCC probe mix was spiked at ratios of 1:30 or 1:100 to demonstrate that low-concentration ERCC probes could be better detected and sequenced in samples prepared at a ratio of 1:30 than 1:100. These results indicate that variations in probe length or concentration are not associated with sequencing bias during BOLT-seq. Thus, compared to traditional mRNA-seq methods, BOLT-seq is simpler, less labor intensive, and less expensive.

Characterization of M-MuLV reverse transcriptase purified in-house

BOLT-seq is performed with in-house purified M-MuLV RT instead of commercially available RT to lower the cost of library preparation. The quality of in-house purified M-MuLV RT was established by comparing the number of DE genes detected by BOLT-seq with in-house M-MuLV RT, commercially available Maxima™-H RT, or SuperScript™ IV (SSIV) RT. These data were also compared with data generated with NEBNext. Venn diagrams representing the number of DE genes detected in each condition are shown in Supplementary Fig. 4a. The results show that in-house M-MuLV RT detects fewer DEs than SSIV RT, but it also detects more DEs than Maxima™-H RT. Therefore, while the performance of in-house M-MuLV RT is not optimal, it performs comparably to commercially available preparations of M-MuLV RT. We also compared the log2-fold change in DE genes with BOLT-seq or NEBNext (Supplementary Fig. 4b), which revealed that the results with BOLT-seq and NEBNext (r = 0.961) and the log2-fold change (lfc) observed with BOLT-seq or NEBNext using in-house M-MuLV RT are also highly correlated (r = 0.944). These results suggest that the function and activity of in-house M-MuLV RT are comparable to the function and activity of commercially available RT preparations. Thus, in-house M-MuLV RT can be safely used without compromising data quality while considerably reducing the cost of library construction with BOLT-seq.

Comparing BOLT-seq to other RNA-seq methods

Next, the performances of BOLT-seq, TRACE-seq30, and NEBNext were compared. For these experiments, 200 ng of total RNA was purified from HEK293T and A549 cells and used for library construction with TRACE-seq or NEBNext, whereas BOLT-seq was performed using 1000 HEK293T or A549 cells per well. For each experimental method, DE genes were identified by comparing HEK293T and A549 cells using DESeq2. A p value less than 5 × 106 and |log2-fold change| > 1 were used as cutoff criteria, and the results are represented in the Venn diagrams in Fig. 3a. TRACE-seq and NEBNext detected 2288 and 3296 DE genes, respectively, while 1007 DE genes were detected by both NEBNext and BOLT-seq, representing 85% of all DE genes detected by BOLT-seq (Fig. 3a, left). Similarly, 997 DE genes were detected by both TRACE-seq and BOLT-seq, representing 84.1% of all DE genes detected by BOLT-seq (Fig. 3a, right). Although BOLT-seq detects far fewer DE genes than canonical RNA-seq methods, >80% of the DE genes detected by BOLT-seq are also detected by TRACE-seq and NEBNext. In addition, the lfc values of DE genes detected by NEBNext or TRACE-seq and BOLT-seq were highly correlated (r > 0.94 for both; Fig. 3b, left and right). Therefore, the reproducibility and performance of BOLT-seq and established RNA-seq methods are comparable.

Fig. 3: Comparison between BOLT-seq and the canonical RNA-seq method.
figure 3

Comparison between BOLT-seq and the canonical TRACE-seq and NEBNext methods. All replicates were subsampled to 1 M (replicates: NEBNext n = 3, TRACE-seq n = 3, BOLT-seq n = 3). a Venn diagrams show overlapping subsets of DE genes detected with NEBNext and BOLT-seq (left) or TRACE-seq and BOLT-seq (right). NEBNext and TRACE-seq were performed using 200 ng of total RNA purified from HEK293T or A549 cells. BOLT-seq was performed using the lysates of 1000 HEK293T or A549 cells. b Correlation between the log2-fold change (lfc) of DE genes detected by NEBNext and BOLT-seq (left) or TRACE-seq and BOLT-seq (right). Cutoff criteria were threshold: |lfc| > 1, p-adj <0.05).

Application of BOLT-seq

By continuously adding reagents into a single tube without purifying the reaction products between sequential reaction steps, the cost and time needed for large-scale preparation of NGS libraries can be reduced. Here, a proof-of-concept experiment was performed to show that BOLT-seq can be used in a large-scale screen for drug-induced perturbation of gene expression in NIC-H358 cells, which are KRAS G12C mutant non-small cell lung carcinoma cells. The cells were exposed to 35 drugs [Supplementary Table 3] in triplicate on two experimental days with 9 replicates of the DMSO control, generating 213 data points per sample (Fig. 4a). The NGS library was prepared through the BOLT-seq method, and then gene expression profiling was performed according to the bioinformatic pipelines from the DRUG-seq method13,31,32, followed by unsupervised clustering to identify genes with similar drug-induced perturbations33,34 (Fig. 4b). Clusters appeared to reflect the type of drug (Supplementary Fig. 5a) but were not influenced by the date of drug treatment (Supplementary Fig. 5b). For example, Cluster 1 included single drugs or drug combinations that target the mitogen-activated protein kinase (MAPK) pathway, which includes the KRAS and MEK proteins. Drugs that inhibit KRAS or MEK inhibit growth and induce apoptosis in NIC-H358 cells33,34,35. AMG510 is a drug that targets KRAS36, while trametinib targets MEK34, both of which are genes in the MAPK pathway. As AMG510 and trametinib are involved in the same pathway, they are expected to cluster together. To confirm this, we used the DEseq normalization method21 and analyzed the normalized and p-adjusted values of gene expression in AMG510- and Trametinib-perturbed samples. Then, the top 20 significant genes for these drugs were identified based on the lowest p-adjusted values (Fig. 4c). BOLT-seq data were also used to identify genes in the ERK pathway37,38, such as FOSL1 and CCND1, in cells treated with AMG510 and trametinib. Gene set enrichment analysis (GSEA)27 was performed to analyze the perturbation of gene expression in NIC-H358 cells treated with AMG510 or trametinib. The gene set (see Methods) of the results analyzed by GSEA, which is the result of gene expression of NIC-H358 cells treated with ARS1620 and trametinib, was used. GSEA confirmed that AMB510 and trametinib downregulate the expression of KRAS-related genes, including E2F transcription factors, the MYC regulatory network, ERK activation, and KRAS dependency signatures22,23,24,25,26 (Fig. 4d).

Fig. 4: Application of BOLT-seq.
figure 4

a Schematic diagram of drug screening with BOLT-seq. b Uniform manifold approximation and projection (UMAP) clustering of data for 35 drugs × 2 days × three replicates (DMSO n = 9). c Top 20 significant DE genes detected in cells exposed to AMG510 or Trametinib. d Gene Set Enrichment Analysis for AMG510 and Trametinib.

In summary, BOLT-seq is a novel RNA-seq method that facilitates large-scale transcriptome profiling and drug screening experiments by dramatically reducing the labor, time, and cost of library construction relative to canonical methods. The present study also demonstrates that BOLT-seq performs as well as canonical RNA-seq methods.