Search
Close this search box.

Circular single-stranded DNA as a programmable vector for gene regulation in cell-free protein expression systems – Nature Communications

Gene expression of CssDNA in the CFE system

To evaluate the performance of the CssDNA vector in the CFE system, we designed two customized versions of CssDNA, CssDNA(+) and CssDNA(−), containing a T7 promoter region and an enhanced green fluorescence protein (EGFP) coding region. CssDNA(+) presented the expression cassette encoding the EGFP sense strand from 5’ to 3’ (Fig. 1a, left). In contrast, CssDNA(−) contained the complementary sequence of the expression cassette encoding EGFP antisense strand in the direction of 3’ to 5’ (Fig. 1a, right). We created these customized CssDNA vectors using a pScaf phagemid containing an M13 origin of replication (M13 ori) and a mutant M13 ori (Supplementary Fig. 2)47. To characterize the CssDNA, we compared it to its corresponding plasmid using agarose gel electrophoresis and atomic force microscopy (AFM). As shown in Fig. 1b, a single gel band in the CssDNA lane (1605 nt), which migrated faster than the plasmid, indicating the high purity of CssDNA production. The AFM images of CssDNA displayed a curled structure resembling a ball of wool, whereas the structure of plasmid appeared more stretched, due to the high flexibility of single-stranded DNA instead of double-helical DNA (Fig. 1c)48.

Fig. 1: Characterization of CssDNA and regulation of its protein expression in the CFE system.
figure 1

a Design schematics of CssDNA vectors, where colors depict positions of vector features such as T7 promoter (red), EGFP (green) and M13 ori (orange). In sense circular single-stranded DNA (CssDNA(+), left), the coding sequence of the expression cassette is presented from 5’ to 3’. In antisense circular single-stranded DNA (CssDNA(−), right), the template sequence of the expression cassette is in the reverse direction. b 1% agarose gel analysis of CssDNA(+) and its corresponding plasmid (S-plasmid), CssDNA(−) and its corresponding plasmid (AS-plasmid). c AFM images of CssDNA (left) and plasmid (right), scalebar 200 μm. d Schematic of gene expression of CssDNA(+) or CssDNA(−) vector in the CFE system. The CFE system contains all the essential components for CssDNA gene expression, with some representative components (not all) shown in the dashed box. e, f Changes in fluorescence signal over time for CssDNA(+) (e) and CssDNA(−) (f) vectors (5 ng/μL) during expression relative to the blank group. g Schematic representation of the regulation of CssDNA(+) or CssDNA(−) vector gene expression by dNTPs and aphidicolin in the CFE system. h, i Changes in protein expression levels over time for CssDNA(+) (h) and CssDNA(−) (i) vectors (5 ng/μL) in the presence of dNTPs or aphidicolin. All fluorescence signals were normalized according to the fluorescence intensity of the highest expression level of the corresponding expression vector. The maximum rate constant of the fluorescence kinetic curves in e, f, h and i was obtained by taking the first derivative of the corresponding curve, as shown in Supplementary Fig. 4. Data collected in e, f, h and i were monitored by a microplate reader and are presented as mean ± standard deviation (s.d.) for n = 3 biologically independent experiments, source data provided.

In this study, we utilized a commercially available cell-free gene expression system based on yeast extract49. We tracked the process of CssDNA protein expression using a microplate reader and measured EGFP fluorescence intensity in real time. Our results show that the target protein can be effectively expressed in the CFE system from both CssDNA(+) and CssDNA(−) gene vectors (Fig. 1d, Supplementary Fig. 3). The production of EGFP from both vectors gradually increased with prolonged reaction time, reaching a plateau (Fig. 1e, f). CssDNA(+) exhibited a comparable EGFP fluorescence curve as CssDNA(-), but CssDNA(−) took less time to reach the plateau, suggesting that its expression rate was faster than for CssDNA(+). We extracted the maximum expression rate constants from the fluorescence curves to quantitatively compare the two CssDNA vectors (Supplementary Fig. 4). When the protein expression level of the CssDNA vector reached a plateau (i.e. after the expression had stopped), we also purified the produced protein and quantified its yield (Supplementary Fig. 5). In addition, we also compared the expression level of CssDNA template with that of traditional expression template (plasmid), and optimized this cell-free expression system to improve the expression level of CssDNA, as shown in Supplementary Fig. 6.

Effects of different expression components for CssDNA in CFE systems

To better understand the expression of CssDNA in the CFE system, we examined the influence of various expression components on CssDNA expression. Initially, we supplemented the original system with deoxynucleoside triphosphates (dNTPs) as substrates for DNA synthesis (Fig. 1g, upper panel). In living organisms, dNTPs are utilized by DNA polymerases as substrates for genome replication, and these polymerases are responsible for the replication and repair of cellular DNA50,51. After dNTPs were added, we observed a considerable increase in EGFP levels from both CssDNA(+) and CssDNA(−) vectors, which suggests that dNTPs effectively promote CssDNA gene expression in the CFE system (Fig. 1h, i, red, Supplementary Fig. 7). The speed of protein expression remained faster for CssDNA(−) than for CssDNA(+). Furthermore, we investigated the impact of aphidicolin, a tetracyclic diterpenoid, on gene expression in this system (Fig. 1g, bottom panel). Aphidicolin is a DNA polymerase inhibitor that thwarts cellular DNA synthesis by disrupting DNA polymerase activity52. The fluorescence signal decreased by over 50% after adding aphidicolin, indicating that aphidicolin effectively represses protein expression of both CssDNA vectors (Fig.1h, i, gray, Supplementary Fig. 7, 8). Furthermore, we also explored the effect of aphidicolin on the plasmid vector, and discovered that aphidicolin had no effect on the plasmid, compared to both CssDNA vectors (Supplementary Fig. 8). Based on these findings, we suggest that the expression of CssDNA is linked to DNA synthesis by DNA polymerases. Different expression components may impact the level of CssDNA protein expression, which establishes the foundation for the regulation of CssDNA gene expression.

Effects of T7 promoter region for CssDNA in CFE systems

The T7 promoter region is a specific DNA sequence that is recognized by T7 RNAP and initiates transcription. The integrity of this promoter domain is a prerequisite for RNA transcription53. To explore the role of the T7 promoter sequence in CssDNA vector during gene expression in the CFE system, we added T7 complementary stands to the CssDNA(+) and CssDNA(−) vectors, respectively (Fig. 2a). Figure 2b, c demonstrated that the addition of T7 complementary strands corresponding to CssDNA(+) had no effect on CssDNA(+) expression, whereas the addition of T7 complementary strands to CssDNA(-) significantly improved CssDNA(−) protein expression (Supplementary Fig. 9, 10, 15). The gene expression level of CssDNA(-) responded strongly to the presence of T7 complementary strands, with the final EGFP fluorescence intensity increasing to three times the original. The promoting effect of the T7 complementary strands was also demonstrated by the maximum rate constants of the corresponding fluorescence curves (Supplementary Fig. 11). We also compared the effect of T7 complementary strand on the plasmid vector (Supplementary Fig. 12). As expected, the additional of T7 complementary strand didn’t have any effect on plasmid expression. In addition, we also observed the level of protein expression when both T7 complementary strands and aphidicolin were presented (Supplementary Fig. 10). For the CssDNA(+) vector, gene expression was significantly inhibited under these conditions, similar to the effect of aphidicolin acting alone. Conversely, the inhibitory effect of aphidicolin on CssDNA(−) almost disappeared when T7 complementary strands were presented. In other words, the promoting effect of T7 complementary strands on CssDNA(-) was not affected by aphidicolin.

Fig. 2: The role of the T7 promoter region in CssDNA gene expression.
figure 2

a Schematic diagram illustrating how DNA strands complementary to corresponding CssDNA T7 promoter (T7 complementary strands) influence CssDNA gene expression levels. b, c Fluorescence signal changes over time for CssDNA(+) + T7 and CssDNA(-) + T7 (CssDNA bound to its T7 complementary strands), respectively, as compared to CssDNA(+) and CssDNA(−) alone. d Illustration of the binding of CssDNA(−) to T7 complementary strands of varying lengths, with the red section representing the 27 bp T7 promoter on CssDNA(-). e The changes in CssDNA(−) protein expression over time after the addition of T7 complementary strands of different lengths. The maximum rate constant of the fluorescence kinetic curves in b, c and e was obtained by taking the first derivative of the corresponding curve, as shown in Supplementary Fig. 11. f 1.5% agarose gel analysis of mRNA obtained by in vitro transcription of CssDNA(−) that combined with different T7 complementary strands, as well as CssDNA(+). g Fluorescence intensity of CssDNA(-) bound to T7 complementary strands during the protein expression plateau in the absence and presence of aphidicolin compared to CssDNA(−). h Schematic representation of the positions of the complementary strands on CssDNA(−), with the P5 region containing 19 bp T7 promoter sequence and the P4 region containing 8 bp sequence of the 27 bp T7 promoter front end. i Fluorescence intensity of CssDNA treated with or without other complementary strands during the expression plateau is shown. The CssDNA vector used here was 5 ng/μL. All fluorescence signals were normalized based on the fluorescence intensity of the corresponding CssDNA expression plateau. Data collected in b, c, e, g and i were monitored by a microplate reader and are presented as mean ± standard deviation (s.d.) for n = 3 biologically independent experiments, individual data points in g and i are overlaid, source data provided.

To investigate the function of T7 complementary strands of different lengths on CssDNA(−) gene expression and to determine the length of accessible T7 promoter, six variants of different lengths were designed to complement the CssDNA(−) promoter domain of 9 nt, 13 nt, 17 nt, 19 nt, 23 nt, 27 nt respectively (Fig. 2d). The T7 promoter sequence used in this study was derived from the commercial optimized plasmid template pD2P, which contains 27 base pairs of the T7 promoter. The fluorescence curves of EGFP showed that T7 complementary strands of different lengths had distinct promoting effects on CssDNA(−) expression, in which the fluorescence enhancement rates were consistent (Fig. 2e, Supplementary Fig. 13). An in vitro RNA transcription kit was then used to further confirm how T7 complementary strands enhance CssDNA(−) gene expression. The gel showed that CssDNA(−) bound to its T7 complementary strands of different lengths can transcribe mRNA, but the transcription capacity of CssDNA(−) and CssDNA(+) themselves was negligible (Fig. 2f). To further quantify the amount of transcribed RNA, we determined the gray value intensity of the gel electrophoresis bands and also measured the RNA concentration via UV absorbance (Supplementary Fig. 14). The results show that the transcription capacity of CssDNA(−) + T7 (CssDNA(−) bound to T7 complementary strand) decreased with decreasing T7 complementary strand length. The similar yield and rate of protein expression are due to the presence of excess templates in the reaction system, that is, the amount of RNA transcribed by CssDNA(−) + T7 is much greater than the RNA required for translation. This implies that the promoting effect of the T7 complementary strands can be attributed to transcription initiation and that a partially double-stranded CssDNA(−), formed by binding to its T7 complementary strands, can directly trigger transcription, even if it contains an accessible T7 promoter of only 9 bp54,55,56,57,58. The inhibition of CssDNA(−) expression by aphidicolin in the presence of different T7 complementary strands was negligible compared to CssDNA(−), suggesting that the RNA transcription process was not interfered with by aphidicolin, similar to the above results (Fig. 2g, Supplementary Fig. 16).

To investigate the impact of DNA strands complementing other domains outside the CssDNA(−) T7 promoter on gene expression, we designed another thirteen 19-nt DNA strands (denoted by P1, P2, P3…P14, respectively) around the T7 promoter region to hybridized with CssDNA(-) (Fig. 2h). Among them, P1-P4 were located in front of T7 promoter region, and P6-P14 were located behind T7 promoter. The results showed when regions P4 or P5, containing part of the T7 promoter sequence, were presented in the form of double-stranded DNA, gene expression was enhanced, while the other complementary strands had no impact on protein expression (Fig. 2i, Supplementary Fig. 17). Therefore, adding DNA strands that complement the T7 promoter in the CFE system is an efficient way to regulate CssDNA(−) gene expression by enabling CssDNA(-) to directly initiate transcription and enhance gene expression.

Gene expression pathways of the CssDNA vector in the CFE system

Based on the results of Fig. 1 and Fig. 2, it is speculated that both the CssDNA(+) and CssDNA(−) vector expression processes are related to DNA synthesis, while the CssDNA(−) vector expression pathway differs from that of CssDNA(+), as it may also involve directly transcriptional mechanisms. To test this hypothesis, we conducted experiments using a series of concentrations of CssDNA vectors ranging from 0.5 ng/μL to 15 ng/μL, and monitored the corresponding EGFP expression reactions. Our system included the use of aphidicolin to inhibit DNA replication and the addition of T7 complementary strands to simulate the partially hybridized intermediates of DNA replication. The methods we applied to simulate the intermediate processes of CssDNA expression made the expression pathway more explicit.

In the case of CssDNA(+) vector, EGFP production initially increased, and then decreased, as the vector concentration increased while all other reaction components were fixed. The maximum EGFP yield occurred at 2 ng/μL vector concentration (Fig. 3a, orange, Supplementary Fig. 18). Protein expression decreased at CssDNA(+) vector concentrations above 2 ng/μL, potentially due to a resource-sharing effect that results from a lack of substrates for DNA synthesis to generate complete double-stranded DNAs. The addition of aphidicolin to the system disrupted DNA polymerase activity, interfering with the conversion of single-stranded DNA into double-stranded DNA and thus, resulting in significantly reduced protein production regardless of the CssDNA(+) vector concentration (Fig. 3a, light green, Supplementary Fig. 19). This showed that expression from incomplete circular double-stranded DNA (i.e., when the template strand is incomplete) produced by CssDNA(+) replication was negligible. Unsurprisingly, the addition of T7 complementary strands to CssDNA(+) vector at any concentration did not significantly enhance CssDNA(+) gene expression (Fig. 3b, dark green, Supplementary Fig. 20). Consistent with prior results, EGFP yields in the presence of both T7 complementary strands and aphidicolin were as low as those in the presence of aphidicolin alone (Fig. 3b, gray, Supplementary Fig. 20). A protein expression curve of CssDNA(+) at various concentrations is displayed in Fig. 3c, indicating that complete double-stranded DNA synthesis via DNA replication is necessary for CssDNA(+) vector gene expression, followed by mRNA transcription and protein translation. In other words, protein expression from CssDNA(+) requires the synthesis of the complete complementary strand to act as a transcription template. This appears to be the sole pathway of CssDNA(+) gene expression (Fig. 3g, upper panel).

Fig. 3: Gene expression processes of two types of CssDNA vectors in the CFE system.
figure 3

a, d EGFP fluorescence intensity of different concentrations of CssDNA(+) or CssDNA(−) vector produced at the protein expression plateau, compared to the corresponding CssDNA in the presence of aphidicolin. b, e Fluorescence intensity of EGFP produced by different concentrations of CssDNA(+) or CssDNA(−) when only T7 complementary strands are present or when T7 complementary strands coexist with aphidicolin. All fluorescence signals in a and b, d and e were normalized based on the average fluorescence of the corresponding CssDNA expression plateau at a concentration of 2 ng/μL. Data collected in a, b, d and e were monitored by a microplate reader and are presented as mean ± standard deviation (s.d.) for n = 3 biologically independent experiments, individual data points are overlaid, source data provided. c, f The trend of protein expression level changes with the concentration of CssDNA(+) or CssDNA(−) vectors. g Schematic representation of the different protein expression pathways of CssDNA(+) and CssDNA(−) vectors.

Interestingly, we observed a bimodal distribution of protein expression efficiency as a function of DNA concentration for the CssDNA(−) vector. Based on Fig. 3d, within the low concentration range from 0.5 ng/μL to 5 ng/μL, the optimal CssDNA(−) vector concentration for EGFP production was 2 ng/μL. EGFP production increased below 2 ng/μL and decreased above 2 ng/μL, similarly to the performance of the CssDNA(+) vector. By contrast, the gene expression level of the CssDNA(−) vector increased with increasing vector concentration at higher concentrations above 5 ng/μL (Supplementary Fig. 21). Another distinction between CssDNA(−) and CssDNA(+) was the effect of aphidicolin on the expression level at different concentrations of CssDNA(−) vector. Although aphidicolin significantly inhibited CssDNA(−) gene expression, EGFP levels tended to increase with increasing vector concentration after the addition of aphidicolin, rather than remaining at similarly low levels as for CssDNA(+) (Fig. 3d, light green, Supplementary Fig. 22). In sum, when the concentration of CssDNA(−) was higher, the expression of CssDNA(−) was increased, and the inhibitory effect of aphidicolin was weak. We have demonstrated that the inhibitory effect of aphidicolin acts on the replication process, but not on transcription (Figs. 1 and 2). Therefore, we conclude that not all expression of CssDNA(−) requires a complete replication process and that a transcription process is present. We thus surmise that, CssDNA(−) expression in the presence of aphidicolin is primarily due to direct transcription from the CssDNA starting from incomplete DNA replication intermediates (i.e., partially hybridized intermediates of DNA replication, in which the T7 promoter is present in double-stranded form), which is a gene expression pathway that differs from that for CssDNA(+). This can also be confirmed further by the analogous expression trend of CssDNA(−) at different concentrations to that of CssDNA(−) bound to T7 complementary strands, in the presence of aphidicolin (Fig. 3d, e, light green and gray). Similarly, the fluorescence intensity of EGFP monitored after the addition of T7 complementary strands was stronger than that of the corresponding CssDNA(−) itself, indicating that RNA transcription was activated (Fig. 3e, dark green, Supplementary Fig. 23). We observed that at low concentrations (especially below 2 ng/μL), the promotion of CssDNA(−) expression levels by the T7 complementary strand was weak, whereas at high concentrations, it was stronger (Fig. 3e, orange and dark green, Supplementary Fig. 23). It can also be observed that aphidicolin had a slight effect on gene expression of CssDNA(−) hybridized to T7 complementary strands (CssDNA(−) + T7) when CssDNA(−) was at high concentrations, which is consistent with our previous results (Fig. 2g). In contrast, the inhibitory effect was evident when CssDNA(−) + T7 was present at low concentrations (particularly below 2 ng/μL), which is similar to the effect on CssDNA(−) (Fig. 3e, dark green and gray, Supplementary Fig. 23). The differential promotion of CssDNA(-) by T7 complementary strand and the differential inhibition of CssDNA(-) + T7 by aphidicolin at different concentrations of CssDNA(−) vector underline that there are two expression pathways for CssDNA(−). Depending on the vector concentrations, these two pathways contribute differently to the overall expression level when the reaction components are given (Fig. 3f). The dominant pathway at low vector concentration appears to be complete DNA replication, whereas at high vector concentration, the dominant pathway is transcription after incomplete replication (Fig. 3g, bottom panel). These results confirmed our hypothesis about the expression fate of the two CssDNA vectors in the CFE system and deepened our understanding of CssDNA as a programmable vector for the CFE system.

Construction of logic gates in CFE systems using CssDNA

Having characterized the performance and gene expression pathways of two CssDNA vectors in the CFE system, we can now use them as components for the implementation of logic gates for gene regulation and biological computing. Due to the fast and strong response of CssDNA(−) to regulatory factors, particularly T7 complementary strands, we focused on CssDNA(-) as a logic element for logic gates (Fig. 4a). To achieve this, a set of ssDNAs were designed as inputs, and the fluorescence of EGFP was used as an output signal. Informed by our experiments on the expression pathways of CssDNA(−), we hypothesized that the addition of aphidicolin would inhibit replication-mediated gene expression of CssDNA(−), thereby improving the signal-to-noise ratio of logic gates (Fig. 4b).

Fig. 4: Construction of logic gates using CssDNA as a logic element.
figure 4

a A two-input logic gate was constructed using CssDNA(−) as the logic element and protein as the output. b Schematic illustration of the addition of aphidicolin to reduce the background signal from CssDNA(-) self-expression and to improve the signal-to-noise ratio of the logic gates. c-e Schematic and fluorescence signals of two-input logic gates under different input combinations, including OR, INHIBIT (INH) and NOR. All fluorescence signals in c, d and e were normalized according to the fluorescence intensity of the corresponding initial gate structure expression plateau under no input conditions. Data collected in c, d and e were monitored by a microplate reader and are presented as mean ± standard deviation (s.d.) for n = 3 biologically independent experiments, individual data points are overlaid, source data provided.

In Fig. 2, we screened a variety of DNA strands complementary to the CssDNA(−) vector, and found several options for the design of inputs for an OR gate. From these variants, we selected complementary strands of 17 nt and 23 nt as input 1 and input 2, respectively. In the absence of either input, the CssDNA(−) vector produced EGFP at a low yield, reflecting an initial state of very low fluorescence. When one or both inputs were present, the fluorescence was at a high level due to the increased gene expression (Fig. 4c, Supplementary Fig. 24). As anticipated, the addition of aphidicolin to the reaction system significantly improved the signal-to-noise ratio, doubling the ratio between the 1 and 0 levels. (Fig. 4c, green).

To construct an INHIBIT (INH) gate, two alternative inputs were designed. Input 1, which contains the T7 complementary sequence that hybridizes to CssDNA(−) and a 3’ flanking toehold sequence, can enhance gene expression. Input 2, which contains the fully complementary sequence to input 1, can displace input 1 from CssDNA(−) via a toehold-mediated strand displacement reaction. Thus, the concurrent presence of both inputs is expected to diminish fluorescence. As displayed in Fig. 4d, high fluorescence was only evident when input 1 was present alone; otherwise, fluorescence was low, which demonstrated that the INH logic gate was successfully implemented. In the presence of aphidicolin, the fluorescence intensity of output 1 was up to 10 times higher than that of output 0 (Fig. 4d, green, Supplementary Fig. 25). To design a NOR gate, we altered the initial gate structure by annealing the CssDNA(−) with a longer T7 complementary strand that had toehold sequences at both ends (Supplementary Fig. 26). Either input could release the CssDNA(-) from the gate structure via toehold at the 5’ and 3’ ends, respectively. Accordingly, fluorescence was high only in the absence of any input, but low in the presence of one or both inputs (Fig. 4e, Supplementary Fig. 27). The signal-to-ratio was also improved upon the addition of aphidicolin (Fig. 4e, green). In addition, we also constructed NAND and AND gates to demonstrate the general applicability of our approach (Supplementary Fig. 28). The implementation of logic gates further confirmed that CssDNA can serve as a programmable vector for gene regulation in a cell-free system.

In summary, we investigated gene expression using circular single-stranded DNA (CssDNA) as a programmable type of gene vector in CFE systems. We demonstrated that the expression level of CssDNA can be promoted or suppressed by additional components, such as dNTPs and aphidicolin, as well as by complementary strands of varying lengths and sequences. This highlights the regulatory potential of CssDNA in these systems. By varying the concentrations of CssDNA and simulating intermediate processes, we identified the differing expression fates of CssDNA(+) and CssDNA(−). CssDNA(+) follows a single expression pathway, in which fully complementary double-stranded DNA is synthesized through complete DNA replication followed by transcription. On the other hand, CssDNA(-) has two expression pathways simultaneously, namely, complete DNA replication and incomplete DNA replication. The dominant pathway of CssDNA(−) primarily depends on the vector concentration when the reaction components are constant.

As an application example, two-input logic gates were designed and implemented using CssDNA(-) as the logic element. Interference with the DNA replication pathway resulted in an improved signal-to-noise ratio of the logic gates. Apart from logic gates construction, the CssDNA-based regulatory system holds great potential for biosensing and molecular diagnostics. As a gene expression vector, CssDNA enables programmable regulation of gene expression. This includes the formation of secondary structures through the use of staple strands, which can influence the different expression pathways of CssDNA59. Such a capability uniquely expands the toolbox of gene circuits and synthetic biology by the rich repertoire of methods previously employed in DNA nanotechnology.