Close this search box.

Synthetic intrinsically disordered protein fusion tags that enhance protein solubility – Nature Communications

Native IDPs inspire a diverse set of SynIDPs

Native IDPs are highly dynamic, as they have significant stretches of amino acids that lack a defined tertiary structure and have a high degree of solvent-exposure22,23. These properties of IDPs can enable folding of their fused protein partner, and thereby promote soluble protein expression24,25,26. However, native IDPs have several potential limitations as tags for the recombinant expression of proteins. First, native IDPs come in different sizes, and the extent of their intrinsically disordered regions—IDRs—span a wide range of their primary sequence27. Second, native IDPs have a specific cellular function, so that their recombinant overexpression in cells could potentially interfere with cellular function or metabolism28. Hence, we decided to create SynIDPs in the size range of 10–20 kDa that are completely disordered, motivated by the hypothesis that a complete lack of secondary structure would impart the highest possible degree of solvent exposure, and is also unlikely to impart—potentially interfering—biological function to these SynIDPs. We chose the 10–20 kDa size range since in our experience, peptides smaller than 10 kDa are poorly expressed, making them difficult to characterize, while proteins larger than 20 kDa can cause metabolic stress in cells overexpressing fusion proteins. In addition, in a previous study exploring disordered proteins as solubility tags, a 15 kDa disordered tag was identified as one of the most effective25. We hypothesized that these SynIDPs may promote the solubility of proteins that they are fused to, and therefore may serve as a useful tag for the soluble expression of proteins that are known to form inclusion bodies in E. coli.

To design SynIDPs that have these attributes, we decided to create SynIDPs that consist of repeats of short sequences of low complexity (SLCs) that are prevalent in native IDPs29,30. Our design was informed by our previous work on the identification of sequence heuristics to design repetitive polypeptides of SLCs that are intrinsically disordered31. These proteins consist of repeats of a P-G-Xn SLC sequence, where n varies from 0 to 4. The periodic Pro and Gly residues were chosen as they are structure-breaking residues that are ubiquitous in naturally occurring IDPs32 and the X residues represent a diverse set of amino acids. Repetitive polypeptides of these SLC sequences are the simplest possible—minimal—SynIDPs in terms of their sequence complexity.

We hypothesized that in the enormous set of >204 possible sequences of these SynIDPs, there must exist a subset of highly soluble SynIDPs. To test this hypothesis, we next describe the gene synthesis of a library of SynIDPs, their expression, and the identification of a subset of SynIDPs that show a high level of soluble expression in E. coli.

P-G-Xn gene libraries generate SynIDPs with high diversity

We selected a highly diverse set of 1020, 72-nucleotide (nt) long sequences that encode (PGX1X2X3X4) 4, where X1 through X4 represent any amino acid except proline and cysteine, and glycine occurs at least once in each motif (Supplementary Data 1, Supplementary Data 2). Cysteine was excluded due to its propensity to form disulfide bridges that are implicated in protein aggregation in bacterial expression systems33,34. We excluded glycine from X4 to limit the presence of GP dipeptides which have been shown to promote irreversible aggregation35. X is chosen to be glycine at least once per repeat, as it is a small hydrophilic residue that imparts conformational flexibility to the SynIDP36. In this repeat motif, the proline content is ~15% and the glycine content is ~30%, which is consistent with the composition of highly disordered native elastomeric domains that these SLCs are inspired by37. With 1020 sequences, all possible selections of amino acids for Xn except proline and cysteine are represented, ignoring permutations (Supplementary Data 1).

We designed the 72-nt long sequences to contain exactly one non-palindromic methylation-sensitive SexAI recognition site to overlap with the Pro-Gly coding sequence (Supplementary Fig. 1). Codons were chosen using the codon scrambling algorithm that we had previously developed to remove inverted nucleotide repeats or hairpins to maximize success of subsequent reactions with DNA polymerase or ligase38.

Rolling circle amplification (RCA) generates a pooled SynIDP gene library

We developed a method for the pooled, “one-pot” synthesis of repetitive gene libraries from the oligonucleotides on the order of one gene per oligonucleotide molecule using multiply primed RCA (Fig. 1A)39. RCA is preferable over traditional polymerase chain reaction (PCR) to generate repetitive DNA sequences due to potential misalignments in traditional PCR primer annealing, and difficulties in in vitro gene synthesis arising from secondary structure formation from repetitive sequences40. The entire pool of 1020 linear 72-nt long ssDNA molecules was circularized via CircLigase II (Epicenter (Lucigen), USA). Non-circularized DNA was removed by incubating ligated products with Exonuclease I and Exonuclease III. An exonuclease resistant primer with 3’-terminal phosphorothioate (PTO) modifications (Thermo Fisher Scientific) was annealed to the circular templates prior to isothermal polymerization at 30 °C using φ29 DNA polymerase, a highly processive and strand displacing polymerase. 5-methylcytosine triphosphate (5mCTP) was added to the dNTP mix to introduce the modified nucleotide approximately every four repeats (where one repeat is 72 nucleotides). In multiple primed RCA, a cascade of random priming events generates higher molecular weight double stranded DNA from each individual template simultaneously, as the random primer can anneal to the ssDNA product and the polymerase can generate a complementary strand.

Fig. 1: The steps for identification of highly soluble SynIDPs.
figure 1

A Generation of SynIDP gene library. A pool of 1020 ssDNA 72-nt long oligonucleotides was circularized by CircLigase and amplified using RCA. Substitution of dCTP with 10% 5-methyl-dCTP randomly incorporates 5mC into the RCA product, resulting in various DNA size products upon digestion by SexAI (see Supplementary Fig. 1). The products are separated by electrophoresis. Fragments of the desired size (360–576 bp) are then gel purified and ligated into a plasmid. B Restriction digestion of RCA products with 0% 5mCTP (left) vs. 10% 5mCTP (right) using the restriction enzyme SexAI shows that digests occur in multiples of 72 bp when 5mC is incorporated. Desired size (360–576 bp, marked with arrow) DNA are then gel purified (C). Illumina Miseq analysis of cloned library of plasmid DNA verifies the presence of 865 unique motifs (See Supplementary Fig. 2 for enlarged version). Source data are provided as a Source Data file (D). CoFi protein expression (see Supplementary Fig. 3) allows identification of E. coli colonies that express soluble protein by an anti-His6 western blot (left panel). Colonies were identified manually on a greyscale image (middle panel) and a black and white pixel-thresholded image overlaying the green channel (right panel), was printed and aligned underneath the plate of colonies as a visual aid for colony picking. Images represent a subset of the total set of agar plates that were analyzed (see Supplementary Fig). E Dot blot of insoluble and soluble lysate fractions to determine target protein solubility. The dot blot membrane was subjected to His6-tag antibody detection. The distinct dots marked by arrows represent an example target SynIDP—[GTHGTP]24—that was determined to be soluble. Solubility was determined by the intensity of the soluble fraction relative to the intensity of the insoluble fraction in both dot blot and PAGE gels (see Supplementary Figs. 6, 7).

The resulting products were digested by SexAI, a methylation sensitive enzyme with activity blocked by dcm methylated restriction sites. As the SexAI cut site was designed to overlap with the Pro-Gly coding sequence and was split to the 5’ and 3’ ends of the oligonucleotide sequence, all digested sequences would have a (GX1X2X3X4P)n sequence. Additionally, ssDNA sequences would not undergo digestion due to the enzyme’s specificity for dsDNA (Supplementary Fig. 1). Digested fragments were separated by gel electrophoresis and exhibited a ladder pattern due to random methyl-C incorporation (Fig. 1B, right lane). We isolated bands by gel purification between 400 and 600 nucleotides, corresponding to 5-8 repeats of the 72 nt long template that encode polypeptides with 22-28 repeats of the PGX1X2X3X4 motif. The purified DNA was ligated using ElectroLigase (NEB) into a pET-24a expression vector that is modified to encode a MSKGP sequence at the N-terminus of the Syn-IDP with a seamless SexAI recognition site followed by a DNA sequence that encodes a C-terminal ENLYFQG-(H)6 peptide where H6 denotes the (His)6 tag and the ENLYFQG peptide sequence (referred to hereafter as tev) is the substrate-cleavage site of TEV protease. The H6 tag enables purification of the SynIDPs by immobilized metal affinity chromatography (IMAC) and the cleavage site allows cleavage of the protein from the SynIDP tag.

The plasmids with SynIDP inserts were electroporated into DH10B electrocompetent cells and plated on kanamycin agar plates. Following overnight growth, colonies were collected by scraping with 1 mL of fresh media, and the plasmid DNA were purified by a plasmid isolation kit in a single tube, and prepared for next generation sequencing (NGS, see experimental).

Next generation sequencing (NGS) verifies library diversity

Sequence patterns and diversity were verified by NGS of the genes, using a 2 × 251 bp paired end flow cell in the Illumina Miseq NGS system. The sequences were aligned using a Burrows-Wheeler Aligner (BWA)41, where the outputs were mapped against the theoretical 1020 reference sequences within the sequence space. Out of 2 million sequences returned by NGS, there were 3888 unique alignments based on reference oligonucleotides paired with compact idiosyncratic gapped alignment reports strings (Fig. 1C, Supplementary Fig. 2). Alignments were then filtered for sequences that were too short, aligned poorly, or contained nonsense mutations, large frameshifts, or less than 3 perfect repeats. Elimination of these sequence left 865 unique sequences, which included perfect alignments and sequences with substitutions or small frameshifted regions. These remaining sequences aligned with 419 of the original 1020 oligonucleotides due to the introduction of sequence bias. Sequence bias could arise from a variety of factors including, but not limited to, polymerase nucleotide insertion preference, preferential plasmid replication in vivo and toxicity. An additional 180 unique sequences aligned with reverse complements of the oligonucleotide references, resulting from cloning of a fragment in the opposite orientation, but still resulting in a proline/glycine rich amino acid sequence.

Colony filtration (CoFi) screens for soluble SynIDP expression

To rapidly identify soluble protein expression in E. coli, we implemented the Colony Filtration (CoFi) blot method (see experimental, Supplementary Fig. 3)42. Briefly, a mix of plasmid DNA was electroporated into Acella™ electrocompetent BL21(DE3) E. coli and plated on LB-kanamycin plates. Plates were replica plated43 on additional LB-kanamycin plates prior to transfer of the original plate’s colonies on to a nitrocellulose membrane over a filter sandwich consisting of a Whatman paper drenched in lysis buffer. The resulting soluble blots were visualized by Western Blot with an anti-His6 antibody (Fig. 1D, Supplementary Fig. 4).

The CoFi method positively screens for soluble proteins at the colony level. Dark spots on a His6-tag Western blot indicate colonies expressing soluble full-length proteins (left panel, Fig. 1D). These dark spots were manually identified for colony-selection from a greyscale image that was printed on paper (middle panel, Fig. 1D). To facilitate the visual identification of previously selected colonies on the replica plates, the greyscale image was processed (right panel, Fig. 1D). The resulting color image was properly scaled, printed, and fiducially aligned underneath the semi-transparent agar replica plates, facilitating subsequent colony picking. In this way, candidate SynIDPs for subsequent analysis were selected by their soluble expression level in CoFi.

Colony PCR identifies undesired plasmids

Next, we used PCR amplification directly from the E. coli colonies that showed soluble expression in CoFi to screen and eliminate from further consideration short genes of undesired lengths that are not removed by the CoFi method. Short genes of suboptimal lengths were likely isolated with the longer genes during the size selection process due to aberrations in gel migration and/or cross-hybridization between different DNA molecules. For particularly dense plates, some colonies contained mixtures of genes due to cross-contamination of neighboring colonies. Mixed populations of genotypes are less clearly identified by Sanger sequencing and must be identified by the colony-PCR screen to be eliminated from further consideration44,45 (Supplementary Fig. 5).

Following colony screening, 572 colonies were sent to a Sanger sequencing vendor (Azenta) for direct DNA sequencing from E. coli cultures. At this stage, duplicates of different lengths were removed, and the remaining colonies were labeled (Fig. 1D middle panel) so that their sequences could be tracked during protein expression. A total of 52 parent motifs (Supplementary Table 1) were conserved, where some sequences included minor missense mutations, which slightly changes the overall amino acid composition of the parent motif. For each of these 52 parent motifs, we selected a unique sequence with greater than 20 repeats of the monomer for expression.

SynIDP library expression identifies soluble clones

E. coli BL21(DE3) colonies containing plasmids encoding 52 unique SynIDP sequences were grown in liquid TB autoinduction media to express each recombinant protein. To facilitate high throughput protein production, we used the Duetz Microflask system to cultivate cells with high growth rates and protein yields. Following growth of the E. coli BL21(DE3) cells harboring a SynIDP encoding plasmid for 24 h, clones were then re-sequenced to further confirm the absence of deleterious frameshifts or mixed populations, and the duplicates were then removed leaving single representative clones. Cells from the resulting clones were harvested for purification. Soluble and insoluble fractions of the cell lysate were screened for protein expression. Insoluble fractions were dissolved in 8 M urea, and both fractions were run on an SDS-PAGE gel (Supplementary Fig. 6). Both fractions were also blotted onto a nitrocellulose membrane to quantify the relative amounts of soluble and insoluble fractions by an anti-His6 western blot.

Visualization of the blot assay showed distinct dots (Fig. 1E, Supplementary Fig. 7) for the soluble and insoluble fractions. Solubility was determined by the intensity of soluble fraction relative to intensity of the insoluble fraction in both dot blot and PAGE gels. Candidate SynIDPs for subsequent analysis were selected by the following criteria: (1) Soluble expression level in CoFi; (2) Soluble expression level in dot blot and PAGE gels; (3) Homogenous DNA plasmid population.

Based on the criteria above, we selected a subset of five SynIDPs that exhibited the highest soluble expression and evaluated their ability to confer solubility to fusion protein: [GQSGLP]24, [GTHGTP]24, [GIGQAP]20, [GANMPQ]24, and [GAGAIP]24. [GANMPQ]24 and [GAGAIP]24 have missense mutations—noted by the underlined residues— and have the sequence [GANMPQGASIPPGANIPPGASIPP]6 and [GAGAIPGAEAIPGAGAIPGAGAIP]6 respectively.

Characterization of SynIDPs demonstrates lack of secondary structure but different peptide chain properties

To examine the utility of these five SynIDPs as solubility tags, we surveyed the literature for proteins with a useful biological function that are known to express insolubly in E. coli, but not due to incorrect disulfide bond formation. The three proteins we chose—mTdT, Z2-LO10, and TEV protease—satisfy these criteria16,19,21 (Supplementary Fig. 8, Supplementary Table 2). These proteins also have an easy measure of functional activity as two of them—mTdT and TEV protease—are enzymes, while Z2-LO10’s function can be measured through metabolic arrest, leading to cellular death of epidermal growth factor receptor (EGFR)-expressing cells46,47. These proteins are also of interest in biotechnology and medicine. Z2-LO10 is a modular biologic drug for cancer treatment that consists of a dimeric affibody targeting domain and a potent bacterial toxin payload18,48,49,50; TEV protease is commonly used to cleave recombinant proteins from their tags51; and TdT is a reagent for the TUNEL apoptosis assay and for the de novo synthesis of DNA17,52,53.

SDS-PAGE of insoluble and soluble fractions of these proteins (Supplementary Fig. 8A) expressed in BL21(DE3) E. coli at 37 °C showed that TEV and Z2-LO10 are indeed insoluble. However, the expression of mTdT at 37 °C is low, even as insoluble protein. As previous studies have reported high expression levels of mTdT at lower temperatures54, we attempted its expression at 16 °C. At these lower temperatures, there is a high level of expression of mTdT, although most of it is still insoluble (Supplementary Fig. 8B).

As there is significant interest in using TdT for biotechnology applications17,53,55,56,57,58,59,60, there is critical need to develop new methods for the large scale and economical expression of highly processive TdT as a reagent for this emerging enzymatic DNA synthesis technology. For these reasons, we chose mTdT as the first protein to test the ability of the SynIDPs to promote soluble expression. The genes encoding the SynIDP-mTdT fusion proteins were constructed using Gibson assembly by appending the mTdT gene to the C-terminus of each of the five SynIDP-tev-H6 genes61,62. The fusion genes were inserted into a pET-24 expression vector and were transformed into BL21(DE3) E. coli. The transformed cells were grown overnight using overnight express TB. Western blot analysis of the soluble fraction of cell lysate (Supplementary Fig. 9) using an anti-His antibody showed soluble expression of three SynIDP-tev-H6-mTdT fusions: [GQSGLP]24-tev-H6-mTdT that we hereafter refer to as SynIDP-1-mTdT, [GTHGTP]24-tev-H6-mTdT named SynIDP-2-mTdT, and [GAGAIP]24-tev-H6-mTdT named SynIDP-3-mTdT. In comparison, the expression of H6-mTdT without a solubility tag resulted in an undetectable level of protein in the soluble fraction of the cell lysate (Supplementary Fig. 9).

Having confirmed that these three SynIDPs (Supplementary Table 3) promote the soluble expression of mTdT, we next sought to study their physicochemical properties and their structure. To do so, we expressed SynIDPs-tev-H6 in E. coli BL21(DE3) and purified the proteins by IMAC (Supplementary Fig. 10A) followed by size exclusion chromatography (SEC). SynIDP-2-tev-H6 has histidine residues in every repeat unit and hence requires a higher imidazole concentration to elute than the two other SynIDP fusions, resulting in purer protein than the other two SynIDPs. SDS-PAGE of the purified SynIDPs (Fig. 2A) shows that the SynIDPs migrate at a larger molecular weight than expected for globular proteins, which has been observed for other SynIDPs63. To verify the mass of the SynIDPs, mass spectrometry (MS) analysis was performed. For all three SynIDPs, the experimentally measured mass by matrix assisted laser desorption ionization time-of-flight (MALDI-TOF) MS (Supplementary Fig. 11) exactly matched their predicted mass (Supplementary Table 3). However, the MALDI-TOFMS spectrum of SynIDP-2-tev-H6 has an additional peak that is consistent with a protein that is truncated within the tev—ENLYFQG—site.

Fig. 2: Characterization of the physical and molecular properties of SynIDPs demonstrate that they are unstructured proteins.
figure 2

A SDS-PAGE of purified SynIDPs. All SynIDPs run slightly higher than their expected molecular weight, which is characteristic of many IDPs. B CD spectra of the SynIDPs shows a random coil structure, which is deduced from the characteristic negative peak at 197 nm and a positive peak at 215 nm. C Kratky plots [I(q)·q2 as a function of q] of the SynIDP reveal a characteristic of flexible disordered chain for SynIDP-1 and SynIDP-2 while demonstrating extended chain conformation for SynIDP-3. D Scattering profiles [I(q) vs. q] of SynIDPs in a log-log plot. Black lines represent Fit of the PEV model to the SAXS data of SynIDP-1 and SynIDP-2. Curves were offset along Y-axis for better visibility. Source data are provided as a Source Data files.

Circular dichroism (CD) spectroscopy at 37 °C (Fig. 2B) showed that the SynIDPs lack defined secondary structure, as they have a negative peak at ~197 nm and a positive peak at ~215 nm that are characteristic of a random coil31,37. As native IDPs are frequently known to have lower/upper critical solution temperature phase behavior (LCST/UCST) that causes their phase separation upon heating or cooling respectively26,30,63, we carried out thermally ramped turbidity experiments in PBS and 1 M NaCl in a temperature range of 15–80 °C (Supplementary Fig. 12). No increase in absorbance was detected over this temperature range, indicating a lack of LCST phase separation under these conditions and ensures that no coacervation is expected due to elevated temperature or salt concentration.

Next, small angle X-ray scattering (SAXS) was performed to characterize the nanoscale structures of the SynIDPs. Displaying the scattering profiles of SynIDP-1 and SynIDP-2 as a Kratky plot [I(q)·q2 as a function of q] (Fig. 2C) reveals a characteristic flexible disordered chain for both SynIDPs, as seen by the lack of a well-defined maximum and a plateau at large q values64. The Kratky plot of SynIDP-3 suggests an extended chain conformation, as seen by the upturn and uniform slope of the curve. This behavior can be explained by the missense mutation (G to E) every four repeats, which adds negative charges to the chain and restricts the available conformations to an extended chain due to intramolecular electrostatic repulsion. The scattering profiles of SynIDP-1 and SynIDP-2 were fit to a polymer excluded volume (PEV) model (Fig. 2D)65,66,67,68, which describes the scattering from polymer chains and its mass fractal behavior. The PEV model was used to estimate the Rg and fractal dimension of the SynIDPs, yielding Rg of 4.2(pm)0.5 nm and 3.8(pm)0.3 nm for SynIDP-1 and SynIDP-2 accordingly. The obtained Rg values are in good agreement with previously published data of Rg for IDPs of a similar size64. The Porod exponent value for these two SynIDPs is 2, which indicates an idealized random walk chain (which can cross itself). The Porod exponent for SynIDP-3 was estimated from the slope in the mid-q range (0.03 < q < 0.2 A-1) of the scattering profile (Fig. 2D) and was found to be 1, consistent with a rigid extended chain. As the PEV model assumes chain flexibility, it does not apply to the limit of a rigid extended chain (Porod exponent of 1) and could hence not be fit to SynIDP-3.

Hydropathy of SynIDPs tags confirm their solubility

Previous attempts to rationally design SynIDPs for the purpose of solubility tags used the Wilkinson and Harrison solubility calculator69,70 to predict the chance of soluble expression of a given protein in E.coli. For SynIDP-2, this calculator predicts 100% chance of solubility when overexpressed in E.coli. However, this calculator fails when calculating the solubility of SynIDP-1 and SynIDP-3, by predicting that there is 0% chance that these SynIDPs would be soluble.

Analysis of the data for expression of the SynIDPs fused to mTdT shows that the average residue hydropathy calculated using the Urry hydropathy scale71 can be used to predict soluble vs. insoluble expression (Table 1). The Urry hydropathy scale delineates amino acids on a range from hydrophobic to hydrophilic based on the phase transition temperatures of a class of related SynIDPs—Elastin-like polypeptides (ELPs)—with the sequence (VPGXG)n where X is any residue except Pro and n is the number of repeats. Given that the Urry scale is derived from measurements on ELPs—a SynIDP— rather than on folded globular proteins, we hypothesized that it would provide a better predictive tool to assess the hydropathy of the SynIDPs in this work. We used this scale to compute the hydropathy of the expressed SynIDPs selected by colony filtration in this study by averaging the scores of each amino acid. A ranking of hydropathies suggests that there is a threshold integer value of 42 that clearly separates insoluble (below 42) and soluble (above 42) SynIDPs. The threshold upon fusion to mTdT is shifted upwards to 47 (Table 1), suggesting that the greater hydrophobicity of mTdT relative to the SynIDPs requires its fusion to more hydrophilic SynIDPs to ensure soluble expression of the fusion protein.

Table 1 Summary of the solubility of SynIDPs and SynIDP-mTdT fusion proteins

Fusion with SynIDPs enhances mTdT solubility and rescues enzymatic activity

We repeated the expression of mTdT and the three soluble SynIDP-1/2/3-tev-H6-mTdT constructs in E. coli BL21(DE3) by induction of protein expression with IPTG (0.5 mM) at an OD600 of 0.7–1. To measure the amounts of mTdT in the insoluble and soluble fractions of the lysate, a Western blot using an anti-TdT antibody was performed (Fig. 3A). The Western blot image shows that mTdT is largely present in the insoluble fraction, with a small amount in the soluble fraction. In contrast, when mTdT is fused to SynIDP-1, SynIDP-2, or SynIDP-3, a significantly greater amount of the SynIDP-mTdT fusion is present in the soluble fraction. Next, the proteins were purified by IMAC on Ni-NTA gravity columns (Supplementary Fig. 13A), followed by further purification by SEC (Supplementary Fig. 13B). We note that the efficiency of IMAC purification could be improved by moving the His6 tag to the C-terminus of the fusion, which should improve its accessibility to the Ni-NTA ligand on the IMAC resin.

Fig. 3: Fusion to SynIDPs rescues soluble and functional expression of mTdT.
figure 3

A Western blot of insoluble and soluble fractions of mTdT and SynIDP-1/2/3-mTdT using anti-TdT antibody. B Purified SynIDP-1/2/3-mTdT after IMAC and SEC visualized on SDS-PAGE. C mTdT activity assay showing elongation of Cy5-poly-T50 initiator on TBE-Urea PAGE gels. Low Range ssRNA ladder was used to quantify nucleotide addition. From left to right; ladder, initiator (negative control), Promega-TdT (positive control), SynIDP-1-mTdT 1X and 2X, SynIDP-2-mTdT 1X and 2X, SynIDP-3-mTdT 1X and 2X. D Insoluble (Ins) and soluble (S) fractions of mTdT, SynIDP-1-mTdT, SUMO-mTdT and MBP-mTdT visualized on SDS-PAGE. Large increase in soluble expression of mTdT are observed when fused to SynIDP-1 compared to fusion with SUMO or MBP. Arrows indicate the desired protein product. E TdT activity assay showing elongation of Cy5-poly-T50 initiator on TBE-Urea PAGE gels. Low Range ssRNA ladder was used to quantify nucleotide addition. From left to right; cleaved mTdT, SynIDP-1-mTdT, SUMO-mTdT, MBP-mTdT, Ladder. Image was processed from Supplementary Fig. 14D (see experimental). F ImageJ analysis of fluorescence intensity as measurement for dispersity and degree of polymerization of mTdT variants’ reaction products. The peak at a distance of ~100 pixels stems from the dye front and is not a real product. SynIDP-1-mTdT outperforms the other variants by displaying lower dispersity and a higher degree of polymerization. The calibration of nt size to the distance on the gel (shown in the line above) was obtained by similar image analysis of the ladder. Source data are provided as a Source Data file.

Murine-TdT is a eukaryotic enzyme that can promiscuously append nucleotides to the 3’-end of a single stranded DNA (ss-DNA) substrate16. We were interested in seeing whether the SynIDP-mTdT fusion retains the catalytic activity of mTdT. The fact that the proteins are soluble does not necessarily indicate that they are properly folded, as they could misfold by falling into a local minima—a trough—of free energy, which does not provide the correct tertiary structure to confer activity72. Additionally, fusion to a solubility tag could sterically interfere with the activity even if the enzyme were properly folded.

To explore the enzymatic activity of the recombinant mTdT and SynIDP-mTdT fusions, TdT-catalyzed enzymatic polymerization (TcEP) of ss-DNA was performed. Briefly, proteins were buffer exchanged into the optimal reaction buffer (50 mM Potassium Phosphate, 100 mM NaCl, 1 mM 2-mercaptoethanol, 0.1% Tween 20, 50% glycerol, pH 6.4) for nucleotide addition by mTdT to a final concentration of 0.5 or 1 µM (labeled as 1X or 2X) without cleaving off the SynIDP tag. A Cy5 labeled (dT)50 oligonucleotide sequence was used as the primer to initiate the reaction with deoxythymidine triphosphate (dTTP) nucleotides at a 1:500 initiator: dTTP molar ratio73. The enzymatically catalyzed nucleotide addition reaction was allowed to proceed for 2 h at 37 °C, followed by heat inactivation of the enzyme at 95 °C prior to mixing with 2xTBE-Urea loading buffer to abolish any protein-DNA interactions. Reaction products were visualized by 10% TBE-Urea PAGE (Fig. 3C, Supplementary Fig. 13C). As a negative control, we used the initiator without any enzyme, while as a positive control we used commercially available TdT (Promega or NEB, working concentration of 1.2 U/μL). The recombinant mTdT that we purified from the soluble fraction demonstrated very little enzymatic activity (Supplementary Fig. 13C). In contrast, for all the SynIDP-mTdT fusions, the higher molecular weight DNA products are indicative of enzyme activity. Interestingly, when using 1 µM of the SynIDP-mTdT and a 1:500 initiator:dTTP ratio, we were able to achieve stoichiometric incorporation of ~500 nucleotides, similar to the length and polydispersity generated by the commercially available TdT (Promega). Importantly, we could do so without the need to cleave the SynIDP tags from mTdT.

To compare the performance of SynIDP with existing solubility tags, we expressed His6-MBP-His6-mTdT and His6-SUMO-His6-mTdT under the same conditions. Visualization of soluble and insoluble fractions of SynIDP-1-mTdT, SUMO-mTdT and MBP-mTdT (Fig. 3D) on SDS-PAGE demonstrates significant enrichment of the target protein in the soluble fraction of SynIDP-1-mTdT compared to soluble fractions of SUMO-mTdT and MBP-mTdT. SUMO-mTdT and MBP-mTdT also exhibited a larger fraction of insoluble mTdT expression than SynIDP-1-mTdT, highlighting the improvement in soluble expression of mTdT when fused to SynIDP-1 compared to SUMO and MBP. In addition to the constructs described above, a cleaved mTdT was generated via incubation of the soluble fractions of SynIDP-1-mTdT with SynIDP-1-TEV (further described in the following sections) at 4 °C overnight. The constructs were purified using IMAC (Supplementary Fig. 14A, B) and SEC.

Enzymatic activity of the mTdT variants was measured by TcEP of ss-DNA (Fig. 3E, Supplementary Fig. 14D). To that end, all mTdT variants were buffer exchanged into the optimal reaction buffer (50 mM Potassium Phosphate, 100 mM NaCl, 1 mM 2-mercaptoethanol, 0.1% Tween 20, 50% glycerol, pH 6.4) to a final concentration of 0.5 µM (Supplementary Fig. 14C) or 1 µM (labeled as 1X or 2X). Recombinant, soluble mTdT demonstrated very little activity (Supplementary Fig. 14D) as mentioned above. In contrast, all of the mTdT variants expressed as fusions to solubility tags, as well as the cleaved mTdT from SynIDP-1-mTdT demonstrated a marked improvement in nucleotide addition compared to recombinant mTdT. Further analysis of the TcEP products was performed by image processing of the original gel (Supplementary Fig. 14D, Fig. 3E) to obtain the fluorescence intensity distribution for each of the mTdT variants (Fig. 3F) as function of distance from the gel bottom. For the TcEP reaction, we were interested in obtaining a high degree of polymerization, which is indicated by a higher distance from the bottom of the gel, and low dispersity, which can be evaluated by the width of the fluorescence signal. Comparing the fluorescence intensity curves of the mTdT variants shows that SynIDP-mTdT outperforms the other mTdT variants; SynIDP1-mTdT shows the highest degree of polymerization and the lowest dispersity (Fig. 3E, F), indicating the superior folding of mTdT within the cohort of solubility tags tested.

Fusion with SynIDPs enhances LO10 solubility and rescues exotoxin activity

We next tested the ability of SynIDP-1/2/3 tags to promote soluble protein expression with a second inclusion body forming protein. The LO10 domain derived from P. aeruginosa exotoxin A is a powerful ribosome inhibitor that leads to cellular death upon internalization, and its expression alone results in inclusion body formation19,74. We fused the L010 exotoxin A domain to the C terminus of a dimeric EGFR targeting affibody domain (Z2), as the free C terminus of the exotoxin is crucial for cytotoxic activity, with the intent that the Z2-LO10 construct could target and kill tumor cells that overexpress EGFR such as certain breast cancers and gliomas75,76.

Fusion of SynIDP-1/2/3 to Z2-LO10 led to a marked improvement in the soluble expression of the protein at 37 °C compared to the protein without the SynIDP tags (Supplementary Fig. 15A). Soluble proteins were purified by IMAC (Supplementary Fig. 15B) followed by SEC, concentrated and buffer exchanged into PBS (Fig. 4A) prior to testing in vitro activity in an EGFR-expressing glioma line, CT2A-EGFRviii. Incubation of tumor cells with SynIDP-Z2-LO10 shows concentration dependent cytotoxicity (Fig. 4B), with IC50’s ranging from 75 to 125 pM, whereas treatment with the SynIDP tags alone caused no cell death (Supplementary Fig. 15C). These data indicate that all three SynIDP fusions exhibit potent cytotoxicity that is imparted by the exotoxin domain. These data hence clearly show that fusion of Z2-LO10 to these SynIDPs enables soluble and functionally active expression of the protein.

Fig. 4: Fusion to SynIDPs rescues soluble and functional expression of Z2-LO10 and TEV proteins.
figure 4

A Purified SynIDP-1/2/3-Z2-LO10 after IMAC and SEC. B Log-fold dilutions of SynIDP-Z2-LO10 and SynIDP controls (see Supplementary Fig. 15C) were incubated with CT-2A-EGFRviii, an EGFR positive murine glioma cell line for 48 h and tested for viability by an MTS assay. n = 3 replicates, error bars represent SD. C Purified SynIDP-1/2/3-TEV after His-purification and SEC. D MALDI-TOF MS of products of SynIDP-2-TEV reaction with ELP-tev-FGF21 substrate at 37 °C for 30 min confirms proteolytic activity. The absence of the intact substrate is evident by the lack of a peak at m/z 45,459. The cleaved FGF-21 (m/z of 20,038) and the ELP-tev (m/z of 25,439) can be seen in the MS spectra. The peak at m/z ~ 37,000 matches the expected Mw of SynIDP-2-TEV. Mass spectra for the control and products of SynIDP-1/3-TEV reaction with ELP-tev-FGF21 are shown in SI (see Supplementary Fig. 18). Source data are provided as a Source Data file.

Fusion with SynIDPs enhances TEV protease solubility and rescues proteolytic activity

Finally, we tested the three SynIDPs with TEV protease, a protein that also forms inclusion bodies when overexpressed in E. coli11,21,62. TEV protease is used as a reagent to cleave off target proteins of interest from their fusion partners such as solubility or purification tags77. We have previously synthesized and validated a TEV substrate comprising of an ELP, a tev cleavage site, and fibroblast growth factor-21 (ELP-tev-FGF21), which was used to quantify the activity of the SynIDP-TEV fusions78.

Fusion of SynIDP-1, SynIDP-2 and SynIDP-3 to TEV protease with a terminal His6 tag and no tev cleavage site between SynIDP and TEV protease led to a marked improvement in the solubility of TEV when expressed at 37 °C compared to the protein alone (Supplementary Fig. 16A, Supplementary Fig. 8A). Soluble SynIDP-TEV fusions were purified from bacterial lysate by IMAC and SEC (Supplementary Fig. 16B), prior to concentration and buffer exchange into 50 mM Tris, 0.5 M EDTA, pH 8.0 buffer (Fig. 4C). A 1:100 molar ratio of enzyme to the substrate—ELP-tev-FGF21 (Supplementary Fig. 17)—was incubated at 37 °C for 30 min prior to heating to 95 °C for 2 min to inactivate the enzyme. Cleaved fragments were identified by MALDI-TOF MS (Fig. 4D, Supplementary Fig. 18), which showed a peak at m/z ~ 20000, which was not present in the control (ELP-tev-FGF21 at same conditions with no addition of SynIDP-TEV), indicative of the cleaved FGF21 as well as disappearance of the peak at m/z ~ 45000 indicative of the absence of full length ELP-tev-FGF21 protein.