Protein purification with light via a genetically encoded azobenzene side chain

cis/trans-Dependent complex formation between p-(phenylazo)-L-phenylalanine (Pap) and cyclodextrins (CDs)

In the search for a suitable ncAA that can be cotranslationally incorporated into recombinant proteins and shows a light-switchable change in configuration and molecular shape (Fig. 2) we chose p-(phenylazo)-L-phenylalanine (Pap). Pap (also known as AzoPhe or AzoF) was previously incorporated in the frame of protein-functional studies into a semisynthetic bovine ribonuclease S12, the Escherichia coli catabolite activator protein (CAP) as well as recombinant myoglobin from sperm whale13 and into the subunit HisH of a glutaminase from Thermotoga maritima14, for example. The recombinant proteins were produced using E. coli as expression host and an orthogonal pair of an engineered Methanocaldococcus jannaschii suppressor tRNACUA and the cognate tyrosyl-tRNA synthetase which had been evolved to accept this ncAA substrate13. However, the spectroscopic properties of Pap have been only partially characterized up to now13,15,16,17 even though numerous other azobenzene derivatives were synthesized and their photo-isomerization studied18. Therefore, we prepared Pap (Supplementary Fig. S2) following a published procedure13 and investigated its absorption spectra as well as its isomerization between the trans– and cis-states in greater detail.

Fig. 2: Structure and photochemistry of Pap as well as complex formation between its trans– or cis-state and α-CD or β-CD.
figure 2

A Photo-induced cis/trans-isomerization of Pap. B, C While the structure of the azobenzene side chain in trans-Pap is mostly planar and elongated (B), the one of cis-Pap is twisted and more bulky due to the steric repulsion between the two aromatic rings (C). D UV-Vis spectra of the trans– and cis-states of Pap. A 50 µM solution of Pap in 100 mM Tris/HCl pH 8.0 either equilibrated under daylight (orange) or after irradiation with 355 nm UV light for 30 min (violet), respectively. Wavelengths of absorption maxima are indicated. E Time-dependent increase in absorption at 326 nm during irradiation of a 50 µM trans-Pap solution at 355 nm with a UV LED from the top. F Thermal re-isomerization of cis-Pap (50 µM) at 25 ± 1 °C in the dark as spectrophotometrically monitored at long measurement intervals (12 h). The data in (E) and (F) were subjected to exponential curve fit (applying asymptotic values for (F) from (E)). G, H Structural model of the complex between trans-Pap and α-CD (side view and front view, respectively; energy-minimized with the MM2 method using ChemDraw3D). I Titration of 50 µM trans-Pap in 2 mL 100 mM Tris/HCl pH 8.0 with a 50 mM solution of α-CD in the same buffer. J Titration of cis-Pap (100 µM) with α-CD (solid circles) in comparison with buffer alone (hollow circles). K Titration of trans-Pap (50 µM) with β-CD. L Titration of cis-Pap (100 µM) with β-CD. For all titrations – except for (J), where a straight line fit was applied – the small change (negative or positive, respectively) in absorbance at the diagnostic wavelengths of 426 nm for cis-Pap and 326 nm for trans-Pap during complex formation with the CD was monitored and the curves were fitted using Eq. 1.

Pap as a free amino acid (at pH 8.0, equilibrated under daylight) in the low-energy state, i.e. in its trans-configuration, exhibits a pronounced absorption maximum (Fig. 2D; for a more profound spectroscopic analysis, see Supplementary Figs. S6 and S7) in the near UV region at 326 nm (ε326 = 21500 ± 411 M−1 cm−1) and a much weaker second absorption maximum in the violet/blue light region at 423 nm (ε423 = 1380 ± 86 M−1 cm−1). A quantitative switch to the higher-energy state, i.e. the cis-configuration, was easily achieved by illuminating the solution in a quartz cuvette from the top (Supplementary Fig. S1) with mild UV light at 355 nm using a 1.2–2.4 mW LED for ≤30 min (Fig. 2E). The resulting solution of cis-Pap exhibited a prominent, less strong absorption maximum shifted to a shorter wavelength, at 293 nm (ε293 = 6700 ± 337 M−1 cm−1), as well as a second minor absorption maximum in the visible region at almost the same wavelength as trans-Pap, 426 nm, with slightly larger amplitude (ε426 = 2120 ± 90 M−1 cm−1). Hence, similar to the published spectroscopic properties of plain azobenzene (trans: ε320 = 22000 M−1 cm−1, ε440 = 400 M−1 cm−1; cis: ε270 = 5000 M−1 cm−1, ε450 = 1500 M−1 cm−1)19, Pap shows distinct absorption bands that not only allow the individual spectroscopic detection of its trans– and cis-states but, importantly, also offer two well-separated wavelengths to specifically trigger switching between these configurations: e.g., in theory, illumination at 326 nm for transcis and at 426 nm for cistrans.

To more precisely assess the relative cis/trans composition of Pap in the photostationary state (PSS) upon illumination at different wavelengths we performed 1H-NMR spectroscopy, which allowed us to individually quantify the well separated peaks in the aromatic region for both isomers (see Suppl. Results). Astonishingly, and somewhat in contrast to the expectations from the scientific literature in this field, the cis-state of Pap in aqueous solution can almost be quantitatively populated (around 95 %) upon irradiation by 355 nm LED light, with significantly better yield than at the wavelength of 365 nm which has been commonly used for switching Pap to the cis-state in other laboratories14,17. This is in good agreement with the high ratio εtranscis = 8.3 at a wavelength of 355 nm (Supplementary Fig. S3A), assuming an approximately constant quantum yield, (Phi_{{rm{c}}}), for the transcis isomerisation across the π→π* transition band, as described for azobenzene20. Conversely, only a proportion of 70–80 % trans-isomer was obtained in the PSS upon illumination at daylight or with monochromatic blue LED light at around 430 nm, which is less than might have been expected for this energetically favored configuration. Thus, we used visible light of 430 nm, or simply daylight, for switching cistrans, which for Pap only slowly occurs via thermal relaxation (see below). Of note, both chosen wavelengths are remote from the characteristic absorption bands of proteins (280 nm for the aromatic side chains and ≤225 nm for the peptide backbone) as well as nucleic acids (260 nm).

Next, we measured the kinetic stability of the higher-energy cis-state in aqueous solution after UV exposure. To this end, we prepared a 50 μM cis-Pap solution in 100 mM Tris/HCl pH 8.0 by illumination at 355 nm, directly in the quartz cuvette as above, and then placed this cuvette into a diode array spectrophotometer in the dark. The absorbance of the solution at 326 nm was measured at different intervals of 2–12 h – to minimize an influence of illumination at the diagnostic wavelength of ≤1 s per measurement – for up to 100 h. The rise in absorption at this wavelength indicated the increasing population of trans-Pap due to thermal relaxation, which could be fitted with a mono-exponential decay function converging at t1/2 ≈ 196 h (Fig. 2F; Suppl. Results). Remarkably, this half-life of approximately 8 days is drastically longer than the one reported for the parent compound azobenzene in organic solvent21, which means that experiments with the almost pure cis-isomer of Pap, once generated, can be carried out in aqueous solution at room temperature in a practically reasonable time frame – as long as kept in the dark – and do not require permanent exposure to UV light.

While host-guest complex formation between certain azobenzene derivatives and CDs – or structurally related host cavities – was studied before, mainly in the area of nanomaterial research22,23,24,25,26, no such data have been reported for Pap. Taking into consideration the limited solubility of the free Pap amino acid in Tris/HCl pH 8.0, we chose the method of spectroscopic titration of a diluted Pap solution with a concentrated CD stock solution in the same buffer and monitoring the small change in the amplitude of the corresponding absorption band (Supplementary Fig. S3C) via quick measurement in the diode array spectrophotometer, as above, for each titration step after equilibration in the dark (Fig. 2I–L). The data were fitted according to the Law of Mass Action for bimolecular complex formation (Eq. 1).

Titration of trans-Pap with α-CD (cyclomaltohexaose) led to a remarkably low dissociation constant, KD = 91 µM (Fig. 2I). Hence, the affinity between trans-Pap and α-CD is stronger by an order of magnitude than could be expected from published measurements with trans-azobenzene as part of a polyacrylamide-based hydrogel using 1H-NMR23,26. Conversely, the same measurement with cis-Pap led to no detectable spectroscopic effect, thus preventing the determination of a KD value for its putative complex with α-CD (Fig. 2J). Again, this was not fully expected since affinities reported for cis-azobenzene in a polyacrylamide hydrogel had indicated a KD value of 29 mM23. Further titrations of trans-Pap and cis-Pap performed here (Fig. 2K, L) using the larger β-CD (cyclomaltoheptaose, 2-hydroxypropyl-derivatized; see Suppl. Methods) resulted in KD = 325 µM for the slim trans-state and a 15-fold higher KD value of 4.98 mM for the more bulky cis-configuration (cf. Fig. 2B, C). Taken together, in contrast to β-CD, whose affinities revealed a more moderate difference, α-CD appeared to exclusively form a tight complex with trans-Pap whereas exhibiting vanishingly low, if any, affinity towards cis-Pap.

In fact, these distinct binding activities of α-CD towards the light-switchable cis– and trans-states of Pap were in a range that seemed suitable for applications in affinity chromatography. For comparison, the Strep-tag8 revealed a dissociation constant of 13.0 ± 1.3 μM for complex formation with streptavidin27, which was sufficient to enable the one-step purification of corresponding fusion proteins from complex cell extracts. Therefore, to test the performance of Pap and its potential light-dependent binding activity under chromatographic conditions in aqueous solution, we synthesized a chromatography matrix with covalently immobilized α-CD groups at high density by chemical coupling of cyclomaltohexaose to epoxy-activated Sepharose 6B28. In this setting, the long hydrophilic linker provided by the activated polysaccharide matrix and its conjugation via an ether bond to one of the many hydroxy-groups on the hydrophilic surface of α-CD were expected to ensure on average good sterical accessibility of its hydrophobic inner pocket (Supplementary Fig. S8). At the same time, the swollen gel matrix (soaked with aqueous buffer) showed a high transmission, leading to ≥50 % light intensity within a typical packed chromatography column at bench scale (Supplementary Fig. S8). Furthermore, it is well established that upon passage through an opaque medium the incoming light is scattered isotropically and that its mean path length is invariant with respect to the microstructure of this medium29.

The resulting α-CD affinity matrix was used to prepare a chromatography column with 1 mL bed volume (7 mm diameter). When a 5 mM solution of Pap (100 μL) was applied under daylight (i.e. in its predominant trans-state) the ncAA immediately adsorbed to the matrix in the upper part of the column, as directly visible due to its intense yellow color (see Supplementary Fig. S9). The compound was largely retained in this zone, showing just modest movement with the flow of the mobile phase when washing the column with ten bed volumes of 100 mM Tris/HCl pH 8.0, 500 mM NaCl. This pronounced retention effect was in line with the stable (yet dynamic) complex formation between α-CD and trans-Pap (see Fig. 1) observed before in solution. However, when the column was illuminated at 355 nm for 10 min with a laterally positioned set of LEDs (see Supplementary Fig. 1), the yellow band became mobilized immediately and fully eluted within 1.5 bed volumes of the same buffer. Direct measurement of the absorption spectrum of the eluted fraction revealed the presence of Pap in its cis-state by showing only absorption bands at 293 and 426 nm and no peak at 326 nm (see Supplementary Fig. S9B). This experiment provided initial proof that the light-switchable cis/trans-isomerization of the ncAA Pap can be used to control its retention versus elution under chromatographic conditions on an α-CD affinity column. These encouraging results suggested the incorporation of Pap into recombinant proteins and to investigate if these adopt a similar light-dependent chromatographic behavior.

Development of a genetic system for the efficient incorporation of Pap into recombinant proteins

To incorporate Pap into a recombinant POI, as part of our so-called Azo-tag, we employed the strategy of amber stop codon (TAG) suppression in E. coli using an orthogonal pair comprising a heterologous tRNACUA and a cognate aminoacyl-tRNA synthetase (aaRS)30 encoded by a one-plasmid system for all genetic components31. However, instead of the system from M. jannaschii adapted to the Pap substrate as used by others before13,14, we chose the pyrrolysyl-tRNA synthetase (PylRS) from Methanosarcina mazei with altered substrate specificity (as described further below) and its cognate tRNAPyl32. This naturally evolved orthogonal pair is known for its more robust amber suppression in E. coli and less cross-reactivity with endogenous canonical amino acids33 while PylRS had been engineered already to accept amino acid substrates with certain substituted azobenzene side chains34.

Our expression vector pSB19 (Fig. 3) carries in total three heterologous genes, each under the control of a different promoter. tRNAPyl is constitutively expressed from the lpp promoter, whereas the structural gene for PylRS has been placed under control of the arabinose-inducible araBAD promoter/operator (p/o). Finally, the coding region for the POI is cloned under control of the lacUV5p/o which is inducible by lactose or isopropyl-β-D-thiogalactopyranoside (IPTG)5. Initially, we utilized superfolder GFP (sfGFP)35 in order to select E. coli cells carrying a mutated PylRS for the incorporation of Pap at its structurally permissible sequence position 39 via fluorescence-activated cell sorting (FACS)31 (see below and Suppl. Results). After having evolved PylRS with the desired substrate specificity, the coding region for sfGFPa39 was replaced by the ones of various different POIs equipped with the Azo-tag (see Supplementary Table S1) as described further below.

Fig. 3: Genetic system for the efficient cotranslational incorporation of Pap into recombinant proteins produced in E. coli.
figure 3

A Expression vector pSB19 encoding all three components needed for the biosynthesis of a POI carrying Pap, which is encoded by an amber stop codon: (i) the coding region for the POI (here: sfGFPa39) under control of the lacUV5p/o, (ii) the gene for tRNAPyl carrying an anticodon complementary to the amber stop codon, expressed from the lpp promoter, (iii) the coding region for an engineered version of PylRS, PapRS#34, under control of the araBADp/o. Apart from the relevant regulatory regions (repressor genes araC and lacI as well as origin of replication), the vector additionally harbors a transcriptional fusion of the genes for β-lactamase and chloramphenicol-acetyltransferase, which confer ampicillin (Amp) and chloramphenicol (Cam) resistance, respectively. The latter was equipped with an amber stop codon at a permissible position (residue 112), thus allowing selection for E. coli growth in the presence of Cam during the directed evolution of PylRS for accepting a ncAA substrate. B Replacement of the prfA cistron within the operon flanked by the hemA and prmC (alias hemK) genes on the chromosome of the E. coli B strain NEBExpress by the coding region for the kanamycin resistance protein, which was accompanied by a gene duplication (see Suppl. Results), thus leading to diminished transcriptional activity for RF-1 (the relevant promoter, P1, is indicated). C Subsequent deletion of the malE cistron, as part of the mal operon, in the genome of NEBExpress(lowRF1) and its replacement by the coding region for the streptomycin resistance protein, resulting in NEBExpress(lowRF1/ΔMBP). D, E Directed evolution of PylRS to efficiently charge tRNAPyl with Pap. D Exemplary FACS measurements (sfGFP39Pap signals, 105 events each) comparing the initial PylRS mutant encoded on pSB15-PapRS#0 (red) with the pSB19-PapRS#34 plasmid (black), both harboring the sfGFPa39 reporter protein gene, in the NEBExpress(lowRF1) strain background. E MFI of sfGFPa39 expression determined by FACS (105 events each; N = 2 individual experiments) in the presence of different Pap concentrations in the culture medium, using the same experimental setup.

To boost the efficiency of amber stop codon suppression by tRNAPyl, which is counter-acted by the ribosomal release factor 1 (RF-1) of E. coli, we attempted to construct an expression strain in which its gene, prfA, is deleted (Fig. 3B). Based on an earlier report on the inactivation of RF-1 in E. coli suggesting that RF-1 is indispensable for E. coli K-12 strains, due to the lower activity of its mutated RF-2 allele36, we chose the E. coli B strain NEBExpress® (New England Biolabs, Ipswich, MA), which encodes an intact RF-2, as chassis strain. To this end, the prfA cistron within the hemA-prfA-prmC(hemK) operon was precisely replaced by the coding region for the kanamycin resistance protein (kanr) using the λ-Red recombination system37. However, careful genome sequence analysis of the single clone obtained indicated that a gene duplication had occurred in a way that the remaining copy of intact prfA suffered from lower transcriptional activity (see Suppl. Results). Notably, the resulting strain, NEBExpress(lowRF1), showed essentially unaffected growth but strongly enhanced amber suppression activity – approximately 50 % versus the typically 10–20 % suppression in a wild-type background – in combination with the pSB19 vector, as also demonstrated by a five-fold higher yield of a fluorescent reporter protein containing the Pap residue after purification via the C-terminal Strep-tag II (see Suppl. Results). Thus, this E. coli derivative was used during the development of a PylRS mutant enabling the cotranslational incorporation of Pap and for the subsequent affinity purification experiments.

The evolution of a Pap-specific PylRS mutant started from the M. mazei enzyme which had been modified to incorporate meta-substituted derivatives of Pap for applications in click chemistry (MmPSCaaRS, harboring four amino acid exchanges from the wild-type protein: A302T, L309S, N346V, and C348G)34. We prepared a synthetic gene for this enzyme while additionally introducing a conservative amino acid mutation, K192R, which also occurs in the highly homologous PylRS from M. barkeri38, thus generating a second BsaI restriction site (together with the one introduced via silent mutations at the amino acid position G423/L424). This gene/protein format (dubbed MmPSCaaRS(BsaI), initially cloned on the vector pSB15 carrying different promoter elements, see Methods) facilitated cassette mutagenesis and subcloning of the central coding region encompassing the amino acid substrate pocket and active site of the modified PylRS.

Next, the mutation Y384F, which enhances ncAA incorporation by PylRS over a wide range of substrates comprising extended Lys side chains39, was introduced (mutant PapRS#0), resulting in higher activity both towards the unsubstituted Pap and its para-amino derivative, NH2-Pap (4.4-fold and 2.5-fold increase, respectively, according to the median of the fluorescence intensity, MFI, determined by FACS analysis of cells expressing the sfGFPa39 reporter protein). NH2-Pap was tentatively employed as a better soluble, albeit chemically less stable (cf. Supplementary Fig. S17) derivative of Pap. Notably, the latter showed immediate precipitation when diluted from its alkaline stock solution into the culture medium of E. coli. Interestingly, this phenomenon was effectively prevented by preparing a stock solution of 50 mM Pap in 200 mM NaOH supplemented with 200 mM β-CD, thus taking advantage of its reversible complex formation with both the cis– and trans-states of Pap, as determined above.

To further improve the incorporation of Pap (and also of NH2-Pap; see Suppl. Results) by PapRS#0, its directed evolution was pursued (Supplementary Fig. S10). To this end, the central PylRS-encoding region encompassing residues D196–E425 (flanked by the pair of BsaI restriction sites) was amplified by error-prone polymerase chain reaction (PCR), thus generating a mutated plasmid library. Transformants of NEBExpress(lowRF1) were grown in the presence of 1 mM of the ncAA and subjected to FACS using the sfGFPa39 fluorescence as readout. After several iterations of positive and negative selection31, emerging mutants revealing specifically enhanced fluorescence were analyzed by plasmid sequencing and compared by FACS analyses of individually cultured clones. Promising amino acid exchanges were subsequently combined by subcloning, finally resulting in the mutant PapRS#34 which carried three additional amino acid exchanges: F295L, N304S and V346A.

The combined advantages of the evolved PapRS#34, the expression vector pSB19 and the improved genetic background of NEBExpress(lowRF1) became evident when examining the amount of Pap in the culture medium that was needed to generate the maximal signal of sfGFPa39 in the FACS measurements. Signal saturation was approached already at a very low Pap concentration of 30–100 µM (Fig. 3D, E), compared to a 1 mM concentration as applied at the onset of this project as well as in other published studies14,34. This also made the use of β-CD for solubilizing Pap in the bacterial culture less important in our hands. Consequently, this optimized expression system was employed for the preparative production of various POIs (Supplementary Table S1) to test their performance in a light-controlled affinity chromatography.

Affinity purification of Azo-tagged proteins on the α-CD affinity column controlled by light

To investigate the application of the light-switchable Pap side chain for purification purposes if incorporated at an exposed position into a biosynthetic POI, initially two colored model proteins were chosen: Azurin, a natural blue copper protein from Pseudomonas aeruginosa40, and mScarlet(3), an engineered monomeric red fluorescent protein from corals41,42. To avoid possible sterical interference of the interaction between α-CD and Pap in the context of the macromolecular protein, the amber stop codon was positioned at the very C-terminus of the reporter protein. Separated by two slim Gly residues, the Strep-tag II affinity peptide8 was arranged upstream for convenience, whereas Pap was directly followed downstream by a terminal ochre stop codon (TAA, not suppressed by the PylRS system; Fig. 4A). Apart from serving as an additional spacer between Pap and the target protein, the Strep-tag II also allowed the initial purification of the POI by Strep-Tactin affinity chromatography, prior to assessing its chromatographic behavior mediated by the Azo-tag in combination with the α-CD affinity column.

Fig. 4: α-CD affinity chromatography of Azo-tagged proteins.
figure 4

A Scheme of the recombinant (mature) protein constructs for Azurin and mScarlet, both carrying a C-terminal Azo-tag with the Pap residue. B Retention of (pre-purified) Azurin-GG-Pap (top) and mScarlet-GG-Pap (bottom) to an α-CD affinity column (1 mL bed volume) after washing with 1 mL buffer. The minor fraction of protein that carried the Azo-tag in the cis-state (see Supplementary Fig. S14) or that had not fully incorporated the C-terminal Pap residue via amber suppression (see Supplementary Fig. S12) is already partially washed out of the column with the buffer flow. C The elution of mScarlet-Strep-GG-Pap under different illumination conditions monitored via its specific fluorescence in the chromatography fractions. The column was either loaded with the protein and (i) washed directly under 355 nm UV light (blue dashed line) or (ii) washed under daylight (black solid line), followed by elution upon exposure to 355 nm UV light (red solid line). D Separation of a mixture of Azurin-Strep-GG-Pap and mScarlet-Strep, here without the Azo-tag, on the α-CD affinity column. While mScarlet (magenta) does not bind to the column and is quickly washed out, Azurin-Strep-GG-Pap (blue) is retained in the upper zone of the column and specifically eluted afterwards via exposure to UV light (see the Supplementary Movie 1).

While mScarlet was produced in the cytoplasm of E. coli, Azurin was secreted into the bacterial periplasm, with the help of the OmpA signal peptide, and subsequently reconstituted with Cu2+ ions in the periplasmic extract7. The coding regions for Azurin and mScarlet, both equipped with the combined Strep/Azo-tag at their C-termini, were each cloned on the pSB19-PapRS#34 vector – thus replacing sfGFPa39 which had served as fluorescent reporter for the PylRS evolution described above – and produced in E. coli NEBExpress(lowRF1), then purified to homogeneity via the Strep-tag II. When Azurin or mScarlet were subsequently loaded onto a 1 mL α-CD affinity column, each protein accumulated at the top of the resin. Even after washing with chromatography buffer A (25 mM Tris/Cl pH 8.0, 150 mM NaCl), a large fraction of the POI was retained there (Fig. 4B). Thus, Pap in its low-energy trans-state was able to interact with the immobilized α-CD groups and form a stable non-covalent complex even if incorporated into a recombinant protein. On the other hand, upon exposure to UV light at 355 nm both POIs eluted immediately with the buffer flow (see below). Therefore, as with the free ncAA initially investigated, the interaction between immobilized α-CD groups and a POI exposing the Pap side chain can be reversed by its light-induced isomerization into the cis-configuration (see Fig. 1A).

The effect of UV light exposure on the binding of the POI was further demonstrated by quantifying the mScarlet fluorescence of individual fractions collected in the course of the α-CD affinity chromatography (Fig. 4C). As long as exposed to daylight – which stabilizes the trans-configuration of the azo-group as explained above, even though with a proportion of 20–25 % in the cis-state – mScarlet-Strep-GG-Pap was largely retained on the column, with only minor fluorescence detectable in the flow-through upon continued washing. Only after specific illumination at 355 nm (via lateral exposure to LED light in the shaded laboratory, cf. Supplementary Fig. S1D) the bound protein quantitatively eluted. For comparison, exposure to the UV light already during the loading and washing steps led to the immediate elution of mScarlet-Strep-GG-Pap in the flow-through. More detailed investigation of the role of illumination, at 430 nm or 355 nm, respectively, during sample loading/washing steps and the elution phase revealed that the proportion of bound Azo-tagged POI can actually be boosted by constant exposure to blue light (Supplementary Fig. S14) owing to the dynamic equlibrium between cis– and trans-isomers in the PSS (see Fig. 1B). However, this High Yield mode also leads to less tight retention on the α-CD affinity column such that in practice (see below) the High Purity mode, with sample loading and washing in the dark (where the photostationary cis-/trans-isomer composition is frozen), may be beneficial.

To further demonstrate a chromatographic purification effect mediated by the Azo-tag, a mixture of the separately prepared mScarlet-Strep protein, this time lacking the C-terminal Pap residue, with the Azurin-Strep-GG-Pap from above was applied to the α-CD affinity column (Fig. 4D and Supplementary Movie 1). While mScarlet-Strep entered the column and eluted with the buffer flow without delay or visibly interacting with the resin – thus also confirming the negligible role of the Strep-tag II – the Azo-tagged Azurin accumulated in the top zone of the matrix as before. Upon continued washing with chromatography buffer (under dim light), the two colored proteins gradually separated and the column was completely cleared from mScarlet (after a total washing volume of 1.5 mL), whereas Azurin was fully retained in the upper part of the column. Then, 355 nm UV light was applied and Azurin eluted instantaneously with the chromatography buffer flow. This experiment demonstrated that the Azo-tag as part of a biosynthetic POI allows its efficient separation from an untagged recombinant protein via affinity chromatography on an α-CD matrix under physiological buffer conditions, simply controlled by light. The elution itself was typically complete within 1–1.5 column volumes, which is comparable to the conventional application of a competitive ligand, e.g. when using a biotin derivative in case of Strep-tag affinity purification8 and, thus, keeps dilution of the finally purified protein sample at minimum.

Biosynthesis of exemplary proteins and their one-step affinity purification from cell extracts

Aiming at practical applications in the life sciences, we investigated if the Azo-tag in combination with the α-CD affinity chromatography is suitable for the one-step purification of a recombinant POI from a complex host cell protein mixture. As the Azurin-Strep-GG-Pap from above constituted the vastly prevailing protein in the periplasmic protein fraction of the corresponding transformed E. coli strain its isolation from this cellular extract was easy to accomplish (Supplementary Fig. S12). Hence, we chose human cystatin C43 as a more challenging and biomedically relevant recombinant POI, which also carries disulfide bonds. Again, this protein was fused with the C-terminal Strep-tag II followed by the GG-Pap sequence, in addition to the N-terminal OmpA signal peptide for periplasmic secretion.

The periplasmic extract of E. coli cells expressing this biosynthetic protein was prepared by osmotic shock and directly applied to the α-CD affinity column. After that, the column was washed with 3 column volumes of buffer under daylight until the host cell proteins were removed. When then applying 355 nm UV light, the cystatin C specifically eluted from the column as detected by SDS-PAGE analysis of the corresponding fractions (Fig. 5). Notably, this analysis revealed that essentially all host cell proteins were quickly washed out of the column using a physiological buffer (25 mM Tris/Cl pH 8.0, 150 mM NaCl). On the other hand, the cystatin C was obtained in high purity – with just one minor contamination. This demonstrated the successful light-controlled purification of a POI via the Azo-tag from the periplasmic bacterial extract, also indicating high specificity of the immobilized α-CD groups towards the Pap side chain.

Fig. 5: One-step purification of different Azo-tagged POIs from complex protein mixtures via light-controlled α-CD affinity chromatography.
figure 5

Protein solutions were applied to a column with 1 mL bed volume under daylight and, after washing with buffer, the bound protein was eluted via illumination at 355 nm (in the dark lab). A Isolation of cystatin C carrying the C-terminal GG-Pap sequence from the periplasmic cell extract of NEBExpress(lowRF1). B Isolation of Azurin carrying the C-terminal GG-Pap sequence from the total cell extract of NEBExpress(lowRF1), with 5 mM maltose added to the lysis buffer. C Isolation of mScarlet carrying the C-terminal GG-Pap sequence from the total cell extract under similar conditions. D Isolation of mScarlet3 carrying the N-terminal Ala-Pap-Gly sequence from the total cell extract, again under the same conditions. Note that the two additional lower bands in the elution fractions are obviously due to a specific backbone cleavage within mScarlet3 (cf. Supplementary Fig. S17A). This prominent post-maturation hydrolysis, which has also been described by others68, was observed in all mScarlet(3) preparations, even when using other purification methods and in the absence of the Azo-tag, but not for sfGFP (cf. Supplementary Fig. S18E).

The only contaminating protein species that was detectable as a very weak band migrating at around 40 kDa in the SDS-PAGE was subsequently identified as the maltose binding protein (MBP) of E. coli using tandem ESI-MS of peptide fragments prepared by trypsin digestion (see Supplementary Fig. S18). This was also in agreement with its measured intact mass of 40706 Da. The same contaminating band, whose (retarded) elution was independent of light, was detected in experiments with other POIs. While this phenomenon was unexpected, since MBP is not a commonly known host cell contaminant in the widely applied affinity purification of MBP fusion proteins from E. coli extracts on an amylose resin44, it is supported by previous investigations on the ligand spectrum of MBP which includes cyclic maltodextrins45. However, adsorption of the endogenous MBP to the α-CD affinity column was effectively prevented here when supplementing the bacterial extract with 5 mM maltose, thus saturating its ligand pocket (KD = 3.5 µM). As an alternative, we constructed a derivative of the expression strain using the λ-Red system as above, NEBExpress(lowRF1/ΔMBP), which had the malE gene deleted (Fig. 3C). While this strain showed slightly slower growth, it reached similar cell densities in the stationary phase and appeared as another suitable expression host for POIs carrying an Azo-tag, as demonstrated here for sfGFP (Supplementary Fig. S18E, F).

In the next step, we set out to purify Azurin-Strep-GG-Pap – still secreted into the bacterial periplasm utilizing the N-terminal OmpA signal sequence as before – from the total cell lysate of E. coli (Fig. 5B). In the SDS-PAGE of this cell-free extract, Azurin no longer gave rise to a prominent protein band; nevertheless, upon elution under illumination at 355 nm it was recovered from the α-CD affinity column as a homogeneous protein without detectable impurities. Notably, analogous purification experiments performed with a version of this POI lacking the Strep-tag II, i.e. carrying the Azo-tag alone, led to similar results, thus excluding any contribution from the second affinity tag (Supplementary Fig. S13). Likewise, mScarlet equipped with the C-terminal Strep-GG-Pap tag, which was expressed in the bacterial cytoplasm, was isolated from the whole cell extract using the same procedure by light-controlled α-CD affinity chromatography in a highly pure state (Fig. 5C).

Sequence optimization of the Azo-tag and α-CD affinity column performance

While the experiments up to now clearly demonstrated the efficient purification of POIs carrying a C-terminal Azo-tag on an α-CD affinity column, even from a whole cell extract, there was always a proportion of the recombinant protein that did not adsorb to the affinity column (cf. Supplementary Fig. S12 and Fig. 5A, C) and appeared in the flow-through. As in these cases washing was performed under daylight, the premature elution of some proportion of the Azo-tagged POI in the cis-state, as mentioned further above, should have been largely prevented (see the effect of continuous illumination during the washing step as illustrated in Supplementary Fig. S14). Moreover, determination of the dynamic column capacity (Supplementary Fig. S15) did not provide an indication of column overloading, with a binding capacity of >10 mg Azo-tagged POI per 1 mL bed volume. Likewise, the α-CD affinity matrix appeared highly robust and tolerant against widely applied biochemical reagents (such as 10 mM DTT or TCEP, 6 M GdnHCl, 8 M urea or 1 M NaOH), allowing 10–50 repeated chromatography runs without noticeable loss in performance (Supplementary Fig. S15).

However, UV/Vis absorption spectra as well as ESI-MS analyses indicated that the observed effect was due to a fraction of the biosynthetic protein that lacked the C-terminal Pap amino acid (Supplementary Fig. S12), whereas only the protein that had Pap incorporated remained bound to the α-CD matrix and was eluted upon UV exposure. Obviously, this phenomenon was caused by the incomplete suppression of the amber stop codon during the cotranslational incorporation of the ncAA using the engineered PylRS/tRNAUAG system, due to the merely partial deactivation of RF-1 in the engineered E. coli host strain used, NEBExpress(lowRF1), as described above. This led to the translation of a protein with essentially identical biochemical properties but just one residue shorter, thus lacking the azobenzene side chain needed for the complex formation with the immobilized α-CD groups. While this unbound side product may less matter in practice, we sought to avoid its formation for demonstration purposes.

To ensure that only the full-length gene product carrying the Pap residue is expressed, the amber stop codon was moved from the C- to the N-terminus of the POI. Notably, in this setting care had to be taken that the azobenzene side chain was not obstructed by a bulky neighboring amino acid such as, in particular, the start Met residue in the case of direct cytoplasmic expression (e.g. for mScarlet). To promote adequate accessibility of the N-terminal Pap group, it was preceded by an Ala residue, right after the start Met residue, based on the knowledge that the endogenous methionine aminopeptidase cleaves the start Met if followed by a residue with small side chain46. On the other hand, with regard to the secretory expression strategy, it was questionable if the signal sequence would get efficiently processed by the bacterial signal peptidase I47 if directly followed by a bulky hydrophobic side chain such as the one of Pap. Therefore, an Ala residue was also inserted on the N-terminal side of Pap for those genetic constructs effecting periplasmic secretion (e.g. for Azurin). In both cases, Pap was followed by two Gly residues, with the POI sequence arranged downstream and still carrying the Strep-tag II at its C-terminus (for convenience). Using these constructs, the N-terminally Azo-tagged versions of both Azurin and mScarlet3 were both successfully purified from the whole cell extract of E. coli without detectable recombinant protein in the flow-through or wash fractions (Fig. 5D and Supplementary Fig. S18A).

Nevertheless, it appeared that the two N-terminally tagged POIs did not adsorb as tightly to the α-CD affinity column as their versions with the C-terminal Azo-tag. Hence, we investigated if the amino acids directly adjacent to Pap have an impact on its affinity to α-CD. In order to screen a large set of sequence combinations, a SPOT assay48 was performed. To this end, an array of 18×18 Xaa-Pap-Yaa tripeptides, C-terminally anchored on a hydrophilic carrier membrane, was synthesized, with Xaa and Yaa representing all amino acids except for the bulky Trp and the chemically labile Cys. Binding activity was probed by incubation with bacterial alkaline phosphatase27 which had been chemically conjugated with α-CD groups, followed by washing with buffer and detection of enzyme activity using a precipitating chromogenic substrate (Supplementary Fig. S19). In this assay, the SPOT signal obtained for the original tripeptide sequence, Ala-Pap-Gly, was only slightly above the median. However, among others, there was in particular one tripeptide that gave rise to a significantly elevated signal: H2N-Gly-Pap-Gly-. In fact, the emergence of an N-terminal Gly residue, which lacks any side chain, appeared plausible in the light of further reduced sterical requirements in the chemical neighborhood of the Pap side chain as contemplated above.

To test the performance of the different N-terminal peptide sequences in the α-CD affinity purification, the corresponding versions of mScarlet3 were generated as recombinant proteins and equivalent amounts of the bacterial total cell lysate were loaded onto the column. As expected, moderate but consistent differences in binding of the colored Azo-tagged POI to the α-CD affinity resin were observed during the washing steps (see Supplementary Fig. S20). After washing with three bed volumes of chromatography buffer, the Gly-Pap-Gly variant still remained concentrated within the upper half of the column, whereas the initial Ala-Pap-Gly variant was already mobilized to some extent, visibly moving towards the bottom of the column. Apart from that, the purity of both protein preparations obtained after UV-induced elution was comparable.

Finally, we investigated if the exposure to UV light at 355 nm had an effect on the integrity of the Azo-tagged POI purified via α-CD affinity chromatography. To this end, we used (i) an ordinary protein lacking chromophores, human cystatin C (which only absorbs light in the farther UV region, around 280 nm, due to the Tyr and Trp side chains), and (ii) a colored/fluorescent protein that also shows absorption in the relevant spectral region, sfGFP (both POIs already employed above). Remarkably, even after extended illumination up to 60 min using the same setup of LEDs as applied for the affinity chromatography, no signs of deterioration apparent from SDS-PAGE, ESI-MS or the fluorescence spectrum were detectable (Supplementary Fig. S16). This is in line with the mild form of near UV radiation at 355 nm and the well known resistance of proteins against exposure even to 280 nm UV light, as it generally happens in common UV detectors used for chromatographic purification.

Application of the Azo-tag in a high-throughput setup and to antibodies or their fragments

The experiments so far demonstrate that the light-controlled α-CD affinity chromatography is suitable for Azo-tagged protein purification at the bench top scale. While upscaling to bulk separation or to industrial application may require more elaborate column design and process development, Excitography directly appears attractive for screening applications at the microscale, compatible with laboratory automation. To this end, we investigated the isolation of proteins using 96-well microtiter plates. POIs belonging to two classes with biomedical relevance were chosen as examples: (i) enzymes, such as the monomeric β-lactamase AmpC49 (Fig. 6A, B) as well as the homo-dimeric glutathione-S-transferase50, GST (Supplementary Fig. S23A, B), and (ii) antibody fragments, i.e. the single-chain variable fragment (scFv) of the monoclonal antibody (MAb) T84.66 directed against the tumor-associated carcinoembryonic antigen (CEA)51 (Fig. 6C) as well as the nanobodies (NBs) EgA1 directed against human epidermal growth factor receptor (EGFR)52 (Supplementary Fig. 22) and 2Rs15d directed against human epidermal growth factor receptor 2 (HER2)53 (Supplementary Fig. S23C). These representative POIs can be produced in small cultures and handled in functional assays using standard laboratory robotics, such as in the frame of biopharmaceutical protein engineering or industrial enzyme screening campaigns.

Fig. 6: One-step purification of Azo-tagged POIs in parallel using the 96-well format.
figure 6

A Isolation of the β-lactamase AmpC carrying the C-terminal GG-Pap sequence from the total cell extract of NEBExpress(lowRF1) using 10 wells of a 96-well receiver plate filled with 200 µl α-CD affinity resin each. AmpC was purified as essentially homogeneous protein without significant well-to-well difference. AmpC without an Azo-tag or untransformed NEBExpress(lowRF1) cells served as negative controls. B Colorimetric assay to measure the AmpC enzyme activity in aliquots of the eluate from (A) after mixing with CENTA substrate in a 96-well microtest plate. C Affinity purification of the anti-CEA scFv antibody fragment T84.66 carrying the C-terminal GG-Pap sequence from the periplasmic extract of NEBExpress(lowRF1), followed by ELISA to assess antigen-binding activity in 10 separately eluted aliquots. scFv bound to a CEA-coated microtiter plate was detected via an antibody-HRP conjugate directed against the Strep-tag II using ABTS substrate. Untransformed NEBExpress(lowRF1) cells served as negative controls (3 wells). Data points in box plots shown in (B) and (C) each correspond to a single measurement per well; median (center line), upper/lower quartiles (box limits) and min/max values (whiskers) are indicated. D Light-controlled affinity purification of trastuzumab from cell culture medium using an adapter protein, as schematically illustrated to the right. Trastuzumab and the Azo-tagged protein G C2 fragment (Azo-ProtG) were mixed in DMEM, 5 % (v/v) FCS (input). The antibody was specifically isolated by UV light-controlled elution while all serum proteins, in particular albumin (66 kDa), were efficiently removed.

Here we used lysates of E. coli cultures expressing each of these POIs equipped either at the N-terminus or at the C-terminus with the Azo-tag (see Supplementary Table S1) and encoded on the pSB19-PapRS#34 vector for recombinant gene expression, followed by purification on the α-CD affinity matrix, this time hosted in the wells of a 96 well column plate (200 µl bed size). After the application of a 500 µl bacterial extract and washing under gravity flow, until the host cell proteins had been removed, the columns were illuminated from above at 355 nm by an array of UV LEDs (one per well; see Supplementary Fig. S1H), which led to the immediate elution of the bound POI simply in the applied buffer (chromatography buffer A or GST assay buffer). The collected protein eluate showed high purity and uniform protein concentration, with no significant well-to-well difference (Fig. 6A and Supplementary Fig. S23A). The resulting purified protein solutions were directly suitable for subsequent assays, without the need for buffer exchange or removal of elution agents, as here demonstrated with enzyme activity assays and antigen binding assays, respectively (Fig. 6B, C and Supplementary Fig. S23B, C).

In this context we also investigated if the presence of the Azo-tag may have an influence on the structural or functional properties of the POI. To this end, we prepared GST both with and without a C-terminal Azo-tag (GST-Strep-GG-Pap versus GST-Strep-GG, respectively) and performed various assays (Supplementary Fig. S21). While the presence of the azobenzene side chain (in the trans-state) was clearly visible as a distinct band at 326 nm in the UV-Vis spectrum, there was no change in the protein secondary structure, as assessed from far-UV circular dichroism (CD) spectroscopy, and the enzyme activity of both protein versions was identical within experimental error, as measured by Michaelis-Menten kinetics. Most notably, the elution behavior in an analytical hydrophobic interaction chromatography was indistinguishable, which indicates that the observed poor solubility of the isolated Pap amino acid does not cause a measurable increase in overall hydrophobicity if incorporated into a folded protein. Likewise, a comparison of the anti-EGFR Nb with or without the C-terminal Pap residue revealed an unchanged high antigen-binding activity (Supplementary Fig. S22). Moreover, the azobenzene side chain proved to be stable against common (intracellular) biochemical reducing agents such as glutathione (GSH), NADPH and TCEP (Supplementary Fig. S13).

Finally, to further extend the utility of the light-controlled α-CD affinity chromatography also to full-size antibodies produced in mammalian cell culture, we designed a small adapter molecule based on the C2 fragment of protein G which forms complexes with the Fc portion of a wide range of immunoglobulins (Igs)54. This adapter molecule (Azo-ProtG), equipped with an N-terminal Azo-tag (see Supplementary Table S1) and a C-terminal Strep-tag II, was expressed with high yield in NEBExpress(lowRF1), purified and added to cell culture medium containing the anti-HER2 IgG1/κ humanized antibody trastuzumab (Herceptin®)55 (Fig. 6D). Trastuzumab was isolated from this solution via Excitography in one step in the presence of phosphate-buffered saline (PBS) at physiological pH. Notably, albumin, which is a major protein constituent in many cell culture media, was quickly washed out of the α-CD affinity column, and thus efficiently removed, prior to the selective UV light-induced elution of the intact antibody.