Deciphering triterpenoid saponin biosynthesis by leveraging transcriptome response to methyl jasmonate elicitation in Saponaria vaccaria – Nature Communications

Methyl jasmonate upregulated expression of βAS in leaves and flowers of S. vaccaria

Tentative identification of the target mass of vaccaroside E, segetoside I, and segetoside I Ac by LC-MS in different organs of S. vaccaria suggested that saponins were present in roots, stems, leaves, and flowers (Supplementary Fig. 1, Supplementary Table 1). Major enzymes involved in the saponin biosynthetic pathway belong to large enzyme families, such as cytochrome P450s (CYPs)23 and UDP-glycosyltransferases (UGTs)24,25, making it difficult to distinguish specific enzymes from other members. Therefore, an effective screening method is required for recognizing saponin biosynthesis genes in S. vaccaria.

External application of MeJA has been shown to increase saponin production in some plants3,8,26, often by increasing the expression of saponin biosynthetic genes4,7. Therefore, genes involved in the saponin biosynthetic pathway and the related biological processes could be upregulated together by MeJA, allowing us to narrow down the range of candidate enzymes.

βAS converts 2,3-oxidosqualene into β-amyrin, the first committed step of oleanane-type triterpenoid saponin biosynthesis27. MeJA upregulated S. vaccaria βAS (SvβAS) expression, exhibiting the highest induction after 24 h at 100 µM in both leaves and flowers (Fig. 2). The upregulation of SvβAS indicates that exogenous application of MeJA elicited the expression of genes involved in saponin biosynthesis in S. vaccaria, potentially leading to elevated saponin production. We confirmed that homologs of other genes that are known to be induced by jasmonates in other plants were similarly upregulated (Fig. 2c, d), including Allene Oxide Cyclase (AOC)2, 23 kDa Jasmonate-Induced Protein (23kDa JIP)28, TIFY 10b29 (a JAZ protein), and Jasmonate-Resistant 4 (JAR4)30.

Fig. 2: Methyl Jasmonate-elicited transcriptional responses in leaves and flowers of S. vaccaria.
figure 2

a Fold change of β amyrin synthase in leaves treated with MeJA at 50, 100 µM compared to 0 µM at 4 h, 24 h, and 72 h tested by qPCR. Error bars indicate mean ± SD (n = 3 or n = 4 biologically independent samples). b Fold change of β amyrin synthase treated by MeJA at 50, 100 µM compared to 0 µM at 24 h and 72 h in flowers tested by qPCR. Error bars indicate mean ± SD (n = 4 biologically independent samples). c, d Fold change of Allene oxide cyclase (AOC), 23kDa Jasmonate-Induced Protein (23kDa JIP), TIFY 10b and Jasmonate-Resistant 4 (JAR) in leaves treated with 100 µM MeJA at 4 h and 24 h compared to 0 h tested by qPCR. Error bars indicate mean ± SD (n = 3 or n = 4 biologically independent samples). Asterisks indicate statistically significant fold change using a one-way ANOVA test with a Tukey HSD test (*p < 0.05; **p < 0.01). Source data, test statistics, and exact p-values are provided as a Source Data file.

Full-length transcriptome sequencing and annotation

We first sought to investigate the complete set of genes co-upregulated with βAS by MeJA treatment in S. vaccaria, but the complete genome or transcriptome sequences were not available. Therefore, we developed and implemented a pipeline of combinatorial transcriptional sequencing and transcript expression analysis (Supplementary Fig. 2).

To obtain accurate full-length transcriptome sequences from S. vaccaria, cDNA libraries from flowers and leaves were constructed for SMRT sequencing by the PacBio Sequel II sequencer31,32. A total of 6,104,715 polymerase reads were processed to produce 3,717,290 circular consensus sequencing (CCS) subreads with a mean length of 2388 bp. Next, subreads were refined and clustered, resulting in 118,956 high-quality Iso-seq transcript isoforms from leaves and 113,581 from flowers. After removing redundant transcripts by CD-HIT, nonredundant transcripts from leaves and flowers were combined and collapsed into a total of 89,371 unique transcript isoforms using guidance from the reconstructed coding genome sequences generated by Cogent33,34. Cogent partitions Iso-Seq transcripts into gene families based on k-mer similarity, reconstructs the coding region for each gene family, and ultimately creates a de novo coding genome. Subsequently, all Iso-seq transcripts were collapsed into unique isoforms guided by the reconstructed genome. Each Cogent gene family contains unique isoforms, denoted by “PB” with the Arabic number representing the gene it belongs to followed by a period and an Arabic number for the isoform (e.g. ‘PB.4332.2’). Isoforms with the same gene number are likely derived from the same gene, but since we do not have a full genome sequence, occasional instances of recently duplicated genes cannot be completely ruled out34.

The unique transcript isoforms of S. vaccaria were annotated by comparing sequences against Swissprot, Pfam, KEGG, and GO databases using BLASTX. As a result, 73,676, 46,788, 64,327, and 71,964 unique transcripts were annotated in the databases mentioned above, respectively. We also detected alternative splicing events in the S. vaccaria transcriptome by aligning the nonredundant transcripts to the reconstructed coding genome (Supplementary Fig. 3)35.

Illumina sequencing and transcripts profiling

For gene expression profiling, RNA samples from leaves and flowers of S. vaccaria with and without MeJA treatment (each in quadruplicates) were subjected to 3′-Tag-RNA-Seq sequencing by Illumina Hiseq. The mapping tool Salmon was then used to map 89,371 unique transcript isoforms obtained from PacBio sequencing for transcript quantification. The Salmon tool tends to map most reads to just one of the isoforms of a gene, typically the isoform with the most complete 3′-end. Importantly, in some cases another isoform may encode a more complete protein sequence and be more suitable for functional characterization. Both principal component analysis (PCA) and hierarchical cluster analysis (HCA) analysis showed that sample replicates of different treatments and tissues correlated well (Supplementary Fig. 4).

Quantitative reverse transcription-PCR validation of RNA-seq transcript quantification

The reliability of RNA-seq transcript quantification was validated by quantitative reverse transcription-PCR (qRT-PCR). The expression profiles of transcripts obtained from RNA-seq analysis were compared to qRT-PCR results. These are transcripts from the mevalonate and squalene pathways producing triterpenoid precursors: 3-hydroxy-3-methylglutaryl-coenzyme A reductase (HMGR) (PB.35046.2), diphosphomevalonate decarboxylase (MVD) (PB.5779.2), squalene synthase (PB.40810.6), and βAS (PB.4332.2). Pearson correlation coefficient of log2 value of the expression level fold change is 0.8425, indicating the RNA-seq expression quantification is positively correlated to qRT-PCR results (Supplementary Fig. 5).

MeJA upregulated saponin biosynthesis pathway genes

To gain insight into the pathways activated by MeJA, we conducted a gene ontology (GO) analysis of differentially expressed genes under MeJA treatment36. GO terms associated with both triterpenoid biosynthesis and saponin biosynthesis were significantly enriched among the genes upregulated by MeJA in S. vaccaria (Supplementary Fig. 6).

Given that the expression of SvβAS and genes involved in squalene synthesis was confirmed to be upregulated by MeJA, it is likely that other saponin biosynthesis genes would respond similarly to this treatment. Therefore, to explore genes co-induced with SvβAS, we clustered all the differentially expressed genes based on their expression patterns. Then we identified a specific subcluster that included SvβAS and all other genes that also had increased expression in both leaves and flowers after MeJA treatment (Supplementary Data 1).

Identification and characterization of SvC28, SvC16, and SvC23 oxidases

We began discovering genes for saponin biosynthesis in S. vaccaria with cytochrome P450 monooxygenases (CYPs) involved in the production of triterpenoid aglycones. The aglycone structures of major S. vaccaria saponins are oleanane triterpenoids with a C28 carboxylic acid group that could be oxidized at C23 and C16 (Supplementary Table 1): quillaic acid, gypsogenic acid, gypsogenin, etc.

Candidate CYPs for triterpenoid aglycone biosynthesis were identified by gene upregulation and phylogenetic analysis. Several CYPs appeared to be co-induced with βAS upon MeJA treatment in leaves and flowers (Fig. 3a; Supplementary Data 1). Seven CYPs were shown to be co-induced with SvβAS and they clustered with known triterpenoid biosynthetic CYPs, suggesting their functions in β-amyrin oxidation.

Fig. 3: Discovery of genes in the biosynthesis of triterpenoid saponin aglycone in S. vaccaria.
figure 3

a a neighbor-joining tree (1,000 bootstrap replicates) of CYPs acting on triterpenoid from other plants and CYP candidates identified from S. vaccaria transcriptome. Gene names in blue represent S. vaccaria genes. Yellow highlighted names represent co-upregulated CYPs with β amyrin synthase. Numbers indicate bootstrap values. Sequences of S. vaccaria proteins in the tree are in Supplementary Data 2 and accession numbers of CYPs from other species are in Supplementary Table 2. b extracted ion chromatograms (EIC) of oleanolic acid obtained from Nicotiana benthamiana transiently expressing SvβAS + SvC28. c EIC of echinocystic acid obtained from N. benthamiana transiently expressing SvβAS + SvC28 + SvC16; d, EIC of gypsogenic acid obtained from N. benthamiana transiently expressing SvβAS + SvC28 + SvC23-1 and SvβAS + SvC28 + SvC23-2. Compounds were identified with authentic standards. Corresponding mass spectra are shown in Supplementary Fig. 7.

The previously described C28 oxidases of β-amyrin belong to the CYP716 family37. In the neighbor-joining tree, transcript PB.8389.1 clusters with C28 oxidases from various plants in a subgroup of the CYP716 branch (Fig. 3a). It is also co-upregulated with SvβAS by MeJA. To characterize the enzymatic function of the protein encoded by PB.8389.1, Nicotiana benthamiana leaves were infiltrated with a mixture of Agrobacterium tumefaciens strains to express both SvβAS and PB.8389.1. Subsequent analysis by LC-MS revealed the formation of oleanolic acid, the C28 oxidized β-amyrin, in leaves where both SvβAS and PB.8389.1 were transiently expressed (Fig. 3b, Supplementary Fig. 7). Conversely, oleanonic acid was absent in control leaves infiltrated only with the SvβAS strain. Thus, the function of PB.8389.1 was confirmed to be β-amyrin C28 oxidase in S. vaccaria and designated as SvC28 oxidase (CYP716A173).

Two plant CYP enzymes from different subfamilies in the CYP85 clan have been reported to perform C16α oxidation of β-amyrin: Bupleurum falcatum CYP716Y1 and Maesa lanceolata CYP87D1638. In the neighbor-joining tree (Fig. 3a), three S. vaccaria transcripts are closely related to MlCYP87D16. However, none of them were induced by MeJA. On the other hand, a subclade was formed by plant CYP716 enzymes, including BfCYP716Y, with a group of potential S. vaccaria CYP transcripts. Among these candidates, PB.2497.1 and PB.29244.1 exhibited notable co-induction with SvβAS. We first selected PB.2497.1 as candidate C16 oxidase for functional characterization. However, transient expression of SvβAS, PB.8389.1 (SvC28 oxidase), and PB.2497.1 together in N. benthamiana resulted in production of an unknown product with the same m/z but with different retention time compared to echinocystic acid, the C16-hydroxylated oleanolic acid. We then selected PB.29244.4 for functional characterization, as it encodes a full-length protein, whereas PB.29244.1 has a SNP that causes an early stop codon. Importantly, the product of the protein encoded by PB.29244.4 in the presence of SvβAS and SvC28 oxidase was echinocystic acid (Fig. 3c, Supplementary Fig. 7).

The function of the protein encoded by PB.29244.4 was also validated in a β-amyrin-producing yeast strain. Yeast expressing PB.29244.4 with SvC28 oxidase and Arabidopsis thaliana cytochrome P450 reductase (AtATR1) on plasmid produced echinocystic acid (Supplementary Fig. 8). Thus, PB.29244.4 was identified as the C16α-hydroxylase of β-amyrin in S. vaccaria and designated as SvC16 oxidase (CYP716A379).

SvC23 oxidase candidate genes that were co-upregulated with SvβAS (PB.6518.3, PB.29444.13, PB.29444.23, and PB.33196.6) resided in two clusters in the tree with C23 oxidases from either CYP714 or CYP72 subfamily (Fig. 3a). PB.29444.23 and PB.33196.6 were expressed at a relatively higher level, and their expression fold changes by MeJA were the highest among the SvC23 candidates. To investigate the enzymatic functions of these co-upregulated C23 oxidases candidates, we transiently expressed each of them in leaves of N. benthamiana with SvβAS and SvC28 oxidase to provide oleanolic acid as the substrate for C23 oxidases. The C23 oxidation of oleanolic acid can result in three progressively more-oxidized products: hederagenin, gypsogenin, and gypsogenic acid. Neither PB.6518.3 nor PB.29444.13 was able to oxidize oleanolic acid at the C23 position since none of the C23-oxidized oleanolic acid was detected. However, when PB.33196.6 (SvC23-1 (CYP72A1130)) and PB.29444.23 (SvC23-2 ((CYP72A1131)) were expressed together with SvβAS and SvC28 oxidase, gypsogenic acid was detected without the presence of hederagenin or gypsogenin (Fig. 3d, Supplementary Fig. 7). The β-amyrin-producing yeast expressing the S. vaccaria C23 oxidase-1 or C23 oxidase-2 in addition to SvC28 oxidase and AtATR1 on plasmid produces exclusively gypsogenic acid, consistent with the results in N. benthamiana (Supplementary Fig. 8). Therefore, we identified two C23 oxidases that oxidized oleanolic acid to gypsogenic acid, which is the aglycone of many S. vaccaria monodesmosides (Supplementary Table 1). Thus, these two SvC23 oxidases are likely involved in the biosynthesis of S. vaccaria monodesmosides. However, many bisdemosides possess gypsogenin or the C16-hydroxylated gypsogenin (quillaic acid) as their aglycone, and none of these SvβAS-co-upregulated SvC23 oxidases or candidates were able to convert oleanolic acid to gypsogenin as a detectable product.

Combined expression of SvC28, SvC16, and SvC23 oxidases in N. benthamiana and yeast

To further investigate the involvement of SvC28, SvC16, and SvC23 oxidases in the biosynthesis of S. vaccaria saponin aglycones, we combined their expression in N. benthamiana and yeast.

When SvC28, SvC16, and SvC23-1/2 oxidases were expressed with SvβAS in N. benthamiana, we could not detect echinocystic acid or gypsogenic acid, suggesting that the third oxidase converted them into a new product (Supplementary Fig. 9a, b). We expected that SvC16 oxidase would further oxidize gypsogenic acid, and found evidence for this in several negative ion peaks with m/z 501.4 (Supplementary Fig. 9c). However, we did not detect quillaic acid (QA), the C16α-hydroxylated gypsogenin, in N. benthamiana, expressing these three oxidases (Supplementary Fig. 9b). In a β-amyrin-producing yeast strain expressing SvC28, SvC16, and SvC23-1/2 oxidases with AtATR1 on a plasmid, the later-eluting m/z 501.4 peak was also observed (Supplementary Fig. 10), likely corresponding to the C16α-hydroxylated gypsogenic acid, which is also the aglycone of monodesmosidic segetoside K. This compound is a preferred substrate over gypsogenic acid for a glucosyltransferase to form triterpenoid C28-carboxylic acid glucosides15. On the other hand, none of S. vaccaria bisdesmosidic saponins have aglycones in the form of C16α-hydroxylated gypsogenic acid (Supplementary Table 1), suggesting its exclusive involvement in monodesmosides biosynthesis.

Unexpectedly, when these three oxidases and AtATR1 on plasmid were expressed in yeast, the main product was QA (Supplementary Fig. 10), even though combining the expression of SvC28 and SvC23-1/2 oxidases with AtATR1 in yeast resulted in gypsogenic acid, not gypsogenin. Furthermore, combined expression of these same genes integrated into the yeast genome resulted in a similar product composition (Supplementary Fig. 11). The unforeseen production of QA as the main product demonstrates the enzymatic activity plasticity of both SvC23 oxidases and suggests SvC23-1/2 oxidase may potentially take part in the biosynthesis of bisdesmosides.

Although the product distribution differed in N. benthamiana and yeast, we confirmed that aglycones of both mono- and bis-desmosides are formed when SvC28, SvC16, and SvC23-1/2 oxidases are expressed together.

Identification and characterization of SvCslG

Most bisdesmosidic saponins in S. vaccaria and many other species of the Caryophyllales order, have a glucuronic acid residue (GlcA) at the C3 position (Supplementary Table 1). As cellulose synthase-like (Csl) enzymes have been demonstrated to be C3-GlcA transferases of saponins39,40,41 and a Csl gene was co-upregulated with SvβAS (Supplementary Fig. 12), we selected it for functional characterization. We designated the gene SvCslG following the nomenclature of Jozwiak and coworkers39, but as noted by Chung and coworkers40 the gene is in a clade separate from Arabidopsis CslGs.

We first examined whether SvCslG could attach GlcA to the C3 hydroxyl group of QA by incubating microsomal proteins from yeast expressing SvCslG with QA and UDP-GlcA. A peak corresponding to QA-C3-GlcA was readily detected in a complete reaction, suggesting that SvCslG transfers the GlcA from UDP-GlcA to C3 of QA (Supplementary Fig. 12).

To further characterize the function of SvCslG, kinetic studies were conducted with various concentrations of QA and UDP-GlcA after establishing the optimal reaction conditions. The substrate dependency of SvCslG to QA and UDP-GlcA follow Michaelis–Menten kinetics with KM of 3.13 µM and 65 µM, respectively (Fig. 4).

Fig. 4: A Cellulose synthase-like (Csl) G from S. vaccaria transfers glucuronic acid to C-3 of triterpenoid aglycone quillaic acid.
figure 4

Kinetic analysis of SvCslG with quillaic acid (a) and UDP-GlcA (b). Error bars indicate mean ± SD (n = 3 biologically independent samples). c Confocal images of SvCslG-YFP transiently coexpressed in N. benthamiana leaves with Golgi-CFP marker and ER-RFP marker. Scale bars, 20 µm. The experiment was repeated in four biologically independent replicates with similar results. Source data are provided as a Source Data file.

The function of SvCslG was then tested in planta by co-infiltrating leaves of N. benthamiana with A. tumefaciens carrying SvCslG and QA solution. Leaves were collected four days after infiltration and processed to extract saponin components. The peak of QA-C3-GlcA was detected (Fig. 5a, Supplementary Fig. 13a). QA-C3-GlcA was not present in the control samples where SvCslG was replaced with GFP, suggesting that SvCslG utilized the endogenous UDP-GlcA in N. benthamiana leaves as a sugar donor and transferred the GlcA to QA, consistent with the in vitro enzymatic assay.

Fig. 5: Identification of a galactosyltransferase and a xylosyltransferase glycosylating the 3-O-glucuronide of QA-3-GlcA.
figure 5

EIC of QA-3-GlcA (a), QA-3-GlcA-Gal (b), and QA-3-GlcA-Gal-Xyl (c) from plants transiently expressing SvCslG, SvCslG + SvGalT, and SvCslG + SvGalT + SvC3XylT as indicated and infiltrated with QA solution. Corresponding mass spectra are shown in Supplementary Fig. 13.

To investigate the subcellular localization of SvCslG, we tagged yellow fluorescent protein (YFP) at the C-terminus of SvCslG and expressed it with ER and Golgi markers in N. benthamiana. The overlapped signals of fluorescent protein-tagged SvCslG and organelle markers suggested SvCslG localized at ER and probably also in Golgi (Fig. 4c). Csl enzymes catalyzing C3-glucuronosylation have previously been shown as ER-localized39,40, unlike most Csl enzymes that are involved in cell wall biosynthesis and localized in Golgi.

SvCslG facilitates the C3 glucuronidation of C23 aldehyde aglycone

We tested if SvCslG could glucuronidate any triterpenoid aglycone produced by SvβAS, SvC28, SvC16, and SvC23-1/2 oxidases. Although QA was not detected when all three types of β-amyrin oxidases were expressed together in N. benthamiana, the additional expression of SvCslG gave rise to the formation of QA-C3-GlcA (Fig. 6, Supplementary Fig. 14). In addition, the glucuronide of C16α-hydroxylated gypsogenic acid (GA-C16-OH-C3-GlcA) was also detected by LC-MS (Supplementary Fig. 14), suggesting SvCslG could also glucuronidate GA-C16-OH.

Fig. 6: SvCslG increases the production of QA-C3-GlcA.
figure 6

Different OD levels of Agrobacterium tumefaciens harboring SvCslG were individually co-infiltrated with A. tumefaciens harboring SvβAS, SvC28, SvC16 and SvC23-1 (a) and SvC23-2 (b). Error bars indicate mean ± SD (n = 3 biologically independent samples). Analysis calculated using a one-way ANOVA test revealed a significant effect of SvCslG strain OD level on the production of QA-C3-GlcA [(a), F(1, 4) = 14.27, p = 0.0195; (b), F(2, 6) = 8.98, p = 0.0157]. Asterisks indicate statistical significance with a Tukey HSD test ((a) p = 0.019; (b) p = 0.014). Production of GA-C16-OH-C3-GlcA was not significantly affected by the OD level of SvCslG strain [(a), F(1,4) = 6.39, p = 0.0648; (b), F(2,6) = 1.988, p = 0.218]. Source data and additional p-values for Tukey tests are provided as a Source Data file. Compound verification by LC-MS and mass spectra are shown in Supplementary Fig. 14.

The production of QA-C3-GlcA indicates that SvCslG could glucuronidate QA before SvC23 oxidase further oxidizes it into GA-C16-OH. To determine the effect of SvCslG expression on the C23 oxidation of triterpenoids, we co-infiltrated different OD600 (optical density) levels of SvCslG-carrying A. tumefaciens with constant ODs of strains carrying SvβAS and all three of β-amyrin oxidases in N. benthamiana. As the OD of the SvCslG strain increased, the amount of QA-C3-GlcA increased significantly. At the same time, the production of GA-C16-OH-C3-GlcA did not change significantly (Fig. 6, Supplementary Fig. 14), suggesting that SvCslG facilitated the production of QA as its sugar acceptor and the glycosylation prevented the further C23 oxidation of QA by SvC23 oxidase.

Although gypsogenin was not a detectable product when combining SvβAS, SvC28, SvC16, and SvC23-1/2, we also detected the formation of gypsogenin-GlcA (GN-GlcA) when SvCslG was expressed. Furthermore, the increasing OD of the SvCslG-carrying A. tumefaciens was accompanied by a significantly higher concentration of GN-GlcA in N. benthamiana leaves after infiltration (Supplementary Fig. 15), indicating SvCslG would also improve the production of gypsogenin and glucuronidate it before another oxidation occurs. These results suggested that SvCslG could partially block the further oxidation of C23 aldehydes by SvC23 oxidases and facilitate their glucuronidation, thus changing the product profile of SvC23 oxidases and redirecting the involvement from the production of exclusive monodesmosides to bisdesmosides.

Identification of SvGalT and SvC3XylT in C3 glycosylation of QA

A galactose residue is linked to the C3-GlcA residue of many bisdesmosidic saponins in S. vaccaria. We constructed a neighbor-joining tree of UDP-glycosyltransferase (UGT) candidates co-upregulated with SvβAS in S. vaccaria and their homologs with triterpenoid UGTs from other plants (Supplementary Fig. 16). In the neighbor-joining tree, PB.41560.2 was closely related to the triterpenoid-C3-GlcA-Gal transferase (GmUGT73P2) in soybean42 (Supplementary Fig. 16). For functional verification, it was expressed in the QA-fed yeast with the Arabidopsis UDP-glucose dehydrogenase (AtUGD) for UDP-GlcA production and SvCslG. However, we could not detect a peak of m/z 823.4 corresponding to QA-C3-GlcA-Gal in cell extracts, suggesting PB.41560.2 could not transfer a galactose residue to QA-C3-GlcA. We selected four other SvβAS-coinduced UGTs from the same clade of PB.41560.2 (Supplementary Fig. 16) for the galactosyltransferase (GalT) activity test, but none was active.

As the amino acid sequence length of Soyasapogenol-B-GlcA-GalT (495 aa) is longer than all these five candidates, especially compared to PB.41560.2 (460 aa), we hypothesized that PB.41560.2 encodes an incomplete protein. We searched with PB.41560.2 as a query sequence in the transcriptome of S. vaccaria to look for a homologous sequence that encodes a full sequence protein. We identified PB.1747.1 as a transcript isoform of PB.41560.2 with only one nucleotide difference at the 3’ end of the coding region, suggesting they are two transcript isoforms of the same gene. Although it was not co-upregulated with SvβAS, it encoded a longer protein than PB.41560.2 due to the single nucleotide insertion. We deduced that PB.1747.1 would be the best candidate for the C3-GlcA-Gal transferase in S. vaccaria.

PB.1747.1 was transiently expressed with SvCslG in N. benthamiana, and QA solution was infiltrated with A. tumefaciens. A peak with m/z 823.4 appeared that was absent in the negative control and matched a QA-C3-GlcA-Gal standard (Fig. 5b, Supplementary Fig. 13b). The function of PB.1747.1 was also validated in yeast using the same method to test PB.42560.2. Therefore, the activity of PB.1747.1 was confirmed as QA-C3-GlcA galactosyltransferase SvGalT (UGT73DL2). The lack of observed activity of PB.41560.2 is likely due to a premature stop codon event caused by the single nucleotide deletion.

QA and gypsogenin 3-O-trisaccharide saponins have also been identified in S. vaccaria with a xylosyl residue linked to the 3-O-glucuronyl group (Supplementary Table 1)19. We identified a xylosyltransferase candidate (SvC3XylT (UGT73CC10)) through phylogenetic (Supplementary Fig. 16) and co-upregulation analyses. SvC3XylT resides in the UGT73 family and is related to GmUGT73P2. The activity of SvC3XylT was tested by expressing its gene with SvCslG and SvGalT in N. benthamiana infiltrated with QA solution. The peak of m/z 955.4 appeared and was verified as QA-C3-GlcA-Gal-Xyl by the same mass and retention time as the standard (Fig. 5c, Supplementary Fig. 13c).

SvGalT and SvC3XylT further boost the C3 glycosylation of C23 aldehyde substrates

As shown in the above experiments, although SvCslG improves the biosynthesis of QA-C3-GlcA, GA-C16-OH-C3-GlcA was still produced at approximately 76% of QA-C3-GlcA (Fig. 6). Therefore, we investigated the effect of expressing SvGalT and SvC3XylT on the proportion of saponins with C23 aldehyde aglycone by comparing the production of C3-glycosylated QA and GA-C16-OH.

By including the expression of SvGalT, GA-C16-OH-C3-GlcA-Gal formed at about 22.6% of QA-C3-GlcA-Gal. Furthermore, adding both SvGalT and SvC3XylT resulted in GA-C16-OH-C3-GlcA-Gal-Xyl produced at only 4.9% of QA-C3-GlcA-Gal-Xyl (Supplementary Fig. 17). Therefore, adding additional sugar moieties at C3 substantially increased the proportion of saponins with QA aglycone. This suggested that SvGalT and SvC3XylT favor substrates with the C23 aldehyde group and boost the formation of C3 glycosylated C23 aldehydes.

UDP-d-fucose biosynthesis and C28 fucosyltransferase

Fucose moieties can be found in cell wall polysaccharides and glycoproteins in the l-fucose form derived from GDP-l-fucose. However, the C28 fucose moiety in S. vaccaria bisdesmosidic saponins is the β-d form. Adding UDP-α-d-fucose to leaf extracts led to the incorporation of the β-d-fucose molecule into digitoxigenin43. Therefore, we hypothesized UDP-α-d-fucose would provide the fucose moiety in many S. vaccaria bisdesmosidic saponins.

The biosynthesis of UDP-α-d-fucose in plants has not been elucidated previously. However, a dTDP-glucose 4,6-dehydratase and a dTDP-4-keto-6-deoxy-glucose reductase have been reported to convert dTDP-α-glucose into dTDP-α-d-fucose in Geobacillus tepidamans44. Thus, we hypothesized that the biosynthesis of UDP-α-d-fucose would be similar, involving a UDP-glucose 4,6-dehydratase (46DH) and a UDP-4-keto-6-deoxy-glucose (UDP-4K6DG) reductase.

We identified a homolog of the N-terminal domain of A. thaliana UDP-rhamnose synthase (AtRHM1) with 46DH activity45 that was co-upregulated with SvβAS in S. vaccaria (Supplementary Fig. 18). Furthermore, a homolog of the full-length AtRHM1 in the S. vaccaria transcriptome was also co-upregulated with SvβAS. Since the single domain 46DH has not been previously reported in other plants and was induced by MeJA, we hypothesized that Sv46DH converts UDP-Glucose into UDP-4K6DG for the biosynthesis of UDP-d-fucose. MeJA also induced expression of the putative SvRHM, likely involved in UDP-l-rhamnose biosynthesis. On another note, a reductase from the SDR114 family46 (Supplementary Fig. 19) was selected as the candidate for UDP-4K6DG reductase (SvNMD) as it was co-induced with SvβAS and closely related to QsFucSyn, which reduces 4-keto-6-deoxy-glucose to d-fucose on a saponin backbone41.

To validate the functions of Sv46DH and SvNMD, we transiently expressed the genes in N. benthamina leaves together or individually with GFP. When Sv46DH and SvNMD were combined for expression, both UDP-l-rhamnose and UDP-d-fucose were produced (Fig. 7a). Overexpression of Sv46DH with GFP resulted in the production of UDP-l-rhamnose as the predominant product, with only a small amount of UDP-d-fucose detected. This suggest that the native RHM in N. benthamina likely converted the UDP-4K6DG product of Sv46DH into UDP-l-Rhamnose. The small amount of UDP-d-fucose production was likely due to an endogenous reductase in N. benthamina catalyzing the 4-keto reduction of UDP-4K6DG. In contrast, overexpression of SvNMD with GFP only led to the formation of UDP-d-fucose, indicating that SvNMD efficiently channeled UDP-4K6DG produced by N. benthamina RHM into UDP-d-fucose. Expressing GFP alone in N. benthamina leaves did not result in detectable levels of UDP-l-rhamnose or UDP-d-fucose. UDP-l-rhamnose is expected to be produced in untransformed N. benthamiana but did not accumulate at detectable levels. Our results confirmed the involvement of Sv46DH and SvNMD in the biosynthesis of UDP-d-fucose.

Fig. 7: Discovery of the UDP-d-fucose biosynthetic pathway and a triterpenoid C28 fucosyltransferase in S. vaccaria.
figure 7

a EIC of UDP-d-fucose and UDP-d-rhamnose with m/z 549.1 from plants transiently expressing Sv46DH + SvNMD, Sv46DH + GFP, SvNMD + GFP, and GFP. b EIC of GlcA-3-QA-28-Fuc with m/z 807.4 from QA-infiltrated plants transiently expressing SvCslG + Sv46DH + SvNMD+SvFucT, compared to SvCslG + Sv46DH + SvNMD+QsFucT (positive control), SvCslG + Sv46DH + SvNMD + GFP (negative control), and SvCslG+ SvFucT. Corresponding mass spectrum is shown in Supplementary Fig. 20.

We identified the SvC28FucT candidate (UGT74CD2) based on phylogenetic analysis (Supplementary Fig. 16) and gene expression profile in S. vaccaria. There were two GT1-type glycosyltransferase sequences residing in the same subclade with the C28FucT in spinach (SOAP6)39 and Q. saponaria: PB.28124.1 and PB.12216.1 (Supplementary Fig. 16). They are probably a pair of alternative splicing isoforms caused by an intron retention event that converted PB.12216.1 into PB.28124.1. Although they were not identified from the list of genes that were co-upregulated with SvβAS in S. vaccaria, their nucleotide sequences were very similar to PB.28124.2, which was a SvβAS co-expressed transcript. Therefore, we chose PB.12216.1 as the SvC28FucT candidate for functional characterization.

To determine if the SvC28FucT candidate could add a fucose moiety to QA-C3-GlcA, we transiently expressed it together with SvCslG, Sv46DH, and SvNMD in N. benthamina leaves infiltrated with QA solution (Fig. 7b, Supplementary Fig. 20). We detected an m/z 807.4 peak, corresponding to QA-C3-GlcA-C28-Fuc, which was absent when SvC28FucT was replaced with GFP, indicating that SvC28FucT was responsible for its formation. The m/z 807.4 peak was not detected when both Sv46DH and SvNMD were excluded. When a C28FucT from Q. saponaria41 was expressed together with SvCslG, Sv46DH, and SvNMD in N. benthamiana leaves injected with QA solution, a peak with the same m/z value and retention time was detected. This further confirms that the formation of the m/z 807.4 peak requires Sv46DH, SvNMD, and a C28FucT. Based on these results, it is likely that SvC28FucT can transfer the fucosyl residue from UDP-d-fucose to QA-C3-GlcA. It is also possible that before being reduced to UDP-d-Fucose, 4K6DG was first attached to the C28 carboxylic group of QA-C3-GlcA by SvC28FucT and then the 4-keto group of 4K6DG linked to the backbone was reduced by SvNMD, giving rise to QA-C3-GlcA-C28-Fuc41. Future experiments will be required to resolve whether SvNMD also reduces the 4-keto group of 4K6DG attached to the triterpenoid backbone.

Functional characterization of other SvβAS co-upregulated glycosyltransferases

The SvβAS co-induced GTs PB.29740.3, PB.33723.2, and PB.17537.3 could not add galactose to the glucuronic acid residue of QA-C3-GlcA. We then investigated if they were involved in modifying QA-C3-GlcA-C28-Fuc by combining the expression of each candidate glycosyltransferase with QA-C3-GlcA-C28-Fuc-producing enzymes in N. benthamiana. None of these candidates were able to glycosylate QA-C3-GlcA-C28-Fuc. Then we elongated the C28 sugar chain by expressing a C28 rhamnosyltransferase identified from Q. saponaria41 with other QA-C3-GlcA-C28-Fuc producing enzymes, giving rise to a compound with the predicted mass of QA-C3-GlcA-C28-Fuc-Rha (Supplementary Fig. 21). PB.29740.3 could add either a hexose (Hex) or a deoxyhexose (DOH) residue to this substrate, while the other two candidates did not exhibit any activity (Supplementary Fig. 21). Expressing a C28 xylosyltransferase from Q. saponaria41 further elongated the C28 sugar chain to make a product consistent with the predicted mass of QA-C3-GlcA-C28-Fuc-Rha-Xyl (Supplementary Fig. 22). Both PB.29740.3 and PB.17537.3 could glycosylate QA-C3-GlcA-C28-Fuc-Rha-Xyl with a Hex or a DOH; the different retention time of corresponding products suggests they attached Hex or DOH to varying positions of the substrate (Supplementary Fig. 22). Based on known structures of S. vaccaria saponins we propose that the hexose and DOH represent glucose and d-fucose as known to be present in vaccaroside I (Supplementary Table 1). A pentose residue was attached to QA-C3-GlcA-C28-Fuc-Rha-Xyl by expressing PB.33723.2 together with the substrate-making enzymes in N. benthamiana. The pentose may be l-arabinofuranose that is attached to d-fucose in many S. vaccaria saponins (Supplementary Table 1, Supplementary Fig. 1). Further experiments are necessary to confirm the structures produced by these transferases.