Close this search box.

Structural and biochemical analysis of family 92 carbohydrate-binding modules uncovers multivalent binding to β-glucans – Nature Communications

Family 92 carbohydrate-binding modules are commonly appended to glycoside hydrolases

The recent establishment of CBM92 as a family is supported by sequence comparison with other families. Indeed, in our phylogenetic analysis, family CBM92 forms a distinct clade with high bootstrap value (Supplementary Fig. 1). CBM92 domains are found in multi-modular proteins that in almost every case include at least one identifiable GH or polysaccharide lyase (PL) (Fig. 2). This indicates the possibility for CBM92 proteins to assist carbohydrate degrading activity by promoting enzyme contact with substrate. Indeed, the founding member of the family is a carrageenan-binding module appended to the κ-carrageenase enzyme Cgk16A6 produced by the marine bacterium W. aestuarii, a species that appears to be proficient at metabolising marine polysaccharides10,28. Carrageenan has structural features such as variable sulphation and anhydro-sugar moieties that are not found in terrestrial glycans29. Yet our preliminary phylogenetic investigations into CBM92 uncovered sequences from a broad range of non-aquatic microbes, suggesting that marine polysaccharides are not the only binding targets.

Fig. 2: Phylogenetic depiction of the multi-modular proteins that contain CBM92 domains.
figure 2

Full protein sequences were aligned at the CBM92 domain, clustering proteins by domain architecture, and the phylogeny was analysed by maximum likelihood (iQtree web server, 1000 replicates). Bootstrap value is shown as branch thickness. Eubacteria, Eukaryota, and Archaea are respectively shaded with light blue, green, and yellow. Coloured squares on the outer ring indicate the phylum: Bacteroidota, Terrabacteria, Pseudomonadota, and PVC group are respectively shown in blue, light blue, pink, and purple. Pictograms depict the domains found in multi-modular proteins: see shape and colour key on the figure. Protein names contain abbreviated species names followed by the number of amino acids: see abbreviations and corresponding accession numbers in Supplementary Table 1. Protein names are respectively coloured brown or blue to indicate the host species is found in soil or water, where black means unknown. Light brown and light blue are soil or water environments with close association to plants.

Using a CBM92 sequence from a soil bacterium as the search input, we identified 164 domains from 163 modular proteins as belonging to family CBM92, with non-redundant genus. Based on our analysis (Fig. 2), the family is mainly distributed among the Eubacteria, with some rare examples in Eukaryota and Archaea. Most CBM92-encoding species are found in soil, fresh water, and ocean ecosystems, including ocean sediment (Fig. 2). Among the Eubacteria, CBM92 is especially enriched in the phylum Bacteroidota, but can also be found among Pseudomonadota (formerly known as Proteobacteria), Terrabacteria, and in the PVC group (Planctomycetota, Verrucomicrobiota, and Chlamydiota). Approximately half of the CBM-containing multi-modular CAZyme proteins in our analyses are predicted to be secreted via the Bacteroidota-specific Type IX Secretion System (T9SS), as they possess the C-terminal domain that marks a protein for secretion via this pathway30,31, which has previously been highlighted as important for the secretion of polysaccharide-degrading enzymes22,32. A rare case of a eukaryotic CBM92-containing protein is a GH1 enzyme found in four Eudicot plant species, which carries the binding domain at its N-terminal end. The only animal genome that seems to encode a CBM92 domain is that of the wood-feeding termite Coptotermes formosanus. Indeed, both our analysis and a previous transcriptomic study33 suggest the occurrence of a protein in that species that contains a CBM92 and a CBM13 domain linked to a putative hemicellulose degrading enzyme.

Of note, the conserved ligand specificity we find for CBM92 proteins (discussed below) is in contrast to the apparent diversity in substrates targeted by the enzymes attached to these modules, which are predicted to include GH18 chitinases, GH16 β-1,3-glucanases and carrageenases, GH25 lysozymes, GH99 α-mannanases, and GH30 β−1,6-glucanases, as well as potentially highly diverse specificities from the multi-functional family GH534. Generally, we see that the CBM92 domain is closely attached to its enzyme partner, to which it is connected via a short linker of less than 20 amino acids in most cases.

Sequences for CBM92 domains were extracted from full-length multi-modular protein sequences, and an independent evolutionary analysis was performed. The CBM92 domains are 125-150 amino acids long, and share an overall sequence identity of ≥ 37%. In the evolutionary tree of CBM92 (Supplementary Fig. 2), at least three distinct clades are seen, corresponding to the Eukaryota, Archaea, and Eubacteria, and within Eubacteria a distinct sub-clade of sequences derives from the Terrabacteria taxon. Since there are many Bacteroidota encoding one or more CBM92-containing protein(s), these likely entered Bacteroidota genomes at an early stage of evolution and then diverged. Conversely, few CBM92 domains occur in Pseudomonadota, and these do not form a distinct clade, which is inconsistent with the general evolutionary tree for these taxa35, and may indicate that for these species, CBM92 domains were acquired more recently via horizontal gene transfer.

CBM92 proteins have three repeats defined as distinct subdomains, each with a conserved motif

Twelve CBM92 domains were selected for further analysis. Targets were chosen from species found in diverse habitats, while sampling sequence diversity from around the phylogenetic tree shown in Fig. 2. Furthermore, in their native multi-modular proteins, the selected domains are appended to GH enzymes from a number of different families (Fig. 2). Seven were chosen from the reasonably well-studied soil bacterium C. pinensis, which has one of the largest genomes and the highest number of CAZyme-encoding genes among Bacteroidota sequenced to date1,22,36. The C. pinensis domains analysed are appended to GH enzymes from families 5, 16, 18, and 99, which covers a broad range of potential enzyme substrates37. A further two domains were selected from the seawater-isolated Aquimarina aggregata38, both of which are appended to putative enzymes, with an additional CBM6 module in the full-length protein that contains AaCBM92A. One CBM92 domain was selected from each of Draconibacterium mangrovi (isolated from river sediment in China39) and Pyxidicoccus caerfyrddinensis (isolated from soil in Caerfyrddin/Carmarthen in Wales40): DmCBM92A is appended to GH5 and GH25 domains, while PcCBM92A is attached to a GH16 domain. Finally, a CBM92 was selected from Euryarchaeota archaeon to explore the potential for functional binding in an archaeal representative.

From a sequence alignment of these 12 selected CBM92 domains, three repeat regions are observed and are named subdomains α, β, and γ (Fig. 3). The region of sequence highlighted in pink on Fig. 3 is conserved across all 164 CBM92 domains in our phylogeny. Secondary structure prediction suggests an enrichment in β sheets, indicating a β-trefoil structure, also found in e.g. CBM1341. A highly conserved ‘WExF’ sequence motif is present at the C-terminal end of each subdomain (Fig. 3). Interactions between carbohydrates and aromatic amino acids such as Trp are frequently important for CBMs27,42. We therefore speculated that the CBM92 proteins identified here have three binding sites each, centred around the three Trp residues of the ‘WExF’ motifs. A survey of other CBM92 proteins in our phylogeny show that the occurrence of three WExF motifs is widespread, although the Trp is lacking in one or more sites for some proteins (discussed below). Interestingly, the WExF motif is not found at all in the previously characterised carrageenan-binding protein6. Two Phe residues were suggested to be important for ligand binding in that protein, proposed to form a hydrophobic platform with support from a well-conserved Arg6. An alignment of the known and putative carrageenan-binders identified by Mei et al. with the proteins under analysis here shows that one of these Phe residues corresponds to the second WExF motif we find in almost all CBM92 proteins (Supplementary Fig. 3a, b). Our alignment further indicates that the carrageenan-binding proteins likely only have one binding site per protein, and that they represent a small sub-group within the family. These striking differences suggest that there are distinct modes of binding within the family, which warrants a further investigation of the binding specificities of CBM92.

Fig. 3: Sequence logo, secondary structure, and subdomains displayed on the alignment of twelve CBM92 domains.
figure 3

The pink shading on the alignment marks out sequences that are highly consistent across the larger dataset of 164 CBM92 sequences. The displayed amino acid numbers are based on the full-length sequence of the product of gene Cpin_2580, from which CpCBM92A is derived. Three amino acid positions, W481, W523, and W565 (marked with pink stars within a highly conserved repeating WExF motif), were substituted with Ala to generate variants of CpCBM92A for carbohydrate binding analysis. An Arg residue (blue stars) close to each WExF motif is proposed to contribute to binding. Full species names and accession numbers can be found in Supplementary Table 1.

CBM92 domains bind to polysaccharides containing the Glc-β-1,6-Glc disaccharide unit

Gene segments encoding the 12 selected CBM92 domains were cloned and expressed as single-domain constructs in E. coli prior to purification. SDS-PAGE analysis confirmed successful production and purification for all recombinant domains (Supplementary Fig. 4). Carbohydrate binding was first investigated via pull-down assays and affinity gel electrophoresis using polysaccharides from diverse plant and microbial sources (see Materials and Methods for a full list of ligands tested). The heat map shown in Fig. 4 summarises the results of these binding assays, and the corresponding data can be found in Supplementary Fig. 5. The domains we tested show a consistent affinity for binding to polysaccharides containing the Glc-β-1,6-Glc linkage, namely pustulan (linear β-1,6-glucan), as well as laminarin, scleroglucan and yeast β-glucan (all consisting of β-1,3-glucan chains substituted with β-1,6-linked glucosyl residues). In some cases, there was some binding to lichenan, which comprises β-1,3- and β-1,4-linked glucosyl residues. Of note, DmCBM92A, which naturally lacks two of the binding-site Trp residues we suggest are necessary for binding, did not noticeably bind to any of the tested polysaccharides except laminarin in this qualitative assay, although later experiments could measure some binding to yeast β-glucan (discussed below).

Fig. 4: Qualitative binding determination of diverse CBM92 domains (left labels) to various polysaccharide ligands (top labels).
figure 4

For laminarin and carrageenan, binding was assayed by affinity gel electrophoresis. For all other ligands, a pull-down assay was used. The H2O samples contained no polysaccharide, as a control experiment. Each CBM domain was produced recombinantly without any other protein modules. The corresponding accession codes of the CBM domains shown in this figure can be found in Supplementary Table 1.

Structural analysis reveals a β-trefoil fold with three carbohydrate binding sites

To probe the mode of binding of CBM92 domains, we successfully determined the protein structures of the C. pinensis proteins CpCBM92A and CpCBM92B by macromolecular crystallography. As was predicted by sequence analysis, both proteins form a β-trefoil structure comprised of 12 β-strands arranged into 3 subdomains (α, β, and γ), similar to β-trefoil domains found in Fascin and CBM13 proteins9,41 (Fig. 5a, b). Soaking experiments of the CpCBM92B protein crystals with glucose, gentiobiose (G2: Glc-β-1,6-Glc), and sophorose (S2: Glc-β−1,2-Glc) revealed a binding cleft within each subdomain comprising a Trp-Glu binding motif, again implying three polysaccharide binding sites per protein (Fig. 5c). Adding either G2 or S2 to the protein crystals led to binding of the non-reducing end sugar in the binding cleft. The electron density for the reducing end sugar was observable but difficult to model accurately, although it notably projected away from the protein (Supplementary Fig. 6). This suggests the capacity for end-on binding to glucose monosaccharides and glucan oligo/polysaccharides of potentially any linkage type. In each ligand complex, the glucosyl unit stacks with the conserved Trp with the O3 and O4 of the sugar positioned by hydrogen bonding with the Oε1 and Oε2 of the conserved Glu. In the binding site of CpCBM92B subdomain β, the protein is observed to further interact with the glucosyl unit through the guanidine group of Arg955 with the O2 of the sugar, and through the carbonyl of a succinimide formed in place of Asp959 with the sugar O6 (Fig. 5c and Supplementary Fig. 7). Succinimide can form as a result of cyclising dehydration from nucleophilic attack of the main-chain N atom on the γ-carbon of Asn and Asp side chains43,44, and is rarely seen in protein structures. Indeed, only 45 protein entries containing this chemical group are currently reported in the PDB45. In our investigation it was found only in the β-subdomain of CpCBM92B and it may be an artefact of protein production or crystallisation. Collectively, the binding modes observed with the ligand complexes reveal the possibility for extensions from both the O1 and O6, presumably enabling binding along a β−1,6-glucan chain such as in pustulan, and additionally binding to β-1,6-linked glucosyl substitutions in, for example, scleroglucan or laminarin. The binding cleft Arg residue in the β-subdomain of CpCBM92B is found in subdomains β and γ in both CpCBM92A and -B, but is substituted with a Ser in the binding clefts of subdomain α in both proteins (Fig. 5d). This substitution in the α site leads to a substantial increase in accessibility around the glucosyl unit’s O2, which may permit binding to oligo- or polysaccharide extensions from this position. In the paper by Mei et al. describing Cgk16A, the founding member of family CBM92, the authors propose that a conserved Arg may be responsible for interacting with the sulphate groups of that protein’s carrageenan ligand6, but our data indicate that it contributes to binding to non-sulphated glycan ligands as well (Supplementary Figs. 3 and 6).

Fig. 5: Structural analysis of two CBM92 domains reveals three subdomains and three potential ligand binding sites.
figure 5

Overall structures of (a) CpCBM92A and (b) CpCBM92B with their subdomains distinctly coloured and their ligand binding Trp and Glu residues shown as sticks. c The β-subdomain of CpCBM92B in complex with glucose. Hydrogen bond distances are shown and the density from the 2Fo-Fc electron density map carved 1.6 Å around the glucosyl ligand and contoured at 1.0σ. d Overlay of the CpCBM92A and B subdomains showing sequence conservation within all putative binding sites. Single letter residue codes are coloured based on the subdomains shown in panels a and b, and are labelled for subdomains α/β/γ, in that order, with the CpCBM92A codes shown above those for CpCBM92B.

Structural comparison with homologues

CpCBM92A and CpCBM92B share structural similarity with β-trefoil proteins from CBM13, a multivalent family that includes single-domain galactose- or mannose-binding plant lectins as well as CBM domains found within larger CAZymes. Structural homologues to our CBM92 domains include the ricin B-like agglutinin domain from Marasmius oreades46, an arabinose-binding CBM domain in a GH27 β-l-arabinopyranosidase from Streptomyces avermitilis47, the CBM domain in CEL-III from Cucumaria echinate48, the xylose/xylan-binding CBM domain in the xylanase Xyn10A from Streptomyces olivaceoviridis E-8649, and actinohivin from Longispora albida K97-0003T50. Structural alignment with these proteins yields Cα root mean square deviation values of 1.5 to 2.5 Å despite low (8-20%) sequence identity. The ligand binding regions in CBM13 are also found in similar surface exposed clefts, with each protein containing three equivalent clefts as part of the trefoil fold. All of these proteins use an aromatic residue and an acidic residue to mediate ligand binding. However, the families differ in the origin of those residues, which ultimately leads to substantially different ligand binding modes (Supplementary Fig. 8). For example, the ricin B-like agglutinin domain from M. oreades, the CBM domain in β-l-arabinopyranosidase from S. avermitilis, the CBM domain in CEL-III from C. echinate, and the CBM domain in Xyn10A from S. olivaceoviridis E-86 all contain acidic residues originating from β2 and aromatic residues originating from β3 of the subdomains, effectively shifting the principal binding site by more than 5 Å compared to CpCBM92A and CpCBM92B. Other CBM13 members, such as actinohivin from L. albida K97-0003T, also use an acidic residue from β2 but their aromatic residues reside on a loop, or small helical section, preceding β4 of the subdomain. In CBM92, the aromatic residue originates from the loop preceding β4 but distinctly has the acidic residue also originating from this loop, leading to the principal binding site being perpendicular to that observed in CBM13 members such as actinohivin. Collectively, while all the proteins comprise a similar overall fold and use similar residues to mediate binding, the location of the residues leads to distinct ligand binding modes.

Exploring the functionality and ligand specificity of three putative binding sites in CBM92

The crystal structures with glucose-based ligands provide evidence for chain-end binding to the non-reducing end of a ligand, with space for potential extension at O2 and O6, which would additionally permit mid-chain binding to glycans with those linkages. According to the crystal structures, mid-chain binding to e.g. β−1,3-glucan or β-1,4-glucan would not be possible. This matches our observations from the qualitative polysaccharide binding assays described above, which suggested some linkage-based selectivity in ligand binding. We used isothermal titration calorimetry (ITC) to explore the binding affinities of CpCBM92A to glucose and glucose-based disaccharides. We were able to determine binding parameters for glucose, G2, and S2, while binding to C2 and L2 could not be reliably measured due to low signal and non-saturating isotherms. These experiments showed stronger binding to G2 and S2 than to glucose, perhaps reflecting the dual potential orientations of the longer ligands in the binding sites. Table 1 shows the parameters of binding determined for CpCBM92A, and the corresponding data can be found in Supplementary Fig. 9.

Table 1 Binding parameters of the interactions between CpCBM92A and three ligands as determined by ITC analysis

To probe the respective functions of the three putative glycan binding sites, a series of modified constructs were generated for CpCBM92A, systematically altering the Trp in each WExF motif. Variants with single (W481A α site, W523A β site, W565A γ site variants), double (W481A/W565A, W481/W523A, W523A/W565A), and triple (W481A/W523A/W565A) binding site substitutions were produced using site-directed mutagenesis (red stars in Fig. 3 show the positions of the residues modified). The doubly substituted W481/W523A variant showed no protein production despite optimisation attempts, while the W481A/W565A form proved to be highly unstable during protein production; as a result, these versions of the protein could unfortunately not be purified or characterised. The melting points of CpCBM92A and all successfully produced variants were investigated, and suggested that protein structure was intact in the modified forms, which all showed similar melting point profiles (Supplementary Fig. 4). Pull-down assays revealed that the single mutation variants showed the same binding specificities as the wild-type, while the double and triple variants showed impaired or abolished binding (Supplementary Fig. 5a), confirming that there are no further unrecognised binding sites in the protein.

Due to weak binding, satisfactory ITC experiments could not be performed for the variant forms of CpCBM92A. Instead, a series of depletion isotherms were performed using the ligand yeast β-glucan, which comprises a backbone of β-1,3-glucan with regular extended sidechains of β−1,6-linked glucosyl units. Binding curves could not be saturated due to protein precipitation at high concentrations, so accurate KD values could not be deduced from these data. However, lines of best fit determined using a Langmuir isotherm fitting model are shown to allow a qualitative comparison of binding strengths (Fig. 6). The wild type and all variant forms of CpCBM92A were first assessed, to investigate the relative contribution to binding made by each site (Fig. 6a). The loss of the Trp residue from either the β or γ binding site (W523A and W565A variants, respectively) caused a major shift in apparent binding ability, with the loss of the β site having the most profound effect. This indicates that for CpCBM92A, the β site likely has the strongest affinity for the ligand. We also see that the α site knockout shows only a small loss of binding ability compared to the wild type, but that there is some residual binding in the β/γ site variant W523A/W565A, suggesting that the wild type α site does make some small contribution to binding in the full protein. The α binding site of CpCBM92A differs from the other two in that it lacks an otherwise well-conserved adjacent Arg (Fig. 3) that likely supports binding by interacting with a glucose ligand and by creating a topographic ‘wall’ for the binding site (Supplementary Fig. 5b).

Fig. 6: Depletion isotherms of CBM92 domains binding to the insoluble polysaccharide yeast β-glucan.
figure 6

a Binding site variants of CpCBM92A were generated, wherein a key Trp residue was converted to Ala in one or more binding sites, as indicated. Binding data for the wild type and variant forms are presented. b Depletion isotherms are compared for several wild type CBM92 domains that differ in the presence or absence of a Trp in the α/β/γ binding site, as indicated by the X/X/X nomenclature. Full species names and accession numbers can be found in Supplementary Table 1.

Overall, the depletion isotherm data for variant forms of CpCBM92A indicate that a greater number of functional (i.e. Trp-containing) binding sites leads to stronger overall binding to the polysaccharide yeast β-glucan. From these data it is not possible to determine whether this results from merely additive or truly avid binding. As there is some natural variety within CBM92 in the number of Trp-containing binding sites within wild type proteins (Fig. 3), we were motivated to perform depletion isotherms for a series of native proteins with differing binding site sequences (Fig. 6b). We see the weakest binding from DmCBM92F, which only has Trp in the γ site, and gave an isotherm highly similar to that obtained for the β/γ variant W523A/W565A of CpCBM92A. For CpCBM92F and AaCBM92B, which both lack one functional site, binding is compromised compared to wild type CpCBM92A and CpCBM92B, which both have three binding site Trp residues. In short, these data agree with observations from the CpCBM92A variants and show that more Trp-containing binding sites leads to stronger interactions with ligand.

Finally, the label-free technique bio-layer interferometry (BLI) was employed, as this method has proven useful in measuring multivalent carbohydrate–protein interactions51,52. BLI works best with relatively high molecular weight ligands, although these must be soluble. Previous BLI experiments on carbohydrate-protein interactions mainly used streptavidin sensors53 and biotinylated Fab-conjugated glycans53,54. In this study, we instead used Ni-NTA sensors, wherein the sensor binds to the His6 tag on recombinant proteins. The interferometry variation during ligand association/dissociation steps were analysed in real-time.

Binding to sophoropentaose (S5), laminarin, and scleroglucan was studied using BLI for CpCBM92A and its variants (Supplementary Fig. 10). Using the S5 ligand at a concentration of 10 µM enabled KD values to be determined, as presented in Table 2. The α and γ site variants (respectively the W481A and W565A forms) show a binding profile that is highly similar to that of the wild type CpCBM92A, indicating that the contributions of those sites to overall affinity is very minor. Conversely, the W523A β site variant shows a non-detectable degree of binding to S5, again confirming that this is the strongest binding site on the protein and that it may be particularly critical with certain ligands. The polysaccharides laminarin and scleroglucan are heterogeneous and polydisperse, so molar concentrations cannot be accurately measured. As a result, KD values could not be determined for these interactions using BLI (Supplementary Fig. 10). Nonetheless, the general trend in these data echoes that from the depletion isotherm experiments, with stronger binding interactions again correlating with a greater number of intact Trp binding sites (Supplementary Fig. 10). A response value from BLI is measured as a nm shift in the interference pattern and is proportional to the number of molecules bound to the surface of the biosensor. Comparing the maximum response values obtained with laminarin as the ligand indicates that the wild type, α site variant, and γ site variant forms of CpCBM92A saturate at roughly the same ligand concentrations, indicating highly similar binding affinities. By contrast, the β site variant reaches saturation more slowly in terms of ligand concentration, consistent with reduced binding affinity. With scleroglucan as ligand, which could be tested at higher concentrations than sophorose, there is a clear loss of binding in the W565A γ site variant, whereas loss of the α site (W481A) exerts a minimal effect on binding. In the doubly substituted variant where only subdomain α is unchanged from wild type, the binding profile is close to that of the triple variant, showing no binding to laminarin or scleroglucan. Overall, the BLI data re-confirm that the β site is contributing the most to CBM affinity for ligand, and indicate that the γ and α sites make lesser contributions to overall binding. Native PAGE analysis of binding to laminarin also indicated that the β binding site is the strongest, as the W523A β site variant showed the greatest reduction in mobility retardation, while the mobility of the W481A and W565A variants more closely resembles that of the wild type protein (Supplementary Fig. 5b). Although the BLI and depletion isotherm studies presented here show that there is some loss of overall binding capacity when the α or γ site Trp is lost, the affinity of these sites for ligand is likely to be comparatively low.

Table 2 Kinetic parameters of the interaction between CpCBM92A variants and S5

Implications of CBM92 binding to β−1,6-glucan

By characterising 12 examples, we have shown that CBM92 domains from distinct microbial species are capable of binding to glucose, gluco-oligosaccharides with β-1,2- or β−1,6- linkages, and to long chain glucans containing β−1,6-linked glucose moieties (pustulan, scleroglucan, yeast β-glucan, and laminarin). Previously characterised examples of CBM92-containing proteins bound to β−1,3-glucan11 and carrageenan6: both of those domains bind to the same polysaccharide as their appended enzymes can target, suggesting a likely role in enzyme potentiation2. Indeed, our phylogenetic analyses show that a number of CBM92 domains are attached to predicted β−1,6-glucanases from enzyme family GH30 (sub-family 3)55, and these may be expected to show the same kind of rate potentiation. The natural substrate for these enzymes may be polymeric pustulan as found in lichenous fungi20 or it may be shorter chains of β-1,6-glucan such as can be found in the cell walls of certain oomycetes18. However, the β-1,6-glucan-binding CBM92 domains characterised in this work are appended to CAZymes with a range of different predicted activities, suggesting that not every member of the family is involved in direct binding to the substrate of an enzyme partner. As β-1,6-glucosidic linkages are found in the cell walls and secretions of marine plants and soil fungi, it may be that tethering, for example, a chitinase56 or β−1,3-glucanase to a complex cell wall substrate matrix does have a rate-enhancing proximity effect in natural systems5.

In addition, the potential multivalent nature of CBM92 glycan binding might be significant, as it could lead to the formation of protein-polysaccharide networks that may stabilise enzymes in a manner conceptually similar to the use of immobilisation in industry. In a study characterising a CBM6 protein with two binding sites showing different modes of interaction with the β-1,3-glucan backbone of laminarin, Jam et al. proposed a model for CBM-mediated cross-linking of oligolaminarin chains up to 12 glucosyl units in length57. The three binding sites of CBM92, which our data suggest all make some contribution to overall binding, may permit a similar cross-linking of ligands in soil and water environments. The biological implications of this remain unclear, but from a biotechnological perspective, it may suggest that CBM92 domains have use as fusion tags for immobilisation of recombinant proteins on polysaccharide surfaces. Pustulan in particular is a strong candidate for an immobilisation surface, as it is inert and insoluble, and easily recoverable from water by centrifugation or filtration. Additional experiments are needed to determine whether this cross-linking interaction is occurring and if it has a stabilising effect on appended enzymes. In Fig. 7 we depict hypothetical models for how CpCBM92A might interact with the various ligands analysed in this study. The model depicts two potential binding orientations for gentiobiose. If a longer oligosaccharide ligand, such as moderate chain length laminarin, were flexible enough, it may be able to sit in multiple binding sites on one protein, an interaction previously proposed for the bivalent CBM6 protein studied by Jam et al.57. A similar phenomenon may be feasible with sophoropentaose, which might be long enough to reach two binding sites on protein. In addition, with a very long chain ligand such as scleroglucan, a cross-linked protein-polysaccharide network may form if multiple binding sites of one protein interact with different ligand chains.

Fig. 7: Theoretical model of CpCBM92A binding to diverse β-glucans.
figure 7

a The wild type protein has three Trp-containing binding sites, depicted with a residue of glucose ligand within each. The more intensely the ligand is coloured, the higher the affinity to the depicted binding site. b, c The Glc-β-1,6-Glc disaccharide gentiobiose can bind in two potential orientations, with either the reducing-end or the non-reducing end sugar in the binding site. de CpCBM92A binds to laminarin, a β−1,3-glucan with single sugar β-1,6-Glc decorations. Certain chains of laminarin likely have the flexibility for more than one substitution per chain to interact with the protein. f A favoured ligand for CpCBM92A is scleroglucan, a very long chain and high molecular weight polysaccharide with a molecular structure similar to that of laminarin. Scleroglucan chains are not likely to be as flexible as laminarin-oligosaccharides, but a protein-polysaccharide network is speculated to form with long chains of this ligand, inter-locked by CpCBM92A. Examples of Glc-β−1,3-Glc and Glc-β−1,6-Glc linkages are indicated with arrows in panels df.