Search
Close this search box.

Structure-guided discovery of highly efficient cytidine deaminases with sequence-context independence – Nature Biomedical Engineering

Cytidine deaminase discovery and screening based on 3D structural analysis

To improve the efficiency of screening for cytidine deaminases that can increase CBE editing activity or that display special features, we introduced AI-mediated protein structure analysis. For this pipeline, we conducted homology-based database searches, generated structural predictions of the hits, which were clustered by the similarity of their 3D structures, then cloned a subset candidate cytidine deaminases from each cluster to compare their C-to-T editing activity in CBE by high-throughput sequencing (Fig. 1a). For the homology-based search, we selected two cytidine deaminase catalytic domains (the APOBEC-like N-terminal domain (Pfam identifier PF08210) and APOBEC-like C-terminal domain (Pfam identifier PF05240)) from the Pfam database of InterPro. Using amino acid primary sequence alignment, these two domains were used as queries to search the UniProt database for protein sequences carrying these two domains. The top 1,483 homologous protein sequences (both e-values less than 0.01 and remove protein sequences in the database that do not begin with methionine or end with a stop codon) were selected for further analysis (Supplementary Table 1).

Fig. 1: Clustering of cytidine deaminases based on 3D protein structure.
figure 1

a, The workflow for identifying cytidine deaminases for potential base editing applications using 3D protein structure prediction. Cytidine deaminase sequences are obtained based on homology with catalytic domain, then clustered according to the predicted 3D structure. Editing properties of deaminases from each cluster are then characterized in cells through high-throughput sequencing of sgRNA-target library. b, Clustering of cytidine deaminases based on structural differences. The red-to-blue heat map colours indicate the degree of structural similarity. The green-to-white gradient indicates the cluster number. Odd clusters (such as, clusters 1, 3, 5 and so on) are marked in blue and even clusters (clusters 2, 4, 6 and so on) are marked in red.

Source data

Given the importance of 3D structure for protein function3,4,5,6, we believed that protein clustering based on overall structure might better reflect functional specificity. Using AlphaFold2 3D structure predictions of these proteins, we evaluated the structural similarity among these deaminases by calculating the template modelling (TM) score. The TM score was normalized according to amino acid length. We used the partitional clustering method to cluster these deaminases, which is sensitive to the selection of the initial cluster centre. We preferentially sorted the deaminases by length from long to short, and then reiterated this process for clusters with a TM score greater than 0.7 starting from the longest deaminase. The clustered deaminases do not participate in other clusters. Through our structural clustering process, 1,483 candidate peptide sequences were categorized into 184 clusters according to their structural similarities (Fig. 1b and Supplementary Table 2). We then used the current system time to generate random seeds via Perl’s rand to randomly select 272 cytidine deaminases, representing 10% of the candidates from across each of the 184 clusters (round up the selection and if there are less than ten in number, select one) (Supplementary Fig. 1 and Supplementary Tables 3 and 4). In addition to partitional clustering, we also categorized the 1,483 candidate deaminases by hierarchical clustering. However, only a few clusters contained the vast majority of the deaminases, the bulk of the remaining clusters contained only one deaminase (Supplementary Fig. 2). We also generated a tree based on hierarchical cluster tree containing the 272 labelled candidate deaminases identified through partitional clustering in Fig. 1b. As with the above hierarchical clustering analysis, the results indicated that most of these candidate proteins aggregated into relatively few clusters (Supplementary Fig. 2).

To facilitate their expression and function in mammalian cells, each cytidine deaminase sequence was optimized for human codons before synthesis and cloned into the CBE expression vector (Supplementary Fig. 3c). Finally, the 272 candidate editors were transfected into a human HEK293T cell line with stable expression of the single guide (sg)RNA-target library to compare their activity using high-throughput sequencing.

Discovery of cytidine deaminases with high editing efficiency, minimal off-target effects, diverse editing windows and diverse motif preferences

To quantify the editing activity of CBEs containing each candidate deaminase, we employed the sgRNA-target library detection strategy33,34 (Fig. 2a). For this purpose, we synthesized a rationally designed oligo library containing 102 sgRNAs and corresponding target sequences. Specifically, these sgRNAs included at least one cytosine located within 2–8 nt from the end of the protospacer adjacent motif ((PAM) with the PAM located at positions 21–23 unless otherwise stated), with 84% of sgRNAs’ GC contents ranging from 40% to 65% (Supplementary Fig. 3a,b). The oligo library was then cloned into a lentiviral expression vector (Supplementary Fig. 3d) and stably integrated by lentivirus into the genome of HEK293T cells, and cells with stable expression of the sgRNA-target library were selected by flow cytometry. After transfecting each CBE into oligo-expressing cells, cytidine editing was detected by high-throughput sequencing, with an average coverage of ≥1,000× per sgRNA and 2,000× average coverage for PCR products of the target sites.

Fig. 2: Characterization of editing properties of cytidine deaminases.
figure 2

a, The experimental strategy for high-throughput screening of the editing properties of cytidine deaminases in HEK293T cells expressing a sgRNA-target library. b, The average editing efficiency of 272 representative cytidine deaminases in a 102 sgRNA-target library. The 102 sgRNA-target sites are aligned along the abscissa and the 272 candidate deaminases are shown according to clusters on the ordinate. The red-to-blue heat map gradient indicates editing efficiency. Cytidine deaminases belonging to clusters 132, 145 and 147 are marked with red, green and blue, respectively. c, Evaluation of on-target and off-target activity of candidate deaminases using orthogonal R-loop assays in HEK293T cells. Each dot represents the average editing efficiency of 102 sgRNA-target sites (y axis) and the average off-target editing at four R-loop sites (x axis). The eight APOBEC-like deaminases with the highest editing efficiency and controls (hA3A, rAPOBEC1, YE1 and eA3A) are labelled in large font. d, Analysis of the editing window for cytosine deaminases. The red numbers presented the highest editing efficiency for the cytosine deaminases and the red boxes are the editing windows. e, The average editing efficiency of 10 cytidine deaminases reselected from cluster 147 in the 102 sgRNA-target library. The 102 sgRNA-target sites are aligned along the abscissa and the 10 candidate deaminases are shown on the ordinate. The red-to-blue heat map gradient indicates editing efficiency. In be, data represent the mean of four independent experiments except for the evaluation of the off-target activity of candidate deaminases using orthogonal R-loop assays in HEK293T cells in Fig. 2c (n = 3).

Source data

Here, we used rAPOBEC1, YE1, hA3A and engineered hA3A (eA3A) as controls to evaluate the editing activity of the 272 deaminases. Sequencing data of target site PCR products showed that 71 editors at 102 sgRNA-target sites displayed an average C-to-T editing efficiency of >10% (Fig. 2b and Supplementary Fig. 4). Notably, 47 candidate CBEs exhibited higher efficiency than YE1 (21.2%), 22 candidate CBEs displayed higher efficiency than rAPOBEC1 (34.8%) and 6 candidate CBEs showed higher efficiency than hA3A (53.3%), which is one of the highest efficiency deaminases reported so far (Fig. 2c and Supplementary Fig. 4). Sequencing analysis of editing window width for the 272 candidate deaminases revealed a diversity of editing windows (Fig. 2d and Supplementary Fig. 5); in this study, we defined the editing window using a lowered threshold of ≥50% maximum editing frequency. In particular, we detected editing windows that slide forward (such as C2–C7 for CD0208 and CD0640, C2–C6 for CD0256 and C1–C7 for CD0293) or backward (such as C3–C12 for CD0596, C2–C10 for CD0452 and C4–C12 for CD0602) relative to the 4–7 nt rAPOBEC1 editing window, while some deaminases had a broad-range editing window (such as C1–C9 for CD0458, CD0730 and CD0181 and C1–C10 for CD0336), some deaminases had a very narrower editing window (such as C4–C6 for CD0354, CD0237, CD0371 and CD0230) (Fig. 2d and Supplementary Fig. 5). Analysis of target sequence context of these deaminases revealed a diversity of motif preferences. Like many existing deaminases, some deaminases (such as CD0362, CD0458, CD0596, CD0730 and CD1049) displayed the highest editing efficiency at TC sites; some deaminases (such as CD0230, CD0371, CD0464, CD0663 and CD0739) exhibited high specificity for editing TC motifs, similar to eA3A; alternatively, some deaminases (such as CD0181, CD0191, CD0336, CD0418 and CD0452) showed the highest editing activity at GC motifs, which could compensate for the relatively low efficiency of conventional deaminases at GC sites. In particular, we noted that CD0208 and CD0640 could efficiently edit at almost all motif types (Extended Data Fig. 1). The high editing efficiency, along with the diversity of editing windows and motif preferences among the candidate deaminases suggested that efficiency, target window and preferential sequence context of editing activity could be improved over that of current CBEs. In addition, further analysis of potential base conversions other than C-to-T revealed that C-to-G substitutions occurred at a higher frequency than other types, suggesting that these cytidine deaminases could be potentially engineered for application in C-to-G base editors (Extended Data Fig. 2).

Since cytidine deaminases can induce widespread, sgRNA-independent, off-target effects through binding with single-stranded DNA (ssDNA) or RNA, it is necessary to assess such off-target effects induced by candidate cytidine deaminases under consideration for use in high-fidelity CBEs. Here, an orthogonal SaCas9 R-loop assay35,36,37 was used to detect sgRNA-independent off-target effects in HEK293T cells at four predicted off-target sites for SaCas9. Analysis of sequencing data showed that the off-target efficiency of each candidate deaminase was highly consistent across all four off-target sites (Extended Data Fig. 3a,b), supporting the reliability of our detection system. We also noted that the off-target efficiency of most candidate deaminases was highly positively correlated with on-target efficiency (Fig. 2c and Extended Data Fig. 4a), implying that deaminases with high on-target efficiency also tend to have high off-target effects. However, we also discovered a subset of deaminases with a high on-target to off-target ratio (such as CD0085, CD0827 and CD0236, which were 3.5-, 2.8- and 2.6-fold that of rAPOBEC1, respectively) (Fig. 2c and Extended Data Fig. 4b), indicating that they exhibit high targeting specificity. In addition, CD0208 and CD0640 had higher editing efficiency (1.1- and 1.1-fold that of hA3A, respectively) but fewer off-target effects (0.7- and 0.7-fold that of hA3A, respectively) than hA3A (Fig. 2c and Extended Data Fig. 4b), while some deaminases had higher editing efficiency and lower off-target activity than the widely used highly specific deaminase YE1 (such as CD0085, and CD0827, which were 1.4- and 1.1-fold that of YE1, respectively) (Fig. 2c and Extended Data Fig. 4b). These results suggested that the efficiency and specificity of many of these candidate deaminases could be further engineered for use in specialized base editors.

In addition, we revisited our above 3D structure-based cluster analysis to check whether similarities in the activity of the ten most efficient deaminases (CD0181, CD0208, CD0288, CD0336, CD0418, CD0458, CD0640, CD0730, CD0902 and CD0911) might reflect structural features. This analysis showed the ten deaminases indeed belonged to three clusters, including cluster 132 (CD0208 and CD0640), cluster 145 (CD0181, CD0336 and CD0418) and cluster 147 (CD0288, CD0458, CD0730, CD0902 and CD0911) (Fig. 2b), suggesting that editing efficiency of these deaminases might be closely related to their 3D protein structure. To further test our hypothesis that deaminases with similar functions (such as high editing activity) share 3D structures that will cluster together, we selected ten uncharacterized cytidine deaminases (CD0590, CD0058, CD0149, CD0956, CD0054, CD0931, CD0746, CD0027, CD0289 and CD0701) from cluster 147 and examined their editing activity. This experiment revealed that four of ten (40%) candidate deaminases (65.3% for CD0956, 64.6% for CD0931, 60.3% for CD0054 and 52.5% for CD0701) showed comparable editing efficiency to that of hA3A (53.3%), which was a higher proportion than that of the ten high-efficiency deaminases screened from 272 candidate proteins (3.7%) across all clusters (Fig. 2e and Supplementary Fig. 6a,b). These illustrated the use of 3D structure classification as a potentially useful screening strategy to identify deaminases with diverse functions.

A high-efficiency cytidine deaminase with non-preferential cytosine targeting

For precise characterization of editing features and unbiased screening of the candidate deaminases, editing activity was next examined in a larger library of 11,868 sgRNA-target sequences constructed following the same approach as that of the 102 sgRNA-target library (Fig. 2a). We examined the editing activity of eight deaminases (66.5% for CD0458, 64.6% for CD0730, 61.7% for CD0208, 60.9% for CD0902, 60.4% for CD0640, 55.6% for CD0418, 52.9% for CD0181 and 51.9% for CD0911) that displayed efficiency close to or higher than hA3A (53.3%) in the 102 sgRNA-target library (Supplementary Fig. 4). High-throughput sequencing analysis indicated that these eight base editors also showed remarkably high C-to-T editing efficiency in the large library (52.2% for CD0458, 47.1% for CD0730, 48.7% for CD0208, 54.5% for CD0902, 45.8% for CD0640, 45.0% for CD0418, 55.2% for CD0181 and 48.4% for CD0911) compared with hA3A (51.5%), rAPOBEC1 (39.0%), YE1 (31.5%) and eA3A (19.9%) (Fig. 3a). In terms of editing windows, all the eight deaminases exhibited wide editing windows (C1–C11 for CD0458 and CD0730, C1–C8 for CD0208, C1–C12 for CD0902, C2–C8 for CD0640, C1–C9 for CD0418, C1–C12 for CD0181 and C1–C12 for CD0911) like hA3A (C1–C14) and the highest editing activity in C5 (Supplementary Fig. 7). These results were consistent with those obtained using the 102 sgRNA library.

Fig. 3: A high-efficiency cytidine deaminase with non-preferential cytidine targeting and engineering to reduce off-target effects.
figure 3

a, High-throughput sequencing analysis of editing efficiency for the eight top deaminases from Fig. 2c and four well-characterized deaminases (hA3A, rAPOBEC1, YE1 and eA3A) in an 11,868 sgRNA-target library. The centre line indicates the median and the bottom and top lines of the box represent the first quartile and third quartile of the editing efficiency at 11,868 sgRNA-target sites, respectively. The tails extend to the minimum and maximum values. b, Sequence-context preference of the top eight cytidine deaminases from Fig. 2c. The ordinate represents the average percentage of sequencing reads with C-to-T conversion at every position within the protospacer across all library members in the 11,868 sgRNA-target library. rAPOBEC1, YE1, eA3A, hA3A, evoAPOBEC1 and evoFERNY served as references. c, Context preference of CD0208 at 34 endogenous target sites in HEK293T cells. The ordinate represents the average percentage of sequencing reads with C-to-T conversion at 34 endogenous target sites within protospacer positions 3–7. d, The editing efficiency of CD0208 in a 3–7 nt editing window for eight representative endogenous target sites in HEK293T cells. The data represent the mean of three independent experiments. e, Predicted DNA-interacting residues targeted for conversion to alanine in the 3D structure of CD0208. f, Detection of on-target and off-target editing activity for CD0208 variants. Each dot represents the average editing efficiency at 11,868 sgRNA-target sites (y axis) and average off-target effects at four R-loop sites (x axis). The CD0208P52A variant and controls (hA3A, rAPOBEC1, YE1 and eA3A) are marked in large font. g, The ratio of on-target to off-target editing for CD0208 variants calculated from f. Well-characterized base editors, including rAPOBEC1, YE1 and eA3A served as controls. The CD0208P52A variant (red arrowhead) was chosen for further evaluation. h, Sequence context preference of CD0208P52A detected in an 11,868 sgRNA-target library. rAPOBEC1, YE1, eA3A, hA3A and CD0208 served as references, data for these groups are from c. i, The editing efficiencies of CD0208P52A, CD0208 and four well-characterized deaminases (hA3A, rAPOBEC1, YE1 and eA3A) in the 11,868 sgRNA-target library. The editing window is shown from left to right in the abscissa. j, The distribution of edit types for CD0208P52A, CD0208 and four well-characterized deaminases (hA3A, rAPOBEC1, YE1 and eA3A) in the 11,868 sgRNA-target library. The number in each cell indicates the proportion of a certain editing type in total. The y axis indicates the base before mutation, while the x axis shows the base type after conversion. The error bars in b, h and i indicate the mean ± s.e.m. of average editing efficiency at 11,868 sgRNA-target sites. The error bars in c indicate the mean ± s.e.m. of three independent experiments.

Source data

Since many commonly used deaminases are limited in their application by preferential editing of some sequence motifs, we next investigated motif preference among the eight deaminases through high-throughput sequencing data. This analysis revealed four categories of motif preference for these eight deaminases: CD0458 and CD0730 showed obvious preferential targeting of TC motifs, which was similar to eA3A, YE1, hA3A and rAPOBEC1; alternatively, CD0181 and CD0418 showed high editing activity at both GC and TC sites, with the highest activity at GC sites, complementing the low efficiency of GC editing by eA3A, YE1, hA3A and rAPOBEC1. By contrast, CD0902 and CD0911 had the highest efficiency at AC motifs and CD0902 preferred RC (AC/GC > TC/CC), with CD0911 preferring AC/TC to GC/CC. Most notably, compared with previously reported deaminases with no obvious sequence preference, such as evoAPOBEC1, hA3A and evoFERNY, both CD0208 and CD0640 exhibited non-preferential editing, meaning that C editing was non-selective for all AC/TC/GC/CC sites (Fig. 3b). In particular, CD0208 exhibited high editing activity comparable to hA3A, while with 0.72-fold lower off-target activity (Fig. 3a and Extended Data Fig. 4b), suggesting obvious potential for development as a high-versatility editing tool.

To further characterize the editing properties of CD0208 (267 aa), we examined its efficiency and motif preference at 34 endogenous target sites in HEK293T cells. For this experiment, HEK293T cells were co-transfected with vectors expressing a CBE containing CD0208 and sgRNAs, respectively. High-throughput sequencing analysis indicated that the CD0208-based editor displayed close to undetectable preferential motif targeting within a 3–7 nt editing window (Fig. 3c,d and Supplementary Fig. 8). Comparison with other CBEs showed that CD0208 and rAPOBEC1 had considerably higher overall editing efficiency than eA3A and YE1, which preferentially edit TC sites. Furthermore, although rAPOBEC1 had comparable efficiency to CD0208 at AC/CC/TC motifs, its activity at GC sites was obviously lower than that of CD0208 (Fig. 3c,d and Supplementary Fig. 8). These results indicated that a CD0208-based CBE could efficiently edit any AC/TC/CC/GC site with almost no discernible motif preference.

Rationally engineered mutagenesis of non-preferential deaminase exhibits reduced off-target effects

In previous studies, we found that sgRNA-independent off-target effects of deaminases can be reduced by mutating single or multiple amino acids that interact with ssDNA38. We therefore used the same approach here to reduce CD0208 deaminase-induced, sgRNA-independent off-target effects. To select amino acid residues potentially involved in CD0208 interaction with ssDNA, we used the online DNA- and RNA-binding predictor, DRNApred (http://biomine.cs.vcu.edu/servers/DRNApred/)39. This analysis identified 27 amino acid residues as the most likely to participate in ssDNA binding (binding score >0.4; Fig. 3e and Supplementary Fig. 9). Given that alanine scanning is an effective strategy for investigating functional amino acid residues40,41, we individually replaced these 27 amino acid residues with alanine to construct CD0208 variant editors and detected their editing efficiency in HEK293T cells co-expressing the 11,868 sgRNA-target library. In addition, the R-loop assay was performed at four sites to determine whether these mutations affected off-target frequency. The results showed that 11 variants had fewer sgRNA-independent off-target effects than CD0208, including CD0208P52A (40.3% that of CD0208), while 15 mutants had higher editing efficiency than CD0208, including CD0208R15A (1.2-fold that of CD0208) (Fig. 3f and Supplementary Figs. 10a,b and 11a). These results implied that rationally engineered substitution of amino acid residues could both reduce off targets as well as increase the editing activity of CBEs in mammalian cells.

To assess the data, we filtered out four cytidine deaminases with lower editing activity, including CD0208W169A, CD0208S168A, CD0208Y199A and CD0208R197A. The on-target to off-target ratio of CD0208 CBE variants showed that six variants exhibited higher editing specificity than the prototype (4.4 for CD0208P52A, 2.8 for CD0208N45A, 2.3 for CD0208H188A, 2.3 for CD0208R53A, 2.2 for CD0208R51A, 2.0 for CD0208T170A and 1.9 for CD0208). CD0208P52A performed the best in the editing specificity (2.3-fold that of CD0208 and 1.7-fold that of YE1) (Fig. 3g). These results supported the likelihood that CD0208 residue P52 contributed to ssDNA binding and that the CD0208P52A mutation could substantially reduce off-target effects while retaining high cytidine deamination activity. At the same time, we also characterized motif preference and editing window of these mutant editors. We found that almost all mutants, including CD0208P52A, displayed a comparable lack of motif preference to that of CD0208 (Fig. 3h and Supplementary Fig. 11c). However, differences were identified in the editing window between CD0208 and some of the variants (Fig. 3i and Supplementary Fig. 11b). For instance, CD0208W2A and CD0208R15A exhibited wider editing windows than CD0208, but CD0208P52A narrowed the editing window while maintaining the highest editing efficiency at C5 (Fig. 3i and Supplementary Fig. 11b), potentially related to changes in ssDNA binding in the variant. Analysis of editing types showed that CD0208P52A maintained a high C-to-T editing purity comparable with other cytidine deaminases (Fig. 3j). In conclusion, these results indicated that rationally engineered mutagenesis could reduce off-target effects and increase the editing specificity of CD0208.

CD0208P52A CBE enables the efficient introduction of nonsense mutations in single- and multi-copy genes in mammalian cell lines

To investigate whether the CD0208P52A CBE could introduce nonsense mutations in single-copy genes without DSBs, we determined its efficiency in introducing stop codons at endogenous sites in mouse N2A cells. For this purpose, we designed 11 sgRNAs targeting Tyr that could induce stop codons or disrupt splice sites (Fig. 4a), then co-transfected these sgRNAs along with CD0208P52A CBE into mouse N2A cells, using a panel of 17 classical and recently developed cytosine deaminase-derived CBEs as controls, including rAPOBEC1, YE1, hA3A, eA3A, CD0208, TadA-CDb42, TadA-CDc42, eTd-CBE43, eTd-CBEa43, eTd-CBEm43, CBE-T1.1444, CBE-T1.46 (ref. 44), CBE-T1.52 (ref. 44), N-d12fCBE-8e (28G46C)45, N-dRRACBE-8e (GGATY)45, miniSdd6 (ref. 6) and miniSdd7 (ref. 6). High-throughput sequencing analysis indicated that the C-to-T editing efficiency of CD0208P52A CBE (41.2%) was comparable to TadA-CDb (36.2%), TadA-CDc (35.4%) and miniSdd7 (37.2%); slightly higher than that of hA3A (27.8%), CD0208 (33.0%), CBE-T1.14 (27.1%), CBE-T1.46 (29.5%), CBE-T1.52 (29.9%) and N-d12fCBE-8e (28G46C) (30.9%); and substantially higher than that of rAPOBEC1 (19.3%), YE1 (11.1%), eA3A (17.8%), eTd-CBE (16.3%), eTd-CBEa (2.4%), eTd-CBEm (10.4%), N-dRRACBE-8e (GGATY) (11.0%) and miniSdd6 (25.1%) CBEs (Fig. 4b and Supplementary Fig. 12a). Quantification of stop codons or splice mutations introduction by CD0208P52A CBE (25.0%) showed similar editing efficiency to that of hA3A (22.2%), CD0208 (25.8%), TadA-CDb (28.9%), TadA-CDc (28.9%), CBE-T1.14 (20.8%), CBE-T1.46 (22.1%), CBE-T1.52 (22.5%), N-d12fCBE-8e (28G46C) (19.8%) and miniSdd7 (28.3%) CBEs, but significantly higher efficiency compared with rAPOBEC1 (16.4%), YE1 (6.9%), eA3A (9.2%), eTd-CBE (7.1%), eTd-CBEa (0.8%), eTd-CBEm (2.6%), N-dRRACBE-8e (GGATY) (5.1%) and miniSdd6 (8.6%) (Fig. 4c and Supplementary Figs. 12c and 13). These results suggested that CD0208P52A CBE, as with several other recently developed deaminases, exhibited close to or higher editing efficiency than the well-established high-activity deaminase, hA3A, and could efficiently induce targeted nonsense mutations in the genome of N2A mouse cells.

Fig. 4: Introduction of nonsense mutations in single- and multi-copy genes by CD0208P52A CBE in mammalian cell lines.
figure 4

a, The design of 11 sgRNAs targeting the Tyr gene. b, CD0208P52A CBE C-to-T base editing efficiency at 11 target sites in the Tyr gene in mouse N2A cells compared with 17 classical and recent cytosine deaminase-based CBEs, including rAPOBEC1, YE1, hA3A, eA3A, CD0208, TadA-CDb, TadA-CDc, eTd-CBE, eTd-CBEa, eTd-CBEm, CBE-T1.14, CBE-T1.46, CBE-T1.52, N-d12fCBE-8e (28G46C), N-dRRACBE-8e (GGATY), miniSdd6 and miniSdd7 CBEs. c, The efficiency of nonsense mutation introduction at the 11 Tyr gene target sites from a by the 18 CBEs from b in N2A cells. d, High-throughput sequencing analysis of editing efficiency by the 18 CBEs from b in the 102 sgRNA-target library. e, Average cytosine substitution efficiency at every position within the editing window for each CBE at target sites in d. The data for rAPOBEC1, YE1, eA3A, hA3A and CD0208 groups are from Supplementary Fig. 4. f, The off-target effects of the 18 CBEs from b detected using orthogonal R-loop assays at four dSaCas9-sgRNA recognition sites (Sa sites 3–6). g, The preferential sequence contexts of the 18 CBEs. The ordinate represents the average percentage of sequencing reads with C-to-T conversion at every position within the protospacer across the full 11,868 sgRNA-target library. Data for rAPOBEC1, YE1, eA3A, hA3A, CD0208 and CD0208P52A groups are from Fig. 3h. h, Eight sgRNAs targeting Rbmy1a1, Ssty1 and Ssty2 genes. i, CD0208P52A CBE editing efficiency at eight target sites across three multi-copy genes (Rbmy1a1, Ssty1 and Ssty2) on the Y chromosome in mESCs compared with the hA3A, rAPOBEC1, YE1 and eA3A CBEs. j, The efficiency of nonsense mutation introduction by the five CBEs from i at eight target sites across multiple copies of the Rbmy1a1, Ssty1 and Ssty2 genes in mESCs. k, Nine sgRNAs targeting the PERV pol gene. l, Editing efficiency of the five CBEs from i plus CD0208 CBE at nine target sites in the PERV pol gene in PK-15 cells. m, The efficiency of nonsense mutation introduction by the six CBEs from l at nine target sites in the pol gene of PERV in PK-15 cells. The error bars in f and g show the mean ± s.e.m. of three or more independent experiments. The centre line in bd, i, j, l and m indicates the median, and bottom and top lines of the box represent the first and third quartiles, respectively, of editing efficiency obtained from three or more independent experiments. The tails extend to the minimum and maximum values. P values were calculated by a two-sided unpaired t-test.

Source data

We then examined several other editing properties of CD0208P52A CBE for comparison with the panel of deaminase-derived CBEs, including editing efficiency, off-target effects, editing window and motif preference. Analysis of editing activity with the 102 sgRNA-target library showed that CBE activity in HEK293T cells was consistent with that in N2A cells, such as CD0208P52A (42.9%) that showed comparable editing activity to hA3A (53.3%), TadA-CDb (53.3%), TadA-CDc (53.2%), CBE-T1.14 (43.5%), CBE-T1.46 (46.2%), CBE-T1.52 (44.4%) and miniSdd7 (53.0%), and considerably higher than that of rAPOBEC1 (34.8%), YE1 (21.2%), eA3A (17.4%), eTd-CBE (14.5%), eTd-CBEa (1.9%), eTd-CBEm (5.7%), N-d12fCBE-8e (28G46C) (34.8%), N-dRRACBE-8e (GGATY) (19.9%) and miniSdd6 (16.3%) CBE (Fig. 4d). Moreover, examination of editing window statistics indicated that CD0208P52A had the highest editing activity at C3–C6, which was the same as TadA-CDb, TadA-CDc, CBE-T1.14, CBE-T1.46 and CBE-T1.52. The editing windows of N-d12fCBE-8e (28G46C) and N-dRRACBE-8e (GGATY) were concentrated at C4–C6, and miniSdd7 exhibited a wider editing window (C2–C8), like hA3A (C1–C9) CBE (Fig. 4e). These results were consistent with that in N2A cells (Supplementary Fig. 11b), suggesting that CD0208P52A has a narrow editing window similar to that of several recent deaminases. Evaluation of off-target effects of these CBEs at four R-loop sites showed that CD0208P52A had significantly fewer off targets than hA3A, rAPOBEC1, TadA-CDb, TadA-CDc and miniSdd7, and slightly close to that of CBE-T1.14, CBE-T1.46, CBE-T1.52, N-d12fCBE-8e (28G46C), N-dRRACBE-8e (GGATY), miniSdd6, eTd-CBE, eTd-CBEa and eTd-CBEm (Fig. 4f and Supplementary Fig. 14). Analysis of motif preference showed that CD0208 and CD0208P52A exhibited context-independent activity (Fig. 4g). By contrast, the other deaminases displayed obvious motif preference, with TadA-CDb, TadA-CDc, CBE-T1.14, CBE-T1.46, CBE-T1.52, N-dRRACBE-8e (GGATY) and miniSdd7 preferentially introducing AC/TC to GC/CC edits; eTd-CBE and eTd-CBEm preferentially editing TC/CC motifs; N-d12fCBE-8e (28G46C) preferring the TC motif; and miniSdd6 preferentially inducing AC/TC/CC to GC edits (Fig. 4g). In summary, compared with a wide variety of other recently published and classical deaminases, CD0208P52A showed generally high editing efficiency, low off-target effects, sequence context-independent targeting and a narrow editing window.

Since complete knockout of multi-copy genes in mammalian cells poses a long-standing challenge for many commonly used editing tools46,47,48, we next assessed whether CD0208P52A CBE could also introduce nonsense mutations in multi-copy genes. For this analysis, we determined the efficiency of stop codon introduction for a set of multi-copy genes in mouse embryonic stem cells (mESCs) and porcine kidney 15 cells (PK-15). In particular, multiple copies of the Rbmy1a1, Ssty1 and Ssty2 genes (Rbmy1a1 >50 copies, Ssty1 >35 copies and Ssty2 >30 copies) are all present on the Y chromosome in the mouse genome and have been targeted with Cas9 to induce Y chromosome deletion in cells and mouse embryos48. To introduce stop codons or perturbing start codons, we designed two, three and three sgRNAs that respectively target Rbmy1a1, Ssty1 and Ssty2 (Fig. 4h). These sgRNAs were then co-transfected along with the CD0208P52A CBE into mESCs, while CBEs containing rAPOBEC1, YE1, hA3A or eA3A served as controls. High-throughput sequencing analysis indicated that C-to-T editing efficiency was significantly higher in the CD0208P52A CBE group compared with cells edited with rAPOBEC1 CBE, YE1 CBE or eA3A CBE (Fig. 4i and Supplementary Fig. 15a,b). The average C-to-T editing efficiency at the eight sgRNAs sites reached 73.0%, 1.4-, 1.2-, 1.7- and 2.9-fold higher than hA3A CBE (53.6%), rAPOBEC1 CBE (60.9%), YE1 CBE (42.8%) and eA3A CBE (24.8%), respectively (Fig. 4i and Supplementary Fig. 15a,b). We also noted that the nonsense mutation introduction efficiency of CD0208P52A CBE (48.1%) was significantly higher than that in cells treated with rAPOBEC1 CBE (28.2%), YE1 CBE (15.2%) or eA3A CBE (18.3%), and similar with hA3A CBE (46.6%) (Fig. 4j and Supplementary Fig. 15c,d).

The presence of multi-copy porcine endogenous retrovirus (PERVs) elements in the pig genome presents a high risk of infection through organ transplantation from pigs to humans. Previous studies have shown that eliminating PERVs from the pig cells by CRISPR–Cas9 typically results in activation of the P53 pathway and subsequent apoptosis due to DSBs47. To test whether CD0208P52A CBE could introduce nonsense mutations in PERV genes without inducing DSB-associated apoptosis in pig cells, we designed nine sgRNAs that produce premature stop codons in the pol gene (Fig. 4k), which is essential for PERV replication and infection, and co-transfected the sgRNAs and CBE into PK-15 cells, with rAPOBEC1, YE1, hA3A, eA3A and CD0208 CBEs serving as controls. Quantification of editing efficiency by sequencing analysis showed that CD0208P52A CBE (58.4%) had similar efficiency to that of CD0208 CBE (54.3%) and significantly higher efficiency than rAPOBEC1 CBE (32.1%), YE1 CBE (7.1%), hA3A CBE (46.2%) and eA3A CBE (14.2%) (Fig. 4l and Supplementary Fig. 16a,b). At the same time, these results showed that CD0208P52A CBE (32.9%) could induce nonsense mutations by C-to-T conversion at significantly higher efficiency than rAPOBEC1 CBE (18.9%), YE1 CBE (2.6%) and eA3A CBE (4.4%), and comparable with hA3A CBE (33.5%) and CD0208 CBE (38.2%) (Fig. 4m and Supplementary Fig. 16c,d). In addition, both CD0208P52A CBE and CD0208 CBE exhibited high editing activity in almost all NC contexts in the editing window, while rAPOBEC1 CBE had lower editing activity in GC contexts (Supplementary Fig. 16d). These results indicated that CD0208P52A CBE could efficiently introduce nonsense mutations in multi-copy genes in mammalian cells.

CD0208P52A is compatible with multiple Cas proteins and improves product purity

Although the most widely used CBEs are fused with nCas9 and cytidine deaminase, the NGG PAM and deleterious byproducts have limited their application. Here, to expand the targeting range of CBEs based on CD0208P52A, we constructed two CBEs that can recognize NNGRRT PAMs or NGN PAMs by linking CD0208P52A with nSaCas9 (D10A) or nSpCas9-NG (Fig. 5a). These CD0208P52A–nSaCas9 or CD0208P52A–nSpCas9-NG CBEs were individually co-transfected with multiple sgRNA expression plasmids into HEK293T cells. In addition, CBEs comprising rAPOBEC1, YE1, hA3A or CD0208 fused to nSaCas9 or nSpCas9-NG served as controls. Analysis of editing activity by high-throughput sequencing indicated that except for YE1, CBEs using nSaCas9 exhibited high editing activity (73.3% for CD0208P52A, 78.0% for hA3A, 83.0% for CD0208 and 64.5% for rAPOBEC1) (Fig. 5b,c and Supplementary Fig. 17a). By contrast, CBEs consisting of CD0208P52A or hA3A with nSpCas9-NG showed comparable editing efficiencies (67.5% for CD0208P52A and 69.0% for hA3A), which were slightly lower than CD0208 (75.3%), but significantly higher than rAPOBEC1 (36.7%) and YE1 (25.1%) (Fig. 5d,e and Supplementary Fig. 18a). The editing windows of CD0208P52A–nSpCas9-NG or CD0208P52A–nSaCas9 were also narrower than that of CD0208- or hA3A-fused CBEs (Supplementary Figs. 17b and 18b). These results indicated that CD0208P52A was indeed compatible with various Cas proteins and retained high editing activity with a narrow editing window.

Fig. 5: CD0208P52A compatibility with various multiple Cas proteins and improvement product purity with dCpf1 nuclease.
figure 5

a, A schematic of pCMV-CBE-mCherry architecture and pU6-sgRNA-EGFP plasmids. CMV pro, Cytomegalovirus promoter; NLS, nuclear localization signal; UGI, uracil-DNA glycosylase inhibitor; pA, poly (A); puro, puromycin; U6 pro, U6 promoter; EGFP, enhanced green fluorescent protein. b, C-to-T base editing efficiency of rAPOBEC1–, YE1–, hA3A–, CD0208– and CD0208P52A–nSaCas9 CBEs at seven target sites in HEK293T cells. c, A summary of editing efficiencies from b. d, C-to-T base editing efficiency of rAPOBEC1–, YE1–, hA3A–, CD0208– and CD0208P52A–nSpCas9-NG CBEs at eight target sites in HEK293T cells. e, A summary of editing efficiencies from d. f, C-to-T editing efficiency of the rAPOBEC1–, YE1–, hA3A–, CD0208– and CD0208P52A–dCpf1 CBEs at 13 endogenous sites in HEK293T cells compared with rAPOBEC1–, YE1–, hA3A– and CD0208P52A–nSpCas9 CBEs. g, A summary of editing efficiencies from f. h, C-to-T editing efficiency in the editing window of each CBE from g at 13 endogenous sites in HEK293T cells. ik, Analysis of base substitution patterns in i and indels (j and k) of the tested CBEs at 13 endogenous sites in HEK293T cells. The x axis in ik shows the CBEs containing various Cas proteins and cytidine deaminases. The error bars in bf, j and k indicate the mean ± s.e.m. of three independent experiments. The centre line in g indicates the median, and bottom and top lines of the box represent the first and third quartiles, respectively, of the editing efficiency obtained from three or more independent experiments. The tails extend to the minimum and maximum values. P values were calculated by a two-sided unpaired t-test.

Source data

Previous studies have shown that the rAPOBEC1–dCpf1 CBE identifies T-rich PAM sequences and induces fewer indels and non-C-to-T conversions than other editors49. We therefore adopted the dCpf1 architecture to construct a potentially context-independent, high-efficiency and high-accuracy CD0208P52A–dCpf1 CBE (Fig. 5a). Editing efficiency and specificity were evaluated at 13 target sites of dCpf1 CBE and eight target sites of nSpCas9 CBE, where the editing windows of dCpf1 CBE (position 8–13) and nSpCas9 CBE (position 4–8) overlap. rAPOBEC1, YE1, hA3A or CD0208 fused with dCpf1 or nCas9 were used as controls. High-throughput sequencing analysis revealed that the C-to-T editing efficiencies of CD0208P52A–dCpf1 (27.3%) and CD0208–dCpf1 (27.1%) were both significantly higher than that of the well-characterized CBEs, rAPOBEC1–dCpf1, YE1–dCpf1 and hA3A–dCpf1 (8.6% for rAPOBEC1–dCpf1, 4.1% for YE1–dCpf1 and 18.2% for hA3A–dCpf1) (Fig. 5f,g). Consistent with the above findings, CD0208P52A–dCpf1 (C7–C11) had a narrower editing window than CD0208–dCpf1 (C6–C12) (Fig. 5h and Supplementary Fig. 19a), but still exhibited high editing efficiency. As in previous studies with the rAPOBEC1–dCpf1 CBE42, the CD0208P52A–dCpf1 (C7–C11) editing window shifted backwards compared with the CD0208P52A–nCas9 (C3–C7) window (Fig. 5h and Supplementary Fig. 19a,b). Although the C-to-T editing efficiency of CD0208P52A–dCpf1 was reduced compared with that of nCas9 fusion CBEs (0.5-, 0.6-, 0.5- and 0.4-fold lower than rAPOBEC1–nCas9, YE1–nCas9, hA3A–nCas9 and CD0208P52A–nCas9, respectively) (Fig. 5g), undesired C-to-A/G (3.0%) substitutions were also considerably reduced in the CD0208P52A–dCpf1 CBE (8.0% for rAPOBEC1–nCas9, 8.6% for YE1–nCas9, 10.8% for hA3A–nCas9 and 7.6% for CD0208P52A–nCas9) (Fig. 5i). Compared with the relatively high indels associated with nCas9 CBE activity (8.9% for CD0208P52A–nCas9, 19.2% for hA3A–nCas9, 14.4% for rAPOBEC1–nCas9 and 6.4% for YE1–nCas9), dCpf1-based CBEs had a significantly lower proportion of indels (0.1% for CD0208P52A–dCpf1, 0.2% for hA3A–dCpf1, 0.1% for rAPOBEC1–dCpf1 and 0.1% for YE1–dCpf1) (Fig. 5j,k). CD0208P52A–dCpf1, rAPOBEC1–dCpf1 and YE1–dCpf1 had indel levels comparable with that of the untreated groups (Fig. 5j,k). These cumulative results indicated that the CD0208P52A–dCpf1 CBE could mediate efficient, context-independent editing at multiple target sites, thus broadening the scope of potential CBE applications while reducing undesired byproducts.

Application of CD0208P52A-based CBEs in pathogenic gene editing

As our above results suggested that CBEs incorporating CD0208P52A showed obvious potential for gene silencing therapies due to the high precision and editing efficiency, we next assessed whether CD0208P52A–nCas9 could induce stop codons or splice mutations in several disease-linked target genes in N2A cells. As Pcsk9 is a target relevant to hypercholesterolaemia treatment and Hpd silencing can rescue the lethal phenotype of hereditary tyrosinemia type 1 in mice, we separately targeted eight sgRNA sites in Hpd and seven sgRNA sites in Pcsk9 with different deaminase CBEs. At these 15 sites, the average C-to-T editing efficiency of CD0208P52A–nCas9 reached 62.2%, which was comparable to that of hA3A (58.1%), significantly higher than rAPOBEC1 (45.7%), YE1 (31.4%) and eA3A (25.9%) fused with nCas9 (Fig. 6a,b). The editing windows of CD0208P52A–nCas9 at these sites is smaller than that of hA3A (Fig. 6c). CD0208P52A–nCas9 efficiency at generating stop codons or splice mutations was 48.2%, which was similar to hA3A–nCas9 (49.2%) and significantly higher than rAPOBEC1 (36.5%), YE1 (23.5%) and eA3A (18.7%) CBEs (Fig. 6d and Supplementary Fig. 20a–d). These results indicated that CD0208P52A–nCas9 could efficiently edit disease-related genes.

Fig. 6: CD0208P52A CBE editing of pathogenic genes.
figure 6

a, The C-to-T base editing efficiency of rAPOBEC1–, YE1–, hA3A–, eA3A– and CD0208P52A–nSpCas9 CBEs at eight target sites in the Hpd gene and seven target sites in the Pcsk9 gene in mouse N2A cells. b, A summary of data from a. c, The average cytosine substitution efficiency of target sites at every position within the editing windows of each CBE in N2A cells. d, The efficiency of nonsense mutation introduction by the CBEs at eight target sites in the Hpd gene and seven target sites in the Pcsk9 gene in mouse N2A cells. e, C-to-T conversion efficiency of the CD0208P52A CBE at C3 versus all C bases at the target site in the PCSK9 gene in HepG2 cells. f, Representative images of flow cytometry analysis of DiI-LDL uptake assays in HepG2 cells. g, Statistical analysis of relative DiI-LDL uptake. The error bars in a, e and g indicate the mean ± s.e.m. of three or more independent experiments. The centre line in b and d indicates the median, and bottom and top lines of the box represent the first and third quartiles, respectively, of the editing efficiency obtained from three or more independent experiments. P values were calculated by a two-sided unpaired t-test.

Source data

Next, we evaluated whether CD0208P52A–nCas9 silencing of PCSK9 indeed improved low-density lipoprotein (LDL) uptake in the HepG2 human hepatic cell line. We designed hPCSK9-sgRNA, a sgRNA targeting exon 2 of PCSK9, which introduced a C3 conversion that generated a TAG stop codon to prematurely terminate PCSK9 protein translation. We then co-transfected hPCSK9-sgRNA with CD0208P52A CBE into HepG2 cells and determined C-to-T editing efficiency by high-throughput sequencing. In addition, cellular uptake of a Dil-labelled LDL (Dil-LDL) fluorescent probe was evaluated by flow cytometry. The results showed that the C-to-T editing efficiency of CD0208P52A CBE with hPCSK9-sgRNA was 76.7%, and this CBE system could introduce a stop codon at up to 47.9% efficiency (Fig. 6e). Dil-LDL uptake levels of cells expressing hPCSK9-sgRNA were 1.2 times higher than that of the nontarget (NT)-sgRNA control group (Fig. 6f,g). These results suggested that CD0208P52A CBE could be used to efficiently correct hypercholesterolaemia-related mutations in PCSK9 in human hepatocytes, resulting in significantly improved LDL uptake.