Engineering TadA ortholog-derived cytosine base editor without motif preference and adenosine activity limitation – Nature Communications

Study approval

Exclusively male mice were utilized for all experiments, including grip strength tests, creatine kinase (CK) analysis, and AAV injections. All animal experiments were performed and approved by the Institutional Animal Care and Use Committee (IACUC) of HuidaGene Therapeutics Inc., Shanghai, China and Lingang Laboratory, Shanghai, China.

Computational analysis of TadA orthologs

Firstly, we downloaded 15,167 TadA protein sequences from NCBI database. We further used BLASTP (v2.2.21) to remove redundant proteins with identity over than 90%40. Then, we performed multiple sequence alignment using MAFFT (v7.429)41. MEGA11 were used to construct phylogenetic tree42. “Calculate_AHC.pl” were used to identify highly conservative residues and AHC residues. MCL (v14-137)43 was used to cluster the redundant proteins with identity over than 70%, and nine TadA orthologs were randomly selected from different clusters for experimental screening.

Plasmid constructions

Human Codon-optimized orthologous TadAs were synthesized commercially (GenScript Co., Ltd) and cloned to generate pT7_NLS-TadA-Cas9-NLS_pA_pCBH_mCherry_pA plasmid by NEBuilder (New England Biolabs). All sequences are listed in Supplementary Data 1. The dual-AAV delivery system was designed to express two separate fragments of a base editor, which are subsequently spliced into the full-length protein via the Rhodothermus marinus (Rma) intein, as described in a previous study44,45. The intein sequences, specifically the N- and C-terminal segments, were synthesized by Genewiz (Suzhou, China). These segments were then integrated into the 573 and 574 amino acid residues of the aTdCBE backbones using Gibson cloning of PCR-amplified inserts.

Mammalian cell culture, transfection and flow cytometry analysis

The HEK293T cells (ATCC, CRL-3216) were cultured in Dulbecco’s Modified Eagle’s Medium (Gibco, 11965-092) supplemented with 10% fetal bovine serum (Gibco, 10099-141 C), and 1% Pen-Strep-Glutamine (100×) (Gibco, 10378-016) at 37 °C with 5% CO2 in a cell incubator. For TadA variants screening, HEK293T cells cultured in 24-well plates were co-transfected with 1.0 μg of tagBFP-*EGFP reporter plasmid and TadA-mCherry plasmid in a molar ratio of 1:1 with Polyetherimide (PEI). After 48 h, mCherry, BFP and EGFP fluorescence were analyzed by Beckman CytoFlex flow-cytometer. To evaluate genome editing in endogenous sites, cells were harvested at 48 h after transfection and sorted by BD FACS Aria III flow cytometer. FACS data were analyzed with FlowJo X (v10.0.7).

Detection of gene editing frequency

20 μL of lysis buffer with proteinase K (Vazyme Biotech) were used to lysis about ten thousand sorted cells following the manufacturer’s manual. Targeted amplifications were produced by Phanta Max Super-Fidelity DNA Polymerase (Vazyme Biotech). For targeted amplicon sequencing, PCR reactions were performed using primers with different barcodes (Supplementary Data 2). The DNA products were purified with Gel extraction kit (Omega) and analyzed by 150 bp paired-end reads Illumina NovaSeq 6000 platform (Genewiz Co. Ltd.). The deep sequencing data were first de-multiplexed by Cutadapt (v.2.8) based on sample barcodes. The de-multiplexed reads were then processed by CRISPResso2 for the quantification of editing efficiency, including indels, A-to-G or C-to-T conversions at each target site46.

High-throughput library experiments

The HEK293T cells previously constructed using 11,868 pairs of sgRNA lentivirus plasmid library was used to detect motif preference in the base editor34. For each 10 cm dish, 35 μg plasmids that encode CBEs and mCherry were transfected using PEI. After 48 h, transfected cells were harvested using FACS followed by genomic DNA extraction. The PCR products were sequenced using a 150 bp paired-end Illumina NovaSeq 6000 platform (Genewiz Co. Ltd.). High-throughput sequencing datasets were processed using CRISPResso2 to calculate editing efficiency of each target. The target sites were excluded with a coverage depth of less than 100 in each sample. Cytosines in positions 4–7 of the target sequences were used to statistically analyze motif preferences.

Off-target analysis with in-silico prediction

To evaluate the specificity of TadA base editors, the Cas-OFFinder was employed to predict the potential off-target sites as described previously47. Search queries covered both Cas9 spacer sequence and PAM of the on-target site. The PAM of research was set to “NGG” and the mismatches were set to less than 5. All other parameters were left as default. The potential off-target sites were amplified and deep sequenced for analysis (Supplementary Data 3).

Orthogonal R-loop assay

Orthogonal R-loop assay was performed to detect the nuclease-independent off-target editing as described previously33. 1.5 μg plasmids that encode aTdCBE/TadCBEd and an on-target sgRNA for aTdCBE/TadCBEd, along with plasmids expressing dSaCas9 and a SaCas9 sgRNA that targets the genome locus previously reported were co-transfected using PEI. After 48 h, transfected cells were harvested using FACS followed by genomic DNA extraction with 20 μL of freshly prepared lysis buffer (Vazyme) with proteinase K added. The targeted loci by dSaCas9 were amplified and deep sequenced.

Generation of humanized DMD∆E54 mdx mice

Mice were housed in a barrier facility with a 12 h light/dark cycle and maintained in compliance with the guidelines outlined in the Instructive Notions with Respect to Caring for Laboratory Animals issued by the Ministry of Science and Technology of China. To generate the humanized DMD∆E54 mice, we employed the CRISPR/Cas9 system on the embryos obtained from mating STOCK Tg(DMD)72Thoen/J male and female mice (#018900). Specifically, we designed two sgRNAs targeting the flanking introns of human DMD exon 54 on Chr.5. The sequences of these sgRNAs are gRNA1: gTTTCTGCAAGTGCAGAGAGG and gRNA2: GGTGTGTGGAGTGAGATACT. Each sgRNA template was appended with the T7 promoter sequence (TAATACGACTCACTATAg) for efficient transcription. The PCR product was then purified directly using the Omega gel extraction kit (Omega, D2500-02), and the templates were used for in vitro transcription with the MEGAshortscript T7 Kit (Invitrogen, AM1354). The sgRNAs were purified using a MEGAclear Kit (Invitrogen, AM1908) and eluted with nuclease-free water. The concentration of target sgRNA was measured using a NanoDrop instrument. For cytoplasmic injection, spCas9 mRNA (100 ng/μl), sgRNA-L (50 ng/μl) and sgRNA-R (50 ng/μl) were mixed and then injected into fertilized eggs using a FemtoJet microinjector (Eppendorf) with constant flow settings. The injected zygotes were cultured in KSOM medium for 12 h and surgically transferred to the oviduct of recipient mice 24 h after estrus was observed. Genomic DNA from the tail tissue of founder (F0) mice was isolated according to manufacturer’s instructions for the OMEGA Kit (Omega, D3396-02) for PCR, followed by gel electrophoresis. All sequences are listed in Supplementary Data 4.

AAV9 production and delivery to DMDΔE54 mdx mice

AAVs used in this study were produced by HuidaGene Therapeutics Co., Ltd. The transfection process involved achieving a confluency of 70–90%, after which the media was replaced with fresh pre-warmed growth media prior to transfection. For each 15 cm dish, a mixture of 20 μg of pHelper, 10 μg of pRepCap and 10 μg of GOI plasmid was transferred dropwise to the cell media. Following a three-day incubation, the AAVs were purified using iodixanol density gradient centrifugation. The DMDΔE54 mdx mice were derived by mating the humanized DMD∆E54 mice with mdx mice carrying stop mutation in mouse exon 23 on Chr.X. In the case of intramuscular injection, 3-week-old DMD∆E54 mdx mice were anesthetized, and their tibialis anterior (TA) muscle was injected with 50 μL of AAV9 (2.5 × 1011 vg per virus) preparations or with an equivalent volume of saline solution. Tissues were divided into distinct segments for targeted assessment. Specifically, the distal region was allocated for evaluating DNA editing and exon skipping efficiency, the middle portion was dedicated to Western blot analysis of dystrophin expression, and the proximal segment was reserved for immunofluorescent analysis of dystrophin levels at six weeks after treatment.

Western blot analysis

The samples were homogenized using RIPA buffer supplemented with protease inhibitor cocktail. The lysate supernatants were quantified using a Pierce BCA protein assay kit (Thermo Fisher Scientific, 23225) and adjusted to an identical concentration using H2O. Equal amounts of the sample were mixed with NuPAGE LDS sample buffer (Invitrogen, NP0007) and 10% β-mercaptoethanol, then boiled at 70 °C for 10 min. Ten µg of total protein per lane was loaded into 3% to 8% tris-acetate gels (Invitrogen, EA03752BOX) and electrophoresed for 1 h at 200 V. Protein was transferred onto a PVDF membrane under wet conditions at 350 mA for 3.5 h. Subsequently, the membrane was blocked in 5% non-fat milk in TBST buffer and then incubated with primary antibody to label the specific protein. After washing three times with TBST, the membrane was incubated with an HRP-conjugated secondary antibody specific to the IgG of the species of primary antibody against dystrophin (Sigma, D8168) or vinculin (CST, 13901S). Finally, the target proteins were visualized using Chemiluminescent substrates (Invitrogen, WP20005).

Histology and Immunofluorescence

Tissue samples were collected and immersed into preconditioned 4% paraformaldehyde. The fixed tissues underwent dehydration through a series of alcohol concentrations, followed by treatment with xylene and embedding in melted paraffin wax. Subsequently, the paraffin-embedded tissues were deparaffinized using xylene, followed by a series of alcohol washes ranging from high to low concentrations, and finally placed in distilled water. For hematoxylin and eosin (H&E) staining, the slides were stained with hematoxylin for 3–8 min, followed by color separation using acid water and ammonia water. After dehydration using 70% and 90% alcohol for 10 min each, the tissues were stained in eosin staining solution for 1–3 min, and dehydrated in ascending alcohol solutions (50%, 70%, 80%, 95%, 100%). Coverslips were then mounted onto the glass slides with neutral resin.

For Sirius red staining, the slides were stained with picrosirius red for one hour, washed in two changes of acidified water. Physical removal of most of the water from the slides was accomplished by vigorous shaking. Then, slides were dehydrated in three changes of 100% ethanol, cleared in xylene, and finally mounted in neutral resin.

For immunofluorescence, the tissues were embedded in optimal cutting temperature (OCT) compound and snap-frozen in liquid nitrogen. Serial frozen cryosections (10 µm) were fixed for two hours at 37 °C followed by permeabilization with PBS + 0.4%Triton-X for 30 min. After washing with PBS, the samples were blocked with 10% goat serum for 1 h at room temperature. Next, the slides were incubated overnight at 4 °C with primary antibodies against dystrophin (Abcam, ab15277) and spectrin (Millipore, MAB1622). The next day, samples were extensively washed with PBS and incubated with compatible secondary antibodies (Alexa Fluor® 488 AffiniPure donkey anti-rabbit IgG (Jackson ImmunoResearch labs, 711-545-152) or Alexa Fluor 647 AffiniPure donkey anti-mouse IgG (Jackson ImmunoResearch labs, 715-605-151)) and DAPI for 3 h at room temperature. Following a 15-minute wash with PBS, the slides were sealed with fluoromount-G mounting medium. All images were visualized using Nikon C2. The number of Dys+ muscle fibers is represented as a percentage of total spectrin-positive muscle fibers.

RNA-seq for off-target analysis

To quantify the transcriptome deaminases off-target edits, HEK293T cells were cultured in 10 cm dishes with 80% confluence and transfected with 35 μg plasmids containing base editors and gRNA. After 48 h, about 600,000 transfected cells were sorted by FACS, and RNA was extracted using Trizol (Ambion) for RNA-seq library preparation. An RNA-seq library was generated with a TruSeq Stranded Total RNA library preparation kit according to the standard protocol. The transcriptome libraries were sequenced using a 150 bp paired-end Illumina NovaSeq 6000 platform (Genewiz Co. Ltd.).

The calculation analysis referred to previously published methods36. Trimmomatic (v.0.39-2) were using to filter the RNAseq raw data48. The clean reads were aligned to the hg38 reference genome with Hisat2 (v.2.2.1)49. RNA editing sites were calculated using REDItools (v1.2.1) with “-e -d -p -u -m 60 -T 5-5 -W -n 0.0” parameters50. The edited adenosines divided by total adenosines and the edited cytosines divided by total cytosines were calculated separately.

Statistics & reproducibility

All cell experimental results are presented as mean ± s.d, while all animal experimental results are presented as mean ± s.e.m. The one-sided Mann-Whitney U test or unpaired two-tailed Student’s t-test were utilized for comparisons. The number of independent biological replicates are shown in figure legends. No data were excluded from the analyses. We randomly selected cells for test group and control group. DMD mice used for gene editing therapy were allocated to control or AAV9 treated group randomly.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.