Close this search box.

Functional screening in human HSPCs identifies optimized protein-based enhancers of Homology Directed Repair – Nature Communications

Molecular cloning (i53 lentiviral vectors)

The construct for the lentiviral-based expression and screening of i53 variants was cloned from a third-generation lentiviral plasmid (Lenti SFFV) purchased from Twist Biosciences. An empty vector was constructed to include BamHI and NsiI restriction enzyme cut sites upstream of a T2A-mCherry-WPRE cassette (to enable fluorescence-based monitoring of cells expressing the i53 variants). The sequences for i53 variants were either ordered as gene fragments from Twist Biosciences or IDT or amplified from previously constructed plasmids using primers designed to introduce the desired amino acid variation(s). Pooled NNK and combinatorial libraries were constructed using NNK primers (IDT) or oligo pools (Twist Biosciences), respectively. Combinatorial libraries were designed using one codon per amino acid (A = GCA, C = TGC, D = GAC, E = GAG, F = TTC, G = GGT, H = CAC, I = ATA, K = AAA, L = CTC, M = ATG, N = AAT, P = CCT, Q = CAG, R = CGG, S = AGC, T = ACC, V = GTC, W = TGG, Y = TAT). In certain library compositions, cysteine (C) and methionine (M) were excluded due to their inherent reactivity and susceptibility to oxidation (for any given two position combinatorial library, N = 324–400). Variants and libraries were cloned into the digested empty vector at the BamHI/NsiI cut sites using standard Gibson assembly protocols. Sequences of an empty (MT) and a representative assembled (i53v_L67R) i53 lentiviral vector are provided in Supplementary Data 5.

Cell culture

Lenti-X HEK293T cells (Takara Bio) were cultured in DMEM (1X) + GlutaMAX-I (Gibco) supplemented with 10% FBS (Sigma). K562 cells (ATCC) were cultured in RPMI (Gibco) supplemented with 10% FBS and 1x penicillin–streptomycin (Gibco). All cells were grown in a humidified 37 °C incubator with 5% CO2 and were passaged every 3–5 d.

Lentiviral production

Lenti-X HEK293T cells (Takara Bio) were seeded at a density of 4.5 × 106 cells per 10 cm dish 18–24 h prior to transfection. The prepared cells were co-transfected using the TransIT®-Lenti transfection reagent (Mirus Bio) with MISSION® Genomics Lentivirus Packaging Mix (Mirus Bio) and lentiviral plasmids containing i53 variants/libraries of interest. The viral supernatant was collected 48 h after transfection, passed through a 0.45 μm filter (Cytiva), flash frozen, and stored until use at -80 °C. Viral titers were measured by FACS in K562 cells and were typically ~0.5–1.5 × 107 TU/mL.

AAV production

The HBB-targeting AAV6 vectors HBB-SNP and HBB-UbC-GFP have been previously described20,21,26,27. All other AAV6 vectors were cloned into the pAAV-MCS plasmid (Agilent Technologies), which contains inverted terminal repeats (ITRs) derived from AAV2. Left and right homology arms (LHAs/RHAs) were derived from human genomic DNA to match the indicated length at the respective knock-in sites. The left and right homology arm lengths for the HBB, HBA, CCR5, and IL2RG donors were as follows: HBB LHA: 556 bp, HBB RHA: 449 bp, HBA LHA: 976 bp, HBA RHA: 879 bp, CCR5 LHA: 502 bp, CCR5 RHA: 500 bp, IL2RG LHA: 400 bp, IL2RG RHA: 414 bp. Each vector contained a UbC promoter, a CopGFP or (mCherry for HBB-UbC-mCherry), and a BGH polyA. UbC-GFP-BGH and UbC-mCherry-BGH were synthesized as gene fragments (Twist Bioscience) and cloned into pAAV with the corresponding LHA and RHA using standard Gibson Assembly protocols. The assembled LHA-UbC-GFP(/mCherry)-BGH-RHA sequences for HBB-UbC-mCherry, HBA-UbC-GFP, CCR5-UbC-GFP, and IL2RG-UbC-GFP AAV donors are provided are provided in Supplementary Data 5. The NPM1-GFP AAV6 vector was designed using the sequence of a donor plasmid described by the Allen Institute for Cell Science45 which attaches an mEGFP tag to the C-terminus of NPM. LHA-linker-mEGFP-BGH-RHA was synthesized as a gene fragment (Azenta/Genewiz) and cloned into pAAV using standard Gibson Assembly protocols. The HBB-SNP AAV6 was produced by Viralgen. The HBA-UbC-GFP AAV6 was produced by Packgene. HBB-UbC-GFP AAV6, CCR5-UbC-GFP AAV6, IL2RG-UbC-GFP AAV6 and NPM1-GFP AAV6 were produced by Vigene. Titers used for CD34 + HSPC editing experiments were determined using droplet digital PCR (ddPCR).

CD34+ HSPCs culture

Human CD34+ HSPCs were cultured as previously described20,21. CD34+ HSPCs were purchased from AllCells and had been isolated from G-CSF-mobilized peripheral blood from healthy donors. CD34+ HSPCs were cultured at 2.5 × 105–5 × 105 cells/mL in StemSpan™-AOF (Stemcell) supplemented with stem cell factor (SCF) (100 ng/mL), thrombopoietin (TPO) (100 ng/mL), FLT3–ligand (100 ng/mL), IL-6 (100 ng/mL) (all Peprotech) and UM171 (35 nM) (Selleckchem). Cells were cultured at 37 °C, 5% CO2, and 5% O2.

Lentiviral transduction of CD34+ HSPCs

CD34 + HSPC cells were transduced using lentivirus at MOIs of 0.25–1 at day 1 post thaw. Cells were concentrated using centrifugation (180 x g, 7 min), counted, and added at a concentration of 4 × 106 cells/mL to media containing lentivirus, cyclosporin A (5uM, Sigma Aldrich), and Synperonic F108 (0.5 mg/mL, Sigma Aldrich). After 4 h of incubation, cells were spun down, washed once with media, and seeded into lentivirus-free media at a density of 3.5 × 105 cells/mL.

Genome editing of CD34+ HSPCs (AAV6 donor)

Chemically-modified single guide RNAs (sgRNAs) used to edit CD34+ HSPCs were purchased from Synthego. The sgRNA sequences were modified by adding 2ʹ-O-methyl-3ʹ-phosphorothioate at the three terminal nucleotides of the 5ʹ and 3ʹ ends. The target sequence for the sgRNAs used are as follows: HBB: 5ʹ-CTTGCCCCACAGGGCAGTAA-3ʹ, HBA: 5ʹ-GGCAAGAAGCATGGCCACCG-3ʹ, CCR5: 5ʹ-GCAGCATAGTGAGCCCAGAA-3ʹ, IL2RG: 5ʹ- TGGTAATGATGGCTTCAACA-3ʹ, and NPM1: 5’-TCCAGGCTATTCAAGATCTC-3’. Cas9 protein (SpyFi Cas9) was purchased from Aldevron. The RNPs were complexed at a Cas9: sgRNA molar ratio of 1:2.5 at 25 °C for 10–15 min prior to electroporation. 48–72 h post thaw, CD34 + HSPC cells were collected, counted, and pelleted at 180 g x 7 min. The cell pellets were resuspended in MaxCyte buffer (standard cell concentrations per cuvette, as recommended by vendor, are shown in Table M1 in Supplementary Data 6) with complexed RNPs (final concentrations in electroporation cuvette: 0.45 mg/mL Cas9, 0.24 ug/uL sgRNA) and electroporated using a MaxCyte ExPERT ATx Nucleofector. After electroporation cells were plated at 3.5–5.0 × 105 cells/mL in media supplemented with cytokines and the desired AAV6 donor added at 5.0 × 102–2.5 × 104 vector genomes/cell. 24 h after nucleofection, cells were spun down, washed once with media, and seeded into AAV-free media at a density of 3.5 × 105 cells/mL. Cells were harvested 1–2 days post nucleofection for NGS analysis (see figure captions for specific times) or 3–5 days post nucleofection for GFP expression analysis.

When editing using purified i53 variant proteins, the proteins were added to CD34+ HSPCs cells as part of the nucleofection mix at concentrations of 0.0125–1.6 mg/mL (volume of added protein ≤ 1/10 of MaxCyte cuvette volume) prior to nucleofection. For editing with a DNAPK small molecule inhibitor (AZD7648, CC-115, and M314/nedisertib from Selleck Chemicals, or BAY8400 from MedChem Express), nucleofected cells were added to media containing both AAV6 and the DNAPKi at various concentrations. Twenty four hours after nucleofection, cells were spun down, washed with media, and seeded into AAV6 and DNAPKi-free media at a density of 3.5–5.0 × 105 cells/mL.

Screening and sorting of pooled libraries

Lentiviral-based i53 variant libraries were transduced at an MOI of ~0.2–0.5 (aiming for ~ 30% transduction and a coverage of >500 cells per library member in mCherry + /GFP+ cell population for each replicate tested). Three days after transduction, cells were edited in triplicate or quadruplet at HBB (or NPM1) as described above, using HBB-UbC-GFP donor AAV6 (or NPM1-GFP donor AAV6) at an MOI of 2.5 × 104 vector genomes/cell. Three days post editing, cells were pelleted and resuspended in media with DAPI (Miltenyi Biotec). Single, live, mCherry + /GFP+ and mCherry + /GFP- cells were collected using a FACSAria cell sorter (Becton Dickinson); purity of populations was confirmed by post-sort purity checks. Post sort, genomic DNA was harvested from each sorted cell population using a Quick-DNA 96 Plus Kit (Zymo Research). The DNA concentration of each sample was measured using a Qubit 1X dsDNA BR assay kit (ThermoFisher).

Next Generation Sequencing (NGS) of pooled libraries

An amplicon sequencing workflow was designed to sequence and quantify i53 variants within starting and post selection pools. Primers and PCR conditions were optimized to specifically amplify the entire variant coding sequence from plasmids, lentiviral libraries, as well as genomic DNA carrying lentiviral vector insertions. After the initial amplification, the i53 amplicons undergo an additional PCR amplification to add sequencing adapters and sample indexes to enable sample multiplexing. The resulting sequencing libraries were then sequenced on an Illumina MiSeq instrument using paired end reads to cover the full length of the i53 coding sequence.

NGS analysis of pooled libraries

The frequencies of i53 variants were quantified by counting the number of each observed sequence in the NGS data and then removing all unexpected sequence (i.e. using a prespecified “whitelist” of variants known to be contained in the pool). Spike-in tests using individual variants demonstrated the sequencing and analysis workflows could correctly estimate the frequencies of different i53 versions. This approach was used to confirm sequence diversity in plasmid and lentiviral libraries prior to screening. For quality control of screening data, key measures we considered were: the number of mapped reads (>1e5 reads per sample), the percent reads carried over from parent, and the diversity of observed sequences (i.e. minimal skewing). Fold-change enrichment of a variant was calculated by dividing normalized variant frequency in GFP+ cells by the frequencies in GFP- cells sorted from the same parent. All datasets contained an internal control (NNK-generated parent sequence) that was used to perform a last quality control of datasets, excluding sequencing runs where internal control abundance was >10% different from that of parent carry over control.

Data processing and visualizations were generated using R (v4.1.2) and the ggplot2 package. Variants were ranked by fold change over parent and any variant for which either average or every replicate was over 1.0 was flagged as ‘Better than parent’, as highlighted in figures. Hits were ranked by average fold change and top selected candidate variants were moved forward for validation in targeted libraries, as described below.

Validation of hits via lentiviral expression

Sequences of individual i53 variants of interest were cloned into the lentiviral-based expression plasmid described above. Hits were validated either as pooled “validation libraries” (variant and control plasmids manually mixed to generate a pool of 5−25 variants) or individually. Lentivirus generated from these plasmids was used to transduce CD34 + HSPC cells at MOIs of 0.5–1 at day 1 post thaw. At day 4, the transduced cells were edited with HBB-UbC-GFP AAV6 at concentrations of 1.25–2.5 × 104 vector genomes/cell. Cells transduced with pooled validation libraries were edited in triplicate or quadruplet; cells transduced with individual variants were edited in duplicate.

For individual testing of variants, rates of integration of the HBB-UbC-GFP donor were measured using a Beckman Coulter CytoFLEX. DAPI (Miltenyi Biotec) was used to discriminate live and dead cells. mCherry expression was used to differentiate transduced cells from untransduced cells and rates of GFP integration were compared between the two populations to quantify the impact of lentiviral-based variant expression on HDR rates. Flow cytometry data were analyzed using FlowJo 10 software. For pooled validation libraries, cells were sorted and analyzed as described above. NGS analysis of the gDNA purified from sorted mCherry+GFP+ and mCherry+GFP- populations was used to determine differential variant enrichment and validate the impact of individual variants on HDR rates relative to a control.

i53 variant protein production and purification

The sequences of different i53 variants were cloned into bacterial expression plasmids, resulting in a N-terminal His-tagged fusion protein with a protease cleavage site in between the 6x-His-tag and i53 variant sequence. The resulting plasmids were transformed into E. coli BL21 (DE3)-RIL for protein expression. Cells were grown at 37 °C in Luria-Bertani broth supplemented with 0.4% glucose to OD600 = 0.8 and induced with 0.4 mM IPTG at 18 °C for 18 h. Cells were harvested by centrifugation, resuspended in 50 mM potassium phosphate pH 8.0, 500 mM NaCl, 20 mM imidazole, and 3 mM β-mercaptoethanol. Cells were lysed using an microfluidizer (Microfluidics). The crude lysate was immediately supplemented with 0.2 mM phenylmethylsulfonyl fluoride (PMSF) and centrifuged at 14,000 x g for 30 min. The soluble fraction was subsequently incubated with 2 ml Ni-NTA (GE Healthcare) per 1000 ODs for 1 h at 4 °C. Following incubation with the Ni-NTA resin, lysate was removed by pelleting the resin at 2500 g for 3 min and washed 3 times with 9 bed volumes of 50 mM potassium phosphate pH 8.0, 500 mM NaCl, 20 mM imidazole, and 3 mM β-mercaptoethanol. Following the batch wash Ni-NTA resin was loaded onto a gravity column and His-tagged i53 variant protein was eluted with 6 bed volumes of 50 mM potassium phosphate pH 8.0, 300 mM NaCl, 500 mM imidazole, and 3 mM β-mercaptoethanol. Eluted protein was dialyzed overnight against 10 mM Tris/HCl pH 8.0, 200 mM NaCl, and 1 mM DTT and the 6xHis-tag was cleaved with protease. The protein was purified by anion exchange chromatography on a HiTrapQ column (GE Healthcare) via a linear NaCl gradient and twice by size exclusion chromatography using a Superdex S200 26/60 column (GE Healthcare) run in 10 mM Tris/HCl pH 8.0, 200 mM NaCl, 1 mM DTT. Proteins were concentrated to ~20 mg/mL and flash frozen for storage.

Size exclusion chromatography

Recombinantly purified 53BP1 Tudor domain (53BP1 residues 1484–1603) was mixed with recombinantly purified i53 variants at a concentration of 0.5 mg/mL each. Proteins were incubated for 30 min at room temperature prior to injection onto an HPLC (Agilent, 1260 Infinity II). 5 μL of protein complex was injected onto a MAbPac 4 × 300 mm SEC column with 5 µm particle size and 300 Å pore size. The HPLC was run at 0.2 mL per minute using PBS as the mobile phase and continuously measuring the absorbance at 280 nm for ~1 full column volume. 53BP1 Tudor domain alone has a retention time of 14.6 min. i53 variants have a retention time of ~15.5 min. A stable complex of 53BP1 Tudor domain and i53 variants were found to have a retention time of 14.3 min.

Bio-layer interferometry (BLI)

Data were collected using an Octet R8 system (Sartorius). Purified 53BP1 Tudor domain was labeled at exposed primary amine groups with NHS-biotin using ChromaLINK NHS-Biotin protein labeling kit (Vector Laboratories). 1 equivalent of chromalink biotin was incubated with the 53BP1 Tudor domain for 2 h and buffer exchanged into fresh PBS. Labeling efficiency was calculated to be ~1 biotin per molecule of 53BP1 Tudor domain. Octet SA Biosensor tips (Sartorius) were incubated with biotin-labeled 53BP1 Tudor domain (ligand) for 60–80 s. The labeled tip was then dipped in 1x binding buffer (Sartorius) for 60 s to remove excess ligand and achieve baseline. Labeled tips were introduced to the i53 variant (analyte) for 500–600 s and the response was continuously monitored to detect association. A range of analyte concentrations were tested from a highest to lowest concentration in nM (i.e. 200, 100, 50, 25, 12.5, 6.25, 3.125). The tips were then introduced to 1x binding buffer for 5 min and the response was continuously monitored to detect dissociation. A dissociation constant (KD) was calculated using a 1:1 binding model and the on-rate (ka) and off-rate (kd) were calculated as a change in response (nm) over time (s).


Assay volumes of 20 μL (n = 4) were composed of 0.5 uM His-tagged i53, 0.5 μM c-terminal avi tagged 53BP1, 5 nM Europium labeled anti-His antibody (Perkin Elmer), 0.5x Streptavidin-xl665 (Cis Bio) and an i53 variant at concentrations ranging from 5000 nM to 4.8 nM. All assay components were prepared in a buffer composed of 50 mM Tris pH 7.5, 150 mM NaCl, 0.02% (v/v) Tween-20, and 0.05% (w/v) BSA. Each assay was incubated for 2 h at room temperature in a 384 well white optiplate. TR-FRET was measured on a Clariostar Plus plate reader (BMG LabTec) using the TR-FRET mode.


The human i53:53BP1 complex, purified in 10 mM Tris 8.0, 200 mM NaCl and 1 mM DTT was screened for crystallization at room temperature using a protein concentration of 30 mg/mL with the previously published condition15 0.1 M MES (2-(N-morpholino)ethanesulfonic acid) pH 6.0, 0.2 M trimethylamine N-oxide and 25% (w/v) PEG MME (polyethylene glycol monomethyl ether) 2000. Crystals grew within 7 days at 23 °C using the sitting drop vapor diffusion method. Crystals were cryoprotected by adding glycerol, 20% (v/v) final concentration, to the reservoir solution before flash-freezing in liquid nitrogen. The i53:53BP1 complex was crystallized in the P212121 space group with one i53:53BP1 complex molecule per asymmetric unit cell.

Structure determination

X-ray diffraction data was collected at the CLSI beamline 081D-1 using a wavelength of 0.95372 Å under cryo-conditions at a temperature of 100 K. Structures of human i53:53BP1 Tudor domain (WT, L67H, L67R, T12Y.T14E.L67R, T12V.T14H.L67H) were solved using molecular replacement and previously published structure of WT i53:53BP1 Tudor domain (PDB code: 5J26). The final models for human i53:53BP1 Tudor domain (WT, L67H, L67R, T12Y.T14E.L67R, T12V.T14H.L67H) were built with native data and refined to an extended resolution below 1.8 Å for each dataset. All models of i53:53BP1 complex were built using COOT46 and further refinement was completed using Refmac47. All models were refined to acceptable quality and Ramachandran values are reported below for each dataset. Ramachandran values were calculated for each dataset and are as follows. i53WT:53BP1—98.94% favored, 1.06% allowed, and zero outliers. i53L67R:53BP1—98.86% favored, 1.04% allowed and zero outliers. I53L67H:53BP1 98.45% favored, 1.55 % allowed and zero outliers. i53T12V.T14H.L67H:53BP1—97.4% favored, 2.6% allowed, and zero outliers. i53T12Y.T14E.L67R:53BP1—97.28% favored, 2.72% allowed and zero outliers. The respective PDB codes are 8SVG, 8SVH, 8SVI, 8SVJ, 8T2D.


Samples of purified proteins (20 μg) were analyzed by LC-MS using a Poroshell 300SB-C8 2.1 × 7.5 mm column coupled to an Agilent 6224 ToF (data collection and analysis completed at JadeBio, San Diego, CA).

Measuring targeted integration of HBB-UbC-GFP, HBA-UbC-GFP, CCR5-UbC-GFP, or IL2RG-UbC-GFP (flow cytometry-based analysis)

Rates of targeted integration of the HBB-UbC-GFP, HBA-UbC-GFP, CCR5-UbC-GFP, and IL2RG-UbC-GFP donors were measured using a Beckman Coulter CytoFLEX. DAPI (Miltenyi Biotec) was used to discriminate live and dead cells. Flow cytometry data were analyzed using FlowJo 10 software.

Measuring targeted integration of HBB-SNP (NGS-based analysis)

The frequency of homology directed repair (HDR) and other editing outcomes at HBB were measured using Next Generation Sequencing (NGS). An NGS assay was developed to determine the frequency of various sequence changes at the HBB locus by quantifying the number of alleles that have been either: (1) not edited (% WT), (2) changed by HDR to incorporate sequence differences present in the AAV repair template (% HR), or (3) mutated during the genome correction process resulting in a gene that produces mutant β-globin (% INDELs).

For this assay, genomic DNA was harvested from cells using a Quick-DNA 96 Plus Kit (Zymo Research). The DNA concentration was measured using a Qubit 1X dsDNA BR assay kit (ThermoFisher). Purified genomic DNA was then used to amplify the HBB locus via polymerase chain reaction (PCR). The PCR products were diluted using nuclease-free water to serve as the template DNA for targeted NGS library prep. An Aglient Tapestation was used to confirm the PCR product for each sample was the expected size (1410 bp). A second PCR with primers carrying partial Illumina adapters was performed to amplify a 142 base pair sequence that includes the region of the HBB locus that is to be corrected during the genome correction process. The PCR products were diluted again to serve as templates in a third PCR reaction using Nextera XT index primers. This third PCR reaction was used to assign unique identifiers to each sample and to add the full length adapter sequences necessary for Illumina sequencing. The size of the PCR products was assessed on an Agilent BioAnalyzer. PCR products were then pooled, purified using a Qiagen PCR purification kit, and quantified using PicoGreen in order to ready the PCR products for sequencing.

Based on the PicoGreen concentration, the library of pooled PCR products was diluted to a final concentration of 4 nM. Sequencing was performed on a MiSeq system using an Illumina MiSeq sequencing reagent kit (V2, 300 cycles). A 10% PhiX control library was added to the sample library to improve sequence diversity and to allow for error rate measurements. The library was denatured and loaded at 8–12 pM onto the sequencing reagent cartridge. The sequencing entails paired-end 150 base pair reads and dual indexing reads. The sequencing data was demultiplexed based on the sample indexes provided and FASTQ files for each sample were generated. The FASTQ files were processed using the CRISPResso2 pipeline (v2.1.0)48 or all experimental and bioinformatic steps, a positive control with known editing outcomes, a negative control with no editing and a no template control were processed in parallel with each set of samples.

As has been reported previously, recombination events were observed where double-stranded breaks at the HBB locus were repaired with HBD, a close and nearby homolog of HBB. These various recombination events could be recognized by the presence of up to 6 SNPs only present in HBD and not the HBB-SNP repair template nor the HBB wildtype sequence. To estimate the frequency of HBB break repair using HBD as a template the fully recombined HBD amplicon sequence was included (containing all 6 mismatch SNPs relative to the HBB amplicon) as an amplicon in Crispresso (in addition to wildtype HBB and the intended repair outcome with HBB-SNP using the “-a” parameter).

To quantify partial recombination (I.e. containing <6 mismatch SNPs), during Crispresso analysis we generated two amplicon sequences consisting of 5 of the 6 HBD-specific SNPs in the 3’ direction and 5 of the 6 SNPs in the 5’ direction from the cutsite. To quantify HBD recombinations, we summed the number of reads that mapped to either the full or the partial HBD recombination plus reads containing a mismatch at the cutsite without any additional indel (all of those tracked to HBD gene).

The full list of parameters passed to Crispresso was the same for all analyses and are shown in Table M2 in Supplementary Data 6.

Summary data for each sample was reported as % WT (unedited), % HDR (incorporation of HBB-SNP donor template), % HBD, %MMEJ (edits that get significantly reduced by POLQ knockdown, as described below), while the rest are classified as “NHEJ”. When presented as % edited alleles, edits are calculated as % of any given edit/(100–WT).

Measuring editing of OT-1 (NGS-based analysis)

Editing outcomes at the off-target (OT) editing site OT-1 were assessed using an assay very similar to the one described above for measuring the targeted integration of HBB-SNP. The off-target editing site was identified via three different methods (in silico prediction26, Circle-Seq49, and Guide-Seq50) and was confirmed to be off-target editing site of significance via amplicon sequencing. The workflow for the OT-1 amplicon sequencing assay is very similar to the on-target sequencing assay but involves one less PCR step. A small 166 base pair sequence encompassing the OT-1 editing site is amplified directly from genomic DNA in the first PCR and then tagged with barcode and adapter sequences in a second PCR. The resulting PCR products are sized, quantified, and prepared for sequencing in the same manner as for the HBB on-target sequencing assay. The sequencing data is also processed using the Crispresso2 pipeline47, but in contrast to the on-target sequencing assay no homology direct repair outcomes are present at OT-1, only % WT and % INDELs are reported.

DNA damage response (DDR markers p21 and yH2AX) analysis

For p21 analysis, 1.5  ×105 cells were spun down at 300 g x 5 min, washed once with PBS, and was resuspended in 22.5 μL of RIPA buffer with 2X Halt protease and phosphatase inhibitors (ThermoFisher). Lysates were incubated on ice for 30 min with intermittent vortexing. Lysates were then spun down in a microcentrifuge at 500 x g for 5 min; supernatants were then transferred to fresh tubes. Samples were prepared by mixing 5 μL of protein extract with 1.25 μL of freshly prepared 5X fluorescent master mix as instructed by the ProteinSimple Jess protocol. Samples were denatured for 10 min at 95 °C, quickly spun and loaded into a Jess capillary cartridge. Capillaries were probed with anti-p21 (CST) and anti-alpha tubulin (Abcam) and detected by HRP-conjugated secondary antibodies. Data was normalized to the internal alpha-tubulin loading control and then expressed as FC values over control treatments.

For yH2AX analysis, 1.5 × 105 cells were spun down at 300 g x 5 min. Cells were resuspended in 100 μL of diluted Live/Dead Fixable Violet dye (ThermoFisher, 1:1000 in PBS). Cells were incubated for 20 min in the dark at room temperature. After adding 100 μL of PBS to cell suspension, the cells were spun down and washed once more with 200 μL PBS. The final PBS wash was flicked from the plate and cells were lightly vortexed to resuspend in residual PBS buffer remaining after wash. 100 μL of freshly-prepared 70% ethanol was added to the cell pellets and the plate was tightly sealed with foil, vortexed and then allowed to fix at –20 °C from 1 h to 3 days. Following fixation, 100 μL of cell staining buffer (CSB, BioLegend) was added to cell suspensions and the resuspended cells were spun down at 500 x g for 5 min. Cells were washed once more with 200 μL of CSB and then resuspended in 50 μL of CSB and blocked for 15 min at RT. Diluted anti-yH2AX-PE (1:20 in CSB, BioLegend) was added to cells and incubated further for 30 min at RT. Following staining, cells were washed twice with CSB and were immediately analyzed using a Beckman Coulter CytoFLEX flow cytometer.

Apoptosis assay

For each sample, 1.2 × 105 total cells in growth medium were centrifuged for 5 min at 400 x g, the supernatant was aspirated, and cells were resuspended in 100 μL of 1x Annexin V Binding Buffer (ABB, Fisher Scientific) supplemented with 5 μL of Annexin V-AF488 reagent (Fisher Scientific). After 15 min incubation at RT hidden from light, 3 μL of 1:10 dilution of 1 mM Sytox AAD stock in DMSO (Fisher Scientific) was added and mixed well. After 5 min incubation, 150 μL of 1x ABB was added and cells were immediately analyzed on CytoFLEX LX (Beckman Coulter) using B525-FITC and R712-APCA700 channels. For negative and positive staining controls, untreated cells or cells cultured for 20–24 h in complete medium supplemented with Etoposide (R&D Systems; 5 µM for 16 h) were stained in parallel with samples, respectively. The flow cytometry results were analyzed using FlowJo v10.8 Software (BD Life Sciences). The compensation matrix was built in FlowJo using single-stained control cells. Before quantitation of viable and apoptotic cell populations, cell debris was gated out from the double negative population.

POLQ knockdown and determination of MMEJ edits

The construct for the shRNA knockdown of PolQ was adapted from the previously reported pLKO51 and cloned from a third-generation lentiviral plasmid (Lenti SFFV) purchased from Twist Biosciences. An empty vector was constructed to include a DNA stuffer flanked by AgeI and EcoRI restriction enzyme cut sites downstream of a U6 promoter and upstream of a SFFV-EGFP-WPRE cassette (to enable fluorescence-based monitoring of cells expressing the shRNA). shRNA sequences were cloned as duplexed DNA oligos (IDT) into the digested empty vector at the AgeI/EcoRI cut sites using standard Ligation protocols. The target sequences used for POLQ (gene ID: 10721) and non-targeting control (NTC) shRNA were identified using the Broad Genetic Perturbation Platform (GPP) Web Portal ( and are as follows were: POLQ: 5ʹ-GCTGACCAAGATTTGCTATAT-3ʹ and NTC: 5’-CCTAAGGTTAAGTCGCCCTCG-3’. Sequences of an empty (MT) and a representative assembled (Polq) shRNA lentiviral vector are provided in Supplementary Data 5.

CD34 + HSPC cells were transduced using lentivirus produced using shRNA transfer vectors at day 1 post thaw using methods described above and MOIs of 2.5–7.5. Three days after transduction, cells were edited in duplicate at HBB as described above, using HBB-SNP or HBB-UbC-mCherry donor AAV6 (MOIs of 75–2.5 × 104 vector genomes/cell). Three days post editing, cells were pelleted and resuspended in media with DAPI (Miltenyi Biotec).

For cells edited with HBB-UbC-mCherry: rates of integration of the donor were measured using a Beckman Coulter CytoFLEX. DAPI (Miltenyi Biotec) was used to discriminate live and dead cells. GFP expression was used to differentiate transduced cells from untransduced cells and rates of mCherry integration were compared between the two populations to quantify the impact of lentiviral-based shRNA expression on HDR rates.

For cells edited with HBB-SNP: single, live, GFP+ (shRNA + ) and GFP- (shRNA-, negative control) cells were collected using a FACSAria cell sorter (Becton Dickinson); purity of populations was confirmed by post-sort purity checks. Post sort, genomic DNA was harvested from each sorted cell population and editing outcomes were determined using the pipeline outlined above. To determine which edits were reduced by POLQ knockdown, a one-sided t-test comparing the GFP+ and GFP- conditions for each individual editing outcome. Those edits with FDR-corrected (Benjamini-Hochberg) p-value below 0.1 were labeled as POLQ-dependent on “MMEJ”.

LT-HSC sorting Method and Materials

Details on the antibodies and reagents used for LT-HSC sort are shown in Table M3 in Supplementary Data 6. Cryopreserved samples were rapidly thawed in warm GMP SCGM (CellGenix) media, washed with cell staining buffer (Biolegend). Washed cells were incubated with a panel of fluorochrome-conjugated anti-human monoclonal antibodies (mAb) and viability dye to characterize hematopoietic stem cell compartments. The following directly conjugated mAbs used in this study were obtained from BD Biosciences: CD38-PE-Cy7 (Clone HIT2), Biolegend: CD34-Aexa488 (581), CD45RA-BV510 (HI100), CD49c-PE (ASC-1), CD49f-BV421 (GoH3), CD90-BV711 (5E10), CD201-APC (RCR-401), CD45-Alexa700 (HI30) and Thermo Fisher Scientific: DyLight 800 Maleimide. Brilliant Stain Buffer Plus (BD Bioscience) was added to stabilize the fluorophore-conjugated antibody cocktail. Cells were stained for 30 min at 4 °C and washed with cell staining buffer and acquired within 1 h on a custom SORP five laser FACSAria Fusion (BD Biosciences).

FACSAria Fusion was calibrated with Cytometer Setup and Tracking beads (BD Biosciences, 655050), the sort parameters were set to 20 psi with a 100 µm nozzle, and the droplet stream was calibrated with Accudrop Beads (BD Biosciences, 345249). Sort layout was set to 4-way purity and four subfractions were collected into 5 mL FACS tubes as follows:

1. Long Term Hematopoietic Stem Cell (LT-HSC) enriched (CD34 + CD45RA-CD90 + CD201 + CD49f + CD49c + )

2. Short Term hematopoietic Stem Cell (ST-HSC) enriched (CD34 + CD45RA-CD90 + CD201- (CD49f-CD49cdim))

3. Hematopoietic Stem and progenitor cell (HSPC) enriched (CD34 + CD45RA-CD90-CD201- (CD49f-CD49c-))

4. Lineage committed progenitors (CD34 + CD45RA+ (CD90dimCD201-CD49f-CD49cdim))

Aliquots of sorted sample populations were re-acquired to assess the purity of the sort. Sorted cells were spun down, supernatant aspirated, and snap frozen at -80 °C for DNA extraction and NGS analysis.

The bulk cells were phenotyped for cell sorting with the surface markers CD45, CD34, CD45RA, CD201, CD90, CD49f, CD49c9,38,39,40. A physical gate was applied to remove debris and isolate HSC sized cells, doublet cells were removed with SSC-singlet and FSC-singlet gates, dead cells were removed, the CD45+ cells were sub-fractionated into HSC/HSPC (CD45 + CD34 + CD45RA-) and linage committed (sort population 4: CD45 + CD34 + CD45RA + ) compartments. The HSC/HSPC was further divided into HSPC (sort population 3: CD45 + CD34 + CD45RA-CD90-CD201-(CD49f-CD49c-)), short-term HSC (sort population 2: CD45 + CD34 + CD45RA-CD90 + CD201-(CD49f-CD49cdim)) and long-term HSC (sort population 1: CD45 + CD34 + CD45RA-CD90 + CD201 + CD49f + CD49c + ).

CFU progenitor assay

At 48 h post gene editing, 250 cells per well were plated in Methocult Optimum media in SmartDish plates (both StemCell Technologies). Plates were incubated in a secondary enclosure at 37 °C, 5% CO2, and 5% O2 for 14 d before scoring colonies using the human mPB program on a STEMvision imager (StemCell Technologies).

Measuring targeted integration of HBB-SNP in Colonies (CFU-seq)

Individual colonies were picked and gDNA was extracted using Lucigen Quickextract kit according to manufacturer’s instructions. NGS library prep on gDNA was performed as described in above section titled Measuring Targeted Integration of HBB-SNP.

Raw fastq files output from the sequencer were analyzed using our in-house On-target HBB CFU Bioinformatics Pipeline. This pipeline uses Crispresso 2 (v2.1.0) to quantify the various gene editing outcomes in each colony. Parameters for Crispresso 2 were set to be identical to those used for On-Target CD34 NGS analysis. Output counts and fractions of each allele from Crispresso analysis was used to infer genotypes. Filters were applied to remove low quality colonies. Colonies with fewer than 2000 reads aligned were removed as the low read count would likely impact quantification. A 10% fraction threshold was used to call the presence of expected alleles. Colonies with more than two alleles above the 10% fraction threshold were removed as these were likely not single clones. NoCall colonies included any colonies that did not produce a band on In/Out PCR or were removed by the above bioinformatics filters.


Genomic DNA (gDNA) was isolated using PureBind Blood Genomic DNA Isolation Kit (Ocean Nanotech), quantified, library preparation was performed, and quality was assessed (LAB-SOP-018, GeneGoCell) then sequenced (NextSeq2000, Illumina). Raw sequence reads were demultiplexed into sample-specific fastq files (bcl2fastq program v2.20.0.422, Illumina). The resulting fastq files were processed as follows: low-quality reads were removed using quality score threshold 28 (Q28), and PCR duplicates were removed using the UMIs. The resulting fastq files were analyzed to generate quality control (QC) statistics. Reads were aligned to the human genome (hg38) using BWA v0.7.17- r1188 (GeneGoCell NGS bioinformatics pipeline v2.2.3).

The control and experimental samples were further analyzed using the same process, abbreviated here: For a given site, the dsODN insertion rate was calculated as the number of site-specific reads with dsODN incorporation vs. total number of site-specific reads. The alignment results were analyzed using G-GUIDE analysis program v4.0 to generate the genome-wide dsODN insertion sites and report break points (BPs) for each high-quality read. Control sample background sites were subtracted from the edited samples, and only sample-specific sites are reported.


Cryopreserved aliquots of 2 M cells were used for each submission. Aliquots of 2 M cells for each sample (24 h post-editing) were prepared by centrifuging cells (180 x g, 7 min) and resuspending in Cryostor CS10 solution (BioLife Solutions) at a density of 10 M cells/mL. Frozen cell aliquots were then sent to KromaTiD (Longmont, CO) for karyotyping (G-banding). Briefly, after harvest and fixation, the fixed cells were washed twice with fixative (prepared fresh day-of-use) and the O.D. was adjusted. Drops of the final cell suspension were placed on clean slides and aged for 60 min at 90 °C. Slides were digested in a pancreatin solution with Isoton II diluent. The enzymatic reaction was then stopped by rinsing with FBS, followed by application of a stain solution (3:1 Wright/Gurr buffer) which was poured on the slides so that it covered the entire surface. After staining for up to 1 min, slides were washed with de-ionized water for 1−5 s and air dried. A mounting medium was applied to the slides and sealed with a coverslip. The slides were scanned on the microscope for cell analysis.


Detection of the sequence of interest and their translocated partners, in this case our editing site, known off-target site and their translocation partners. Samples were treated as follows: gDNA was isolated (PureBind Genomic DNA Isolation Kit, Ocean Nanotech), fragmented via sonication, followed by DNA-end repair, UMI adapter ligation, and PCR amplification enrichment of fragments that contain editing targets and translocations, then prepared for sequencing (LAB-SOP-017, GeneGoCell). The amplified gDNA fragment library was sequenced (NextSeq 2000, Illumina) and DNA sequence generation via sequencing-by-synthesis (SBS) (LAB-SOP-022, GeneGoCell), demultiplexed (bcl2fastq v2.20.0.422, Illumina) and processed as described next (v2.0.9, GeneGoCell):

GeneGoCell’s proprietary G-Trans platform was used to amplify and quantify all potential translocations in an unbiased manner. Translocations were quantified between the target listed below to anywhere else across the genome (hg38). Low quality reads and PCR duplicates were removed via Q28 and UMIs, respectively. Quality control was run (v.0.10.1, FastQC), reads were aligned to hg38 (v0.7.17-r1188, BWA), and results were analyzed (proprietary translocation analysis v1.6, GeneGoCell) to identify potential genome-wide translocation sites. Donor and recipient genomic loci BPs were calculated per read. Called BPs met the following CRISPR/Cas9 genome editing associated criteria: ≥10 UMI reads, a minimum of 3 BPs in the flanking 200 bp of the peak, position +/- 100 base pairs, and peak BP:total region read counts ratio <0.9. To compute on-target translocation rate, the number of reads for each reaction was divided by the number of target-specific reads, and multiplied by 100.

Details on the targets and primers used for Trans-seq are shown in Tables M4 and M5 in Supplementary Data 6.


To assess gene expression profiles from single cells, 2 million cryopreserved cells were thawed for use with 10X Genomics Chromium Next GEM Single Cell 3’ Gene Expression Reagent Kit (10 × 3’ Kit). The thawed cells were counted with AO/PI viability stain on the Nexcelom cell counter. Approximately, 8000 live cells were added to a master mix for reverse transcription (RT Reagent B, Template Switching Oligo, Reducing Agent B, and RT enzyme C), then loaded into a Chromium Next GEM Chip G for running in the Chromium Controller to generate Gel Beads-in-emulsion (GEMs). The GEMs were transfer to tubes for RT incubation in a Bio-Rad C1000 Touch for 45 min at 53 °C, then 5 min at 85 °C and held at 4 °C. After RT, the GEMs were purified with Dynabeads™ MyOne™ SILANE. The eluted cDNA was amplified by using the Amp Mix and cDNA primers with 11 cycles of PCR in a Bio-Rad C1000 Touch. The dsDNA cDNA product was analyzed using the High Sensitivity DNA Chip on an Agilent Bioanalyzer 2100. 10 μL of the dsDNA cDNA product was fragmented with the fragmentation primer, end-repaired and A-tailed to prepare for the ligation of the sequencing adapters. Afterwards, the dsDNA was purified with a double-sided SPRI. Illumina sequencing adapters were ligated to the dsDNA to generate the sequencing library. Another 15 cycles of PCR in a Bio-Rad C1000 Touch was used to amplify and index the sequencing library.

The indexed libraries were purified with a double-sided SPRI and qualitatively measured with the High Sensitivity DNA Chip on an Agilent Bioanalyzer 2100 to assess the size and the concentrations were measured by Qubit Broad Range kit. Each sample library was normalized to 900 pM and pooled in equal volumes. The library pool plus including 10% PhiX control library was denatured and then loaded onto a P3 flowcell on the Illumina NextSeq 2000. The run parameters were Read 1: 28, Index 1: 10, Index 2: 10, Read 2: 90, per 10 × 3’ Kit protocol. After the run, the sequence metrics was checked to see read quality and then bcl files were converted to fastq. The fastq were then input into the Graphite single cell pipeline for analysis.

Data processing and analysis were performed in R version 4.2.0 via RStudio, using Seurat (v4.3.0). Visualizations were created with dittoSeq (v1.8.1) ( and ggplot2. Seurat’s Read10X function was used to generate a count data matrix using the filtered count matrix genes and cells, gene names, and barcode files provided by 10X. A Seurat object was created with the count data matrix and metadata and filtered to keep genes present in at least 3 cells and cells meeting cohort selection criteria of at least 200 genes. Log normalization was performed using Seurat’s NormalizeData function with a scale factor of 10,000, and highly variable features were identified using Seurat’s FindVariableFeatures The data matrix was then scaled using Seurat’s ScaleData function with nCount_RNA regressed out, and dimensionality reduction through Uniform Manifold Approximation and Projection (UMAP) was performed with the appropriate dimensions selected based on the corresponding principal component analysis (PCA) elbow plot.

The Seurat function RunAzimuth was used as reference-based mapping to annotate the data to the Human bone marrow reference (

Isolation of CD34+ cells

Leuokopaks were purchased from AllCells LLC and these were collected from healthy donors per standard protocols using mobilization with G-CSF+ Plerixafor. CD34+ cells were isolated from the leukopaks within 24 h by first removing the platelets using the LOVO cell Processing System (Fresenius Kabi). GMP grade reagents, buffers and columns for CD34 immunomagnetic selection were purchased from Milteny Biotec and the platelet washed cells were incubated for 30–35 min using the CD34 Reagent following which a subsequent wash for excess antibodies was performed on the LOVO. The washed and labelled cells were subject to immunomagnetic selection using the CliniMACS Plus instrument (Miltenyi Biotec) following which the cells were cryopreserved at a concentration of 5 × 106 - 1 × 107 cells/mL in CryoMACS 50 or 250 Bags (Miltenyi Biotec) for the gene edited drug product generation step.

Large scale editing of HSPCs

At least 5 × 107 - 3 × 108 Cryopreserved CD34+ HSPCs were then thawed at 37 °C and cultured in supplemented cytokine rich SCGM media (CellGenix) containing recombinant cytokines at 100 ng/mL each Flt-3L, TPO, and SCF (PeproTech) and 35 nM UM171 (ExCellThera) in gas permeable vessels and incubated in 5% CO2 + 5% O2 for 48−72 h. The cells were then washed and resuspended in 3–10 mL of electroporation buffer (Hyclone). A GMP grade chemically modified single guide RNA (sgRNA) targeting the HBB locus was purchased from Agilent with modifications for 2ʹ-O-methyl-3ʹ-phosphorothioate at the three terminal nucleotides of the 5ʹ and 3ʹ ends with the sequence 5ʹ-CTTGCCCCACAGGGCAGTAA-3ʹ. The gene editing reagents were pre-complexed as an RNP containing 2 mg/mL sgRNA (Agilent Technologies) and 10 mg/mL SpyFi Cas9 (Aldevron) at a 2.5:1 molar ratio for 10 min at room temperature. Approximately 169 µL of RNP was added per 1 mL of cell suspension in electroporation buffer. For conditions testing the HDR booster, thawed i53 variant protein was mixed well by pipetting and added to the RNP at a concentration of 0.8 mg/mL (of total electroporation volume) following which the cells were electroporated using the MaxCyte GTx system using the CL1.1 or CL2 closed cartridge that are suitable for GMP manufacturing. Following electroporation, the cells were allowed to rest for 10 min in an incubator at 37 °C. In the meanwhile, prepared HBB-SNP virus carrying the corrected sequence for HBB was thawed and added at either 6.25 × 102 or 1.25 × 103 vector genomes/cell into culture media following which the electroporated cells were split equally (for different MOI conditions) and transferred to gas permeable culture vessels. At 16–24 h post-gene editing, the cells from each condition were collected and centrifuged at 300 x g for 10 min to pellet the cells. The supernatant was aspirated, and the cell pellet(s) were washed with and re-suspended in PlasmaLyte buffer with 2% (v/v) HSA. A cell count was performed using the NC202 counter that uses AO/DAPI staining using the pre-set Cell count and Viability protocol. Cell counts were used to determine cell yield, viability and concentrations for cryopreservation. Following a final centrifugation step at 450 x g for 10 min, the cells were resuspended in cold cryopreservation media CryoStor CS5 (BioLife Technologies) and aliquoted into vials at a final concentration of 5 × 106 - 1.2 × 107 cells/mL. The vials were then subject to cryopreservation using a controlled rate freezer and storage in vapor phase LN2 at ≤ - 150 °C prior to performing all analytical metrics.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.