A benchmarked, high-efficiency prime editing platform for multiplexed dropout screening

Experimental model and subject details

Prime editing cell lines

All prime editor constructs contained an SpCas9(H840A) nickase, fused to an MMLV RT (D200N, T306K, W313F, T330P, L603W). In addition, PEmax editor construct contained a codon-optimized MMLV RT and the following additional mutations in the SpCas9 nickase: R221K and N394K. Construction of PEmax cell line described previously40. PE2 cell line constructed in the same manner as PEmax cell line. To construct MLH1 knockout PEmax cells (PEmaxKO), 122 pmol of Alt-R S.p. Cas9 Nuclease V3 (IDT 1081058) and 200 pmol Alt-R CRISPR–Cas9 sgRNA targeting MLH1 (IDT Hs.Cas9.MLH1.1.AG, 5′-mC*mU*mU*rCrArCrUrGrArGrUrArGrUrUrUrGrCrArUrGrUrUrUrUrArGrArGrCrUrArGrArArArUrArGrCrArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrArArCrUrUrGrArArArArArGrUrGrGrCrArCrCrGrArGrUrCrGrGrUrGrCmU*mU*mU*rU) were complexed for 20 min at room temperature and were nucleofected into 5 × 105 PEmax cells using the SE Cell Line 4D-Nucleofector X Kit (Lonza V4XC-1032) and program FF-120, according to the manufacturer’s protocol. Five days post nucleofection, cells were sorted by BD FACSAria Fusion Flow Cytometer into 96-well plates at one cell per well with 150 μl conditioned culture medium. Single cells were grown and expanded for 2–3 weeks into clonal lines. Clones with a high percentage of cells with expression of EGFP according to AttuneNXT flow cytometry analysis (Attune Cytometric Software) were selected for further characterization.

General cell culture and selection conditions

Lenti-X 293T was purchased from Takara (632180) and K562 (CCL-243) was purchased from ATCC. K562 stable prime editing cell lines were maintained in RPMI 1640 medium (Gibco, 22400089) supplied with 10% FBS (Corning, 35-010-CV) and penicillin/streptomycin (Gibco, 15140122; 100 U ml−1). The 293T cells were maintained in DMEM medium (Corning, 10-013-CV) supplied with 10% FBS and penicillin/streptomycin. All cells were kept in a humidified incubator at 37 °C, 5% CO2. For all pooled screens, K562 cells were kept in a humidified incubator with agitation (multitron) at 37 °C, 5% CO2, 52–76 rpm depending on total volume. AttuneNXT (Attune Cytometric Software) was used to quantify fluorescent protein expression.

General sequences and cloning

For endogenously tested HEK3 +1 T>A and DNMT1 +6 G>C substitutions, spacer and 3′ extension sequences were from a previous publication (HEK3_4a_1TtoA and DNMT1_ED5f _6GtoC, respectively)25, modified scaffold sequence was 5′-GTTTAAGAGCTATGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGC (ref. 66) and RNA structural motif for epegRNAs was tevopreQ1 (5′-CGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAA)41. pegRNAs and epegRNAs used the pU6-sgRNA-EF1Alpha-puro-T2A-BFP (Addgene no. 60955)29 backbone. Cloning details for these guides described previously40.

To create a backbone plasmid suitable for use in cloning our self-targeting epegRNA libraries, an intermediate backbone plasmid (pJY126) was first generated by removing BsmBI restriction sites on pU6-sgRNA-EF1Alpha-puro-T2A-BFP (Addgene no. 60955)29 through Golden Gate Assembly (NEB E1602S). Then, through restriction cloning, a DNA duplex annealed from DNA oligos (5′-TTGGGAGACGCCTGCAGGCTGCTAAGCTAGGCGCGCCCGTCTCATTTTTTTC, 5′-TCGAGAAAAAAATGAGACGGGCGCGCCTAGCTTAGCAGCCTGCAGGCGTCTCCCAACAAG) was inserted into pJY126 digested with BstXI (NEB R0113S) and XhoI (NEB R0146S). This intermediate backbone (pJY127) was then digested with BamHI (NEB R0136S) and NotI (NEB R0189S), and a DNA duplex annealed from DNA oligos (5′-GATCCAGATCGGAAGAGCACACGTCTGAACTCCAGTCACGC, 5′-GGCCGCGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTG) was inserted through restriction cloning to produce the final pAC025 backbone plasmid.

To create a backbone plasmid suitable for use in cloning StopPR (lAC002) with a tevopreQ1 motif41, we first inserted a DNA duplex annealed from DNA oligos (5′-CGCGCCCGTCTCACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAATTTTTTTC, 5′-TCGAGAAAAAAATTCTAGTTGGTTTAACGCGTAACTAGATAGAACCGCGTGAGACGGG) into pJY127 digested with AscI (NEB R0558S) and XhoI. This intermediate backbone (pJY128) was then digested with BamHI and NotI, and a DNA duplex annealed from DNA oligos (5′-GATCCAGATCGGAAGAGCACACGTCTGAACTCCAGTCACGC, 5′-GGCCGCGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTG) was inserted through restriction cloning to produce the final pAC026 backbone plasmid.

For endogenously validating epegRNA hits from StopPR (lAC002), we used pAC025 as the backbone to construct the individual plasmids encoding ten selected pairs of stop and synonymous epegRNAs (pWY037-056), along with three no edit (pWY057-059) and two nontargeting epegRNAs (pWY060, pWY061) as negative controls. Gene fragments from Twist Bioscience were assembled into pAC025 digested with XbaI (NEB R0145S) and XhoI to produce pWY037-046 and pWY057-061, and gBlocks from IDT were assembled into pAC025 digested with BstXI and XhoI to produce pWY047-056 using NEB Hifi DNA assembly (E2621S).

Method details

Western blot for prime editor and MLH1

Cells were collected from cell culture (1 × 104 cells per µl) and lysed in 1× lysis buffer (1× NuPage LDS, 50 mM sample reducing agent). After resuspension via vortex, samples were incubated at 70 °C for 10 min. Temperature was raised to 85 °C for 3 min. After incubation, samples were moved to room temperature and Benzonase Mix (final concentration 5 mM MgCl2, 1.25 U µl1 benzonase) was added. Samples were then incubated at 37 °C for 30 min and subsequently used for protein electrophoresis. Samples (1 × 105 cells) were loaded and run on 3–8% Tris-Acetate Gels (ThermoFisher EA0375BOX) in Running Buffer (1× NuPage tris-acetate running buffer, 2.5× NuPage antioxidant) at 180 V until completion. Proteins were then transferred to an ethanol-activated polyvinyl difluoride membrane (BioRad 1620177) in transfer buffer (1× NuPage transfer buffer, 10% methanol, 2.5× NuPage antioxidant, 0.025% SDS) at 30 V for 1 h. Protein transfer and total protein content was assessed by Ponceau staining (Sigma Aldrich P7170-1L), and the membrane was carefully cut into three strips (<65, 65–120, >120 kDa) to separately blot for each protein of interest. Ponceau stain was washed out with 1× tris-buffered saline with Tween (TBST), and then these membranes were incubated in blocking buffer (1× TBST and 5% dry milk) for 1 h at room temperature. Membranes were then incubated overnight on a shaker at 4 °C in primary antibodies (β-actin CST3700S; MLH1 Invitrogen MA5-32041; Cas9 Takara 632607) diluted 1:1,000 in 1× TBST with 3% BSA, washed 3× in 1× TBST for 5 min and then incubated in secondary antibody (1× Li-COR intercept buffer, 1:20,000 IRDye secondary antibodies: goat antimouse LI-COR BioScience 926-32210 and goat antirabbit LI-COR BioScience 926-68071) for 1 h at room temperature in dark. Before imaging on a Li-COR Odyssey Infrared Imaging system, membranes were washed 3× in 1× TBST for 5 min.

Oligonucleotide library designs

Self-targeting +5 G>H library (lDS004)

Here, 640 target sites in human protein-coding genes were randomly selected from ‘library 1’ in ref. 52 and the corresponding highest-efficiency RTT/PBS length combination was determined for each selected site. We then designed three epegRNAs per target site with the selected PBS and identical or nearly identical RTT sequence, each specifying a +5 G>A, G>T or G>C edit. With the addition of 22 positive control epegRNAs for sites tested endogenously in the literature, 51 nontargeting controls (with a scrambled target site sequence) and seven no edit controls (with epegRNAs specifying the reference sequence), the final library of 2,000 epegRNA–target pairs tests seven PBS lengths (7, 9, 11, 13, 14, 15 and 17 nt), nine RTT lengths (10, 11, 12, 13, 14, 15, 17, 20 and 22 nt) and all three G>H mutations at the +5 position (Supplementary Table 1).

epegRNA sequences and accompanying target sites were synthesized as 250 nt oligonucleotides by Twist Bioscience. Oligonucleotides were structured with adapter sequences on both ends for library amplification, specifically 5′-GTATCCCTTGGAGAACCACCT on the 5′ end and 5′-CAGACGTGTGCTCTTCCGAT on the 3′ end, with internal BstXI (5′-CCACCTTGTTGG) and BamHI (5′-GGATCC) restriction enzyme sites surrounding epegRNA components (19 nt sgRNA and 17–39 nt extension sequences, 37 nt tevopreQ1 (ref. 41) and 7 nt polyT), 17 nt barcodes unique to each epegRNA–target pair and 45 nt target sites, with reversed BsmBI restriction enzyme sites (5′-GTTTAGAGACGGCATGCCGTCTCGGTGC) splitting the sgRNA target sequence from the remainder of designed components to facilitate a two-step cloning process. Target sites were designed to include 4 nt upstream of the protospacer sequence in addition to the PAM and full RTT binding site.

Self-targeting Tiled edits library (lRM001)

Five target sites within the mouse ZRS enhancer were selected based solely on their proximity to transcription factor binding sites of interest55 and the presence of a 5′-NGG-3′ sequence. epegRNAs were designed to encode all possible single nucleotide variants within a specific positional range at these targets, which overall spanned the +1 position (relative to the nick site) to the +6 to +21 position depending on the target site. Twenty epegRNAs were designed for each unique edit (that is, target site, edit position, substitution type combination) with PBS lengths of 6, 9, 12 and 15 nt and ‘homology flap’ lengths (the region 5′ of the encoded edit within the RTT that facilitates incorporation of the 3′ flap after reverse transcription) of 0, 4, 8, 12 and 16 nt. epegRNAs encoding the unedited sequence with a PBS length of 7 nt and RTT length of 13 nt were included for each target site as negative controls. Additionally, previously validated epegRNA designs encoding single nucleotide variant edits in Rnf2 and Hoxd13 were included as positive controls25,56 for a total library size of 3,745 constructs (Supplementary Table 2). For each epegRNA design, the corresponding target site (forward direction, −1 nt 3′ of protospacer to +29 nt 5′ of protospacer, 50 nt total) was included in the oligonucleotide construct along with a unique 12 bp barcode and necessary cloning and PCR sites as in the +5 G>H self-targeting library (lDS004, above). epegRNA–target site constructs, ranging in size from 200–250 nt, were synthesized by Twist Bioscience.

StopPR (lAC002)

A set of 1,247 genes were nominated for inclusion in StopPR due to their determined status as common essential genes by DepMap57. CRISPick58 was used to design 35 sgRNAs targeting each gene using reference genome Human GRCh38 (GRCh38.p13; National Center for Biotechnology Information (NCBI) Refseq GCF_000001405.39) with CRISPRko and SpyoCas9 options, which were then filtered to 16,278 sgRNA target sequences with on-target efficacy scores >0.5. Ensembl Biomart67 was used to obtain exon coordinates, coding sequences and full genomic regions for each target gene. Codons accessible to each protospacer that could be mutated to stop codons with 1, 2 or 3 bp mutations were identified, then any edits that could not be targeted with prime editing were removed, which could occur if the edit occurred at a position upstream of the Cas9(H840A) nick. For each targeted codon, mutations inducing a synonymous amino acid change (such as mutating the codon ACA to ACG, both encoding threonine) were also identified, and codons where a synonymous mutation could not be introduced were filtered, including the removal of all tryptophan codons, as only one codon sequence produces it. For each edit, we designed accompanying PBS (11, 13 nt) and RTT (10, 12, 15, 20 nt) sequences, and filtered any combinations that would result in a too-long oligonucleotide for synthesis.

epegRNA sequences were then designed into 120-nt oligonucleotides with flanking 5′ (5′-CACCAGAAGCCACCTTGTTG) and 3′ (5′-CTGTGTTGGTCTCCCGCG) amplification regions containing BstXI and BasI restriction enzyme sites for synthesis by Twist Bioscience. sgRNA and extension sequences were split by reversed BsmBI restriction enzyme sites (5′-GTTTAGAGACGGCATGCCGTCTCGGTGC) to enable a two-step cloning process. Finally, oligonucleotides that contained incidental restriction enzyme sites or homopolymer T runs (5+) were removed. Then 12,000 epegRNAs were designed to introduce no edits and 3,000 epegRNAs containing scrambled nontargeting spacer sequences were also included to generate a library of ~240,000 epegRNAs (Supplementary Table 3 and 4). Notably, during later analysis, an updated design filter identified a small number of epegRNAs with erroneous features (580 pairs of spacer- and codon-matched stop and synonymous epegRNAs for which either epegRNA was affected). These were removed before analysis (excluded epegRNAs indicated with ‘no’ in ‘included_in_analysis’ column in Supplementary Tables 3 and 4).

Cloning of epegRNA libraries

Self-targeting +5 G>H library (lDS004)

A two-step cloning process was used. First, the Twist oligo pool was PCR amplified using Phusion Plus polymerase (ThermoFisher F630S), 0.5 μM forward primer (5′-GTATCCCTTGGAGAACCACCT), 0.5 μM reverse primer (5′-CAGACGTGTGCTCTTCCGAT) and 0.1 pmol resuspended oligo pool with the following conditions: one cycle of 1 min at 98 °C; 15 cycles of 15 s at 98 °C, followed by 15 s at 60 °C, followed by 45 s at 72 °C; one cycle of 10 min at 72 °C; 10 °C hold. PCR products were purified using Machery-Nagel NucleoSpin Gel and PCR Clean-up kit (740609.50) as per the manufacturer’s protocol and quantified via Nanodrop. Vector backbone pAC025 was subjected to a BstXI-BamHI double restriction digest, followed by column clean-up. NEB Hifi DNA assembly was used to assemble the amplified library pool and digested vector in a 1:3 vector:insert ratio at 50 °C for 1 h. After SPRI purification, assembled products were transformed into electrocompetent cells (Endura, 60242-1) using a MicroPulser (BioRad). SOC media was added (for a total of 1.2 ml) and the transformation mixture was incubated at 37 °C for 1 h. The cells were then grown for 14 h at 37 °C in a 500 ml culture with Luria-Bertani medium and 100 μg ml1 carbenicillin, and plasmids were extracted from the resulting cultures. To assess intermediate library coverage and quality, epegRNA cassettes and target regions were amplified for validation sequencing using flanking 5′ primer (5′-AATGATACGGCGACCACCGAGATCTACACGCACAAAAGGAAACTCACCCT) and 3′ indexing primer (5′-CAAGCAGAAGACGGCATACGAGATNNNNNNNNGTGACTGGAGTTCAGACGTGTGCTCTTC) with the following program: one cycle of 30 s at 98 °C; ten cycles of 10 s at 98 °C, followed by 20 s at 65 °C, followed by 20 s at 72 °C; one cycle of 2 min at 72 °C and 10 °C hold. Sequencing was performed on an Illumina MiSeq at 500× coverage (‘Sequencing’ section). Notably, sequencing revealed that epegRNA identities and their accompanying target regions with barcodes became uncoupled in ~15% of reads, which we hypothesize may be due to the substantial homologous portions within and between each oligo. These uncoupled epegRNA–target site pairs were filtered from downstream analysis (‘Analysis of prime editing efficiencies’ section).

To complete the cloning, the intermediate library was digested with Esp3I enzyme (NEB R0734S) at 37 °C for 6 h and gel purified. The epegRNA scaffold sequence (5′-GTTTAAGAGCTATGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGC)66 was synthesized with flanking reversed Esp3I sites (5′-CGTCTCGGTTT and 5′-GTGCTGAGACG) as a gene fragment by IDT and amplified by PCR using Phusion polymerase, 0.5 μM forward primer (5′-TCACAACTACACCAGAAGCCAC), 0.5 μM reverse primer (5′-GCTGGCAACACTTTGACGAAGA) and 0.1 pmol resuspended gene fragment with the following program: one cycle of 30 s at 98 °C; 25 cycles of 10 s at 98 °C, followed by 10 s at 58 °C, followed by 15 s at 72 °C; one cycle of 5 min at 72 °C and 10 °C hold. The amplified scaffold was purified by column clean-up and digested with Esp3I at 37 °C for 6 h. After column clean-up, the purified scaffold insert (2 ng) was ligated with the digested initial plasmid library vector (200 ng) using T4 DNA Ligase (NEB M0202S) at 16 °C overnight. After SPRI purification, ligated products were transformed into Endura electrocompetent cells as above. Final library quality was assessed via sequencing as above, with 90% of library elements occurring within a 6.1× range and a Gini coefficient of 0.26 (Extended Data Fig. 2a).

Self-targeting Tiled edits library (lRM001)

Amplification and cloning of lRM001 was performed as described above for lDS004. The final cloned library was assessed via sequencing and reported a Gini coefficient of 0.30.

StopPR (lAC002)

As with the construction of lDS004, we used a two-step cloning process. First, the Twist oligo pool was PCR amplified using Phusion HSII HF (ThermoFisher, F565S), 0.4 μM forward primer (5′-CACCAGAAGCCACCTTGTTG), 0.4 μM reverse primer (5′-CTGTGTTGGTCTCCCGCG) and 10 ng resuspended oligo pool with the following program: one cycle of 30 s at 98 °C; six cycles of 10 s at 98 °C, followed by 20 s at 65 °C, followed by 10 s at 72 °C; one cycle of 5 min at 72 °C; 10 °C hold. Products from multiple PCR reactions were aggregated and purified using SPRI. Vector backbone pAC026 was subjected to a BstXI-BlpI (NEB R0585S) double digest at 37 °C for 4 h followed by SPRI purification, BsmBI-v2 (NEB R0739S) digest at 55 °C for 6 h and final SPRI purification. Amplified oligo pool was double digested with BstXI and BsaI-v2 (NEB R3733S) restriction enzymes at 37 °C for 4 h and purified through column clean-up. Digested oligo pool and vector backbone were ligated using T4 DNA Ligase at room temperature for 45 min and purified using SPRI. Transformation using electrocompetent Endura cells proceeded as described above, and library quality was assessed via sequencing. epegRNA cassettes were amplified for validation sequencing using primers as above for lDS004. Sequencing was performed on an Illumina NovaSeq at 600× coverage (‘Sequencing’ section).

To complete the cloning, the intermediate library was digested with BsmBI-v2 enzyme at 55 °C for 4 h and SPRI purified. PCR amplification and purification of the epegRNA scaffold proceeded as above. Purified PCR product was digested with BsmBI-v2 at 55 °C overnight, followed by SPRI purification. The purified scaffold insert (2 ng) was ligated with the digested intermediate plasmid library vector (200 ng) using T4 DNA Ligase at room temperature for 45 min. After SPRI purification, ligated products were transformed into Endura electrocompetent cells and final library quality was assessed via sequencing as above. StopPR exhibited moderate skew resulting from missing elements (Gini coefficient of 0.35, with 90% of analyzed library elements present within a 57× range). After filtering lowly represented epegRNAs (‘Analysis of epegRNA phenotypes’ section), we retained 84% of originally designed epegRNAs with well-distributed representation (Gini coefficient of 0.26, 90% of analyzed library elements present within a 5× range).

Production of lentivirus

Lentivirus production was performed for each library using a similar process. Lenti-X 293T cells (14 × 106) were seeded in a 150-mm cell culture dish with DMEM. Plasmids pALD-Rev-A (1 μg, Aldevron), pALD-GagPol-A (1 μg, Aldevron), pALD-VSV-G-A (2 μg, Aldevron) and the transfer vector (15 μg) were mixed with Opti-MEM I Reduced Serum Medium (Gibco, 31985070) and TransIT-LT1 (Mirus MIR 2300) transfection reagent, and cotransfected into cells. At 12–14 h post transfection, 1× ViralBoost reagent (ALSTEM VB100) was added to cells, and at 48 h post transfection, lentivirus-containing supernatant was collected and stored at −80 °C. To determine viral titer, serial dilutions of virus (0–500 μl) were transduced into K562 cells with 8 mg ml1 polybrene. Titer was calculated 48 h post transduction based on the percentage of cells expressing blue fluorescent protein (BFP).

Endogenous site editing

HEK3 and DNMT1

The lentiviral pegRNAs and epegRNAs (tevopreQ1) targeting HEK3 and DNMT1 endogenous sites were transduced separately, each into a total of 0.6 × 106 cells for PE2, PEmax and PEmaxKO stable cell lines in triplicate, at an MOI of 0.7. Cells were spun at 1,000g for 2 h in the presence of 8 mg ml1 polybrene (Santa Cruz Biotechnology, sc-134220) before incubating in a humidified incubator. Puromycin (Goldbio, P-600-100) was added 72 h post transduction to deplete untransduced cells. Cells were kept at a minimum of 2.5 × 107 cells per replicate, at a density of 0.5–1.0 × 106 cells per ml (splitting as necessary). Editing lasted for 28 days post transduction, with timepoint samples collected at days 7, 14, 21 and 28. Genomic DNA (gDNA) was extracted from collected K562 cells by first treating with lysis buffer (10 μM Tris-HCl, pH 7.5; 0.05% SDS; 25 μg ml−1 Proteinase K), then by incubating at 37 °C for 90 min followed by heat inactivation at 80 °C for 30 min.

Endogenous sites were amplified from gDNA using a two-step PCR. First, flanking 5′ and 3′ primers were used to amplify HEK3 and DNMT1 genomic sites. HEK3 was amplified with flanking 5′ primer (5′-CGCCCATGCAATTAGTCTATTTCTGC) and 3′ primer (5′-CTCTGGGTGCCCTGAGATCTTTT), with the following program: one cycle of 2 min at 98 °C; 32 cycles of 10 s at 98 °C, followed by 20 s at 69 °C, followed by 30 s at 72 °C; one cycle of 2 min at 72 °C; 10 °C hold. DNMT1 was amplified with flanking 5′ primer (5′-CACAACAGCTTCATGTCAGCCAAG) and 3′ primer (3′-CGTTTGAGGAGTGTTCAGTCTC), with the following program: one cycle of 2 min at 98 °C; 32 cycles of 10 s at 98 °C, followed by 20 s at 66 °C, followed by 30 s at 72 °C; one cycle of 2 min at 72 °C; 10 °C hold. Resulting PCR1 products were SPRI purified using 1.0× reactions. Then, 5′ (5′-AATGATACGGCGACCACCGAGATCTACACNNNNNNNNACACTCTTTCCCTACACGAC) and 3′ (5′-CAAGCAGAAGACGGCATACGAGATNNNNNNNNGTGACTGGAGTTCAGACGTGTGCTCTTC) indexing primers were used to amplify purified PCR1 products, with the following program: one cycle of 2 min at 98 °C; eight cycles of 10 s at 98 °C, followed by 20 s at 65 °C, followed by 30 s at 72 °C; one cycle of 2 min at 72 °C and 10 °C hold. Sequencing was performed on an Illumina MiSeq at 50,000× coverage (‘Sequencing’ section).

StopPR validation sites

To individually test negative growth phenotypes for hits from our StopPR (lAC002) screen, the lentiviral epegRNAs targeting the respective loci were each transduced separately into a total of 0.3 × 106 PEmaxKO cells in 24-well plates in triplicate, at an average MOI of 0.9. Cells were spun at 1,000g for 2 h in the presence of 8 mg ml1 polybrene before incubating in a humidified incubator. The cells were cultured for 14 days (splitting as necessary). For validating the editing efficiency at these same loci, we also transduced the cells at an average MOI of 1.5. Cells were selected by 3 μg ml1 puromycin at 72 h post transduction. At 7 days post transduction, gDNA was extracted from collected cells as above for HEK3 and DNMT1, with a heat inactivation for 45 min.

Endogenous sites were amplified from gDNA using a two-step PCR. First, ten sets of flanking 5′ and 3′ primers were used to amplify the targeted genomic sites using NEBNext Ultra II Q5 Master Mix (NEB, M0544) with the following program: one cycle of 30 s at 98 °C; 31 cycles of 10 s at 98 °C, followed by 20 s at 58 °C, followed by 40 s at 72 °C; one cycle of 2 min at 72 °C and 4 °C hold. Then, 5′ (5′-AATGATACGGCGACCACCGAGATCTACACNNNNNNNNACACTCTTTCCCTACACGAC) and 3′ (5′- CAAGCAGAAGACGGCATACGAGATNNNNNNNNGTGACTGGAGTTCAGACGTGTGCTCTTC) indexing primers were used to amplify PCR1 products by Phusion U Green Multiplex PCR Master Mix (Thermo Scientific, F564L), with the following program: one cycle of 2 min at 98 °C; nine cycles of 10 s at 98 °C, followed by 20 s at 61 °C, followed by 40 s at 72 °C; one cycle of 2 min at 72 °C and 4 °C hold. The resulting PCR2 products were pooled and gel purified. Sequencing was performed on an Illumina MiSeq (‘Sequencing’ section).

Pooled screening

Self-targeting +5 G>H screen (lDS004)

The lentiviral library was transduced into a total of 5 × 107 cells for both PEmax and PEmaxKO stable cell lines in replicate, at an MOI of 0.7 to achieve >10,000× coverage of the number of epegRNA–target pairs. Cells were spun at 1,000g for 2 h in the presence of 8 mg ml−1 polybrene before incubating in a humidified incubator with agitation (multitron). Then, 1 μg ml−1 Puromycin was added 72 h post transduction to deplete untransduced cells. To maintain coverage, cells were kept at a minimum of 2.5 × 107 cells per replicate (>10,000× coverage), at a density of 0.5–1.0 × 106 cells per ml (splitting as necessary). Screening lasted for 28 days post transduction, with timepoint samples (12,500–25,000× representation) collected at days 7, 14, 21 and 28. gDNA was extracted from collected K562 cells using the NucleoSpin Blood XL kit (Macherey Nagel, 740950.50). Subsequently, gDNA was treated with RNase A and purified by ethanol precipitation. epegRNA–target cassettes were PCR amplified using 5′ flanking primer (5′-AATGATACGGCGACCACCGAGATCTACACGCACAAAAGGAAACTCACCCT) and 3′ indexing primer (5′-CAAGCAGAAGACGGCATACGAGATNNNNNNNNGTGACTGGAGTTCAGACGTGTGCTCTTC). Each 100 μl reaction contained 10 μg of gDNA, 1 μM primers and 50 μl of NEBNext Ultra II Q5 Master Mix, and was run with the following program: one cycle of 1 min at 98 °C; 22 cycles of 10 s at 98 °C, followed by 30 s at 67 °C, followed by 45 s at 72 °C; one cycle of 5 min at 72 °C and 10 °C hold. Resulting PCR products from each sample were pooled and SPRI purified using 0.85–0.56× double-sided reactions.

Self-targeting Tiled edits screen (lRM001)

For the self-targeting, Tiled edits epegRNA screen, the prepared lentiviral library was transduced into a total of 3 × 107 PEmaxKO cells at an MOI of 0.2 to achieve >1,000× coverage of the number of epegRNA–target pairs. Cells were spun at 1,000g for 2 h in the presence of 8 μg ml−1 polybrene, split into two separate 150 ml cultures and maintained in a humidified incubator with agitation (multitron) at 37 C, 5% CO2 and 60 rpm. At 72 h post transduction, 2 μg ml−1 puromycin was introduced to deplete untransduced cells and replenished at each subsequent passage. To maintain >1,000× coverage, cells were kept at a density of 0.1–1.0 × 106 cells per ml in 150–200 ml cultures, splitting as necessary. Cell populations were sampled at days 5, 8 and 10 post transduction, collecting 50 × 106–75 × 106 cells per replicate at each timepoint (>1,000× representation). Nucleic acids were extracted from collected PEmaxKO cells using the NucleoSpin Blood XL kit, treated with RNase A and SPRI purified to obtain purified gDNA. epegRNA–target cassettes were PCR amplified and indexed as described for the initial self-targeting screen (lDS004).

StopPR screen (lAC002)

The lentiviral library was transduced into a total of 4.1 × 108 cells for PEmaxKO stable cell line in replicate, at an MOI of 0.7 to achieve >500× coverage of the number of epegRNAs. Cells were spun at 1,000g for 2 h in the presence of 8 mg ml−1 polybrene before incubating in a humidified incubator with agitation (multitron). Next, 1 μg ml−1 Puromycin was added 72 h post transduction to deplete untransduced cells. To maintain coverage, cells were kept at a minimum of 4.5 × 108 cells per replicate (>1,500× coverage), at a density of 0.5–1.0 × 106 cells per ml (splitting as necessary). Screening lasted for 28 days post transduction, with timepoint samples (1,250–2,000× representation) collected at days 7, 14 and 28. gDNA extraction and PCR amplification of epegRNA cassettes proceeded as above, under the following conditions: one cycle of 30 s at 98 °C; 22 cycles of 10 s at 98 °C, followed by 20 s at 65 °C, followed by 20 s at 72 °C; one cycle of 2 min at 72 °C; 10 °C hold. Resulting PCR products from each sample were pooled and SPRI purified using 0.85–0.56× double-sided reactions.

Sequencing

Endogenous sites

Sequencing was performed on an Illumina MiSeq with 10% phiX spike-in with single reads: I1 = 8 nt, i7 index read; I2 = 8 nt, i5 index read; R1 = 300 nt, endogenous sequence. Standard Illumina primers were used for all reads.

Self-targeting +5 G>H screen (lDS004)

Sequencing was performed on an Illumina MiSeq with 5% phiX spike-in with paired-end reads: I1 = 6 nt, i7 index read; I2 = 0 nt, i5 index read; R1 = 144 nt, epegRNA spacer and extension; R2 = 68 nt, target sequence and barcode. Custom primers were used for R1 (5′-GTGTGTTTTGAGACTATAAGTATCCCTTGGAGAACCACCTTGTTG), and standard Illumina primers were used for remaining reads.

Self-targeting Tiled edits screen (lRM001)

Sequencing was performed on an Illumina NovaSeq with 5% phiX spike-in with paired-end reads: I1 = 8 nt, i7 index read; I2 = 0 nt, i5 index read; R1 = 220 nt, epegRNA spacer and extension; R2 = 88 nt, target sequence and barcode. Custom primers were used for R1 as in sequencing of lDS004, and standard Illumina primers were used for remaining reads.

StopPR screen (lAC002)

Sequencing was performed on an Illumina NovaSeq with 25% phiX spike-in with paired-end reads: I1 = 8 nt, i7 index read; I2 = 0 nt, i5 index read; R1 = 28 nt, epegRNA spacer; R2 = 102 nt, epegRNA extension. Custom primers were used for R1 as in sequencing of lDS004, and standard Illumina primers were used for remaining reads.

All sequencing reads were demultiplexed through HTSEQ (Princeton University High Throughput Sequencing Database (https://htseq.princeton.edu/; v.13.13.15)).

Statistical analysis

Analysis of prime editing efficiencies

Endogenous sites

To analyze sequencing data, we first used CRISPRessoBatch68 to align reads to reference endogenous sequences (inputted as amplicon_seq) based on spacer sequences (inputted as guide_seq). Both min_average_read_quality and min_bp_quality_or_N arguments were set to 30, otherwise default parameters were used. The CRISPRessoBatch quantification window was positioned to include 25 nt on both sides of the Cas9(H840A) nick site (50 nt total window size), which ensured analysis of at least 10 nt past the RTT length for all sites. Custom Python scripts were used to further process aligned reads from CRISPRessoBatch (contained in allele frequency tables): First, to account for the presence of inferred single-nucleotide polymorphisms at the endogenous targets in K562 cells, we allowed either A/G at the position 11 nt upstream of the nick site and either A/G at the position 9 nt downstream of the nick site for the HEK3 reference, and for the DNMT1 reference, we allowed either A/G at the position 3 nt upstream of the nick site. Second, we also considered nucleotides assigned to ‘N’ by CRISPRessoBatch, which likely arise due to sequencing errors, as reference nucleotides. We then collapsed reads into alignment bins accordingly. Reads were classified as either precise edit (only variant was the intended edit), no edit (same as reference sequence) or error (contained a variant that was not the intended edit), and reported efficiencies describe the percentage of: number of reads with the classified edit/number of reads that align to the amplicon.

Self-targeting +5 G>H screen (lDS004)

This self-targeting screen was analyzed using a three-stage pipeline:

In the first stage, each read was assigned to an epegRNA identity (unique to each epegRNA–target pair) by aligning components of the epegRNA (contained on read 1) and target (contained on read 2) to reference sequences (that is, spacer through the end of epegRNA extension for read 1, target sequence through the barcode for read 2) using bowtie2. Read pairs with low mapping quality (≤5) or with recombination between the two reads were removed, and remaining reads were assigned to groups based on their epegRNA identities to enable parallel processing.

In stage two, the 45 nt target sites for each epegRNA–target pair were extracted, collapsed and analyzed to determine observed editing outcomes. First, we extracted the part of the read that matched the reference target site with at least 60% of bases. As we have a 45 nt target site, outcomes with 18 or more nucleotide differences from the reference would have been discarded (defining an upper limit on observed indel lengths). Next, barcodes were extracted from reads by identifying the portion of reads that matched the expected barcode with no more than eight mismatches, then any reads with errors in the barcodes (three or more mismatched bases) were filtered to ensure that target sites matched epegRNA identities. Then, reads were collapsed to ‘outcomes’ by identifying all reads with the same sequence. Outcomes that occurred at very low frequencies (<0.1% or ten total reads, whichever was higher) were filtered. We reasoned that the latter set of outcomes likely represented PCR or sequencing errors rather than edits introduced by prime editing. To deal with other outcomes likely containing systematic errors from low sequencing quality, we developed and applied the following algorithm: for each outcome, the mean sequencing quality score was calculated at each base; if the average quality was below 15 and the base did not match the reference sequence, it was corrected. This process was used sparingly, correcting a median of 33 reads per epegRNA–target pair across all four time points. After base correction, outcomes were globally aligned to their reference target sites and variants (substitutions, insertions and deletions) were called for each outcome. Each outcome was associated with zero (reference, no edits made) or more variants and classified as no edit (same as reference), precise edit (only variant is the intended edit) or error (contains a variant that is not the intended edit).

In stage three, all outcomes associated with individual epegRNA identities across all time points were aggregated into one file and the resulting individual files were concatenated for analysis. Any pairs with fewer than 50 reads at any of the four collected time points were removed from analysis, with a unified set of epegRNA–target pairs analyzed for both cell lines.

Self-targeting Tiled edits screen (lRM001)

Our second self-targeting screen was analyzed using a different, independent analysis pipeline:

For each paired read, the 20 nt epegRNA spacer sequence was extracted as bases 2–21 of R1 while the 12 bp construct barcode was extracted from R3 using the polyT termination sequence as an anchor to account for the fact that errors introduced during prime editing could alter the barcode’s absolute position in the read but not its position relative to the polyT. Reads with a valid barcode (exact match to reference) and spacer sequence with a Hamming distance ≤3 compared to the expected spacer sequence, using the barcode as ground truth, were then assigned to a given construct or design. While a valid barcode was detected in approximately 90% of the total reads per library, only 50% of total reads per library had a correct barcode:spacer match, including in the library prepared directly from the cloned plasmid pool (prepackaging and transduction), possibly due to recombination during PCR. Once assigned to a given construct, the R1 sequence of a paired read was searched for the constant epegRNA scaffold and tevopreQ1 regions by local alignment to their reference sequences using the align.localds() function within the pairwise2 module of Biopython (v.1.78) using a scoring criteria of 1, −1, −5, −1 and 0 for matches, mismatches, gap initiations, gap extensions and mismatches with ambiguous nucleotides (‘N’s), respectively.

The epegRNA 3′ extension (composed of the PBS and RTT) within the read was then extracted by taking the entire sequence between these two alignments. A paired read was only considered valid and used to assess editing rate if the extracted 3′ extension was an exact match (both in sequence and length) to the reference 3′ extension, again using the extracted read barcode as ground truth. Overall, 90% of assigned reads passed this last validation step, although this percentage varied between epegRNA designs with some constructs obtaining <20% valid reads. A minimum threshold of 200 valid reads detected per library was used to filter out poorly represented constructs, resulting in a final list of 3,662 epegRNA designs for which prime editing rates could be assessed. From the list of valid paired reads per construct, editing outcomes were determined by comparing the 50 nt region between bases 6–56 of the R2 sequence to unedited and precisely edited reference sequences. Exact matches (Hamming distance 0) dictated the assignment whereas reads matching neither sequence were classified as containing errors. Due to these stringent requirements and a low frequency of errors introduced during PCR and/or sequencing, a background rate of valid reads containing errors was observed for all constructs as established by sequencing epegRNA–target cassettes from the cloned plasmid pool (prepackaging and transduction): typically ranging between 1 and 10% of valid reads with an average error rate of 3.9%.

Analysis of epegRNA phenotypes

StopPR screen (lAC002)

To analyze deep sequencing data from StopPR, we used custom Python scripts to exactly match sequencing reads to epegRNA spacer and extension sequences. Excluded from reported library numbers and statistics throughout the paper were pairs of spacer- and codon-matched stop and synonymous epegRNAs that did not pass an updated design filter, including pairs for which either epegRNA converted a stop codon to a different stop codon or erroneously specified an edit in a noncoding region (found after updating validation code). These constituted a small minority of epegRNA pairs (580 total; associated epegRNAs are indicated as ‘no’ in ‘included_in_analysis’ column in Supplementary Tables 3 and 4). Notably, this set of excluded epegRNAs included 68 epegRNAs (designed as synonymous) targeting the intronic base directly adjacent to 3′ exon boundaries; this small number of epegRNAs with unintended targets was used in the Supplementary Discussion, ‘Examining a small subset of noncoding epegRNA phenotypes’. Additionally, we filtered any pairs of spacer- and codon-matched stop and synonymous epegRNAs for which either epegRNA had fewer than 200 reads at day 7 (23,024). At day 14 and day 28, a pseudocount of ten was added to all read counts to account for epegRNAs that had fully dropped out of the population. Enrichment of each epegRNA both at t = day 14 and t = day 28 was calculated as follows, where t0 = day 7:

$${log }_{2},{mathrm{enrichment}}={log }_{2}left({{{mathrm{read}}; {mathrm{fraction}}}}_{t}/{{{mathrm{read}}; {mathrm{fraction}}}}_{t_0}right)$$

Enrichment was then normalized by subtracting the median enrichment of negative control epegRNAs (NC, nontargeting controls), resulting in our final growth phenotype measurement:

$${mathrm{normalized}},{log }_{2}{mathrm{enrichment}}={mathrm{sample}},{log }_{2}{mathrm{e}}-{{mathrm{median}}, {mathrm{NC}}},{log }_{2}{mathrm{e}}$$

Phenotypes per epegRNA were averaged across replicates for both day 14 and 28, and all epegRNA phenotypes were converted to Z scores by dividing them by the standard deviation of the nontargeting control epegRNA phenotypes. A phenotype induction cutoff was set as two standard deviations below the mean enrichment of nontargeting controls (that is, a score of Z < −2) based on previous literature7. To determine a per-gene (or gene-level) stop epegRNA growth phenotype, the top two epegRNAs with the absolute largest stop epegRNA phenotypes for each gene were averaged.

StopPR validation

Enrichment of each epegRNA at t = day 14 was calculated as follows, where t0 = day 7:

$${log }_{2},{mathrm{enrichment}}={log }_{2}left({{mathrm{BFP}}^{+}{mathrm{cell}, {mathrm{fraction}}}}_{t}/{{mathrm{BFP}}^{+}{{mathrm{cell}}; {mathrm{fraction}}}}_{t_0}right)$$

Phenotypes per epegRNA were averaged across triplicates for day 14.

Multiple linear regression model

To investigate the effects of different epegRNA design choices on phenotypic outcomes in a more informative way than simple feature grouping (which can be potentially confounded by the fact that epegRNAs within any such group unavoidably represent multiple features), we built a multiple linear regression model. First, we restricted our analysis to all stop epegRNAs that targeted a codon where phenotype induction was observed by at least one epegRNA. Subsetting the data in this manner isolated edits for which we had reasonable evidence that edit installation could induce phenotype. We reasoned that, in these cases, features other than the edit itself would determine differences in phenotype induction. This set of 51,279 stop epegRNAs was used to create a multiple linear regression model with the following features to predict day 14 phenotypes: edit distance from cut site (1–20 bp), edit length (1, 2 or 3 bp), edit installed (174 possibilities as no epegRNA specifying a CCT>TAA edit induced a phenotype), starting codon (59 possibilities), stop codon installed (TAG, TGA, TAA), PBS (11, 13 nt) and RTT (10, 12, 15, 20 nt) length, spacer orientation relative to gene (sense or antisense), edit location within gene body (0–100%) and edit located within last exon of transcript (yes or no). Discrete features (starting codon, stop codon installed, substitution type, spacer orientation, last exon) were given numerical encodings through the use of tenfold target encoding that, together with the coefficients from the resulting model, enabled a ranking of the relative importance of each category within the different features. We opted to use a target encoding approach to keep the dimensionality of our model low, as it directly replaces categorical features with their phenotypic mean. RTT length and edit position were given additional quadratic terms in the model to adjust from the observed preference of 15 nt RTT length and edits within the PAM region (Fig. 4a and Extended Data Fig. 5a). After encoding, all features were centered and scaled by subtracting the mean and dividing by the standard deviation of each feature, and then the model was fit (Supplementary Table 5).

ePRIDICT evaluation

We used ePRIDICT48 to generate chromatin favorability scores for prime editing for each stop epegRNA that survived filters in StopPR. For a small number of edits (639), ePRIDICT was missing needed chromatin features and thus did not generate scores, leaving a set of 101,857 stop epegRNAs targeting 15,008 codons for analysis. We defined a codon-level ePRIDICT score as the average ePRIDICT score from all targeted genomic positions within the same codon, and subsequently defined codons with score >50 as having a favorable chromatin context, and those with score <35 as having an unfavorable chromatin context, following thresholds for high and low scores defined in the original publication48.

TSS analysis

To compare phenotypes for stop and control epegRNAs in our StopPR screen with respect to the targeted Cas9(H840A) nick’s distance from the transcription start site (TSS), we used Ensembl to obtain TSS coordinates for all genes targeted in our library. In the case that a gene had more than one annotated TSS, the closest TSS to the targeted Cas9(H840A) nick was used. We removed the subset of synonymous control epegRNAs that were found to target the D−1 position at a canonical splice site (2,637 epegRNAs), as phenotypes that may result from edits specified by these epegRNAs could be attributed to their edit as opposed to a possible CRISPRi effect.

Statistical testing and reproducibility

To compare top two stop epegRNA Z scores between bins of K562 CRISPRi phenotypes, edit position type, edit length, edit location in gene and stop codon installed, we used a one-way analysis of variance (ANOVA) followed by two-sided Tukey’s post hoc test. To compare top two stop epegRNA enrichment values between binarized features including RTT and PBS lengths, spacer orientation relative to gene, and installation in the last exon, we used a two-sided two-sample t-test. When comparing all sense and antisense stop epegRNAs targeting the same substitution, we used a two-sided two-sample t-test. When comparing epegRNAs targeting binned positions from the TSS, we used a two-sided two-sample t-test (all P values reported in Supplementary Table 6). KEGG pathway analysis was performed using ShinyGO web application69,70,71. Effect size analysis was performed using the cohen.d function from the effsize R package with default parameters, with Cohen’s d measurements greater than 0.8 in magnitude generally considered large72. For odds ratio calculations in the ePRIDICT analysis, a total of 101,857 stop epegRNAs were included, with 17,445 inducing phenotype (Z < −2). There were 36,223 stop epegRNAs identified with favorable chromatin contexts, of which 8,373 induced a phenotype, while 1,002 stop epegRNAs targeted codons in unfavorable contexts, with 144 of those inducing phenotype. For all analyses, nonsignificant P ≥ 0.05, *P < 0.05, **P < 0.01, ***P < 0.001.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.