Parallel genome-scale CRISPR-Cas9 screens uncouple human pluripotent stem cell identity versus fitness – Nature Communications

Culture of hESCs

Two human hESC lines were used in this study, H1 (NIHhESC-10-0043) and HUES8 (NIHhESC-09-0021), both with an inducible Cas9 insertion27,28. These lines were maintained in chemically defined, serum-free Essential 8 (E8) medium conditions (Thermo Fisher Scientific, A1517001) on tissue culture- treated polystyrene plates coated with vitronectin (Thermo Fisher Scientific, A14700) at 37 °C with 5% CO2. For regular maintenance, hESCs were dissociated with 0.5 mM EDTA (KD Medical, RGE-3130) at a 1:10–1:20 split ratio every 3–5 days. 10 µM Rho-associated protein kinase (ROCK) inhibitor Y-27632 (Selleck Chemicals, S1049) was added into the culture medium when passaging or thawing cells unless otherwise noted. Cells were counted using Vi-CELL XR Cell Viability Analyzer (Beckman Coulter). Cells were routinely confirmed to be Mycoplasma-free by the MSKCC Antibody and Bioresource Core Facility and karyotypically normal by the MSKCC Molecular Cytogenetics Core. All experiments were approved by the Tri-SCI Embryonic Stem Cell Research Oversight Committee (ESCRO).

Generation of H1 OCT4
GFP/+iCas9 Reporter Line

Generation of OCT4-2A-copGFP donor plasmid

The OCT4-GFP reporter construct was constructed using modifications to a selection free knockin strategy previously used in our lab26. The donor vector for this construct was pOCT4-2A-copGFP. To generate this vector, an NheI-2A-copGFP-AscI cassette was PCR amplified using the primers NhPTV-F and Asc_St_copGFP-R from the plasmid pCRIIAgPTV_ppGFP (An AgeI-PTV1-2A-ppGFP fragment was amplified by PCR using AgPTVppGFP-F and ppGFP-R, using pMAX-GFP plasmid (Lonza) as template. The AgeI-P2A-ppGFP insert was purified and Topo-cloned into pCR™II-TOPOTM (Thermo Fisher Scientific) resulting in pCRIIAgPTV_ppGFP). Next, the NheI-2A-copGFP-AscI was ligated into the pCRTMII-TOPOTM cloning vector using the TOPO TA Cloning Kit (Thermo Fisher Scientific, 450641) following manufacturers’ instructions for ligation and transformation. Next, the pCR™II-TOPOTM-NheI-2A- copGFP-AscI plasmid and the pOCT4-2A-copGFP plasmid were digested with NheI and AscI and ligated. Sequences for primers used for PCR and sequencing are listed in the Supplementary Data 6.

Transfection into H1 iCas9

Guide RNA (gRNA) and tracrRNA were ordered from IDT (Alt-R® CRISPR-Cas9 crRNA and #1072532). RNA molecules and plasmid were transiently transfected into hESCs using Lipofectamine 3000 (Thermo Fisher Scientific, L3000001) following manufacturer’s instructions. Briefly, gRNA and tracrRNA were added at a 10 nM final concentration, 5 µg donor plasmid was added. gRNA/tracrRNA and Lipofectamine/plasmid were diluted separately in Opti-MEM (Thermo Fisher Scientific, 31985070), mixed, incubated for 15 min at room temperature (RT), and added dropwise to 500,000 freshly seeded iCas9 hESCs in one well of a 24-well plate. Cas9 expression was induced with 2 µg/ml doxycycline one day prior to transfection, the day of transfection, and one day after transfection. GFP positive clones were isolated through FACS and subsequent single cell colony picking. The H1 OCT4GFP/+ iCas9 reporter line is a heterozygous line as confirmed through PCR and DNA sequencing. OCT4-GFP reporter fidelity was confirmed by flow-cytometry analysis. gRNA sequences are listed in Supplementary Data 6.

Flow cytometry

Flow cytometry analyses were performed as previously described24. Antibodies used for flow cytometry are listed in Supplementary Data 7. Briefly, cells were dissociated and stained with DAPI for live GFP data collection or fixed and stained with LIVE-DEAD Fixable Violet Dead Cell Stain (Invitrogen; L34955) and corresponding antibodies for data collection using BD LSRFortessa or BD LSRII. Annexin V staining was performed using the PE Annexin V Apoptosis Detection Kit I as per manufacturer’s instructions (BD Biosciences, 559763). Flow cytometry analysis and figures were generated in FlowJo v10. Gating strategy is shown in Supplementary Fig. 1d.

Neuroectoderm differentiation

Neuroectoderm differentiation performed as previously described90 with modifications. hESC cultures were disaggregated using TrypLE (Life Technologies, 12563-029) for 4 min, collected in E8 media, spun at 200 × g for 5 min, and resuspended in E8 media. 400,000 cells per well of 6-well plate were seeded on vitronectin (Thermo Fisher Scientific, A14700) with 10 µM Rho-associated protein kinase (ROCK) inhibitor Y-27632 (Selleck Chemicals, S1049) in E8 medium (Thermo Fisher Scientific, A1517001). 24 h after plating, cells were washed with PBS and exposed to Essential 6 (Thermo Fisher Scientific, A1516401) with 10 µM SB431542 (Tocris, 161410) and 500 nm LDN193189 (Cedarlane Labs, 04-0074-02). Media changed every 24 h.

Definitive endoderm differentiation

Definitive Endoderm differentiation performed as previously described91 with modifications. hESC cultures were disaggregated using TrypLE (Life Technologies, 12563-029) for 4 min, collected in E8 media, spun at 200 × g for 5 min, and resuspended in E8 media. 300,000 cells per well of 6-well plate were seeded on vitronectin (Thermo Fisher Scientific, A14700) with 10 µM Rho-associated protein kinase (ROCK) inhibitor Y-27632 (Selleck Chemicals, S1049) in E8 medium (Thermo Fisher Scientific, A1517001). 24 h after plating, cells were washed with PBS and exposed to S1/S2 medium supplemented with 20 ng/ml Activin A (Bon-Opus Biosciences; C687-1mg) for 3 days, and CHIR99021 (Stemgent, 04-0004-10) for 2 days (first day, 5 µM; second day 0.5 µM). S1/S2 medium was composed of MCDB131 medium (Thermo Fisher Scientific, 10372019) supplemented with 1.5 g/L sodium bicarbonate (Research Products International, S22060), 1x Glutamax (Thermo Fisher Scientific, 35050061), 10 mM glucose (Sigma-Aldrich, G8769), and 0.5% BSA (LAMPIRE, 7500804). Media changed every 24 h.

Infection and expansion for genome wide CRISPR-Cas9 Screens

The human Brunello gRNA library31, consisting of 76,441 guide RNAs (gRNAs) targeting 19,114 genes (four gRNAs per gene), was produced and tested as previously described24. A minimum of 200-fold library coverage is typically recommended for screens based on basic phenotypes such as cell survival and growth, given the relatively complex nature of our multiple reporter and essentiality screens, we target a 600X library coverage at all steps to maximize sensitivity. 7 Days before the start or our screens, 142 million H1 OCT4GFP/+iCas9 cells were infected with the lentiviral library at an MOI of 0.4 in 150-mm plates at a density of 1.67 million per plate (>600-fold library coverage after selection with puromycin). 6 μg/ml protamine sulfate was added concurrently with the virus infection to enhance the infection efficiency. Infected cells were treated with 2 μg/ml doxycycline (Thermo Fisher Scientific, BP26535) (beginning 24 h after plating) and 0.5 μg/ml puromycin (Sigma-Aldrich, P8833) (beginning 48 h after plating). 7 days post-infection, cells were treated with TrypLE Select (Thermo Fisher Scientific, 12563029), counted and replated for four individual screens. This was considered Day 0 of screening.

Genome Wide CRISPR-Cas9 Screens for Pluripotency

NE screen

160 million post-infection and selection D0 H1 OCT4GFP/+ iCas9 cells were replated in 150-mm plates at a density of 8 million per plate (>600-fold library coverage). 24 h after plating, cells were switched from maintenance E8 medium to NE differentiation medium (described in subsection Neuroectoderm Differentiation). After 36 h of NE differentiation, cells were dissociated using TrypLE Select and sorted using FACSArias (BD Biosciences), according to GFP expression. GFP+ and GFP− cells were collected in to two pellets per condition, with ~50 million cells (>600-fold library coverage) collected per condition. Pellets were frozen for subsequent DNA extraction.

DE screen

90 million post-infection and selection D0 H1 OCT4GFP/+ iCas9 cells were replated in 150-mm plates at a density of 6 million per plate (>600-fold library coverage). 24 h after plating, cells were switched from maintenance E8 medium to DE differentiation medium (described in subsection Definitive Endoderm Differentiation). After 60 h of DE differentiation, cells were dissociated using TrypLE Select and sorted using FACSArias (BD Biosciences), according to GFP expression. Sorted GFP+ and GFP− cells were collected in to two pellets per condition, with ~30 million cells (392-fold library coverage) collected per condition. Pellets were frozen for subsequent DNA extraction.

Genome Wide CRISPR-Cas9 screens for cell fitness

E8 screen

Post-infection and selection D0 H1 OCT4GFP/+ iCas9 cells were collected as Day 0 samples, with ~144 million cells (>600-fold library coverage) in two pellets. Pellets were frozen for subsequent DNA extraction. These Day 0 samples were used as the initial timepoint for both E8 and E6 screens. For later timepoint samples, 66.5 million post-infection and selection D0 H1 OCT4GFP/+ iCas9 cells were replated in 150-mm plates at a density of 3.5 million per plate (>600-fold library coverage). Cells were expanded and split again at the same cell number and density with TryPLE Select on Day 4 and Day 7 of expansion. On Day 10 of expansion ~150 million cells (>600-fold library coverage) were collected in two pellets. Pellets were frozen for subsequent DNA extraction.

E6 screen

Day 0 samples collected as described above in “E8 screen”. For later timepoint samples, 66.5 million post-infection and selection D0 H1 OCT4GFP/+ iCas9 cells were replated in 150-mm plates at a density of 3.5 million per plate (>600-fold library coverage). 24 hrs after plating (Day 1) medium was changed to Essential 6 (E6) medium conditions (Thermo Fisher Scientific, A1516401) and cultured in E6 medium for 120 h, when medium was changed back to E8 (Day 6). Cells were split again at the same cell number and density with TryPLE Select on Day 7 of expansion. On Day 10 of expansion ~121 million cells (>600-fold library coverage) were collected in two pellets. Pellets were frozen for subsequent DNA extraction.

gRNA sequencing

gRNA enrichment sequencing was performed by MSKCC Gene Editing & Screening Core Facility as previously described24. Briefly, genomic DNA from cell pellets was extracted using the QIAGEN Blood & Cell Culture DNA Maxi Kit (QIAGEN, 13362) and quantified by Qubit (Thermo-Scientific) following the manufacturer’s guidelines. Two-step PCR was performed to amplify gRNA sequences for HiSeq. The first PCR used primer sequences to amplify lentiGuide-puro using ~510 μg of gDNA (>1000-fold library coverage) per pellet. This PCR was performed using multiple separate 100 μL reactions each with 10 μg gDNA for 18 cycles, with pooling of the resulting amplicons by sample. For the second PCR, 5 μL of product from the first PCR was used in a 100 μL reaction for 24 cycles, with primers to attach Illumina adapters for barcoding. Primers from24. Gel-purified amplicons were quantified by Qubit and Bioanalyzer (Agilent) and sequenced on the Illumina HiSeq 2500 platform. Raw FASTQ files were demultiplexed and further processed to only contain unique gRNA sequences, and the processed reads were aligned to gRNA library sequences using the FASTX-Toolkit (http://hannonlab.cshl.edu/fastx_toolkit/). Two technical replicates were sequenced per condition. Technical replicate reproducibility was assessed by Pearson correlation test of number of reads per individual gRNA between technical replicates of the same condition. gRNA representation within replicates was assessed by calculating the total number of unique gRNAs represented with a cutoff for gRNA representation of >20 reads per replicate. For each sample, all available reads were combined from different sequencing runs. Read count normalization was performed to median number of reads per sample as part of MAGeCK analysis32.

Pluripotency screens data analysis

Genes were ranked by gRNA read count using the MAGeCK (model-based analysis of genome-wide CRISPR-Cas9 knockout) RRA algorithm32 using MAGeCK 0.5.9.4 default RRA parameters. In each screen, pro-pluripotency hits were defined as genes with 150 lowest ranked RRA scores (OCT4-GFPlo enrichment), anti-pluripotency hits were defined as genes with 150 lowest ranked RRA scores (OCT4-GFPhi enrichment). Pluripotency scores calculated per screen per gene = log10(RRA score OCT4-GFPhi enrichment) – log10(RRA score OCT4-GFPlo enrichment). Log2(Fold Change OCT4-GFPhi / OCT4-GFPlo) (LFC) was calculated per gene using MAGeCK 0.5.9.4 default parameters. Screen results are found in Supplementary Data 1. For GSEA top hit sets, genes were ranked by pluripotency score and GSEA performed with GSEA Software Version 4.2.392,93 using pre-ranked option. Screening data plotted using ggplot2 R-package94 formatted with Adobe Illustrator.

Comparison to previous differentiation screens

For comparison to other screens, ectoderm differentiation screen data from Naxerova et al.9 based on EpCAM + /NCAM- vs. EpCAM-/NCAM+ enrichment was used. These screens used 2 CRIPSR libraries covering roughly 2/3rs and 1/3rd of coding genome, published MaGECK analysis data for both ectoderm screen sets (“CRISPR Ectoderm P13 Mageck” and “CRISPR Ectoderm P2 Mageck”) were combined, then all genes ranked by published RRA scores. We defined pro-ectoderm differentiation hits as 150 genes with lowest neg. RRA score and anti-ectoderm differentiation hits as 150 genes with lowest pos. RRA score. Endoderm differentiation screen data from Li et al.24 based on SOX17+ and SOX17- enrichment. We used “Brunello MaGECK” data set to define hits, pro- endoderm differentiation hits defined as 150 genes with the lowest pos. RRA score, and anti-endoderm differentiation hits defined as 150 genes with lowest neg. RRA score.

Comparison hit overlaps visualized with Upset plots95 using the online implementation of UpsetR package96 UpSetR Shiny App (https://gehlenborglab.shinyapps.io/upsetr/) which were then formatted with Adobe Illustrator. Analysis of hit lists performed with STRING database v11.538 with enriched terms for “STRING clusters”, “GO Component”, and “COMPARTMENTS” shown. GO term enrichment analysis performed using Metascape 3.539 (https://metascape.org/).

Fitness screen analysis

Genes were ranked by gRNA read count using the MAGeCK (model-based analysis of genome-wide CRISPR-Cas9 knockout) RRA algorithm32 using MAGeCK 0.5.9.4 default RRA parameters. In each screen, pro-fitness hits were defined as genes with 150 lowest ranked RRA scores (Day 0 enrichment), anti-fitness hits were defined as genes with 150 lowest ranked RRA scores (Day 10 enrichment). Pluripotency scores calculated per screen per gene = log10(RRA score Day 10 enrichment) – log10(RRA score Day 0 enrichment). Log2(Fold Change Day 0/ Day 10) (LFC) was calculated per gene using MAGeCK 0.5.9.4 default parameters. Screen results are found in Supplementary Data 1.

Comparisons and clustering pluripotency and fitness screens

Pearson correlation test, UpSet plots, GSEA, and STRING analysis performed as described in Pluripotency Screen Analysis. Enrichment analysis on the Reactome Pathway Database was performed on top 150 hits from each screen that could be uniquely identified by the Entrez ID using the Reactome-PA R package49. These Reactome sets were then ranked by Median |LFC| of the genes intersected with the top 150 gene group in the screen. For the hierarchical clustering of pro-pluripotency and pro-fitness screening hits, relative levels LFC were represented as column z-scores, and hierarchical clustering was done using the Pearson correlation chosen as the distance metric and Ward’s algorithm as the linkage method. The top 8 distinguished branches in the dendrogram were defined as modules. Modules were further characterized by STRING database V11.538 using k-means clustering (k = 5) with default parameters.

Comparisons to existing fitness, pluripotency, and essentiality datasets

For comparison to other hESC fitness screens data from11,12,13 was used, as re-analyzed in Mair et al.12 by BAGEL analysis97. For all fitness screens, top 150 pro-fitness hits were calculated by highest Bayes Factor for fold change in late time point vs. early timepoint. Data for the Mair et al.12 screens were taken from the MEF “T12” set, and the Laminin “T12” set. Data for the Yilmaz et al.11 screen was from the “day30” set, and Ihry et al.13 data was from the “T18” data set. For the pluripotency screen in hESC maintenance conditions, top 150 pro-pluripotency hits were calculated by RSA score98. Data for the Yilmaz et al.13 pluripotency screen was taken from the “OCT4_low_high_RSA” set. Common essentiality genes shown are the “CRISPRInferredCommonEssentials” data set from DepMap51,52,54 version 22Q450. Precision and recall were calculated using pluripotency and fitness scores from NE, DE, E8, and E6 screens, using the essential (625 genes) and non-essential (350 genes) as defined by Wang et al.41. The essential and non-essential sets were used as true positive and true negative lists for PRC using the PRROC R-package99,100.

Screen hit validation

Hit validation was performed using the lentivirus CRISPR approach to generate knockouts in H1 OCT4GFP/+iCas9 cells. gRNAs from the Brunello library are listed in Supplementary Data 6. gRNAs were cloned into lentiGuide-puro (Addgene, 52963) following published protocols101. The lentiGuide-puro construct expresses a puromycin resistance gene, allowing for the selection of infected cells through puromycin treatment. 1 μg lentiGuide, 0.1 μg pCMV-VSV-G102 (Addgene, 8454), and 0.4 μg psPAX2 (Addgene, 12260) plasmids were transfected with the JetPRIME reagent (VMR, 89137972) into 293 T cells to pack lentiviruses. Viral supernatant was collected, aliquoted, and stored at −80 °C. A MOI of 0.30–0.36 was used for the infection of the H1 OCT4GFP/+ iCas9 cells with different lentiCRISPR viruses 7 days before plating for validation. 6 μg/ml protamine sulfate was added concurrently with the virus infection to enhance the infection efficiency. Reflecting screen conditions, infected cells were treated with 2 μg/ml doxycycline (Thermo Fisher Scientific, BP26535) (beginning 24 h after plating) and 0.5 μg/ml puromycin (Sigma-Aldrich, P8833) (beginning 48 h after plating). 7 days post-infection, cells were treated with TrypLE Select (Thermo Fisher Scientific, 12563029), counted and replated for validation. All replicates were performed starting from viral infection.

NE validation

For NE validation post-infection and selection H1 OCT4GFP/+ iCas9 cells were replated in 6 well plates at 400,000 cells/well. 24 h after plating, cells were switched from maintenance E8 medium to NE differentiation medium (described in the NE differentiation subsection). After 36 h of NE differentiation, cells were dissociated using TrypLE Select and GFP levels were analyzed by flow cytometry. Relative intensity of OCT4-GFP = (Mean Fluorescent Intensity (MFI) per gRNA)/(Experimental Mean (MFI non-targeting controls)). All experimental repeats were performed starting from viral infection.

DE validation

Given our observation that DE differentiation efficiency is density dependent, we were concerned that varying growth rates of knockout hESCs might affect DE differentiation and therefore the downregulation of pluripotency, even for genes that were not direct regulators of the dissolution of pluripotency in DE context. DE validation was performed with normalization to an in-well uninfected tdTomato (tdT)+ control, given the sensitivity of DE differentiation to variability in plating density. To generate the tdT+ control, H1 OCT4GFP/+iCas9 cells were previously infected with virus containing the tdT containing plasmid pWPXL_Luc2tdT103 which was a gift from Wenjun Guo. tdT+ clones were isolated through FACS and subsequent single cell colony picking. For DE validation post-infection and selection H1 OCT4GFP/+ iCas9 cells were co-plated with tdT+ cells at 150,000 cells/well of each for 300,000 cells/well total. 24 h after plating, on D8, cells were switched from maintenance E8 medium to DE differentiation medium (described in the DE differentiation subsection). After 60 h of DE differentiation (D8-D10.5), cells were dissociated using TrypLE Select and GFP levels were analyzed by flow cytometry. Relative intensity of OCT4-GFP = (MFI per gRNA/ MFI in-well tdT+ control)/(Experimental Mean (MFI non-targeting controls/MFI in-well tdT+ controls for non-targeting gRNAs)). All experimental repeats were performed starting from viral infection.

E8 validation

For E8 validation post-infection and selection H1 OCT4GFP/+ iCas9 cells were replated in 6 well plates at 175,000 cells/well (D0). Cells were expanded and split again at the same cell number and density with TryPLE Select on Day 4 and Day 7 of expansion. On Day 10 of expansion, cells were dissociated using TrypLE Select and GFP levels were analyzed by flow cytometry. Relative intensity of OCT4-GFP = (MFI per gRNA)/(Experimental Mean (MFI non-targeting controls)). All experimental repeats were performed starting from viral infection.

E6 validation

For E6 validation post-infection and selection H1 OCT4GFP/+ iCas9 cells were replated in 6 well plates at 175,000 cells/well (D0). 24 h after plating (Day 1) medium was changed to Essential 6 (E6) medium conditions (Thermo Fisher Scientific, A1516401) and cultured in E6 medium for 120hrs, when medium was changed back to E8 (Day 6). Cells were split again at the same cell number and density with TryPLE Select on Day 7 of expansion. On Day 10 of expansion wells were dissociated using TrypLE Select and cells/well counted using the Vi-CELL XR Cell Viability Analyzer (Beckman Coulter). All experimental repeats were performed starting from viral infection.

Generation of clonal knockout hESCs

Clonal knockouts (KOs) were generated in the HUES8 iCas9 hESC28 as previously described with some modifications104. Sequences for of gRNAs and primers used for PCR and sequencing are listed in Supplementary Data 6. gRNAs and tracrRNA were ordered from IDT (Alt-R® CRISPR-Cas9 crRNA and #1072532). RNA molecules were transiently transfected into hESCs using Lipofectamine RNAiMAX (Thermo, 13778100) following manufacturer’s instructions. Briefly, gRNA and tracrRNA were added at a 15 nM final concentration. gRNA/tracrRNA and Lipofectamine RNAiMAX were diluted separately in Opti-MEM (Thermo, 31985070), mixed together, incubated for 15 min at room temperature (RT), and added dropwise to 250,000 freshly seeded iCas9 hESCs in a 24-well plate. Cas9 expression was induced with 2 μg/ml doxycycline one day prior to transfection, the day of transfection, and one day after transfection. Three to four days after transfection, hESCs were dissociated to single cells using TrypLE Select (Thermo Fisher Scientific, 12563029), and 500–1000 cells were plated into one 100-mm tissue culture dish with 10 ml E8 media supplemented with 10 μM ROCK inhibitor Y-27632 (Selleck Chemicals, S1049) for colony formation. After 10 days of expansion, 96 colonies were picked into individual wells of a 96-well plate. gDNA from crude cell lysate was used for PCR genotyping, followed by expansion of KO cell lines. Additional sequencing performed on gDNA extracted by QIAGEN Blood & Cell Culture DNA Maxi Kit (QIAGEN, 13362), followed by PCR and insertion into Zero Blunt TOPO PCR Cloning Plasmid (Thermo Fisher Scientific, 450245) which were transfected and expanded per manufacturer instructions. Plasmid was miniprepped using the Zyppy Plasmid Miniprep Kit (Zymo, D4037) per manufacturer’s instructions and sequenced. Clonal KOs were also confirmed by western blot.

Western blots

Cell pellets were snap frozen in liquid nitrogen and lysed in cell lysis buffer (9803, Cell Signaling Technology) with proteinase/phosphatase inhibitors (5872, Cell Signaling Technology) and 1 mM PMSF (ICN19538105, MP Biomedicals). Proteins were precleared by centrifugation at 14,000 g for 10 min at 4 °C. Protein concentration was determined by the Bradford Protein Assay (Bio-Rad, 500-0202). Equal amounts of protein were loaded into Bis-Tris 10% gel (Novex, NP0301BOX) and transferred to nitrocellulose membranes (Novex, LC2001). Membranes were blocked with 5% milk (LabScientific, M-0841). Primary antibody was incubated overnight at 4 °C. Membranes were washed with TBST three times for 10 min each and incubated with fluorescent conjugated secondary antibody for 1 h at room temperature. Membranes were washed with TBST three times for 10 min each. Blots were visualized using the Odyssey DLx Imaging System (LICOR) Antibodies used for western are listed in Supplementary Data 7.

Growth curves

hESCs were disaggregated using TrypLE Select and then mechanically dissociated into single cells using 1000 µl tips. One hundred thousand cells were plated into one well of a 6-well plate on vitronectin in E8 medium with ROCK inhibitor. Cells were subsequently maintained in E8 and harvested after TrypLE treatment every 24 h for counting cell numbers. For E6 growth curves, cells were rinsed with PBS and changed to Essential 6 (Thermo Fisher Scientific, A1516401) 24 h after plating. For -ROCKi growth curves, cells were plated in E8 without ROCK inhibitor.

Cell competition assay

Individual LARRY barcode constructs were cloned from the LARRY barcode library (Addgene:140024)105 and transfected to 293 T cells to generate lentivirus. Next, each OTUD5 KO, and WT clone was infected with a unique LARRY barcode at MOI ~ 0.3. One week after lentiviral infection, the barcoded OTUD5 KO and WT cells, which expressed GFP, were isolated by FACS. Sorted cells were then expanded for 2–3 passages in E8 medium. To do the cell competition test, an equal number of barcoded OTUD5 KO and WT cells were pooled and seeded in 6-well plates 200k per well. Cells were passaged every 3–4 days by TrypLE dissociation and 200k cells were seeded every time. 1, 2, 3, and 4 weeks after pooling, cells were collected for genomic DNA extraction using the Qiagen DNeasy Blood & Tissue Kit (QIAGEN, 69506). The LARRY barcodes were amplified via PCR using the Q5 High-Fidelity DNA Polymerase Kit (NEB, MO0491L) using 500 ng genomic DNA and LARRY-F/R primers, sequences are listed in Supplementary Data 5. PCR cleanup was performed using AMPure XP Beads (NEB, E7530). A second round of PCR using this purified PCR product using adapters and indexes as described in ref. 106. Samples were pooled and submitted to the MSKCC Integrated Genomics Operation where sample quantity and purity were determined using a Qubit fluorometer. Library efficiencies were confirmed by Bioanalyzer (Agilent) and libraries were sequenced on the Illumina HiSeq 400 Platform in PE150 mode, 2–3 million reads per sample. We used CRISPResso2 (http://crispresso.pinellolab.org/submission)107 to quantify the representation of each barcode and thereby each cell line. As each cell-line was labeled with a single barcode, cell-line representation was calculated by %individual barcode in total barcode reads.

RNA isolation and RNA-Seq

Total RNA was extracted using Quick-RNA MiniPrep kits (ZYMO Research; R1055) following the manufacturer’s guidelines. Bulk RNA-Seq was performed by the MSKCC Integrated Genomics Operation as previously described29,108 Alignment was performed as described109. DESeq2110 was used to analyze gene differential expression by comparing transcriptomes of WT and TADA2B KO cells in hESCs. DEGs were identified based on cut-off log2(FC) > 1 and FDR < 0.05. Results plotted with Enhnced Volcano R-package. GSEA performed with GSEA_4.0.3 using the pre-ranked option and log2(FC) for pairwise comparisons.

ChIP-MS and analysis

OTUD5 ChIP-MS was performed on NE and DE Day 1 cells (differentiation described in the NE and DE differentiation subsections) using HUES8 iCas9 WT2 and HUES8 iCas9 OTUD5-/- KO2 hESC lines. OCT4 ChIP-MS was performed using HUES8 iCas9 hESCs grown in E8 conditions. ChIP-MS was performed from 15 million cells/experiment as previously described29,108. Antibodies used for immunoprecipitation are listed in Supplementary Data 7. Proteins were eluted from the ChIP immunoprecipitation using a buffer containing 5% SDS (Thermo Fisher Scientific, AM9820), 5 mM DTT (Thermo Fisher Scientific, FERR0861) and 50 mM ammonium bicarbonate (pH = 8), and left on the bench for about 1 hour for disulfide bond reduction. Samples were then alkylated with 20 mM iodoacetamide (VWR, IC10035105) in the dark for 30 min. Afterward, phosphoric acid (Thermo Fisher Scientific, A2421) was added to the sample at a final concentration of 1.2%. Samples were diluted in six volumes of binding buffer (90% methanol and 10 mM ammonium bicarbonate, pH 8.0). After gentle mixing, the protein solution was loaded to an S-trap filter (Protifi, C02-micro-80) and spun at 500 g for 30 s. The sample was washed twice with binding buffer. Finally, 1 µg of sequencing grade trypsin (Promega, V5111), diluted in 50 mM ammonium bicarbonate, was added into the S-trap filter and samples were digested at 37 oC for 18 h. Peptides were eluted in three steps: (i) 40 µl of 50 mM ammonium bicarbonate, (ii) 40 µl of 0.1% TFA and (iii) 40 µl of 60% acetonitrile and 0.1% TFA. The peptide solution was pooled, spun at 1000 g for 30 s and dried in a vacuum centrifuge. Prior to mass spectrometry analysis, samples were desalted using a 96-well plate filter (Orochem) packed with 1 mg of Oasis HLB C-18 resin (Waters). Briefly, the samples were resuspended in 100 µl of 0.1% TFA and loaded onto the HLB resin, which was previously equilibrated using 100 µl of the same buffer. After washing with 100 µl of 0.1% TFA, the samples were eluted with a buffer containing 70 µl of 60% acetonitrile and 0.1% TFA and then dried in a vacuum centrifuge.

Samples were then resuspended in 10 µl of 0.1% TFA and loaded onto a Dionex RSLC Ultimate 300 (Thermo Scientific), coupled online with an Orbitrap Fusion Lumos (Thermo Scientific). Chromatographic separation was performed with a two-column system, consisting of a C-18 trap cartridge (300 µm ID, 5 mm length) and a picofrit analytical column (75 µm ID, 25 cm length) packed in-house with reversed-phase Repro-Sil Pur C18-AQ 3 µm resin. To analyze the proteome, peptides were separated using a 90 min gradient from 4 to 30% buffer B (buffer A: 0.1% formic acid, buffer B: 80% acetonitrile + 0.1% formic acid) at a flow rate of 300 nl/min. The mass spectrometer was set to acquire spectra in a data-dependent acquisition (DDA) mode. Briefly, the full MS scan was set to 300–1200 m/z in the orbitrap with a resolution of 120,000 (at 200 m/z) and an AGC target of 5 × 10e5. MS/MS was performed in the ion trap using the top speed mode (2 s), an AGC target of 1x10e4 and an HCD collision energy of 35. Proteome raw files were searched using Proteome Discoverer software (v2.4, Thermo Scientific) using SEQUEST search engine and the SwissProt human database (updated June 2021). Each analysis was performed with three biological replicates. The search for total proteome included variable modification of N-terminal acetylation, and fixed modification of carbamidomethyl cysteine. Trypsin was specified as the digestive enzyme with up to 2 missed cleavages allowed. Mass tolerance was set to 10 pm for precursor ions and 0.2 Da for product ions. Peptide and protein FDR was set to 1%. Following the search, data was processed as previously described111. Briefly, proteins were log2 transformed, normalized by the average value of each sample and missing values were imputed using a normal distribution 2 standard deviations lower than the mean. Statistical regulation was assessed using heteroscedastic T-test (if p < 0.05). Data distribution was assumed to be normal but this was not formally tested. Interaction score and GO analysis for select hits determined by STRING online database38 with visualization of network by Cytoscape112.

Statistical analysis

All datapoints refer to biological repeats. No statistical method was used to predetermine sample sizes. The investigators were not blinded to allocation during experiments and outcome assessment. No data were excluded from analyses. The number of biological replicates are reported in the legend of each figure. Flow cytometry analysis and growth curves were derived from at least three independent experiments per cell line unless specified in the legends. For ChIP-MS of OCT4 quantification and statistics were derived from two independent experiments. ChIP-MS of OTUD5 quantification and statistics were derived from three independent experiments. CRISPR-Cas9 screening were performed once. All the statistical analyses methods are indicated in the figure legends and methods. Quantification of flow cytometry and growth curve data are shown as the mean ± s.d. Student’s t test was used for comparison between two groups. Statistical significance is indicated in each figure.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.