Close this search box.

Cell of origin epigenetic priming determines susceptibility to Tet2 mutation – Nature Communications


Tet2fl/fl mice30 (#017573, obtained from The Jackson Laboratory) were crossed to Mx1-Cre mice (#003556, obtained from The Jackson Laboratory). Mx1-Cre negative littermates were utilized as controls. Genotyping was performed with primers listed in Supplementary Data 4 (WT amplicon: 250 bp, fl/fl amplicon: ~450 bp). For competitive bone marrow transplantation experiments, aged-matched CD45.1(STEM) mice70 were utilized (mice were bred at Massachusetts General Hospital). WT C57BL6/J recipients (#000664, obtained from The Jackson Laboratory) were subjected to whole body irradiation (9.5 Gy) from a 137Cs source 1 day before BM transplantation. 4 doses of 12.5 μg/g pI:pC (Amersham) were administered intraperitoneally to induce Cre activity. Eight to twelve weeks old, age-matched randomized male and female mice were used in all experiments. For bone marrow transplantation experiments, female mice were used as recipients. All mice were bred and maintained in pathogen-free conditions and all procedures performed were approved by the Institutional Animal Care and Use Committee of Massachusetts General Hospital (protocol #2016N000085).

Cell lines

Hoxb8-ER conditionally immortalized clones were generated at Massachusetts General Hospital and maintained in RPMI medium (Thermo Fisher) supplemented with 1% Penicillin/Streptomycin, 1% Glutamine, 0.5 μM beta-estradiol (Sigma, E2758) and conditioned media containing different cytokines. For the neutrophil-biased cells, media contained ~100 ng/ml SCF (generated from a Chinese hamster ovary cell line that stably secretes SCF), whereas for macrophage progenitors GM-CSF 10 ng/ml (Peprotech). Hoxb8-ER cultures were kept in a 5% CO2 humidified atmosphere at 37 °C.

293 T cell line used for retroviral and lentiviral production was obtained from ATCC (CRL-3216).

Primary cells

For BM transplant studies, primary mouse Lin- Kit+ Sca+ cells were seeded at the concentration of 5 × 105 cells/ml in serum-free StemSpan medium (StemCell Technologies) supplemented with 1% Penicillin/Streptomycin, 1% Glutamine, and mouse early-acting cytokines (mSCF 100 ng/ml,m Flt3-L 100 ng/ml, mTPO 50 ng/ml, and mIL-6 20 ng/ml; all purchased from Peprotech). For PVA-expansion studies, Lin- Kit+ Sca+ Cd150 + Cd48- Epcr+ cells were seeded in 1X Ham’s F-12 Nutrient Mix liquid media (Gibco), supplemented with 1 M HEPES (Gibco), 1% Penicillin/Streptomycin/Glutamine, 1% ITSX (Gibco), mSCF 10 ng/ml, mTPO 100 ng/ml (Peprotech), 1 mg/ml PVA (Millipore Sigma). Further details are provided in the following sections. HSC cultures were kept in a 5% CO2 humidified atmosphere at 37 °C.

scATAC-seq single cell profiling

Cell lysis, tagmentation and droplet library preparation were performed following the SureCell ATAC-Seq Library Prep Kit User Guide (17004620, Bio-Rad). Harvested cells and tagmentation related buffers were chilled on ice. Lysis was performed simultaneously with tagmentation. Washed and pelleted cells were resuspended in Whole Cell Tagmentation Mix containing 0.1% Tween 20, 0.01% Digitonin, 1 x PBS supplemented with 0.1% BSA, ATAC Tagmentation Buffer and ATAC Tagmentation Enzyme. Cells were mixed and agitated at 500 rpm on a ThermoMixer (Eppendorf) for 30 min at 37 °C. Tagmented cells were kept on ice prior to encapsulation. Tagmented cells were loaded onto a ddSEQ Single-Cell Isolator (12004336, Bio-Rad). Single-cell ATAC-seq libraries were prepared using the SureCell ATAC-Seq Library Prep Kit (17004620, Bio-Rad) and SureCell ddSEQ Index Kit (12009360, Bio-Rad). Bead barcoding and sample indexing were performed following the standard protocol and the number of amplification cycles was adjusted according to cell input. Libraries were loaded on a NextSeq 550 (Illumina) and sequencing was performed using the NextSeq High Output Kit (150 cycles) and the following read protocol: Read 1 118 cycles, i7 index read 8 cycles, and Read 2 40 cycles. A custom sequencing primer is required for Read 1 (16005986, Bio-Rad).

scRNA-seq single cell profiling

scRNA-Seq was performed on a Chromium Single-Cell Controller (10X Genomics) using the Chromium Single Cell Reagent Kit v2, Chromium Next GEM Chip A and Chromium i7 Multiplex Kit according to the manufacturer’s instructions. Briefly, single cells were partitioned in Gel Beads in Emulsion (GEMs) and lysed, followed by RNA barcoding, reverse transcription and PCR amplification (according to the available cDNA quantity). scRNA-Seq libraries were prepared according to the manufacturer’s instructions, checked and quantified on Tapestation 4200 (Agilent) and Qubit 4 fluorometer (Invitrogen). Sequencing was performed on a Novaseq machine (Illumina) using the Novaseq S1 Kit (100 cycles).

Multiplexed Bulk ATAC-seq

Indexed Tn5 transposome complexes were assembled as described previously39 Also see this reference for a description of how the barcodes were designed and a table with the oligo sequences. Cells were washed twice with 1 x PBS, counted, and resuspended to a concentration of 0.5 × 106 cells/mL. 2 μL of cells in 1 x PBS (1,000 cells) were mixed with 2 μL of barcoded Tn5, 5 μL 2x Illumina Tagment DNA Buffer (TD), 0.1 μL 10% NP40 (final concentration 0.1%) and 0.9 μL H2O in a 96 well plate. Each well contained a different sample and Tn5 barcode. Cells were mixed and agitated on a ThermoMixer (Eppendorf) at 500 rpm for 30 min at 37 °C. All the wells were pooled together on ice to prevent cross-contamination between Tn5 barcodes. Tagmented DNA was purified using a MinElute PCR Purification Kit (Qiagen), then minimally amplified for sequencing as previously described105. Final libraries were purified using the MinElute PCR Purification Kit (Qiagen), and sequenced on a NextSeq 550 (150 cycles), using the following parameters: Read 1 92 cycles, i7 index read 8 cycles, and Read 2 66 cycles, 50% of PhiX Sequencing Control. A custom sequencing primer is required for Read 1 (16005986, Bio-Rad).


SHARE-seq was performed as described previously40. Briefly, single cells were fixed by adding Formaldehyde (28906, ThermoFisher) at a final concentration of 1%. Fixed cells were transposed using barcoded Tn5 (Seqwell) in a transposition buffer (1 x TD buffer from Illumina Nextera kit, 0.1% Tween 20 (P9416, Sigma), 0.01% Digitonin (G9441, Promega)) at 37 °C for 30 minutes with shaking at 500 rpm. Transposed cells were reverse transcribed using Maxima H Minus Reverse Transcriptase along with RT primer containing a Unique Molecular Identifier (UMI), a universal ligation overhang and a biotin molecule. Ligation of barcoded adapters was performed using three rounds of split pool barcoding followed by reverse crosslinking. ATAC and RNA libraries were prepared as previously described40. Libraries were quantified with KAPA Library Quantification Kit and pooled for sequencing. Libraries were sequenced on the Nova-seq platform (Illumina) using a 200-cycle S1 kit and the following read protocol: Read 1: 50 cycles, Index 1: 99 cycles, Index 2: 8 cycles, Read 2: 50 cycles.

scATACseq data processing

Genome-wide chromatin accessibility peaks were called using MACS v2 (MACS2)106 on the merged aligned scATAC-seq reads per condition, generating a list of peak summit calls per condition. To generate a non-overlapping set of peaks, we first extended summits of each condition to 800 bp windows (±400 bp). We combined these 800 bp peaks, ranked them by their summit significance value and retained specific non-overlapping peaks on the basis of this ordering. We further added to the peak list all non-overlapping peaks from the ImmGen ATAC-seq atlas, after also extending the ImmGen peaks to 800 bp windows107 ( This resulted in a filtered list of disjoint peaks (n = 297,361), which were finally resized to 301 bp (i.e. ± 150 bp from each peak summit) and used for all downstream analyses.

scRNA-seq analysis

Base call files were demultiplexed, for each flow cell directory, into FASTQ files using Cellranger v3.1.0 ( mkfastq with default parameters. FASTQ files were then processed using Cellranger count with default parameters. Gene-mapped counts were then loaded into R as a Seurat108 object and used for downstream analysis. Genes with at least one UMI across cells were retained, and cells with a number of unique feature counts ≥ 200 and total UMIs ≥ 5000 and mitochondrial read percentage of <5% were initially retained. Normalization and scaling of RNA gene expression levels was then performed. PCA dimensionality reduction was run and UMAP was used for the final 2D cell projection (top 30 PCs). A cell kNN graph was determined using the FindNeighbors function in Seurat (k = 30 cell neighbors). Cells were then grouped into clusters using the FindClusters Seurat function (resolution = 0.8; Leiden algorithm), and cluster and cell annotations manually assigned by visualizing the mean and percent expression of cell identity markers within cell clusters. Broader annotations were determined by merging finer cell groupings.

scATAC-seq single cell clustering and annotation

First, dimensionality reduction was performed with cisTopic109 using the runWarpLDAModels function as part of the cisTopic R package, with the prior number of topics set to 50. Next, Harmony110 was run on the cisTopic cell Z-scores to adjust for observed sequencing batch effects (correcting for animal as a batch covariate). The batch-corrected cisTopic cell Z-scores were then used to project cells in 2D by running UMAP as part of the uwot R package, with k = 50 cell neighbors and a cosine distance metric. Cells were clustered using a Louvain algorithm, and clusters were annotated using gene activity scores and gene expression markers (see below).

Cell localization analysis

For visualization of sample distribution on UMAP coordinates (Figs. 1f, 5g), the % cell neighborhood per cell that belongs to a specific sample was represented for Tet2 KO and Sox4 OE Tet2 KO cells (relative to wild type cells). For each cell, the k-nearest neighbors were considered (k = 50) using the batch-corrected (harmony) principal components, and the fraction of neighborhood cells that came from a non-wild-type genotype was determined, and shown on the UMAP. All mice were utilized for this analysis.

Similar analysis was performed for Fig. 4g and Supplementary Fig. 6c, representing the % cell neighborhood per cell that belongs to a specific sample for each SCF-derived GMP clone (relative to all other clones).

TF motif scores

TF motif accessibility Z-scores were computed for scATAC-seq data using chromVAR111. Briefly, scATAC-seq data (n = 57,232 cells; n = 297,361 peaks) was used as input, and GC bias for each peak was determined using the BSgenome.Mmusculus.UCSC.mm10 reference genome. Mouse cisBP TF motifs (n = 890 TFs) were then matched against the reference peak set, and n = 100 background iterations were used, using which deviation Z-scores were estimated using chromVAR’s computeDeviations function.

Gene TSS activity scores

Gene activity scores based on chromatin accessibility were derived for scATAC-seq data (n = 57,232 cells) using a sum of accessibility fragment counts around gene transcription start sites (TSSs), weighted inversely to the distance from the TSS, as previously described42. Aligned scATAC-seq fragments per cell are weighted based on the inverse distance to gene TSSs, then summed across the chosen window (9,212 bp) reflecting 1% of the total weight for the chosen exponential half-life (1 kb). Gene activity scores were then normalized by dividing by the mean score per cell, and used for downstream analysis.

Differential peak testing

Differential testing of accessibility peaks was determined using DESeq2112, as previously explored as a robust tool for analyzing ATAC datasets113. First, only early progenitor cells (excluding terminally differentiated clusters) were retained for differential accessibility signature derivation in GMP cells. This includes cells belonging to clusters 1,4 (only considering GMP sample), 7,10,13 (n = 22,569 cells). Next, for each annotated cell type, cell peak counts were “pseudobulked” or grouped per mouse sample, annotated cell type and genotype (WT or Tet2 KO) by summing raw accessibility counts per peak per grouping. DESeq’s negative binomial Wald test was then applied with default parameters, adjusting for celltype as a covariate, yielding estimates of fold-change and significance per peak between Tet2 KO and WT samples. Only peaks with adjusted p-value ≤ 0.01 were retained as differentially accessible between the two genotype groups, and were used as signatures for downstream analysis (e.g. TF motif enrichment testing and overlap with Tet2 ChIP-seq peaks). Same analysis was repeated for the Sox4 OE Tet2 KO group. These peak signatures were also used as peak annotation features with chromVAR to score single cells for KO signature accessibility relative to background peaks (Fig. 2b).

Differential DORC analysis

For differential testing of DORC accessibility scores, we used normalized single cell DORC scores and performed differential testing using a Wilcoxon rank-sum test per cell type, comparing each Tet2 KO condition to its control condition. FDR was determined to adjust for multiple tests. Cells belonging to clusters 1, 4, 7 (derived from GMP sorted sample) were utilized for this analysis.

Co-accessibility modules

Chromatin accessibility peak “modules” were derived as previously described42. Briefly, TF motif accessibility Z-scores for cells that were annotated as wild-type (WT) GMPs (n = 8,313 cells) were used. First, TF motifs (n = 890; see TF motif scores section above) were clustered using a sequence similarity correlation cut-off = 0.8, and then the most variable motif in each cluster was used as a representative TF. Additionally, jackstraw PCA was performed on motif deviation Z-scores to determine TF motifs with significantly variable accessibility using n = 100 iterations and a jackstraw permutation P ≤ 0.05 across the first 10 PCs. This yielded n = 68 significantly variable TF motif groups. Then, difference in mean accessibility was tested across for all reference peaks (n = 297,361) using normalized scATAC-seq peak counts (mean-centered per cell) between the TF high vs low cell groups (cell high/low groups divided based on the median Z-score across cells for each TF motif), and significant peaks for any given TF (FDR ≤ 1e-06 two-tailed t-test) were retained. Finally, fold-changes of mean accessibility between the high and low groups were used to cluster peaks into co-accessible modules, using a Louvain algorithm for determining communities (k = 30 peak nearest-neighbors), resulting in distinctly grouped peak modules, which were manually annotated based on the TF motifs that positively or negatively associated with their activity. These were then converted into a reference peak x module binary annotation matrix, and chromVAR was used to compute module accessibility deviation Z scores for each cell in the entire scATAC-seq dataset.

scATAC-scRNA-seq cell pairing and visualization

Cells were paired between the two modalities using the scOptmatch workflow described previously114. Briefly, CCA was first run using Seurat’s RunCCA function108 to co-embed scATAC-seq and scRNA-seq data (using the cell KNN-smoothed normalized scATAC-seq gene TSS activity scores and scRNA-seq gene expression, respectively). Only the union of the top 5,000 variable gene scores (ATAC) and gene expression (RNA) was used to derive the top 30 CCA components, with rescaling of features performed prior to running CCA. These components were then used to pair cells between the two assays based on the minimum geodesic neighbors between ATAC-RNA cells across the entire data. For scATAC-seq cells, gene expression of paired scRNA-seq cells were then used to visualize gene expression markers on the scATAC-seq UMAP.

Peak-gene cis regulatory correlation analysis

Peak-gene links and domains of regulatory chromatin were determined using the FigR R package114. Briefly, using paired scATAC and scRNA-seq data, we determined for each gene a set of cis-regulatory peaks that are most correlated with the given gene’s expression. To do this, we tested peaks falling within 10 kb of a given gene TSS for correlation in peak accessibility and paired gene expression across single cells, using n = 100 permuted background peaks (matched for peak GC-content and mean accessibility) for significance testing. Only peak-gene links with a positive correlation and permutation P ≤ 0.05 were retained. DORCs were defined as genes having ≥ 3 significantly associated peaks.

Pathway enrichment analysis/GSEA

Enrichment analysis was performed using Metascape115 using the following ontology sources: GO Biological Processes, GO Molecular Functions and Reactome Gene Sets. All genes in the genome have been used as the enrichment background. Terms with a p-value < 0.01, a minimum count of 3, and an enrichment factor > 1.5 are reported. The most statistically significant term within a cluster is chosen to represent the cluster. For analysis in Supplementary Fig. 1g, gene scores with correlation > 0.1 to each gene module were considered. For analysis in Fig. 2f, differential genes with p value < 0.001 were considered. For GSEA116, we performed pre-ranked analysis (genes ordered by fold-change) using the default settings. MSigDB Hallmark(H), KEGG and Reactome (C2) databases were utilized, pathways FDR < 0.25 were considered significant.

Vectors, plasmids and molecular analyses

Lentiviral constructs expressing GFP, codon-optimized Sox4, or Cre recombinase were cloned into self-inactivating transfer constructs under the expression of human PGK promoter. Lentiviral backbones were obtained through MTA with Naldini lab (SR-TIGET, Milan). Lentiviral vectors were generated using HIV-derived, third-generation plasmids. Stocks were prepared and concentrated as previously described117 using HEK-293T(ATCC, CRL-3216) as packaging cells. Titration was performed by qPCR118 using a HEK-293T line with known number of vector integrations per diploid genome as standard. High complexity barcoding library LARRY Barcode Version 1 library83 was a gift from Fernando Camargo (Addgene #140024). Retroviral stocks were generated using pCL-Eco plasmid, which was a gift from Inder Verma (Addgene plasmid # 12371).

For molecular analyses, genomic DNA was isolated with QIAamp DNA Micro Kit (QIAGEN) according to the number of cells available. Successful deletion of Tet2 by Cre recombinase was measured from the gDNA by ddPCR (QX200, Biorad), quantifying the number of Tet2 alleles relative to an unrelated genomic locus (Sema3a).

For gene expression analyses, total RNA was extracted using RNeasy Plus Micro Kit (QIAGEN), according to the manufacturer’s instructions and DNase treatment was performed using RNase-free DNase Set (QIAGEN). cDNA was synthesized with SuperScript VILO IV cDNA Synthesis Kit (Invitrogen) and analyzed on a CFX Connect Real-Time PCR System (Biorad). The relative expression of each target gene was first normalized to housekeeping genes and then represented as 2^-DCt for each sample.

Sequences of DNA oligonucleotides used in this study are reported in Supplementary Data 4.

Generation of Hoxb8-ER clones

Immortalization of murine BM cells with Hoxb8-ER was done as previously described53 with the following modifications. Filtered BM cells were layered over Ficoll-Plaque-Plus (GE Healthcare Biosciences), and centrifuged at 400 x g for 25 min at RT without break to enrich for mononuclear cells. Cells were incubated in a 6-well tissue culture plate for 48 h at 37 °C with 5% CO2 prior to the retroviral transduction in RPMI (RPMI-1640, Corning) with 10% 2 mM L-glutamine, and 100U penicillin/streptomycin (all from Thermo Fisher), supplemented with 20 ng/ml stem cell factor (mSCF), 10 ng/ml interleukin-3 (mIL-3), and 10 ng/ml interleukin-6 (mIL-6).Non-adherent cells were harvested and 5 × 105 cells were plated onto a 12-well tissue culture plate (Corning) coated with 10 mg/ml human fibronectin (Sigma). 1 ml of ecotropic retrovirus encoding MSCVneo-HA-ER-Hoxb8 was applied in the presence of 8 mg/ml polybrene and spinoculation was performed at 1000 x g for 60 min at RT. After transduction cells were maintained in RPMI supplemented with 0.5 μM beta-estradiol (Sigma, E2758) and conditioned media containing ~100 ng/ml SCF for the generation of neutrophil-biased cells or GM-CSF 10 ng/ml (Peprotech) for the generation of macrophage progenitors. Antibiotic selection was performed by adding G418 at 1 mg/ml final concentration until untransduced control cells the control were not viable (usually ~ 7 days). After selection, cells were FACS sorted into 96 well plates with culture media for the generation of single cell clones.

For generation of Tet2 KO lines, each clone was transduced with integrase-defective lentiviral vector encoding for Cre recombinase or GFP as control at a Multiplicity of Infection (MOI) of 200.

Functional assays on myeloid cells

Cytospin and Wright-Giemsa staining: Cells were prepared in PBS at a concentration of 1 × 106 cells/ml and spun 1,000 RPM for 60 s on microscope slides. After air drying for 30 min, slides were sequentially soaked in different dilutions Wright-Giemsa stain (Siemens, 100% 4 min, 20% 12 min, 3X rinse in ddH2O). Coverslips were affixed with Permount Mounting Media (Thermo Fisher) and samples were analyzed at 100X magnification using oil immersion objective.

Phagocytosis assay: Cells were pre-stimulated with 100 ng/ml LPS (L2630, Sigma) for 30 min, washed in PBS and incubated in Live Imaging Solution (Thermo Fisher) along with labeled E. coli or S. aureus BioParticles (Thermo Fisher, 500 µg/ml and 1000 µg/ml, respectively) for 1 h at 37 °C before flow cytometry analysis.

Reactive Oxygen Species (ROS) assay: Cells were incubated using Invitrogen™ CellROX™ Flow Cytometry Assay Kit (Thermo Fisher) at 37 °C for 30 min in culture media, with or without 80 nM PMA (MIllipore Sigma).

Intracellular staining for cytokines: Cells were pre-stimulated with 100 ng/ml LPS together with protein transport inhibitor Golgi Plug (1:1000, BD) for 1 h at 37 °C. Surface and intracellular staining was performed using Perm/Fix kit (BD) according to manufacturer’s instructions.

Proliferation assay: Cells were stained with 0.5 µM CellTrace Far Red Cell Proliferation Kit (Thermo Fisher) according to manufacturer’s instructions. Flow cytometry analysis was performed 3 days later.

In vivo transplant: 2 × 106 Hoxb8-ER GMPs were transplanted in lethally irradiated recipients (9.5 Gy) together with 2× 105 supporter CD45-mismatched BM cells via retro-orbital injection. Peripheral blood was collected at 4-7-9-11 days after injection.

Functional perturbation score

Raw data outputs from obtained from functional assays performed in WT and Tet2 KO GMP clones (expression of Ly6G, expression of CD115, expression of CD11b, expression of IL6, ROS production, phagocytosis after exposure to E.Coli and S.Aureus particles) were first standardized in order to render them comparable using the R function scale. A distance matrix among the different clones was then calculated using the R dist function and the euclidean distance measure. Functional perturbation score is then defined for each GMP clone as the distance between WT-KO paired clone pairs.

Bone marrow transplantation of KSL HSPC

Long bones, pelvis and spines were harvested and muscle tissue was removed. Bones were crushed in PBS complemented with 2 mM EDTA (Sigma) and 0.5% BSA (Sigma) and bone marrow cells in suspension were filtered on a 40 um cell strainer. Lin- cells were obtained using the Direct Lineage Cell Depletion Kit (Miltenyi Biotec) according to the manufacturer’s instructions. Cells were then stained with antibodies against HSC-related markers Kit and Lin- Kit+ Sca+ (KSL) cells were sorted from Tet2fl/fl Mx-1 Cre mice.

Transduction of mouse HSC was performed in serum free-medium enriched with cytokines as previously described119 at a MOI of 20. Transduction efficiency was monitored by qPCR. Sixteen hours after transduction cells were washed in PBS and transplanted at a dose of 1 × 104 cells via retro-orbital injection in lethally irradiated C57BL6/J recipients together with 2 × 105 Sca-depleted supporter BM cells. Six weeks after transplant, 4 doses of 12.5 μg/g pI:pC (Amersham) were administered intraperitoneally to induce Cre activity (and thus Tet2 deletion).

Mice were monitored weekly for body weight and signs of suffering, and euthanized when showing ≥ 15% weight loss and/or labored breathing, followed by necropsy analysis. Serial collections of blood from the mouse tail were performed to monitor the hematological parameters and donor cell engraftment. At the end of the experiment (25 weeks), BM and spleen were harvested and analyzed (scATAC-seq, flow cytometry for hematopoietic subpopulations).

PVA-based HSC cultures and analyses

For PVA-expansion experiments, Lin- Kit+ Sca+ Cd150 + Cd48- Epcr + cells were sorted from Tet2fl/fl mice. Culture was performed as previously described73. After 6 days of expansion, cultures were split, and transduced with Sox4 OE lentiviral vector or BFP control vector (MOI 40). Four days later, cultures were split again and transduced with Cre-expressing lentiviral vector (MOI 40) to introduce Tet2 KO. Levels of transduction were monitored by BFP expression and qPCR. KSL cells were enriched again by FACS before proceeding with other analyses (SHARE-seq, flow cytometry for HSC markers, transplant). For transplant, 6 × 104 cells from each condition were transplanted in lethally irradiated C57B6 recipients together with 2 × 105 competitor total BM cells (mismatched for CD45 isoform expression). Serial collections of blood from the mouse tail were performed to monitor donor cell engraftment. At 30 weeks after primary transplant, 2 × 106 cells from each primary mouse were transplanted in secondary lethally irradiated recipients.

For experiments including the high complexity barcoding library LARRY83, transduction of Lin- Kit+ Sca+ CD150 + CD48- EPCR+ cells was performed 2 h after sorting in a serum free-medium enriched with cytokines119 at a MOI of 20. Transduction level around ~20% was achieved using these conditions, thus maximizing the likelihood of vector copy number of 1 (1 unique barcode/ cell). Sixteen hours after transduction, cells were washed in PBS and switched to PVA-based medium. Larry lentiviral libraries were prepared from plasmid stocks as described83, and diversity was confirmed to be in the range of 2 × 105. Libraries were prepared as described83, and sequenced on 2 × 150 Miseq (Illumina). Clonal abundances were estimated using a pipeline adapted from ref. 120. Briefly, barcodes are isolated by the identification of flanking sequences using the ShortReads R package, and further filtered by perfect matching of the constant bases present within the 28-mer barcode. Correction for sequencing errors is performed using the Starcode algorithm121 using default parameters. Low-frequency barcodes with counts <10 are removed.

Flow cytometry and FACS

Immunophenotypic analyses and cell sorting were performed on FACS Aria II (BD Biosciences) and antibodies utilized are listed in Supplementary Data 5 (with corresponding catalog numbers, dilutions utilized and RRID #). Single stained and Fluorescence Minus One stained cells were used as controls. For peripheral blood analysis, red blood cells were lysed using ACK buffer (Quality Biologicals) for 7 minutes at room temperature before staining. Samples were incubated with the antibody cocktail in PBS 2% FBS for 30 minutes at 4 °C before analysis or sorting. 7-AAD Viability Staining Solution (BioLegend) was included in the sample preparation for flow cytometry to exclude dead cells from the analysis. Relevant gating strategies are reported in Supplementary Fig. 9.

Metabolomics analysis

For metabolomics analysis, at least 3 biological replicates were used for each GMP clones and primary sorted HSC samples from mouse BM. Samples were pelleted and lysed by adding 100 µL ice-cold 80% methanol in water and polar metabolites were extracted using a methanol-chloroform phase separation (1 ml methanol containing 2.5 µM of an internal standard (fully 13C-, 15N-labeled amino acid mix; Cambridge Isotope Laboratories), 500 µL water, and 1 mL chloroform). The samples were mixed on shaker at 4 °C for 15 min and centrifuged at 5000 g 15 min at 4 °C. The aqueous phase was recovered and dried under nitrogen flow and resuspended in 30% acetonitrile in water, with a volume scaled to the extracted cell number. 14 µL of each sample was transferred to glass microinserts and used for MS1 runs. The rest of each sample was combined to create a pool sample for MS2/MS3 acquisition. Samples were run a Vanquish LC coupled to an ID-X mass spectrometer (Thermo Electron North America, Madison, WI, USA). A volume of 5 µl was injected on a Zic-pHILIC column (150×2.1 mm, 5 micron particles; Merck). The flow rate was 0.15 mL min-1, except for the first 30 s where it was ramped from 0.05 mL min−1 to 0.15 mL min−1. The mobiles phase were 20 mM ammonium carbonate in water with 0.1% ammonium hydroxide for A and acetonitrile 97% in water for B. The gradient consisted of an isocratic step of 0.5 min at 93% B, then a gradient to 40% B in 18.5 min, then to 0% B in 9 min, followed by an isocratic step at 0% B for 5 min and back to 93% B in 3 min. The column was re-equilibrated at 93% B for 9 min. For MS1 only run, data was acquired in MS1 full-scan with polarity-switching, resolution 120,000, RF lens 30%, normalized AGC target 25%, max IT 50 ms, m/z range 65 to 1000. For MS2/3 runs (on the pool sample), data was acquired using AquireX deepscan with 5 repetitions, separately in each polarity. For the targeted analysis, a standard mix at 1 µM of each target was prepared and run after the samples to confirm retention times. Targeted metabolite measurements were normalized to the internal 13 C/15N-labeled amino acid standard. For untargeted analysis, Compound Discoverer (CD, version 3.3, Thermofisher Scientific) was used to generate a list of compounds (monoisotopic molecular weight and RT couples, de-adducted, combined from positive and negative mode) and to integrate the corresponding area. Identification was based on the MS2/3 data acquired on the pool sample. Fragmentation spectra were searched against an internal library (with matching retention time, to generate level 1 identification) and mzCloud (to generate level 2 and level 3 identifications). All identifications were manually curated. For untargeted metabolomics functional analysis, area and m/z (as [M + H] + adducts calculated from the monoisotopic masses generated in CD) were processed using Metaboanalyst 5.0 software118. Positive ion mode, 5.0 ppm tolerance and retention time (minutes) were provided as input parameters. Mouse KEGG database was used for Mummichog Pathway Analysis. Normalized AUC data relative to Fig. 6e and Supplementary Fig. 8e are reported in Supplementary Data 6.

Statistical analysis

Data were expressed as means ± SEM or dot plots with median values indicated as a line. Statistical tests and number of replicates are reported in the figure legends. Assumptions for the correct application of standard parametric procedures were checked (e.g., normality of the data). Adjusted p-values using Bonferroni’s correction are reported. Whenever these assumptions were not met, nonparametric statistical tests were performed. In particular, Mann-Whitney test was performed to compare two independent groups. In presence of more than two independent groups, Kruskal-Wallis test was performed, followed by post hoc pairwise comparisons. For paired observations, Wilcoxon matched-pairs signed rank test was performed. For statistics in Fig. 2g, paired t-test was used because each independent comparison was performed using Tet2 KO and WT cells competitively transplanted in a single mouse. Analyses were performed using GraphPad Prism 10 and R statistical software. Differences were considered statistically significant at p < 0.05, p < 0.01, p < 0.001, p < 0.0001, “ns” represents non significance.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Latest Intelligence