Close this search box.

An immunophenotype-coupled transcriptomic atlas of human hematopoietic progenitors – Nature Immunology

This work complies with all relevant ethical regulations. Approval for the collection and analysis of adult healthy fresh human bone marrow aspirates and participant consent were obtained by Lonza. These samples were consented by the donors to share the age, sex and self-identified ancestry and raw sequencing data in an open manner. Sample size was determined based on availability of bone marrow donor ethnicity and sample volume, which is comparable to prior bone marrow atlas studies20. Samples were randomly selected for each experiment based on availability, and no donor was excluded. Sequencing data collection and analysis were not performed blind to the conditions of the experiments. Operators who performed morphology and c.f.u. evaluations were blinded of experiment group in validation experiments.

Sample preparation for CITE-seq

For the TotalSeq-A titration experiment, 100 ml of fresh bone marrow from each healthy donor was purchased from Lonza and shipped overnight at 4 °C. All four donors were nonsmokers and tested negative for human immunodeficiency virus and hepatitis B and C viruses. For each donor, BMNCs were isolated by Ficoll Paque Plus (GE17-1440-02, Sigma-Aldrich) gradient centrifugation using SepMate-50 tubes (85450, STEMCELL Technologies). A small fraction of BMNCs was flow sorted using a Sony MA900 cell sorter (Sony Biotechnology) for live cells (7AAD) and granulocyte depleted by side scatter for better BMNC capture quality. The rest of the BMNCs were stained using a Miltenyi CD34 indirect kit (130-046-701, Miltenyi Biotec) and were enriched for CD34+ cells (CD34hi) on an autoMACS separator (Miltenyi Biotec) using the Possel-d program setting. The negative fractions were stained with a CD271 MicroBead kit (human; 130-099-023, Miltenyi Biotec) and enriched on a Miltenyi autoMACS separator (program Possel) for coelution of CD271+ cells (CD34+ and/or CD271+). For the validation experiment and full-spectrum flow cytometry, 25 ml of fresh bone marrow samples was used. Cells were isolated as described above and split between CITE-seq and spectral flow cytometry workflows. Frozen total mononuclear cells from the bone marrow of three individuals with AML were sorted for live cells and processed using the same protocol as in the validation experiment.

Flow cytometry cell staining buffer consisting of DBPS (14-190-250, Thermo Fisher) with 2% fetal bovine serum (FB5002-H, Thomas Scientific) was used in the washing steps unless otherwise specified. Donor information can be found in Supplementary Table 1.

TotalSeq-A antibodies

All TotalSeq-A antibody mixes are recommended for staining up to 500,000 cells in a volume of 50 μl. The 275-plex titration antibody cocktail (PN 900006213, BioLegend) was custom made based on TotalSeq-A Human Universal Cocktail, V1.0 (399907, BioLegend) and a prototype 277-plex human antibody cocktail (PN 900003129, BioLegend) that was titrated by BioLegend on PBMCs. We determined/verified the concentration of 47 antibodies in the titration cocktail (900006213, BioLegend) via three twofold dilutions using flow cytometry and hybridized oligo(dT)-Alexa Fluor 647 (/5Alex647N/TTT TTT TTT TTT TTT TTT TTT TTT TTT TTT, Integrated DNA Technologies) on human apheresis products. Concentrations of the other antibodies in 900006213 were determined based our previous experience with 900003129. When the amount of ADT occupied >10% of the total sequencing reads, the concentration was decreased. For ADTs that provided low sequencing reads, the concentration was increased. ADTs used in titration and validation experiments can be found in Supplementary Table 2.

Single-cell CITE-seq generation

For the TotalSeq-A antibody titration experiment, CD34hi and CD34+CD271+ cells were stained with unique TotalSeq-A anti-human Hashtag antibodies (Extended Data Fig. 1) to denote concentration and a previously sequentially diluted TotalSeq-A 275-plex antibody cocktail (4× to 0.25×) or a 277-plex antibody mix. BMNCs were stained similarly but without the 277-plex antibody mix. CD34hi cells and CD34+CD271+ cells were washed on a Laminar Wash MINI System (Curiox Biosystems) with the following settings: 25 cycles, flow rate of 10 μl s–1 and initial volume of 55 μl. After washing, the CD34hi and CD34+CD271+ cells were pooled per donor, whereas BMNCs were pooled by combining two donors. Of note, cells were stained and washed on a Laminar Wash 16-well plate in a staggered manner so that the staining step was timed for 30 min at 4 °C consistently across all populations/concentrations. Pooled cells were counted by Trypan Blue staining (15250061, Thermo Fisher Scientific) on a hemacytometer (3120, Hausser Scientific), and viability was over 90% before 10x chip loading. CD34hi and CD34+CD271+ cells (97,000–115,000 per well) were loaded in eight wells using a 10x Chromium X 3′ version 3.1 HT kit (1000370, 10x Genomics), whereas BMNCs were loaded in four wells (16,000 per well) using a 10x Chromium standard 3′ version 3.1 kit (1000268, 10x Genomics). Emulsion, Gel Bead-In Emulsions (GEM) collection, clean-up and cDNA amplification with ADT/hashed tag oligonucleotide (HTO) spike-in primers were performed according to 10x Genomics and BioLegend TotalSeq-A protocols. Of note, HTO/ADT-containing fractions were further cleaned with 1.2× SPRI beads from 0.6× SPRI cDNA selection supernatant after cDNA amplification and were two-in-one pooled like cDNA in HT kits for HTO and ADT sample indexing PCR.

For the titrated CITE-seq donor samples, three populations from each donor were stained with the 132 titrated antibody cocktail including seven spike-in antibodies and washed by laminar flow. Cell number was counted, and populations were loaded in three ports without hashing (16,000 per well) with a 10x 3′ V3.1 kit. ADT libraries were amplified and cleaned similarly in step 2.3d where supernatant is separated. Clean-up and cDNA amplification were performed according to the standard TotalSeq-A protocol.

Library preparation and sequencing

Library preparation was performed according to the manufacturer’s protocols. To account for the high number of cells from the HT kits, two to three cycles were reduced in most PCR amplification steps. Final transcriptome, ADT and HTO libraries were quantified and analyzed using a Qubit dsDNA HS assay kit (Q32854, Invitrogen), a High-Sensitivity DNA kit (5067-4626, Agilent Technologies) on a 2100 Bioanalyzer (G2939BA, Agilent Technologies) and a KAPA HiFi library quantification kit (KK4824, Roche). Dual-indexed transcriptome libraries from titration experiments were pooled and sequenced on two Illumina S4 flow cells with the PE150 + 10 + 10 setting (Illumina), and libraries from validation were sequenced across multiple S4 flow cells. Single indexed ADT and HTO libraries from titration experiments were pooled and sequenced on an Illumina S2 flow cell with the PE30 + 8 setting. ADTs from validation experiments were sequenced alone or with transcriptome libraries on S4 flow cells with PE100. BCL files were demultiplexed into fastq files for CellRanger input. ‘AT’ was added to the end of the RPI-x ADT i7 index (6 base pairs) to match the D70X_long HTO index (8 base pairs). HTO and ADT FASTQ files were supplied as 3P feature barcodes together with transcriptome FASTQ files into the Cell Ranger V6.1.2 count pipeline. The transcriptome was mapped to hg19 and hg38 reference genomes for downstream analysis and visualization. In total, 484,637 cells were recognized by Cell Ranger, with a median gene count ranging from 1,196 to 3,858 and a median ADT unique molecular identifier count ranging from 587 to 2,726 per cell.

HTO calling and quality control

Cells were multiplexed using HTOs to distinguish both donor and CITE-seq ADT concentration. HTO barcode count matrices were obtained through the multimodal analysis workflow in Cell Ranger before normalization (counts per ten thousand (CPTT)). Cell barcodes with >30% of normalized reads assigned to multiple HTOs were annotated as doublets, with confident singlet predictions assigned to cells with >40% of normalized reads assigned to a single HTO (HTO processing module of AltAnalyze). Cells were further filtered based on the seven mouse/rat isotype control antibody counts (see the source code in Data availability) and performed quality control filtering in Seurat V4 (ref. 36) by nFeature_RNA > 500 & nCount_RNA > 1000 & < 25. This quality control step filtered 393,748 cells to 315,792 high-quality single cells in the initial titration dataset and 90,889 to 72,198 cells in the final titrated CITE-seq dataset.

CITE-seq analysis

All Cell Ranger-produced count matrices underwent ambient RNA exclusion using the software SoupX37 with a contamination fraction of 15% and quality control filtering by HTO and Seurat V4 (ref. 36). Ambient corrected transcriptome counts and associated ADT counts were supplied as input to the software TotalVI to obtain normalized and denoised ADT counts. To derive clusters from the initial titration CITE-seq datasets, the software cellHarmony was used to transfer labels from CPTT normalized expression centroids computed in author-provided labels from three prior published reference bone marrow atlases. Cell annotations from an integrated multicohort human bone marrow atlas from the Azimuth website ( were projected onto the initial titration dataset using the default mapping function in the Azimuth R Shiny web interface (level 2 annotations). Unsupervised clustering was performed using a two-step process in the software ICGS2 using 5,000 cells used for PageRank downsampling and a minimum marker Pearson threshold of 0.2. For this analysis, SoupX-corrected CPTT expression files were combined and supplied to AltAnalyze version 2.1.4. This workflow automatically selects the optimal cluster resolution based on marker gene cluster filtering, ignoring nonrobust and doublet cell clusters. Cells with a poor mapping score to the final clusters (linear support vector classification coefficient > 0) were excluded from the analysis (for example, doublets). This analysis identified 33 initial clusters with no evident donor-specific effects. These clusters were grouped into seven broad lineage classes: HSPCs, early lymphoid and B cells, T and NK cells, stromal cells, myeloid cells, erythroblasts and basophil/mast cell progenitor/megakaryocyte progenitors. ICGS2 was rerun independently on all seven classes to produce a set of combined subclusters. All candidate supervised and unsupervised cluster annotations were provided as inputs for the software scTriangulate version 0.13.0 to identify the most stable integrated cluster annotations. To refine the final cell annotations and exclude putative doublet cell assignments, final cluster annotations were derived by remapping cell barcode transcriptome profiles to the scTriangulate cluster centroids following MarkerFinder feature selection on 50 representative cells per cluster. Three initial scTriangulate clusters out of the original 88 were excluded from this analysis due to low cellHarmony remapping scores (Pearson correlation < 0.5). These cell annotations were projected again with cellHarmony onto the final titrated CITE-seq dataset to derive initial transcriptome annotations. WNN28 was applied using three Leiden clustering resolutions (1, 2 and 3) in the MUON framework28 to obtain granular and fine cluster annotations derived from Harmony38 batch-corrected RNA and TotalVI-corrected ADT counts on the titrated dataset. scTriangulate was performed on the titration CITE-seq dataset using both SoupX-corrected RNA counts and TotalVI-corrected ADT normalized values and annotations from WNN and the cellHarmony titration dataset annotations. This analysis produced 89 integrated clusters. Refined final cell annotations for the titrated CITE-seq dataset were obtained using the same cellHarmony remapping protocol. UMAPs were produced in AltAnalyze using the default UMAP function, considering the top 60 cluster-specific marker genes as features rather than principal components. SCCAF stability scores were derived using the scTriangulate SCCAF function. Marker heat maps were obtained using the software MarkerFinder using either single cells or combined donor pseudobulks for each scTriangulate cell population. For differential ADT analyses, we applied an empirical Bayes moderated t-test (FDR corrected), as this robust procedure is typical for molecular ‘omics comparison analyses.

The AML CITE-seq samples using the 132-antibody cocktail were processed using the same protocol as the titrated CITE-seq controls (CellRanger, SoupX and TotalVI) and mapped to the titrated scTriangulate clusters using cellHarmony (default options, centroid alignment, correlationCutoff = 0). Differential ADT abundance analyses were performed in cellHarmony using the default testing procedure for each matched cell population for all cells in the p-LSC versus p-LSC + m-LSC samples comparison (empirical Bayes t-test P < 0.05, FDR corrected). Bone marrow scRNA-seq CPTT scaled count matrices were obtained from three prior described cohorts (n = 38 donors) from the author-provided count matrices and were also aligned to this titrated scTriangulate cluster with cellHarmony. cellHarmony-annotated cells from these 38 healthy bone marrow samples were projected into the reference UMAP coordinate space using the AltAnalyze ‘approximateUMAP’ function 3,20,32.

ADT marker nomination and concentration selection

To prioritize oligonucleotide-conjugated CITE-seq antibodies in their analytical value scTriangulate transcriptionally defined cell populations, we trained an XGBoost classification model using ADT expression levels to predict scTriangulate clusters for each concentration tested in the titration. The gain in feature importance metric was then used to rank each antibody’s contribution to the prediction. Meanwhile, we checked the specificity of ADTs at a given titration concentration by UMAP and heat map visualization compared to isotype controls. Specifically, we considered an ADT underperforming if its signal was sparse within clusters that it was supposed to label (by XGBoost or based on the literature) or nonspecific if it exhibited an indistinguishable staining pattern as most isotype controls at the lowest concentration. Sixty-seven ADTs that ranked in the bottom 50% among all concentrations were examined to rescue well-established markers. Unknown markers within the consistent bottom 50% in XGboost and/or that were identified as nonspecific were excluded. For ADTs that exhibited dose-dependent signals, we chose the concentration that retained over 75% of the observed dynamic range after confirming that the increased concentration did not lead to nonspecific staining in irrelevant clusters. For example, a 2× concentration was selected for CD38 because it displayed a robust dose-dependent dynamic range across all donors (Fig. 2b), high expression in plasma cells and expression at distinguishable intermediate levels in CLPs but not HSCs (Fig. 2c). For ADTs that did not demonstrate a dose-dependent dynamic range, we concluded that all five concentrations used were either saturating or considerably below the optimal concentration. For those in the first scenario (saturating), we lowered the concentration for ADTs enriched in specific populations but exhibiting high levels of background signal as assessed by comparing to isotype controls; otherwise, we chose the 1× concentration. For example, CD29 stained nonspecifically at higher concentrations; therefore, we chose a 0.5× concentration (Extended Data Fig. 4c). In cases where an ADT appeared specific at select concentrations but showed few reads per cell and overall weak staining as indicated by its dynamic range, we consulted BioLegend for the concentration relative to the TotalSeq-A Human Universal Cocktail V1.0 and adjusted the concentration accordingly (for example, CD325, CD27, CD162, CD11c, CD55 and CD44, Extended Data Fig. 4a,b). Specific markers associated with more mature subsets not resolved in the progenitor atlas were excluded from the final titration, including several lymphocyte markers, ADTs targeting T cell receptors and immunoglobulins, as markers for lymphocyte clonality are not required to assess progenitor cell and broad lineage identity. BioLegend formulated and lyophilized 125 of 132 ADTs at specific concentrations (Supplementary Table 6) as a custom panel to enable subsequent validation.

PE antibody titration

Human bone marrow cells were processed and magnetically enriched for CD34+ cells. Forty million CD34 cells were mixed with three million CD34+ cells to better represent progenitor and mature lineage populations. Fifty thousand cells were stained for each concentration of 1:25, 1:50, 1:100, 1:200 and 1:400 (vol/vol antibody:staining buffer). Cells were settled on 96-well Laminar Wash plates (96-DC-CL-05, Curiox Biosystems) at 4 °C for 30 min and were washed on a Laminar Wash HT2000 System (Curiox Biosystems) at a setting of 15 cycles, wash rate of 10 μl s–1 and initial volume of 55 μl. Data were analyzed using an Automated Sample Loader on a five-laser Cytek Aurora full-spectrum flow cytometer (Cytek Biosciences) by adding a 96-well grid adaptor (Curiox Biosystems) onto the laminar wash plate. In total, 10,000 to 20,000 live cells were recorded, and FCS files were exported for analysis using FlowJo v10.8.1 software with the StainIndex v1.8.1 plugin (BD Biosciences). The optimal concentration for each PE antibody was selected by determining the maximum stain index. Stain indexes and final concentration information can be found in Supplementary Tables 13 and 14. Of note, the PE antibodies that overlapped with the backbone were titrated but not used in the Infinity Flow assay.

Infinity Flow data generation

In total, 15 million bone marrow cells from each donor were stained for 30 min at 4 °C with a panel modified from the Cytek 20-Color AML Panel. The antibodies in the backbone were used at concentrations recommended by the manufacturer. After washing, 50,000 cells were aliquoted into each well of 96-well Laminar Wash plates (96-DC-CL-05, Curiox Biosystems) and stained with each titrated PE antibody for 30 min at 4 °C. Cells were then washed on a Laminar Wash HT2000 System (Curiox Biosystems) with a flow rate of 5 μl s–1, which reduced any physical stress-like drawbacks by exponential dilution of medium based on laminar flow rates. Direct well-to-SIT analysis on a five-laser Cytek Aurora through a Laminar Wash Direct Reading Grid (DC-GR02-96-M, Curiox Biosystems) allows recording upwards of 10,000–20,000 live cells with a 10-s mix time. Multiplate runs were possible through the application of Laminar Wash plates on a Cytek Aurora Plate Loader. The same set of FSP bead-based single-color controls were used to standardize the controls across all donors. FCS files were exported for analysis using FlowJo v10.8.1 software (BD Biosciences).

Infinity Flow object generation and analysis

Live and single cells were gated from the unmixed FCS files (Extended Data Fig. 9a) and exported as inputs to pyInfinityFlow version 1.0.5. All parameters were included for analysis except for live/dead. Infinity marker expression was imputed using the default settings with the following modifications: –ratio_for_validation 0.5 and –n_events_combine 0. Infinity objects were imported into the FlowJo v10.8.1 software (BD Life Sciences) for analysis. All cell surface marker parameters were analyzed on a biexponential scale. Cells were gated for granulocytes, lymphocytes, blast cells, monocytes and stromal cells (CD45) based on side scatter area and expression of CD45.

Batch effect correction by cyCombine

Infinity Flow FCS objects were integrated across samples and with the CITE-seq ADT profiles using the cyCombine workflow. ADT values were normalized by centered log ratio (CLR) transformation. Batch effect correction was performed with the following settings: seed = 840, xdim = 8, ydim = 8, norm_method = rank and ties.method = average; the cofactor for spectral flow data was set to 6,000. Modifications to plot functions were made to output one plot per page. Modified code is supplied in our GitHub repository. The Earth Mover’s Distance plots were generated using the evaluate_emd function in CyCombine to assess variability among all nine separate donors before and after correction and for the pairwise comparison of each CITE-seq and Infinity Flow-assayed donor sample BMNC.

Antigen detection consistency assessment

We integrated the cyCombine batch-corrected antibody profiles from CITE-seq and Infinity Flow for the four common donors profiled. To evaluate surface marker consistency across donors and technologies, we applied both expert curation of the obtained antibody distribution ridge plot and a prior described statistical comparison approach (k-sample Anderson–Darling test)39. Expert curation defined 79 surface markers consistent between CITE-seq and Infinity Flow and 40 surface markers consistent between all nine Infinity Flow profiled donors. In total, 35 markers were consistent based on ridge plot inspection across technology and donor (Supplementary Table 16). As an alternative approach, we applied the k-sample Anderson–Darling test implemented in the R package kSamples. Comparison of distributions of corrected flow data across the nine donors was performed by first randomly subsampling data from each donor to 2,500 observations. The function ad.test (method = ‘asymptotic’) was then used to compute the Anderson–Darling t value for each marker. Distributions of corrected flow and CITE-seq data within the four donors was performed by first randomly subsampling data from each donor/technology combination to 2,450 observations (approximately the minimum number of observations across the eight combinations). The function ad.test.combined (method = ‘asymptotic’) was then used to compute the Anderson–Darling t value (technology within donor) for each marker (Supplementary Table 17). This analysis assigns a score for each surface marker based on its relative consistency.

Cell sorting for TSPAN33+C5L2+ cells and bulk culture c.f.u. assays

Total nucleated cells were isolated from unprocessed bone marrow product by layering 1:1 PBS:cell suspension over 15-ml Ficoll Paque Plus in SepMate-50 tubes, according to the manufacturer’s instructions. RBCs were lysed with PharmLyse (555899, BD). Cells were then labeled according to Supplementary Table 15 and sorted by a BD FACSAria II directly into culture medium consisting of MegaCult-C Medium Plus Lipids (04850, STEMCELL Technologies) supplemented with 3.0 U ml–1 recombinant human (rh) erythropoietin (rhEPO), 10 ng ml–1 rh interleukin-3 (rhIL-3), 10 ng ml–1 rhIL-6, 25 ng ml–1 rh stem cell factor (rhSCF), 50 ng ml–1 rh thrombopoietin (rhTPO), 20 ng ml–1 rh granulocyte colony-stimulating factor (rhG-CSF), 20 ng ml–1 rh macrophage colony-stimulating factor (M-CSF) and 20 ng ml–1 rhGM-CSF. The sorted cells were then mixed with collagen solution (04902, STEMCELL Technologies) to a final concentration of 1.2 mg ml–1, plated in six-well plates and incubated at 37 °C with 5% CO2 for 1 week. Colonies were stained in situ 6 days after plating using antibodies according to Supplementary Table 15. Colony assays were imaged the next day using an ImageExpress-4 (Molecular Devices) to produce high-resolution scans of each well, which were then processed in ImageJ for subsequent colony scoring.

Cell sorting for MEPs and bulk culture c.f.u. assays

Primary human CD34+ cells, obtained from the Yale Cooperative Center of Excellence in Hematology, were stained with antibodies according to Supplementary Table 15. Human MEP (DAPILinCD34+CD41aCD45RACD135CD110CD38mid)40, MEP CD133low, MEP CD133mid and MEP CD133hi cells were sorted on a BD FACSAria. The c.f.u. analysis was performed as previously described41. Briefly, MEPs, MEP CD133low, MEP CD133mid and MEP CD133hi cells were cultured in MegaCult-C Medium Plus Lipids (04850, STEMCELL Technologies) mixed with collagen solution (04902, STEMCELL Technologies) to a final concentration of 1.2 mg ml–1 with 3.0 U ml–1 rhEPO, 10 ng ml–1 rhIL-3, 10 ng ml–1 rhIL-6, 25 ng ml–1 rhSCF, 50 ng ml–1 rhTPO, 20 ng ml–1 rhG-CSF, 20 ng ml–1 rhM-CSF and 20 ng ml–1 rhGM-CSF at 37 °C with 5% CO2. At day 6 after plating, colonies were stained in situ with c.f.u. staining panel antibodies in Supplementary Table 15 diluted in 300 μl of PBS (1:100 dilution) per well of a six-well plate. At day 7, six-well plates were imaged using a Molecular Devices ImageXpress Micro 4 microscope to produce high-resolution whole-well scans at ×40 magnification. All images were then processed using ImageJ, and all colonies were counted manually from processed images.

MEP indexed sorting and c.f.u. assays

Primary human CD34+ cells were obtained from the Yale Cooperative Center of Excellence in Hematology. Cells were stained with antibodies according to Supplementary Table 15 and index sorted on a BD FACSAria II as singlet, live, CD34+CD45RACD135 events using single-cell purity mode into 384-well plates (3765, Corning) that were prefilled with 80 μl per well of culture medium. Culture medium was composed of IMDM (21056023, Gibco) supplemented with 0.1 mM β-mercaptoethanol, 20% BIT 9500 (09500, STEMCELL Technologies), 40 μg ml–1 human low-density lipoprotein (02698, STEMCELL Technologies), 1× GlutaMAX (35050061, Gibco), human SCF (25 ng ml–1), human IL-3 (10 ng ml–1), human IL-6 (10 ng ml–1), human GM-CSF (20 ng ml–1), human M-CSF (20 ng ml–1), human G-CSF (20 ng ml–1), human TPO (50 ng ml–1; T1003.1, ConnStem) and human EPO (3 U ml–1; Amgen). Cells were incubated for 14 days at 37 °C with 5% CO2. To fluorescently label and identify lineages produced in colonies, antibodies were diluted into PBS according to Supplementary Table 15, added to each well on day 13 of culture and incubated overnight to bind. Plates were imaged with phase contrast and fluorescence using an ImageXPress Micro 4 microscope and sampled for flow cytometric analysis using a four-laser BD Fortessa (BD) equipped with a high-throughput sampler. Images were processed using FIJI. All colonies were scored manually by analyzing fluorescence channels and comparing their flow cytometry profiles. Flow cytometry data were processed using FlowJo v10.9.0.

CD235a and CD326 erythroid sort

Human bone marrow was stained with a lineage cocktail using FITC-conjugated antibodies to CD2, CD7, CD11b and CD14 and erythroid markers CD71-PE (334105, BioLegend), CD326-APC (324207, BioLegend) and CD235a-BV421 (349131, BioLegend) and DAPI to identify viable cells. Cell sorting was performed on a FACSAria II to collect four populations (CD71+CD326+CD235a, CD71+CD326+CD235a+, CD71+CD326CD235a+ and CD71CD326CD235a+) or sorted on an MA900 where cells were stained with Percp/Cy5.5-conjugated antibodies to CD2, CD7, CD11b and CD14 and erythroid markers CD71-FITC (334103, BioLegend), CD326-APC (BioLegend, clone 9C4) and CD235a-BV421 (BioLegend, clone HI264).

Cytospin staining

Cytospin slides were prepared from each cell population with a Thermo Scientific Cytospin 4 set to spin at 500 rpm for 5 min onto Superfrost Plus microscope slides and stained with Differential Quik III (Polysciences) or set to spin at 900 rpm for 3 min onto VWR HistoBond slides (16004-406, VWR) and stained with Camco Stain Pak (702, Cambridge Diagnostic Products). Cellular morphology and characterization of each population were assessed with an upright light microscope with a ×40 objective.

RShiny app development and analysis

The multimodal Azimuth bone marrow reference RShiny interface was built following the Azimuth v0.4.6 instructions ( using the neighbors from the titrated RNA data restricted to the top MarkerFinder marker genes ( CITE-seq RNA counts34 were scaled and normalized as CPTT with clusters defined for three different annotation levels based on multimodal scTriangulate clusters. The Azimuth browser was parameterized to impute ADT concentrations per cell as an optional input for user analysis through the hosted Azimuth web interface. The AML CITE-seq h5ad read counts file was analyzed using the local implementation of the Azimuth R object following import. Per-sample cell frequency differences in p-LSC- and m-LSC-only biopsies were determined by a Student’s t-test P value (P ≤ 0.05). A ShinyCell viewer for the healthy bone marrow compendium was generated using a formatted h5ad counts matrix with corresponding sample/cell-level metadata for both the optimized titrated and titration datasets.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.