Close this search box.

Semi-automated approaches for interrogating spatial heterogeneity of tissue samples – Scientific Reports

Sequential immunofluorescence (seqIF) approach for spatial proteomics

To visualize the composition of tumoral tissues, we used the seqIF protocol on COMET (Fig. 1). The microfluidic setup was previously described and characterized23, with an adaptation of the scanning area to 81 mm2. COMET platform protocols are based on the fast fluidic exchange (FFeX) technology24 (Fig. 1B) that yields in ultra-fast and efficient antibody-based staining (approximately 15 min per single staining step of 2 markers), followed by imaging (approx. 30 min per marker) and elution (approx. 10 min per marker) (Fig. 1C). The immunostaining reaction, that occurs within the closed chamber formed in between the imaging window of the microfluidic chip and the histological slide, is precisely controlled via the automated system and the final signal reliably reflects the amount of antigen present in the tissue25. The seqIF protocol resulted in a co-registered multi-layer OME-TIFF image containing the following layers of information: nuclei signal acquired in the DAPI channel, intrinsic tissue autofluorescence images acquired in both TRITC and Cy5 channels, 20 single biomarker images. Additionally, we acquired the autofluorescence images after each elution cycle that allowed us to precisely monitor the evolution of autofluorescent signal and elution efficiency over the different cycles of the protocol9, and to perform accurate background subtraction (see methods chapter: Image pre-processing). COMET images from every cycle were stitched and automatically aligned within the COMET Control software, and the output file was ready for qualitative assessment and quantitative analysis at the end of the seqIF protocol (Fig. 1D).

Figure 1
figure 1

COMET platform automated workflow ensures “slide in, data out” approach. (A,B) In contrast to the passive diffusion of reagents happening during static incubation (A), the Fast-fluidic exchange (FFeX) technology achieved by the imaging microfluidic chip ensures active distribution of reagents and the possibility of a dynamic incubation over the surface of tissue sections (B). The homogeneous distribution of reagents ensures high-quality staining that is achieved in the scale of seconds9. Schematics were created with (C) The COMET platform automates the sequential immunofluorescence (seqIF) protocol, where tissue samples can undergo 20 cycles of staining-imaging-elution. A pre-processed tissue (i.e.: tissue section that underwent deparaffinization and heat-induced epitope retrieval process offline) is sequentially stained with primary and secondary antibodies (2 antigens at a single step) together with a DAPI counterstaining and then imaged using a three-color fluorescence microscope. After imaging, an elution procedure is performed to remove the antibody complexes, and the process is repeated for another cycle. Schematics were created with (D) 20-plex fluorescence image stack which was used for analysis is shown. COMET hyperplex images are stored as co-registered multi-layer image files using a generic OME-TIFF format.

To challenge the quality of data produced by the COMET device, we used a sample dataset that encompasses several specimens with a heterogenous tissue composition. To efficiently address this challenge, we used ngTMA containing both primary lung tumor samples and the corresponding lymph node metastasis specimens (Fig. 2A,B). The 20-plex panel consisted of biomarkers directly targeting immune cells (CD3, CD4, CD8, CD11c, CD16, CD20, CD31, CD45, CD68, FoxP3, HLA-DR), non-immune tumor microenvironment (aSMA, CD31, PanCK, S100, Vimentin), features of the immunosuppressed microenvironment (IDO-1, PD-1, PD-L1) and the proliferation marker ki67 (Fig. 2C). The specificity of detection of a single biomarker was evaluated according to internal standard guidelines9. For all biomarkers, the protocol resulted in high-quality staining that allowed to detect both signal positive and signal negative areas (Fig. 2C–E). Additional images were also acquired after each elution step allowing to assess the elution efficiency that was qualitatively deemed as excellent for all the markers based on quality-control criteria previously reported9. Once the initial quality control of the final image was passed, we moved toward the downstream image data analysis.

Figure 2
figure 2

COMET enables high-quality and high-throughput data acquisition for multiple markers which can be used to segment cells and to extract single-cell data accurately for subsequent phenotype analysis. (A) A map of a ngTMA used for the hyperplex staining. The panel consists of cores from primary lung squamous cell carcinoma (SCC), primary lung adenocarcinoma (AC) and lung cancer lymph node metastasis (LNmtx) tissue samples from 5 different cases (specimens S1–S5). (B) Whole slide image (WSI) of the same TMA section stained with hematoxylin and eosin. Scale bar: 1 mm. (C) Background subtracted single channel zoom-in fluorescence images of all markers in the panel (see Table 1 for panel composition and staining conditions). Images are displayed using the same displayed minimal intensity value (0) and corresponding maximum intensity values (see Table 1). Scale bars: 50 µm. (D) An overlayed multicolor visualization of the fluorescence image of 5 markers from the panel (white: PanCK, red: CD4, green: CD3, blue: CD8, magenta: CD45). Scale bar: 1 mm. (E) Zoom-in into images from panel D, depicting (1–2) stroma-tumor interface, (3) accumulation of immune cells in the tumor stroma (3) and (4) accumulation of macrophages near the tumor stroma. Scale bar: 50 µm. (F) DAPI image with corresponding nuclei annotations, showing single-cell nuclei segmentation with a pre-trained StarDist18 model. The segmented nuclei were further dilatated by 5 pixels to approximate the cellular membrane. All scale bars: 20 µm.

Nuclei-based cell segmentation and data filtering

We based our analysis on a single-cell feature extraction, which in turn required a reliable cell segmentation approach (Fig. 2F). In the first step of the workflow, single-cell detection was performed based on the DAPI staining. We applied the StarDist method18,19 to delineate the single-cell annotations. To improve the segmentation results, we internally generated a dataset of 12,138 manually annotated nuclei from a heterogeneous dataset of 234 image crops extracted from COMET images of several tissue types. The validation dataset consisted of 1192 nuclei and was carefully crosschecked internally by manual curation. Training of the model with this dataset was harmonized as per the guidelines provided by the authors18. The model trained in-house showed a clear trend toward better performance when compared with the standard StarDist model (see methods section: Image segmentation). Once trained, the model was used to generate single-cell masks for the next steps of analysis. Subsequently, the annotation of nuclei was expanded by 5 pixels for each of the cells to obtain a proper cell delineation. The measurements of fluorescent signal intensities and the corresponding detected features were exported from QuPath20 for 68,801 segmented cells stemming from a single raw TMA image.

To ensure high quality of cell detections, we applied a two-step verification process of detected objects. Because of the formalin-based fixation, FFPE tissues are known to be highly autofluorescent26 along with structural elements such as collagen and elastin increasing such confounding phenomena. Additionally, highly vascularized tissues contain a significant number of erythrocytes that can be encountered in all acquisition channels of COMET microscope, including faint signal in DAPI channel. To discriminate between the true cell detections and artefacts caused by autofluorescent signal, we applied a two-step single-cell data cleaning procedure27. Once all measurements were exported, we performed unsupervised clustering for all nucleus features based on the measurements in all channels but DAPI. Using results visualized with UMAP approach (Figure S1A), we could detect 4 clusters that were characterized by high expression of all the markers as well as relatively strong signal in unstained images due to tissue autofluorescence (Figure S1B). Manual curation of these clusters revealed the high levels of erythrocytes’ detection (Figure S1C,D) at each cycle of the seqIF workflow within them and were therefore excluded from subsequent analysis. In the second step of filtering, we excluded objects based on 4 features, (1) model specific feature—StarDist detection probability (cut-off value: 0.65, Figure S2A), (2) signal-based feature—DAPI mean intensity (cut-off value: 2917.1, Figure S2B), and shape-based features— (3) nucleus area (7.1 µm2 < accepted value < 137.9 µm2, Figure S2C) and (4) nucleus circularity (cut-off value: 0.65, Figure S2D). Visual inspection confirmed that excluded objects were mostly artefacts (Figure S2E, F). In total, 14,229 cells were discarded during the 2 steps of data cleaning with 55,063 cells passing the quality control and deemed as acceptable for the subsequent analysis pipeline.

Before the final step of feature extraction, background subtraction was performed for all channels separately, using the corresponding autofluorescence channel recorded before each cycle9. The background subtraction was performed pixel-wise and infrequent negative pixel values were zero-floored.

Dynamic range assessment

To investigate in detail the biomarker expression within the tissues, a single-cell resolution of the image as well as a broad dynamic range of the immunofluorescent signal must be delivered by imaging modality. COMET images have a pixel size of 0.23 μm and a spatial resolution below 1 µm, which is sufficient to clearly discriminate the subcellular biomarker expression patterns and segment cells into their nuclear, cytoplasmic, and membranous compartments (Fig. 2E,F). To investigate if images generated with COMET provide a dynamic range of fluorescent signal sufficient to discriminate different levels of biomarker expression, we examined in more detail the HLA-DR expressing cells in the lung tumor metastasis core of specimen 3 (core C2, Figure S3). HLA-DR protein expression is known to be reflecting the activation status on immune cells as macrophages and dendritic cells28. HLA-DR expression can also be triggered in the tumor cells29, thus its expression levels are expected to be heterogeneous and can vary from negative through low, medium, and high.

In this specimen, we identified a bimodal expression of HLA-DR with high levels expressed by immune cells and low expression found in epithelial tumor cells (Figure S3A). When exploiting the mean cell intensity as a unique parameter to interrogate cell phenotypes, we found distinguishable differences between the two cell types (Figure S3B). When mean cell intensity was compared with the cell size feature on the biaxial scatter plot, both the identification and quantification of immune vs tumor cells could be straightforwardly achieved with a gating strategy (Figure S3C). These data demonstrate that COMET platform has a sufficient resolution and dynamic range to accurately discriminate cell-intrinsic biomarker expression variability.

Supervised phenotyping of tumor microenvironment

Spatial detection of multiple biomarkers enables the identification of diverse cell types present within a tissue. To spatially find predefined cell types based on known expression patterns30, we applied supervised methods based on a priori classification rules as a first approach (Fig. 3A). The 20-plex panel presented here was established with the aim to characterize tumor-infiltrating immune cells within the TME (Fig. 2C). The panel was designed to allow performing a rule-based single-cell phenotyping which uses binary expression features derived with a threshold-based approach (Fig. 3A). We characterized the immune cell infiltration level, along with tumor-intrinsic features and stromal compartments of different cores of a lung cancer TMA.

Figure 3
figure 3

Supervised rule-based cell phenotyping assisted with automatic thresholding can be applied to the COMET dataset to identify known phenotypes present in the data and to quantify their populations. (A) A rule-based tree classifier was used to assign cell types to individual cells based on their binary expression profiles. The expression was binarized (i.e., positive vs negative label for each marker) using an automatic thresholding approach described in Figure S4. (B) Spatial distribution of all cell types found using the tree classification shown in (A). The cells classified as unknown are not visualized. The colors of horizontal rectangles show the different tumor types (primary versus lymph node metastasis) as in (D) and vertical rectangles refer to different specimens (sample S1–S5) as in (E). The rectangles marked with numbers 1–4 correspond to the zoom-ins in panel (G). The scale bar: 500 µm. (C) Expression matrix for each of the cell phenotypes presented in (A). Z-normalized mean expression values per group are visualized. (D,E) Normalized cell type distributions of corresponding cell phenotypes for (D) different tumor types: primary vs metastatic and (E) for different specimens: S1–S5. (F) Cell count histogram for corresponding cell phenotypes. (G) Annotations of classified tumor and immune cells (top) and the corresponding antigens detected in the multi-color fluorescence images (bottom). Scale bars: 50 µm.

In the first step of data analysis, we z-normalized mean cell intensities for all 20 biomarkers (Figure S4A,B). Subsequently, we applied an automatic threshold-based approach to determine positive cells for each marker individually. We established a metrics-based approach relying on the statistical characteristics of the background signal (i.e., negative cells) (see Methods chapter: Supervised phenotyping and Figure S4C for more details). This semi-automatic thresholding approach was deployed to eliminate the user bias and harmonize thresholding values over all markers. Our approach successfully detected the positive cells (Figure S4D,E), which was further confirmed by the visual inspection by experienced senior biologists. To identify distinct cell types, we applied a decision tree-based classification (Fig. 3A). Cell identities were manually assigned based on the known marker combinations, which are established in literature31. Based on this approach, different cellular classes were detected in most of the cores present in the TMA (Fig. 3B–F). Unknown cells, that did not fit any of the identified classes, were summing up to 30% of all cells and were mainly present in primary tumor cores (Fig. 3B,D) that displayed lower expression levels for markers of interest.

The second most abundant class was a cell type with a predefined phenotype of PanCK+ non-proliferating tumor cells. Visual inspection of randomly selected areas of the TMA confirmed that once the cell type was identified, the phenotyping of the cells was accurate, and it properly reflected the biomarker signal (Fig. 3G). The main limitation of a rule-based classifier stems from the lack of inclusion for the markers lying outside of the established rules. To examine the expression patterns of each pre-defined class, we plotted a heatmap showing the biomarker abundance and distribution for the detected cell types (Fig. 3C). We could confirm the expected expression of Vimentin positivity by immune and stromal cells and its absence within tumor cells. Surprisingly, other markers like CD20, were not limited to CD3- immune cells but also detected to a lower extent in other immune subtypes such as several myeloid and T cell classes. Indeed, for small densely clustered cell types such as lymphocytes, signal spillover through cell masks is one of the most important challenges in threshold-based classification32. Indeed, to minimize the signal leak from neighboring cells into an area used for the phenotyping of T cells, we analyzed the mean intensity of CD3, CD4, and CD8 markers within the nucleus mask and a similar approach for other small cells, as B cells, might help in their threshold-based phenotyping.

Additionally, we have also tested the recently published Astir algorithm as an alternative to developing a fully automated threshold-based pipeline for cell identification33. This machine learning algorithm was developed to provide unbiased classification of cells into predefined classes33 and can be easily applied to COMET image-derived data. We aimed to detect the same classes as in our decision tree-based classifier (Figure S5A). Astir algorithm detected unknown cells with the highest frequency, especially in primary tumor cores, while immunosuppressive tumor cells were the second most frequent cell type identified (Figure S5B–D). Similarly, to the threshold-based classifier, the simple phenotypes were assigned as expected with a few rules (Figure S5A), while the assignment of complex phenotypes, (i.e., dendritic cells) turned out to be more challenging.

Unsupervised cell classification and spatial cell distribution

After evaluating how supervised classification methods can be applied to a COMET dataset, we explored unsupervised classification as an alternative automated workflow (Fig. 4). We performed Leiden clustering34 and UMAP dimensionality reduction technique35 for data visualization.

Figure 4
figure 4

Unsupervised cell phenotyping using Leiden clustering and assisted by UMAP performs fast and unbiased mapping of the single-cell phenotypes present in the COMET dataset and enables subsequent quantification of tissue composition. (A) UMAP projection of all markers with reference cell phenotypes assigned based on the Leiden clustering results. Further details on cell phenotype assignment can be found in  Methods chapter: Unsupervised phenotyping and SI Figs. 7–9. (B) Quantification of single-cell expression patterns at sub-cellular scale. Averaging of single-cell image crops was performed to obtain average expression profiles for the randomly selected subset (N = 2000) of tumor cells (PanCK) and regulatory T-cells (CD3, CD4, FoxP3) selected with an identical approach. Scale bars: 5 µm. C) The expression matrix for each of the cell phenotypes based on Leiden clustering results. Z-normalized mean expression values per class are visualized. (D,E) Normalized cell type distributions of corresponding cell phenotypes for (D) different tumor types (primary or metastatic) and (E) for different specimens. (F) Cell count histogram for corresponding cell phenotypes. (G) A spatial distribution of all cell phenotypes with colors corresponding to Leiden classes. (H) A spatial distribution of regulatory T cells, lymphoid B cells and CD11c macrophages, showing the apparent density difference between primary and metastatic tumors. Scale bars in (G,H): 500 µm.

Leiden clustering resulted in the detection of 20 clusters (Figure S8), that were merged into 14 clusters (Fig. 4A, see below) after detailed examination and analysis as described below. For each cluster, we further examined the expression patterns of the corresponding signature markers—for example, for the regulatory T cells cluster we generated a mean intensity projection for the signature markers of cells belonging to this cluster, where the expected localization of FoxP3, CD3 and CD4 expression was confirmed (Fig. 4B). Similarly, we could identify the exclusive cytoplasmic expression of PanCK with no nuclear interference (Fig. 4B). It further demonstrates that the spatial resolution obtained is sufficient to quantify the spatial biomarker expression at a sub-cellular level. Additionally, the known expression patterns can be used for optimization and quality control of unsupervised cell approximation algorithms.

Following the unsupervised classification step, cell identity for each class was assessed based on the following parameters: (1) expression level of the markers in corresponding sub-cellular compartments (Table 1) in each of the clusters (Fig. 4A,C, Figure S7), (2) visual inspection of the cells in the tissue context, and (3) literature reference. Using this method, 6 clusters expressing heterogeneous levels of PanCK were identified, all of them located nearby in the UMAP representation graph (Figure S8A,B). Therefore, we merged these clusters into a metacluster of tumor cells (Figure S8C). As a result, the tumor cell cluster was the most abundant in this dataset, which is expected, considering that 39% of the original image area is PanCK positive (PanCK+) (See details in the Methods chapter: Image segmentation). However, the total stromal components outnumbered the number of the PanCK+ cells corroborating the high infiltration of non-tumoral cells previously reported for lung tumors36,37. Some clusters were consistently detected in all specimens, with a higher frequency of activated T cells, B cells and the CD11c+ macrophages being present in the secondary tumors. Importantly, the spatial evaluation of the clusters revealed the degree of tissue heterogeneity and the different patterns of immune infiltrations between specimens of primary and metastatic cancer tissues (Fig. 4G,H).

Once the phenotypic classes were properly assigned to each cluster identified in Fig. 4, we investigated the degree of cell proximity and interaction to identify interacting cells in the analyzed COMET dataset. Due to a large fraction of unknown cells in the supervised approach, we have decided to perform the spatial examination of the clusters identified via the unsupervised clustering approach. We applied spatial characteristics such as cellular neighborhood enrichment score and the co-occurrence probability (Fig. 5A–C). Due to the small size of the TMA cores, the results are not fully representative of the original tissue milieu, however, we could observe, that macrophages tended to localize to a greater extent in proximity to the tumor cells (Fig. 5C), while T cells tended to remain further from tumor cells. Interestingly, the distribution of T regulatory cells seemed to differ between primary and metastatic tumors with more frequent homotypic T regulatory cells’ neighborhood (Fig. 5B) and intracellular interactions (Fig. 5D) in metastatic cores. Tumor cell homotypic interactions, reported previously38, were also detected in our dataset (Fig. 5B,D). Preliminary observations on spatial characteristics highlighted the potential of a spatial analysis approach to identify tissue-specific patterns of cell distributions.

Figure 5
figure 5

Spatial analysis of the tumor microenvironment reveals spatial connections between interacting cell phenotypes identified with the unsupervised clustering results. (A) A schematic explanation of the cellular neighborhood enrichment score which was used to establish and quantify spatial relations between different cell phenotypes. Enrichment score is a metric that quantifies the degree to which cells from one cluster are frequently close to cells from another cluster. A high score indicates enrichment, while a low score indicates depletion. Blue: cells of interest, green: cell class being analyzed as a neighboring class, gray cells: other cells surrounding cells of interest. (B) Neighborhood enrichment scores for different cell phenotypes for primary (left) and metastatic (right) tumor tissues. (C) Co-occurrence probability between tumor cells and regulatory CD4+ T cells, CD16+ macrophages and CD8+ T cells plotted vs the distance from the tumor cells. (D) Interaction matrices, showing the number of shared edges in between cells from different phenotypes. Matrices for phenotypes from primary (left) and metastatic (right) tumor tissues are displayed.