Datasets and preprocessing
We used publicly available data resources from three different SST commercial platforms (10 × Genomics Xenium, NanoString CosMx, and Vizgen MERSCOPE), and sequencing data from Human Cell Atlas.
Subcellular spatial transcriptomics data
For all datasets and for each gene, detected transcripts were converted into a 2D image where the value of each pixel represents the number of detected transcripts at its location. The images were combined channelwise, resulting in an image volume ({{{{{{{boldsymbol{X}}}}}}}}in {{mathbb{R}}}^{Htimes Wtimes {n}_{genes}}), where H is the height of the sample, W is the width of the sample, and n_{genes} is the number of genes in the panel.
(i) XeniumBreastCancer1 and XeniumBreastCancer2
The Breast Cancer datasets included in this study were downloaded from https://www.10xgenomics.com/products/xeniuminsitu/previewdatasethumanbreast(accessed 9 Feb 2023), and included two replicates. Lowquality transcripts for 10 × Genomics Xenium data with a phredscaled quality value score below 20 were removed, as suggested by the vendor^{1}. Negative control transcripts, blanks, and antisense transcripts were also filtered out. This resulted in 313 unique genes with the overall pixel dimension of the images being 5475 × 7524 × 313 for Xenium breast cancer replicate 1 (XeniumBreastCancer1) and 5474 × 7524 × 313 for Xenium breast cancer replicate 2 (XeniumBreastCancer2).
(ii) XeniumMouseBrain
The Mouse Brain data included in this study was downloaded from https://www.10xgenomics.com/resources/datasets/freshfrozenmousebrainreplicates1standard (accessed 14 Feb 2023) and were processed following the steps in (i). There were 248 unique genes, and the resulting size of the image was 7038 × 10,277 × 248 pixels.
(iii) CosMxLung
The CosMx NSCLC Lung dataset included in this study was downloaded from https://nanostring.com/products/cosmxspatialmolecularimager/nsclcffpedataset/ (accessed 24 Mar 2023). We used data for Lung51, which comprised 30 fields of view. Transcripts containing “NegPrb” were removed, resulting in 960 unique genes and an overall image dimension of 7878 × 9850 × 960 pixels.
(iv) MERSCOPEMelanoma
The MERSCOPE melanoma data included in this study were downloaded from https://info.vizgen.com/merscopeffpesolution (for patient 2, accessed 26 Mar 2023). Transcripts with “Blank” were filtered out, resulting in 500 unique genes and an image with 6841 × 7849 × 500 pixels.
(v) StereoseqMouseEmbryo
The Stereoseq data used in this study, including the DAPI image and detected gene expressions (bin 1), were downloaded from https://db.cngb.org/stomics/mosta/download/ for sample E12.5_E1S3. Stereoseq data contains a far greater number of genes compared to Xenium, CosMx, and MERSCOPE. For efficiency, we selected a panel of 275 highly variable genes (HVGs) as the input to BIDCell. The HVGs are the common genes of the top 1000 HVGs from both Stereoseq data and the singlecell reference data.
Nuclei segmentation
DAPI images were directly downloaded from the websites of their respective datasets. In cases where the maximum intensity projection (MIP) DAPI image was not provided, we computed the MIP DAPI by finding the maximum intensity value for each (x,y) location for each stack of DAPI. DAPI images were resized to align with the lateral resolutions of spatial transciptomic maps using bilinear interpolation. Nuclei segmentation was performed on the MIP DAPI using the pretrained Cellpose model with automatic estimation of nuclei diameter^{5}. We used the “cyto” model as we found the “nuclei” model to undersegment or omit a considerable number (e.g., 21k for XeniumBreastCancer1) of nuclei given the same MIP DAPI image, which is consistent with another study^{29}. Other nuclei segmentation methods may be used with BIDCell as our framework is not limited to Cellpose.
Transcriptomics sequencing data
We used five publicly available singlecell RNAseq data collections as references to guide the cell segmentation in BIDCell and evaluation with CellSPA. For the reference data with multiple datasets, we constructed celltype specific profiles by aggregating the gene expression by cell type per dataset.
(i) TISCHBRCA
The reference for XeniumBreastCancer used in BIDCell was based on 10 singlecell breast cancer datasets downloaded from The Tumor Immune Single Cell Hub 2 (TISCH2)^{17} from http://tisch.compgenomics.org/gallery/?cancer=BRCA&species=Human, which contains the gene by cell expressions and cell annotations of the data. We used the “celltype major lineage” as the cell type labels. We combined the “CD4Tconv” and “Treg” as “CD4Tconv/Treg” and “CD8T” and “CD8Tex” as “CD8T/CD8Tex”, which results in 17 cell types in total.
(ii) ChromiumBreastCancer
To evaluate the performance of XeniumBreastCancer, we downloaded the Chromium scFFPEseq data from the same experiment from https://www.10xgenomics.com/products/xeniuminsitu/previewdatasethumanbreast (accessed 22 March 2023), which contains 30,365 cells and 18,082 expressed genes. We then performed Louvain clustering on the knearest neighbour graph with k = 20, based on the top 50 principal components (PCs) to obtain 22 clusters. We then annotated each cluster based on the markers and annotation provided in the original publication^{1}.
(iii) Allen Brain Map
The reference for XeniumMouseBrain data was based on Mouse Whole Cortex and Hippocampus SMARTseq data downloaded from https://portal.brainmap.org/atlasesanddata/rnaseq/mousewholecortexandhippocampussmartseq, which contains both gene by cell expressions and cell annotations of the data. We used the cluster annotation from “cell_type_alias_label” as the cell type labels and combined some of the labels with a small number of cells. For example, we combined all “Sst” subtypes as “Sst” and all “Vip” subtypes as “Vip”, which results in 59 cell types in total.
(iv) HLCA and TISCHNSCLC
The reference for CosMxLung for both BIDCell and CellSPA was based on Human Lung Cell Atlas (HLCA)^{30}, provided in the “HLCA_v1.h5ad” file from https://beta.fastgenomics.org/p/hlca, including both gene expressions and cell type annotations of the data. We used “ann_finest_level” as cell type labels, which contained 50 cell types in total.
As HLCA only contains singlecell datasets from noncancer lung tissue, we complemented the reference data with malignant cells provided in TISCH2, where we downloaded 6 singlecell NSCLC datasets with tumour samples from http://tisch.compgenomics.org/gallery/?cancer=NSCLC&species=Human. We only included the cells labelled as malignant cells in the reference.
(v) TISCHSKCM
The reference for MERSCOPEMelanoma for both BIDCell and CellSPA was based on 10 singlecell melanoma datasets downloaded from TISCH2 from http://tisch.compgenomics.org/gallery/?cancer=SKCM&species=Human, which contains the gene by cell expressions and cell annotations of the data. We used the “celltype major lineage” as the cell type labels. We combined the “CD4Tconv” and “Treg” as “CD4Tconv/Treg” and “CD8T” and “CD8Tex” as “CD8T/CD8Tex”, which resulted in 15 cell types in total.
(vi) Mouse Embryo reference
The reference for StereoseqMouseEmbryo was downloaded from GEO database under accession code: GSE119945^{31}, which contain both counts and cell type annotation data. The E12.5 data was then used as reference.
Biologicallyinformed deep learningbased cell segmentation (BIDCell) overview
BIDCell is a selfsupervised deep learning framework that computes biologicallyinformed loss functions to optimise learnable parameters for the prediction of cell segmentation masks for spatial transcriptomic data. BIDCell uses three types of data: (i) spatial transcriptomic maps of genes, (ii) corresponding DAPI image, and (iii) average gene expression profiles of cell types from a reference dataset, such as the Human Cell Atlas. A major innovation in developing BIDCell is the use of biologicallyinformed prior knowledge via the SSL paradigm to enable DL models to learn complex structures in SST data, to derive cell segmentations that are visually more realistic and capture better expression profiles.
The BIDCell framework has the following four key characteristics:

BIDCell predicts diverse cell shapes for datasets containing various cell types to better capture cell expressions (see section Elongated and nonelongated shapes).

BIDCell uses positive and negative markers from sequencing data to enhance the guidance for learning relationships between spatial gene expressions and cell morphology in the form of cell segmentations (see section Positive and negative celltype markers).

BIDCell is parameterised by a deep learning architecture that learns to segment cells from spatial transcriptomic images (see section Deep learningbased segmentation).

BIDCell uses biologicallyinformed, selfsupervised loss functions to train the deep learning architecture without the need for manual annotations and better capture cell expressions (see section BIDCell training and loss functions).
Elongated and nonelongated shapes
BIDCell is capable of generating cell segmentations that exhibit different morphologies for different cell types, rather than assume a generally circular profile for all cell types. In particular, BIDCell can distinguish between cell types that typically appear more elongated, such as fibroblasts and smooth muscle cells, and those that are typically more rounded or circular, such as B cells. Elongated cell types can be directly specified for each tissue sample as desired, based on existing biological knowledge.
We used the expression within the nuclei (see section Nuclei segmentation) of cells to perform an initial classification of elongated and nonelongated cell types. Transcripts were mapped to nuclei using nuclei segmentations, and the Spearman correlation was computed between nuclei expression profiles and reference cell types of the Human Cell Atlas. Nuclei were classified as the cell type with which it was most highly correlated to. This initial classification coupled with the eccentricity of the nuclei were used to inform the cellcalling loss function (described in section Cellcalling loss) to produce segmentation morphologies with more variation that are more appropriate for different cell types. We considered epithelial cells, fibroblasts, myofibroblasts, and smooth muscle cells to be elongated for samples of breast cancer and melanoma. Endothelial cells, fibroblasts, myofibroblasts, fibromyocytes, and pericytes were deemed elongated for NSCLC. We considered all cell types in the mouse brain sample to be elongated.
Positive and negative celltype markers
BIDCell learns relationships between the spatial distribution of gene expressions and cell morphology in the form of cell segmentations. This relationship can be enhanced by incorporating biological knowledge in the form of celltype markers, specially, the genes that are typically more expressed (positive markers) and less expressed (negative markers) in different cell types, which allows BIDCell to predict segmentations that lead to more accurate cell expression profiles. Celltype marker knowledge is drawn from the Human Cell Atlas, which allows BIDCell to be applied without requiring a matched singlecell reference for the same sample of interest. Markers were incorporated into BIDCell through our positive and negative marker losses (described in section Positive and negative marker losses).
Deep learningbased segmentation
BIDCell is parameterised by a set of learnable parameters θ of a deep learning segmentation model. We used the popular UNet 3+^{19} as the backbone of our framework to perform cell segmentation by predicting the probability of cell instances at each pixel. This architecture may be swapped out for other segmentation architectures. UNet 3+ was originally proposed for organ segmentation in computed tomography (CT) images. It was built on the original UNet^{12} and incorporated fullscale skip connections that combined lowlevel details with highlevel features across different scales (resolutions). UNet 3+ comprised an encoding branch and decoding branch with five levels of feature scales. We did not adopt the deep supervision component proposed by UNet 3+, and instead only computed training losses at the lateral resolution of the original input.
Input
The input to the UNet 3+ model was a cropped multichannel spatial transcriptomic image ({{{{{{{bf{x}}}}}}}}in {{mathbb{R}}}^{htimes wtimes {n}_{genes}}), where n_{genes} represents the channel axis corresponding to the total number of genes in the dataset, h is the height of the input patch, and w is the width of the input patch. Prior to being fed into the first convolutional layer, the input was reshaped to [n_{cells}, n_{genes}, h, w], effectively placing n_{cells} in the batch size dimension. In this way, all the cells in a patch were processed simultaneously, and the model could flexibly support an arbitrary number of cells without requiring extra padding or preprocessing. n_{cells} was determined by the corresponding patch of nuclei to ensure consistency with predicted cell instances. Input volumes that were empty of nuclei were disregarded during training and yielded no cells during prediction.
Output and segmentation prediction
The softmax function was applied to the output of UNet 3+ to yield probabilities of foreground and background pixels for each cell instance. This produced multiple probabilities for background pixels (i.e., n_{cells} probabilities per pixel for a patch containing n_{cells}), due to the placement of cell instances in the batch size dimension. These probabilities were aggregated by averaging across all the background predictions per pixel. The argmax function was applied pixelwise to the foreground probabilities for all cells and averaged background probabilities. This produced a segmentation map corresponding to the object (cell instance or background) with the highest probability at each pixel.
Morphological processing
The initial segmentation output by the deep learning model was further refined to ensure pixel connectivity within each cell (i.e., all the sections of the cell were connected). The process involved standard morphological image processing techniques to each cell, including dilation, erosion, holefilling, and removal of isolated islands, while ensuring that the nucleus was captured. First, dilation followed by erosion were applied using a 5 × 5 circular kernel with two iterations each. Holefilling was then carried out on the cell section with the largest overlap with the nucleus. Any remaining pixels initially predicted for the cell that were still not connected to the main cell section were discarded. After morphological processing, the number of transcripts captured within each cell is slightly higher, while purity metrics and correlation with Chromium are the same or slightly higher (Supplementary Fig. 24).
Mapping transcripts to predicted cells
The detected transcripts were mapped to cells using the final predicted segmentations. The segmentation map was resized back to the original pixel resolution using nearest neighbour interpolation. Transcripts located in the mask of a cell were added to the expression profile of the cell. This produced a genecell matrix n_{cells} × n_{genes}, which was used for performance evaluation and downstream analysis.
BIDCell training and loss functions
The BIDCell framework combines several loss functions that automatically derive supervisory signals from the input data and/or predicted segmentations at each step of the training process. This approach to learning is a core aspect of SSL^{32}. Furthermore, the modular and additive design of the loss functions allows each loss to be swapped out with alternative approaches to compute training signals. The SSL label describes the ability of the framework to automatically learn relationships between gene expressions and cell morphology from its inputs.
Our approach for learning the parameters θ of the segmentation model relies on minimising a total of 6 loss functions that we propose with our framework. Some of the losses effectively increase the number of pixels predicted for a cell, while others reduce the size of its segmentation. The nuclei encapsulation, cellcalling, oversegmentation, and overlap losses guide the basic morphology of cells. The positive and negative marker losses refine the cell morphologies learned through the other loss functions, by further guiding the model to learn biologicallyinformed relationships between gene expressions and cell morphology. This is reminiscent of the pretext and downstream (finetuning) stages commonly encountered in SSL, where the pretext task aids the model to learn better representations or intermediate weights, while the finetuning task refines the weights and further improves performance for a particular prediction task. Taken together, the losses ensure that the segmentation model learns relationships between spatiallylocalised, highdimensional gene expression information and the morphology of individual cells.
(A) Nuclei encapsulation loss
The segmentation of a cell must contain all the pixels of the cell’s nucleus. Additionally, the expressed genes in nuclei can guide the model to learn which genes should be predicted within cells. Hence, we included a loss function L_{ne} that incentivises the model to learn to correctly predict nuclei pixels:
$${L}_{ne}({{{{{{{{bf{x}}}}}}}}}_{{{{{{{{bf{nuc}}}}}}}}},hat{{{{{{{{bf{y}}}}}}}}})={{{{{{{{bf{x}}}}}}}}}_{{{{{{{{bf{nuc}}}}}}}}}log (hat{{{{{{{{bf{y}}}}}}}}})(1{{{{{{{{bf{x}}}}}}}}}_{{{{{{{{bf{nuc}}}}}}}}})log ({{{{{{{bf{1}}}}}}}}hat{{{{{{{{bf{y}}}}}}}}}),$$
(1)
where x_{nuc} is the binary nucleus segmentation mask, and (hat{{{{{{{{bf{y}}}}}}}}}) is the predicted segmentation for all cells of the corresponding training patch.
(B) Cellcalling loss
The aim of the cellcalling loss was to increase the number of transcripts assigned to cells. We also designed the cellcalling loss to allow BIDCell to capture celltype specific morphologies. Unique expansion masks e_{c} ∈ {0, 1}^{h×w} were computed for each cell based on the shape of its nucleus and whether its nucleus expression profile was indicative of an elongated cell type. The expansion mask of a nonelongated cell was computed by applying a single iteration of the morphological dilation operator with a circular kernel of 20 × 20 pixels to its binary nucleus mask.
The expansion mask of an elongated cell was computed based on the elongation of its nucleus, defined as the eccentricity of an ellipse fitted to its nucleus mask:
$$ecc=sqrt{1frac{{b}^{2}}{{a}^{2}}},$$
(2)
where a represents the length of the major axis, and b is the length of the minor axis.
We found that elongated cell types tended to have nuclei with higher eccentricity (Supplementary Fig. 1). Hence, the eccentricity of a nucleus could serve as a proxy for the shape of its cell via an elongated expansion mask. We computed each cellspecific elongated expansion mask using an elliptical dilation kernel applied to the nucleus. The horizontal and vertical lengths of the elliptical kernel were computed by:
$${l}_{h}=alpha times ec{c}_{nuc}times {l}_{t},$$
(3)
$${l}_{v}=left{begin{array}{ll}{l}_{t}{l}_{h},quad &,{{mbox{if}}},,{l}_{t}{l}_{h} > {l}_{vm} {l}_{vm},quad &,{{mbox{otherwise}}},end{array}right.$$
(4)
where α is a scaling factor set to 0.9, ecc_{nuc} is the eccentricity of the nucleus, l_{t} is the sum of l_{h} and l_{v}, which was set to 60 pixels, and l_{vm} is the minimum vertical length, which was set to 3 pixels. These values were selected based on visual inspection (e.g., the cells appear reasonably sized), and were kept consistent across the different elongated cell types and datasets used in this study. The elliptical dilation kernel was rotated to align with the nucleus and applied to the nucleus mask to produce the elongated expansion mask of the cell.
The expansion masks were used in our cellcalling loss function that was minimised during training:
$${L}_{cc}({{{{{{{bf{e}}}}}}}},hat{{{{{{{{bf{y}}}}}}}}})=frac{1}{M}mathop{sum }limits_{c}^{M}{{{{{{{{bf{e}}}}}}}}}_{{{{{{{{bf{c}}}}}}}}}log ({hat{{{{{{{{bf{y}}}}}}}}}}_{c})(1{{{{{{{{bf{e}}}}}}}}}_{{{{{{{{bf{c}}}}}}}}})log ({{{{{{{bf{1}}}}}}}}{hat{{{{{{{{bf{y}}}}}}}}}}_{c}),$$
(5)
where e_{c} is the expansion mask and ({hat{{{{{{{{bf{y}}}}}}}}}}_{c}) is the predicted segmentation of cell c of M cells in an input patch.
(C) Oversegmentation loss
We introduced the oversegmentation loss to counter the cell sizeincreasing effects of the cellcalling loss to prevent the segmentations becoming too large and splitting into separate segments. This loss function elicited a penalty whenever the sum of cytoplasmic predictions exceeded the sum of nuclei predictions for a cell in a given patch:
$${p}_{nuc,c}=mathop{sum}limits_{i}mathop{sum}limits_{j}sigma ({hat{q}}_{ijc}{x}_{nuc,ij}0.5),$$
(6)
$${p}_{cyto,c}=mathop{sum}limits_{i}mathop{sum}limits_{j}sigma ({hat{q}}_{ijc}(1{x}_{nuc,ij})0.5),$$
(7)
$${L}_{os}=left{begin{array}{ll}frac{1}{M}mathop{sum }limits_{c}^{M}({p}_{cyto,c}{p}_{nuc,c}),quad &,{{mbox{if}}},,mathop{sum }limits_{c}^{M}({p}_{cyto,c}{p}_{nuc,c}), > ,0 0,hfillquad &,{{mbox{otherwise}}},end{array}right.$$
(8)
where for cell c at pixel (i, j), ({hat{q}}_{ijc}) is the predicted foreground probability for cell c, x_{nuc,ij} ∈ {0, 1} is the binary nucleus mask, and σ is the sigmoid function. L_{os} was normalised by number of cells M to aid smooth training.
(D) Overlap loss
Cells are often denselypacked together in samples of various human tissues. This poses a challenge to segmentation models in predicting clear boundaries and coherent segmentations for neighbouring cells without overlap. We introduced the overlap loss to penalise the prediction of multiple cells occurring at each pixel:
$${s}_{ov,ij}=(1{x}_{nuc,ij})+mathop{sum }limits_{c}^{M}sigma ({hat{q}}_{ijc}(1{x}_{nuc,ij})0.5),$$
(9)
$${L}_{ov}=left{begin{array}{ll}frac{{sum }_{i}{sum }_{j}({s}_{ov,ij})}{Mhw},quad &,{{mbox{if}}},,{s}_{ov}, > ,0 0,quad &,{{mbox{otherwise}}},end{array}right.$$
(10)
L_{ov} was normalised by number of cells M, and the lateral dimensions h and w of the input to aid smooth training.
(E) Positive and negative marker losses
The purposes of our positive and negative marker losses were to encourage the model to capture pixels that contained positive celltype markers, and penalise the model when segmentations captured pixels that contained negative celltype markers for each cell. The marker losses refine the initial morphology learned through the other loss functions, by further guiding the model to learn biologicallyinformed relationships between gene expressions and cell morphology.
The positive and negative markers for the training loss were those with expressions in the highest and lowest 10 percentile for each cell type of a tissue sample. In our experiments, we found that a higher number of positive markers tended to increase the size of predicted cells as the model learns to capture more markers, and vice versa. We found that removing positive markers that were common to at least a third of cell types in each tissue type was appropriate across the different datasets for training.
The onehot encoded lists of positive and negative markers of the cell type for cell c were converted into sparse maps m_{pos,c} ∈ {0, 1}^{h×w} and m_{neg,c} ∈ {0, 1}^{h×w}. At each pixel, 0 indicated the absence of all markers, while 1 indicated the presence of any positive or negative marker for its respective map. m_{pos,c} and m_{neg,c} were then multiplied elementwise by the expansion mask e_{c} to remove markers far away from the current cell. Each marker map was dilated by a 3 × 3 kernel, which was based on the assumption that pixels in a 3 × 3 region around each marker were most likely from the same cell. We found this dilation to improve training guidance and segmentation quality, as the maps tended to be quite sparse.
The marker maps were then used to compute the positive and negative marker losses:
$${L}_{pos}({{{{{{{{bf{m}}}}}}}}}_{{{{{{{{bf{pos}}}}}}}}},hat{{{{{{{{bf{y}}}}}}}}})=frac{1}{M}mathop{sum }limits_{c}^{M}{{{{{{{{bf{m}}}}}}}}}_{{{{{{{{bf{pos}}}}}}}},{{{{{{{bf{c}}}}}}}}}log ({hat{{{{{{{{bf{y}}}}}}}}}}_{c})(1{{{{{{{{bf{m}}}}}}}}}_{{{{{{{{bf{pos}}}}}}}},{{{{{{{bf{c}}}}}}}}})log ({{{{{{{bf{1}}}}}}}}{hat{{{{{{{{bf{y}}}}}}}}}}_{c}),$$
(11)
$${L}_{neg}({{{{{{{{bf{m}}}}}}}}}_{{{{{{{{bf{neg}}}}}}}}},hat{{{{{{{{bf{q}}}}}}}}})=frac{1}{M}mathop{sum }limits_{c}^{M}sigma ({hat{{{{{{{{bf{q}}}}}}}}}}_{c}{{{{{{{{bf{m}}}}}}}}}_{{{{{{{{bf{neg}}}}}}}},{{{{{{{bf{c}}}}}}}}}0.5)$$
(12)
Total loss
The model was trained by minimising the sum of all the loss functions over N training patches:
$$mathop{min }limits_{theta }mathop{sum }limits_{n}^{N}[{lambda }_{ne}{L}_{ne}+{lambda }_{cc}{L}_{cc}+{lambda }_{os}{L}_{os}+{lambda }_{ov}{L}_{ov}+{lambda }_{pos}{L}_{pos}+{lambda }_{neg}{L}_{neg}],$$
(13)
where each λ represents a hyperparameter that scaled its respective L. The value of λ for all loss functions was set to 1.0 (except for the ablation and lambdas studies); this ensured our losses were not finetuned to any particular datasets.
Practical implementation
Details
To address computational efficiency concerns related to memory usage, we partitioned the spatial transcriptomic maps into patches of 48 × 48 × n_{genes} for input into UNet 3+. BIDCell has been verified for datasets containing up to 960 genes on a 12 GB GPU. It is also important to note that the number of genes primarily affects the weights of the first convolutional layer, thus having a minor impact on memory usage.
The patchbased predictions could result in effects along the patch boundaries such as sharp or cutoff cells. When dividing the transcriptomic maps into patches, we create two sets of patches of the same lateral dimensions with an overlap equal to half the lateral size of the patches. The predictions for the patches were combined (see Supplementary Fig. 25), without additional operations to resolve potential disagreement between predictions of the two sets. Only patches from the first set (no overlaps) were selected during training, while all patches were used during inference.
One image patch was input into the model at one time, though batch size was effectively n_{cells} due to reshaping (see section Deep learningbased segmentationInput). Neither normalisation nor standardisation were applied to the input image patches, such that the pixels depicted raw detections of transcripts.
The model was trained endtoend from scratch for 4000 iterations (i.e., using 4000 training patches). This amounted to a maximum of 22% of the entire image, thereby leaving the rest of the image unseen by the model during inference. Weights of the convolutional layers were initialised using He et al.’s method^{33}. We employed standard onthefly image data augmentation by randomly applying a flip (horizontal or vertical), rotation (of 90, 180, or 270 degrees) in the (x,y) plane. The order of training samples was randomised prior to training. We employed the Adam optimiser^{34} to minimise the sum of all losses at a fixed learning rate of 0.00001, with a first moment estimate of 0.9, second moment estimate of 0.999, and weight decay of 0.0001.
Time and system considerations
We ran BIDCell on a Linux system with a 12GB NVIDIA GTX Titan V GPU, Intel(R) Core(TM) i99900K CPU @ 3.60GHz with 16 threads, and 64GB RAM. BIDCell was implemented in Python using PyTorch. For XeniumBreastCancer1, which contained 109k detected nuclei, 41M pixels (x,y), and 313 genes, training was completed after approximately 10 minutes for 4000 steps. Inference time was about 50 minutes for the complete image. Morphological processing required approximately 30 min to generate the final segmentation. A comparison of the runtimes between different methods is included in Supplementary Fig. 26.
Ablation study
We performed an ablation study to determine the contributions from each loss function and effects of different hyperparameter values (Supplementary Figs. 4, 5). We used XeniumBreastCancer1 for these experiments. We evaluated BIDCell without each of the different loss functions by individually setting their corresponding weights λ to zero. Furthermore, we evaluated different parameterisations of the cellcalling loss. We experimented with different diameters for the dilation kernel for nonelongated cells, including 10, 20, and 30 pixels, and different total lengths of the minor and major axes l_{t} of the dilation kernel for elongated cells, including 50, 60, and 70 pixels. We also ran BIDCell without shapespecific expansions, thereby assuming a nonelongated shape for all cells.
Performance evaluation
We compared our BIDCell framework to vendorprovided cell segmentations, and methods designed to identify cell bodies via cell segmentation. Table 2 provides a summary of all methods compared from adapting classical approaches including Voronoi expansion, nuclei dilation, and the watershed algorithm, to recently proposed approaches for SST images including Baysor, JSTA, and Cellpose. Methods that were excluded from the evaluations include those that focus on the assignment of transcripts to cells and do not consider the cell boundaries, underperformance on the public datasets, lack of code and instructions to prepare data into the required formats, and failure of the method to detect any cells (Supplementary Table 1).
Settings used for other methods
We used publicly available code for Baysor, JSTA, and Cellpose with default parameters unless stated otherwise. All comparison methods that required nuclei information used identical nuclei as BIDCell, which were detected using Cellpose (v2.1.1) (see Nuclei segmentation).

Baysor – Version 0.5.2 was applied either without a prior, or with a prior nuclei segmentation with default prior segmentation confidence of 0.2. For both instances, we followed recommended settings^{35}, including 15 for the minimum number of transcripts expected per cell, and not setting a scale value, since the sample contained cells of varying sizes. We found the scale parameter to have a considerable effect on segmentation predictions, and often resulted in cells with unrealistically uniform appearances if explicitly set.

JSTA – default parameters were used. We encountered high CPU loading and issues with two regions of XeniumBreastCancer1, which yielded empty predictions for those regions despite multiple attempts and efforts to reduce input size.

Cellpose – Version 2.1.1 was applied to the channelwise concatenated image comprising DAPI as the “nuclei” channel, and sum of spatial transcriptomic maps across all genes as the “cells” channel, using the pretrained “cyto” model with automatic estimation of cell diameter.

Voronoi – Classical Voronoi expansion was seeded on nuclei centroids and applied using the SciPy library (v1.9.3).

Watershed – The watershed algorithm was performed on the sum of transcriptomic maps across all genes. Seeded watershed used nuclei centroids and was applied using OpenCV (v4.6.0).

Cellpose nuclei dilation – we applied dilation to nuclei masks as a comparison segmentation method. Each nucleus was enlarged by about 1 micron in radius by applying morphological dilation using a 3 × 3 circular kernel for one iteration. Overlaps between adjacent cell expansions were permitted.
Evaluation metrics and settings
We introduce the CellSPA framework, that captures evaluation metrics across five complementary categories. A summary of this information is provided in Supplementary Table 2.
[A] Baseline metrics
Overall characteristics
Celllevel QC metrics

Proportion of cells expressing each gene

Number of transcripts per cell

Number of genes expressed per cell

Cell area
$${{{{{{{rm{Density}}}}}}}}=frac{{sum }_{iin I}{n}_{i}}{A},$$
(14)
where ∑_{i∈I}n_{i} represents the sum of all total transcripts over a set I, and A represents the cell area.
Cell morphology metrics
We evaluated multiple morphologybased metrics and provide diagrammatic illustrations in Supplementary Fig. 27.
• Elongation =
$$frac{{W}_{{{{{{{{rm{bb}}}}}}}}}}{{H}_{{{{{{{{rm{bb}}}}}}}}}},$$
(15)
where W_{bb} represents the width of the bounding box, and H_{bb} represents the height of the bounding box.
Elongation measures the ratio of height versus the width of the bounding box (Supplementary Fig. 27f). Elongation is insensitive to concave irregularities and holes present in the shape of the cell. The value of this metric will be 1 for a perfect square bounding box. As the cell becomes more elongated the value will either increase far above 1 or decrease far below 1, depending on whether the elongation occurs along the height or width of the bounding box.
• Circularity =
$$frac{4pi times A}{{P}_{{{{{{{{rm{convex}}}}}}}}}^{2}},$$
(16)
where A represents the area, and P_{convex} represents the convex perimeter.
Circularity measures the area to perimeter ratio while excluding local irregularities of the cell. We used the convex perimeter of the object as opposed to its true perimeter to avoid concave irregularities. The value will be 1 for a circle and decreases as a cell becomes less circular.
• Sphericity =
$$frac{{R}_{{{{{{{{rm{I}}}}}}}}}}{{R}_{{{{{{{{rm{C}}}}}}}}}},$$
(17)
where R_{I} represents the radius of the inscribing circle, and R_{C} represents the radius of the circumscribing circle.
Sphericity measures the rate at which an object approaches the shape of a sphere while accounting for the largest local irregularity of the cell by comparing the ratio of the radius largest circle that fits inside the cell (inscribing circle) to the radius of the smallest circle that contains the whole cell (circumscribing circle). The value is 1 for a sphere and decreases as the cell becomes less spherical.
• Compactness =
$$frac{4pi times A}{{P}_{{{{{{{{rm{cell}}}}}}}}}^{2}},$$
(18)
where A represents the area, and P_{cell} represents the cell perimeter.
Compactness measures the ratio of the area of an object to the area of a circle with the same perimeter. Compactness uses the perimeter of the cell thus it considers local irregularities in the cell perimeter. A circle will have a value of 1, and the less smooth or more irregular the perimeter of a cell, the smaller the value will be. For most cells the numerical values for compactness and circularity are expected to be similar. Identifying which cells have large differences between these metrics can identify cells with highly irregular perimeters which may be of interest for downstream analysis and quality control for segmentation.
• Convexity =
$$frac{{P}_{{{{{{{{rm{convex}}}}}}}}}}{{P}_{{{{{{{{rm{cell}}}}}}}}}},$$
(19)
where P_{convex} represents the convex perimeter and P_{cell} represents the cell perimeter.
Convexity measures the ratio of the convex perimeter of a cell to its perimeter. The value will be 1 for a circle and decrease the more irregular the perimeter of a cell becomes, similar to compactness.
• Eccentricity =
$$frac{{L}_{{{{{{{{rm{minor}}}}}}}}}}{{L}_{{{{{{{{rm{major}}}}}}}}}},$$
(20)
where L_{minor} represents the length of the minor axis and L_{major} represents the length of the major axis.
Eccentricity (or ellipticity) measures the ratio of the major axis to the minor axis of a cell. The major axis is the longest possible line that can be drawn between the inner boundary of a cell without intersecting its boundary. The minor axis is the longest possible line can be drawn within the inner boundary of a cell while while also being perpendicular to the major axis. This gives a value of 1 for a circle and decreases the more flat the cell becomes.
• Solidity =
$$frac{A}{{A}_{{{{{{{{rm{convex}}}}}}}}}},$$
(21)
where A represents the area, and A_{convex} represents the convex area.
Solidity measures the ratio of the area of a cell to the convex area of a cell. This measures the density of a cell by detecting holes and irregular boundaries in the cell shape. The maximum value will be 1 for a cell with a perfectly convex and smooth boundary and will decrease as the cell shape becomes more concave and/or irregular.
Genelevel QC characteristics
• Proportion of cells expressing each gene
[B] Segmented cell expression purity. We implemented two broad classes of statistics to capture (i) the concordance of expression profile with scRNAseq data and (ii) the expression purity or homogeneity of cell type markers. The scRNAseq data used are described in Section Datasets and preprocessing and listed in Table 1.
• Concordance with scRNAseq data – We calculated the similarity of the expression pattern between the segmented cells and publicly available singlecell datasets. Here the similarity was measured by Pearson correlation of the average lognormalised gene expression for each cell type. We also calculated the concordance of the proportion of nonzero expression for each cell type between the segmented cells and scRNAseq data. For data with paired Chromium data from the same experiment, i.e., XeniumBrain, we also compared the cell type proportion and quantify the concordance using the Pearson correlation. We annotated the cell type annotation for segmented cells using scClassify^{36} with scRNAseq data as reference.
• Purity of expression – We first curated a list of positive markers and negative markers from the scRNAseq reference data. For each cell type, we selected the highest and lowest 10 percentile of the genes with difference of expression compared to other cell types. We also removed the positive markers that were common to more than 25% of cell types for a more pure positive marker list. For each segmented cell, we then consider the genes with the highest 10 percentile of expression as positive genes and lowest 10 percentile as negative markers. We then calculated the Precision, Recall and F1 score for both positive and negative markers. We further summarised the average positive marker F1 scores and negative marker F1 scores into one Purity F1 score for each method, where we first scaled the average positive and negative marker F1 scores into the range of [0, 1] and then calculated the F1 score of transformed metrics as the following:
$$F{1}_{{{{{{{{rm{purity}}}}}}}}}=2cdot frac{(1F{1}_{{{{{{{{rm{negative}}}}}}}}})cdot F{1}_{{{{{{{{rm{positive}}}}}}}}}}{1F{1}_{{{{{{{{rm{negative}}}}}}}}}+F{1}_{{{{{{{{rm{positive}}}}}}}}}}.$$
(22)
[C] Spatial characteristics
In this category, we measured the association between cell type diversity in local spatial regions and all the celllevel baseline characteristics provided in [A]. We first divided each image into multiple small regions. Then, for each local spatial region, we calculated the cell type diversity using Shannon entropy with the R package ’entropy’, where a higher entropy indicates a more diverse cell type composition. Next, we assessed the variability of celllevel baseline characteristics within each local region using the coefficient of variation. Subsequently, for each of the celllevel baseline characteristics mentioned in [A], we calculated the Pearson correlation between the cell type diversity (measured using Shannon entropy) and the coefficient of variation of these characteristics across all local regions. Here, we anticipate that regions with more diverse cell type compositions will exhibit higher variability in celllevel characteristics, leading to a stronger correlation between these two metrics.
[D] Neighbouring contamination
This metric is designed for cell segmentation to ensure that the expression signals between neighboring cells are not contaminated. For a pair of cell types (e.g., cell type A and B), we computed the Euclidean distance from each cell in cell type A to its nearest neighbor belonging to cell type B. We then grouped the cells of cell type A based on a range of distances. Within each group, we calculated the proportion of cells expressing a selected negative marker, which is a cell type marker for cell type B. We anticipate that the method with less contamination will result in segmented cells expressing lower levels of the negative marker, even when the distance to a different cell type is minimal.
[E] Replicability
Our analysis involved assessing the agreement between the XeniumBreastCancer1 and XeniumBreastCancer2 datasets, which are closely related in terms of all the celllevel baseline characteristics provided in [A]. As these datasets are considered to be sister regions, we anticipated that the distribution of all the baseline characteristics, as well as the cell type composition, would be similar. We use Pearson correlation to quantify the degree of concordance.
Statistics and reproducibility
All analysis was done in R version version (4.3.0). No statistical method was used to predetermine sample size. No data were excluded from the analyses. All cells that passed quality control were included in the analyses. The experiments were not randomized. The Investigators were not blinded to allocation during experiments and outcome assessment.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
 SEO Powered Content & PR Distribution. Get Amplified Today.
 PlatoData.Network Vertical Generative Ai. Empower Yourself. Access Here.
 PlatoAiStream. Web3 Intelligence. Knowledge Amplified. Access Here.
 PlatoESG. Carbon, CleanTech, Energy, Environment, Solar, Waste Management. Access Here.
 PlatoHealth. Biotech and Clinical Trials Intelligence. Access Here.
 Source: https://www.nature.com/articles/s4146702344560w