Bacterial leaf-infiltration assay for microscopy
Seeds of A. thaliana (accession Col-0) were surface sterilized by an overnight incubation at −80 °C followed by washing with ethanol (5–15 min shaking in a solution of 75% EtOH (Sigma-Aldrich) and 0.5% Triton X-100 (Sigma-Aldrich), followed by a 95% EtOH wash and drying in a laminar flow hood). Stratification was done in a 0.1% agar solution at 4 °C for 7 d before planting. Seeds were sown on potting soil (CLT Topferde; www.einheitserde.de), in 60-pot trays (Herkuplast Kubern). During the first 2 d after sowing (the germination period), the trays were covered with a transparent lid to reduce the likelihood of pest infection. Indoor growing conditions were as follows: Cool White Deluxe fluorescent bulbs (25 to 175 μmol m−2 s−1), 23 °C and 65% relative humidity. Plants were grown under long-day conditions (16 h of light) for 15 d before syringe-infiltration with mCherry-tagged Pst DC3000 at OD600 = 0.001. Only half of the leaf was infiltrated (in relation to the main vein). A 3xmCherry construct had been inserted at the attn7 site and was a kind gift from Brian Kvitko.
Pst DC3000 was grown overnight in Luria Broth with the appropriate antibiotics (gentamicin and nitrofurantoin, 5 μg ml−1 each), then diluted 1:10 on the following morning, and was grown for an additional 4 h to initiate the log phase, after which the bacteria were centrifuged at 3500g for 90 s, and resuspended in 10 mM MgSO4.
Three days after infections, leaves were dissected and placed on 0.5× MS medium with agar (Duchefa, M0255), inspected under a Zeiss Axio Zoom.v16 fluorescence stereomicroscope to verify that the mCherry signal was present, and immediately flash-frozen in liquid N2. The leaves were stored at −80 °C before cryosectioning.
Imaging of bacterial-infected leaves
Infected A. thaliana leaves were imaged on a Zeiss Axio Zoom.v16 fluorescence stereomicroscope, equipped with an LED array for transmitted illumination and an X-Cite XYLIS LED (Excelitas Technologies) for epi-illumination. All leaves were imaged using a PlanNeoFluar Z 1×/0.25 dry objective and a Hamamatsu ORCA-Flash4.0 digital CMOS camera (c11440-22C) with 2 × 2 binning. mCherry-tagged Pst DC3000 was detected using the Zeiss filter set 45 (00000-1114-462), which includes a 560/40 nm excitation filter, a 630/75 nm emission filter and a 585 nm dichroic mirror. Bright-field images were acquired as references for the outline of the leaves for the analysis. The camera exposure time was 220 ms at 5% of light intensity. The images of infected leaves have a pixel size of 18.6 µm2 and were acquired at a ×7 magnification. Image acquisition was done using the ZEN 2.1 software package.
For the analysis of microbial hotspots, microbial interactions and host responses to wild microbiomes, seeds of A. thaliana (accession Col-0) were germinated and grown indoors for seven short days (8 h of light). On 27 February 2019, the trays were placed outdoors near the Max Planck Institute for Biology Tübingen in a naturalized environment surrounded by other plants. Plants were irrigated weekly with regular tap water. Twenty-seven days after outdoor planting, individual leaves were sampled and immediately flash-frozen in liquid N2. Leaves from different plants were stored separately at −80 °C before cryosectioning.
To grow axenic plants, Arabidopsis Col(0) seeds were pretreated at 37 °C for 24 h, followed by a cold treatment at −20 °C for 24 h. The seeds were then rinsed in 70% ethanol for 5 min, 20% chlorine for 20 min, and washed in sterile water three times before being transferred to Murashige Skoog plant agar plates. Subsequently, the seeds were vernalized at 4 °C for 48 h before allowing them to germinate and grow into seedlings, still on the MS plates, at 20 °C under long daylight conditions (16 h of light and 8 h of darkness). Ten days after the vernalization, individual leaves were sampled and immediately frozen in liquid N2. Leaves from different plants were stored separately at −80 °C before cryosectioning.
Two batches (‘1’ and ‘2’) of axenically-grown leaves were analyzed in the SmT experiments. Leaves were prepared as described in the subsection Sample preparation and sectioning. Three leaf sections from batch 1 and five leaf sections from batch 2 were cryosectioned and attached onto two multimodal array capture areas, respectively. In addition, from the same leaves, four sections per leaf were collected to a Lysing Matrix D tube (MP Biomedicals) for total RNA extraction, which was performed using the RNAqueous-Micro Total RNA Isolation kit (Invitrogen, Thermo Fisher Scientific) using minor modification. Specifically, the leaf sections were disrupted using a Fastprep-24 instrument (MP Biomedicals) in 50 μl of Lysis Buffer at 6.0 m s−1 for 40 s. Subsequently, the homogenized tissue lysate was centrifuged and transferred to the binding column followed by washes with wash buffers according to the manufacturer’s protocol. Finally, the total RNA was eluted in 20 μl of elution solution, and 10 μl from each of the two samples was added onto two multimodal array capture areas, respectively, during the cDNA synthesis (instead of tissue sections).
Multimodal array structure
SmT uses multimodal slides (10x Genomics) with capture areas of 6.5 × 6.5 mm. Each capture area comprises 4,992 spots, with diameters of 55 μm each. The spots are covered with capture probes in the following proportion: 45% 16S rRNA probes, 45% 18S rRNA/ITS probes and 10% poly-d(T) probes.
Probes were designed using the following two approaches: one based on established primers of the relevant marker genes (P799 (ref. 65) and P902 (ref. 66)) and a de novo approach (P1265 and P479) (Supplementary Fig. 18). On average, the probing sites were 100 nt upstream of the target site. In general, we aimed to maximize the following two variables: the conservation of the probe sites and the variability of the 100 nt downstream target sites. The de novo design process was adopted because previously designed primers were suboptimal with respect to these criteria.
Previously designed primers were used as templates due to their wide usage in the field, which is indicative of useful specificity—it implies that they have a wide taxonomic range and good ability to exclude host reads such as those originating from 16S chloroplast rRNA. Four probes were designed based on the following previous primers; the 16S probes 16S:P799 (5′-TTA VVG CRT GGA CWM CCM GGG TAT CTA ATC CKG TT-3′) and 16S:P902 (5′-CSS YTG TGY GSG GSC CCC CGT CAA TTC MTT TGA GTT TYA RYC-3′) were based on the mainstream primers 799F65 and 902R66, respectively. Additionally, the eukaryotic capture probes 18S:P-ITS1 (5′-CCT ACG GAA ACC TTG TTA CGA CTT TTT ACT TCC TCT AAA TGA CCA AG-3′) and ITS:P-ITS7 (5′-RRG CGC AAK RTG CGT TCA AAG ATT CGA TGA YTC AC-3′) were based on the mainstream primers ITS1F67 and ITS7F68, respectively. To fit the primers to the annealing conditions of the array, we reversed-complemented all forward-oriented primers (that is, all of them but 902R; the target RNA is single stranded, so reversal of the primer orientation was needed to capture it) and elongated them to obtain 35–45 bp long sequences, as recommended for microbial profiling in microarray systems69. To this end, 16S rRNA and ITS custom databases were downloaded (on 29 April 2020) from NCBI GenBank and the sequences downstream of the primer (up to 100 nt, including the primer) were extracted. These sequences were then aligned using the software Clustal Omega (v1.2.4) and the sequence profiles were plotted using weblogo (v3.7.5). The primers were elongated by manual inspection of the resulting weblogo. The length and degeneracy level were limited to obtain fewer than 35,000 unique probe sequences.
In addition to these probes, the following two de novo 16S probes were designed to complement the primer-based probes (as shown in Supplementary Fig. 18): 16S:P1265 (5′-GGT AAG GTT YYK CGC GTT GCD TCG AAT TAA ACC RCAT-3′) and 16S:P479 (5′-TCT CAG THC CAR TGT GGC YBD YCD YCC TCT CARR-3′). To design these probes, representative sequences were selected from the SILVA 16S database (v138.1) using CDHIT (v4.8.1) to the level of 99% sequence identity. Representative sequences were aligned using MAFFT (v7.245), and the sequence profile was plotted using weblogo (v3.7.5). In this process, we targeted highly variable regions with a conserved matching probing site.
Sample preparation and sectioning
The leaves stored at −80 °C were immersed in 50% Optimal Cutting Compound (OCT, Sakura) in PBS (Medicago). Embedded samples were frozen in a cryostat (Cryostar NX70, Thermo Fisher Scientific) and sectioned to obtain 14-μm longitudinal sections. Tissue sections were then laid over the multimodal capture areas of the arrays.
Tissue optimization experiment
Tissue permeabilization conditions were identified using a modified variant of a previously reported protocol29. Briefly, after attaching of the tissue section to the slide surface containing 100% poly-d(T) capture probes, the tissue was fixed in methanol (VWR) at −20 °C for 40 min and stained with 0.05% Toluidine Blue (Sigma-Aldrich) at room temperature for 2 min. Tissue sections were imaged using a Zeiss AxioImager 2T and a Metafer slide scanning system (v. 3.14.2, MetaSystems). They were then permeabilized with pepsin (Sigma-Aldrich) in 0.1% per 0.1 M HCl (Fluka) at 37 °C for 30 min. The plant mRNA molecules that had hybridized to the capture probes were reverse transcribed to cDNA using SuperScript III (Invitrogen, ThermoFisher Scientific) and Cy3-dCTP-nucleotides (PerkinElmer) at 42 °C overnight. Tissue sections were removed from the slide surface by incubation for 1 h at 37 °C in a hydrolytic enzyme mixture consisting of pectate lyase (Megazyme), xyloglucanase (Megazyme), xylanase 10A (Nzytech), β-mannanase 26A (Nzytech) and cellulase (Worthington) in monobasic sodium citrate (Sigma-Aldrich), pH 6.6. They were then incubated with 2% β-mercaptoethanol (Calbiochem) in RLT buffer (Qiagen) and proteinase K (Qiagen) in PKD buffer (Qiagen) for 1 h each. Finally, the fluorescent cDNA footprint was imaged using an Innoscan 910 (Innopsys) slide scanning system and Mapix image analysis software (v9.1.0, Innopsys) with a pixel size of 5.0 and a gain of 50.
Sequencing library preparation
Sequencing libraries were prepared according to the Visium protocol (10x Genomics) with the following modifications: multimodal slides with leaf sections attached to the capture areas were incubated for 2 min at 37 °C followed by a 40-min fixation in methanol (VWR) at −20 °C. Capture areas were washed with PBS (Medicago) and incubated for 2 min at 37 °C. Tissue sections were stained for 2 min with 0.05% Toluidine Blue (Sigma-Aldrich) at room temperature followed by two washes with ultrapure water and warming at 37 °C for 2 min. The slides were mounted with 85% glycerol (Merck) and the bright-field images were acquired with a Zeiss AxioImager 2X microscope and a Metafer slide scanning system (v. 3.14.2, Metasystems) at ×20 magnification. To increase permeabilization efficiency and reduce the effect of secondary metabolites, the slides were incubated in 2% (wt/vol) polyvinylpyrrolidone 40 (PVP-40, Sigma-Aldrich) at room temperature for 10 min. Host plant and eukaryotic microbial cells were permeabilized using the permeabilization enzyme (10x Genomics) at 37 °C for 30 min. Bacterial organisms were permeabilized using 10 mg ml−1 lysozyme from chicken egg white (Sigma-Aldrich) in 0.05 M EDTA pH 8.0 (Invitrogen) and 0.1 M Tris–HCl, pH 7.0 (Invitrogen) for 30 min at 37 °C.
The rest of the SmT workflow followed the procedure described in the Visium Spatial Gene Expression user guide with the following modification: reverse transcription was performed using 2% (wt/vol) PVP-40 instead of nuclease-free water to reduce adverse impacts due to secondary metabolites and cDNA was amplified by performing 12–15 PCR cycles. Libraries were sequenced using the Illumina Nextseq 2000 and Nextseq 1000/2000 P2 or P3 Reagents (200 cycles) kit.
Preprocessing of the reads and bright-field images
Template switch oligo and long poly-A stretches were removed from Read 2 using cutadapt v. 2.9 (ref. 70). The location of the tissue was determined using the Loupe Browser v. 5.1.0 (10x Genomics), in which all the spots containing at least 25% of the tissue were selected and their locations (that is, x and y coordinates) were recorded.
TSO- and poly-A trimmed reads were analyzed using the ST Pipeline71 (v. 1.7.9, https://github.com/jfnavarro/st_pipeline), which enables simultaneous analysis of the spatial location, unique molecular identifier (UMI) and mRNA molecule. First, the pipeline trimmed poly-N stretches that are longer than 15 bp. Read 2 was then mapped against the A. thaliana TAIR10 genome release72 using the STAR v. 2.7.7a73 mapping tool and annotated with htseq-count 1.0 (ref. 74). The spatial barcode in read 1 was demultiplexed using Taggd (v. 0.3.6)75 and the information from read 1 and read 2 was combined. The ST Pipeline then grouped the reads based on the spatial barcode, gene and genomic location. Finally, the unique molecules were identified using a UMI and the counts were compiled into the gene-count matrix.
Taxonomic assignment of microbial reads
Reads were mapped against the A. thaliana reference genome using STAR v. 2.7.7a73 and all reads aligning to the genome were discarded, leaving putative microbial reads. Next, read datasets were demultiplexed based on their probe types (that is, 16S rRNA and ITS/18S rRNA). For each probe dataset, the reads were first clustered into representative sequences by the fastx_uniques module of usearch v. 11.0.667 (ref. 76). Next, the representative sequences (query) were searched for the best homolog (hit) in the NCBI NT database (downloaded on January 2021)77 using MMseqs2 v. 1f30213 (refs. 78,79). For each query, all of the best hits (that is, those with the highest identical bit score and a taxonomic assignment on the genus level) were selected for further consideration. Next, the taxonomic assignment for a query was set as the lowest common ancestor (LCA) among the best hits as calculated by TaxonKit v. 0.7.2 (ref. 80) using the NCBI Taxonomy database (downloaded on January 2021)81. For 18S rRNA/ITS probes, reads were further considered if they were classified as Eukaryota but not as unclassified, chloroplast, mitochondria, uncultured, Streptophyta, Chordata or Arthropoda on the genus or the phylum levels. Similarly, for 16S rRNA probes, reads were considered if they were classified as bacteria but not as unclassified, chloroplast, mitochondria or uncultured on the genus level. Finally, reads were further filtered by their UMI, such that for each spatial location, only one representative read with a given UMI was retained. The number of reads considered for each dataset is provided in Supplementary Table 11.
The annotation of the sequences used for taxonomic assignment was assessed to confirm that they originated from the expected locus. On average, 93.7 and 96.8% of the sequences captured by the 16S and ITS probes, respectively, were annotated as 16S and ITS rDNA loci, and most of the rest were annotated as full genomes, which include 16S and ITS rDNA loci (Supplementary Table 12). We further validated the observed microbial profiles by confirming that the reads containing each of the targeted probes fell within the expected range when aligned against the corresponding sequences in the NCBI ‘nt’ database (Supplementary Fig. 47). This further confirms that the reads originated from the expected targeted region.
Pst DC3000 infection experiment—data processing
Processed, aligned reads were analyzed using STUtility (v. 0.1.0)82. To exclude low-quality spots, the A. thaliana host data and bacterial-unique molecules were summed together and every spot with fewer than 20 unique molecules was discarded. Each spot containing less than 10 unique genes/taxa was discarded. The visualized genes and taxa were log10 normalized and projected on a bright-field image of the tissue section with an opacity of 0.75.
The maximum fluorescence intensities for each spot location were performed by manually aligning the fluorescence image and bright-field image. Then Matlab (2022a) was used to identify the centers of the spots and the k-nearest-neighbor algorithm was implemented to identify pixels that are a maximum of 27.5 μm away from the center. Maximum fluorescence values for each of the spots were extracted.
To generate the scatter plots and Pearson correlation, log10-normalized SmT captured unique Pseudomonas molecules were plotted against the log10-normalized host PR1 gene expression and the log10-normalized maximum Pseudomonas fluorescence values from the fluorescence imaging using ggplot2 (v. 3.3.5.)83. The Eulerr84 package in R was used to generate the Venn-diagrams with a cutoff of 45 and 120 for leaves 1 and 2, respectively, to remove the background fluorescence signal and minimum of 1 unique molecule per spot for SmT captured Pseudomonas and PR1. Hotspots analysis using the fluorescence signal values with the applied cutoffs, as well as using the SmT reads, was performed as described in the subsection Analysis of microbial hotspots.
Glass slides bearing a multimodal capture array (10% poly-d(T) probes, 45% bacterial 16S rRNA probes and 45% eukaryotic 18S rRNA/ITS probes), a 100% poly-d(T) array, a 100% bacterial 16S rRNA array and a 100% eukaryotic 18S rRNA/ITS array were used. Three leaves were sectioned on each of these capture slides meaning every leaf had a consecutive section on each array type. Sequencing libraries were prepared as per the above protocol and sequenced with Nextseq 2000 (Illumina). The reads were annotated as described above and analyzed using R (v. 4.0.5).
STUtility (v. 0.1.0) was used to read the A. thaliana data to an object and sums of gene values were log10-transformed. Pairwise Pearson correlation coefficients were calculated and visualized with the corrplot package (v. 0.92) function corrplot.mixed using significance levels of 0.001, 0.01 and 0.05, with hierarchical clustering permitted. The scatter plots are visualized using ggplot2 (v. 3.3.5.)83.
For bacterial 16S rRNA and eukaryotic 18S rRNA/ITS data, unique molecules were summed together per taxon, generating a table containing the sum of unique molecules, phylogenetic paths and metadata relating to section identification. Any annotations to phylum Streptophyta were removed, after which the data were divided into bacterial and fungal datasets based on their superkingdom. For taxonomic rank plots, the unique molecules for the different taxonomic levels were counted and compared with the 100% poly-d(T) array to calculate the fold change for microbial taxa at each of the taxonomic levels. Pairwise correlations, and unique molecules for each taxonomic rank, were only calculated for classified reads. We performed the analysis three times—first with all taxa and then with only the most highly expressed 500 and 20 taxa. Shannon diversity and Bray–Curtis similarity were calculated using vegan R package (v. 2.5-7)85.
Simulation of probe concentration and effect on diversity
Different proportions of reads—ranging from 5 to 95%—were sampled of samples analyzed on a 100% 16S rRNA or 18S/ITS rRNA array to simulate the effect of different probe concentration on the captured microbial diversity (Shannon diversity index). The procedure was repeated 100 times. The distribution of this simulated Shannon diversity is presented together with the diversity observed in the 45% probe concentration multimodal SmT array.
Saturation of the host information was calculated by subsampling the annotated reads to the saturation point (2,000; 3,718; 8,389; 21,085; 55,598; 149,413; 404,428; 1,097,633 and 2,981,957 reads), and unique molecules and genes were counted and plotted against the saturation points.
Validation of SmT with amplicon sequencing
To compare the performance of SmT to that achieved with amplicon sequencing, seeds of A. thaliana (accession Est-1) were surface sterilized and stratified at 4 °C for 1 week in a refrigerator, and then sown in plastic trays (Herkuplast) filled with wild soil from the Heuberger Tor experimental site of the University of Tübingen (Germany). The seeds were left outside to germinate in the same field in late September. The plants developed and overwintered without supplemental watering. Additional plants in each pot were thinned in January 2020 with tweezers, and individual plants were sampled at the end of March 2020 before flowering. The sampling protocol involved cutting the mature rosettes with sterile scissors, placing them in sterile 50 ml centrifuge tubes, and vigorously shaking them in sterile water. The water was then dumped and new water was added until the leaves released no further dirt. After washing, plants were immediately flash-frozen in liquid N2, and subsequently stored at −80 °C prior to nucleic acid extraction. Both DNA and RNA were extracted from each plant. The entire rosette was lysed in a buffer containing 2% β-mercaptoethanol to extract all nucleic acids while preserving RNA. One proportion of the lysate was used for RNA extraction by the phenol/chloroform protocol, while another portion was used to extract DNA following a previously described potassium acetate and SPRI bead protocol86. The DNA moiety was used for 16S rDNA amplicon sequencing. The following two sets of primers were used: (1) 515F-806R (V3-V4) in combination with plastid-blocking clamps87 and (2) 799F-1192R (V4-V6), which does not amplify chloroplasts and for which the mitochondrial amplicon was removed by gel extraction88. The RNA moiety was used for SmT, using the same pipeline as for all other samples with the exception that crude extracts were used in place of leaf samples (so spatial information was not extracted). A total of 300 μg of RNA was used for the array. In total, four plant samples were used for 16S rRNA profiling, comparing two amplicon sequencing primer sets to the SmT array, with the exception of leaf C for which amp-seq 799F-1192R was not performed. The reads obtained by amplicon sequencing were analyzed in the same way as the array reads, excluding the initial mapping to the A. thaliana TAIR10 database. For both of the methods, the reads were subsampled to the same sequencing depth. See the ‘Read alignment’ and ‘Taxonomic assignment of microbial reads’ subsections for information about the full pipeline.
Spearman correlation was calculated between all taxonomic profiles at the genus level (each amp-seq-primer-pair-profile with the SmT-profile and with the other primer-pair derived profile). In all comparisons, only taxa that were detected in both profiles were accounted for. The analysis was performed and plotted using the ggpairs function which is part of R GGally package (version 2.1.0)89.
Analysis of microbial hotspots
Microbial hotspots (based on 16S rRNA/ITS reads) were identified using the Getis-Ord G statistic90 as implemented in the localG function of the R spdep package (v. 1.1.11)91. The calculation was performed using a 2 × 2 grid applied to the count matrix resulting from the sum of reads belonging to the 50 most abundant genera (separately for 16S rRNA/ITS reads). A similar calculation was done for individual host genes so that the association between microbial and G-values for individual host genes could be done. The p.adjustSP function of the R spdep package was used with the BH-FDR92 method to correct the G stats P values while accounting for the number of neighbors of each region. Hotspot spatial maps were plotted using the R tmap package (v. 3.3-2).
Microbial interaction network analysis
Microbial interactions were inferred based on the Spearman rank correlation coefficient (SRCC) values of the reads count associated with each pair of genera. Specifically, for each pair of microbial genera, in each leaf section, SRCC was calculated accounting for all spots of the array (that is, each spot on the array was considered as a ‘sample’ for each genus). We considered pairs of genera to be interacting if their SRCC-corrected P value (BH-FDR) was below 0.05. Next, to account also for the spatial organization of microorganisms in the array, we computed the SRCC value of each candidate pair based on shuffled abundance matrices. This step, repeated 1,000 times, results in an empirical null distribution of expected SRCC values where the spatial association between paired genera is random. The shuffled count matrix was generated by using the permatfull function implemented in the R vegan package (v. 2.5.6) while keeping the total number of reads associated with each genus across all samples (spots) constant (that is argument fixedmar = ‘columns’). Finally, the significance of each candidate pair of genera was calculated by comparing the SRCC value based on the unshuffled count matrix to the empirical null pair distribution93 following a BH-FDR correction. Microbial interactions were considered to be also spatially significant if their corrected-empirical P value was below 0.05. The network was created based on these microbial pairwise correlation values using the R igraph package (v. 1.2.6) and plotted using the R ggraph package (v. 2.0.5).
Host mRNA clustering
For the A. thaliana host data, the counts were filtered using STUtility (v. 0.1.0)82 by removing the low-quality spots and genes containing at least 10 and 30 counts, respectively. In addition, each spot was required to have at least 10 genes and each gene was required to cover at least 20 spots. Chloroplast, mitochondrial, ribosomal and noncoding genes were filtered from the data set because many of them are not polyadenylated and might contain genes captured with 16S rRNA and 18S rRNA/ITS probes. Finally, after the filtering steps, the spots with fewer than 10 genes were removed because they were considered to be of low quality.
Each section was normalized individually using the Seurat (v. 4.1.0)94 function sctransform95 to eliminate intrasection batch effects. To reintegrate the sections back together, anchor features were selected and the whole data was scaled based on these features. Principal component analysis (PCA) was performed on this data using identified variable features. Based on the results of the PCA, the intersection batch effects (experiment date, plant and leaf) were removed with Harmony (v. 0.1.0)96 using a diversity clustering penalty of 4 and PCA dimensions of 1 to 8.
Normalized gene counts were projected onto 2D leaf section images using UMAP33 with the eight first dimensions from Harmony and a resolution of 0.22. To identify cluster-specific markers, raw counts were normalized using the NormalizeData function with LogNormalize as a normalization method and the FindAllMarkers function with the parameters of test.use = ‘poisson’ and logfc.threshold = 0.15.
Spot cell-type deconvolution
Cell-type proportions in the spatial host data were analyzed using Stereoscope (v. 0.3)97 with the Single Cell Leaf Atlas data98, who kindly provided the raw count data and cell-specific annotations. Stereoscope used raw gene-count matrices from single-cell data and raw spatial data from which spots outside the tissues had been removed. The stereoscope was run with a –gpu setting using batch sizes of 2,048 and epoch sizes of 50,000 for spatial and single-cell dataset and 5,000 most highly expressed genes from the single-cell dataset.
The single-cell data contained 19 clusters, which were reduced to the following five: mesophyll (11 clusters), vascular (4 clusters), epidermis (1 cluster), guard cell (1 cluster) and hydathode (1 cluster). These collapsed as well as the 19 original clusters were projected on tissue using STUtility (v. 0.1.0)82 and heatmaps for each of the clusters were generated with pheatmap (v. 1.0.12)99. To aid the visual interpretation, the cell-type proportions were scaled by quantiles using the 95th percentile of the data in each section and cell type.
We used the Boruta algorithm30 to determine which set of A. thaliana genes is important to explain the microbial load on each spot of the array. Briefly, we modeled the relationship between the expression profile of all A. thaliana genes—G1…Gn and M—the sum of the 50 most abundant bacterial/fungal genera in each spot of the array (M)—M ~ G1…Gn. We treated the task as a regression problem and used the random forest algorithm100 to calculate the importance of each gene in the model. Next, we used Boruta to assign a significance score for each gene based on its importance for the model’s accuracy. For this purpose, we used the R implementation of the Boruta package (v. 7.0.0) with 1,000 trees. This procedure was performed for each leaf section, once using the un-normalized read counts and once using the Getis-Ord G statistic value, treating each spot as an observation. Overall, a gene was considered further if it was found to be significant by Boruta for at least one measure (that is, reads count or G statistics), and if its SRCC P value (after FDR correction) was below 0.01. GO enrichment analyses were performed with the DAVID web server with the DAVID knowledgebase v2022q1 (refs. 101,102).
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.