Search
Close this search box.

Cell atlas of the regenerating human liver after portal vein embolization – Nature Communications

Experimental model

Human liver tissue samples

Human adult liver tissue samples were obtained from macroscopically “healthy” tissue that remained from resected human liver of patients with primary or secondary liver tumors or benign liver diseases with or without pretreatment with PVE. Participants of this study gave their informed consent that their tissue samples and patient data (sex, age, diagnosis) can be used after pseudonymization for research purposes and publication, according to the ethical guidelines of Leipzig University Hospital (006/17-ek, 21 March 2017, revised and renewed 12 February 2019). Acquired tissue from portal vein embolized livers included samples of regenerating (R) and embolized (E) tissue whereas tissue samples from benign liver diseases were defined as quiescent healthy controls (H) (Supplementary Data 1). Donors have received no compensation for participating in this study.

Experiments

Isolation of human liver cells from fresh tissue

Isolation of the primary human hepatocytes (PHH) and non-parenchymal cells (NPC) from the liver tissue was performed as described previously49. Briefly, PHH and NPC were isolated from the same tissue sample simultaneously by a two-step EDTA/collagenase perfusion technique and then purified by Percoll density gradient centrifugation. To obtain different cell types from NPC fraction, this cell suspension underwent two different centrifugation steps: 300 g for 5 min to get liver endothelial cells, mesenchymal cells and Kupffer cells; 650 g for 7 min to get the majority of Kupffer cells. Finally, 3 cell suspensions, PHH, NPC300 and NPC650, were used to prepare a single-cell RNA-seq experiment. In case of samples with medical conditions, the regenerating, and embolized tissues, the isolation procedure was made in the same manner.

Single-cell suspension preparation and single-cell RNA-seq experiment

Single-cell RNA-seq experiments were performed using a 10X Genomics platform. Before loading on a microfluidic chip, suspensions of PHH and both NPC fractions were washed and filtered at least twice in ice-cold 1X HBSS without Ca2+ and Mg2+ (HBSS w/o, Sigma) to remove tissue and cellular debris and to get individual cells in media that is compatible with the downstream experimental steps. Wide-bore pipette tips were used working with PHH to avoid premature cell lysis, while p1,000 and p200 pipettes were used gently resuspending NPC cell pellets.

Preparation steps of the single-cell suspension were made on ice and every time washed samples were spun down using a 4 °C cooled centrifuge: PHH at 50 g for 5 min, NPC300 at 300 g for 5 min, and NPC650 at 650 g for 7 min. Finally, to generate single-cell suspension, PHH suspension was filtered through 40 and 30 and both NPC fractions through 30 μm diameter cell strainers. Cell viability and concentration were assessed using a cell analyzer (MuseTM Cell Analyzer, Luminex Corporation). NPC300 and NPC650 were pooled 1:1 and loaded on a 1 channel and PHH on a 2 channel of the Single Cell A and B chips targeting 6000–8000 cells per each sample. All steps of the single-cell suspension preparation for the regenerating and embolized tissue samples were executed following the healthy tissue protocol.

The next steps were conducted as described in the Chromium Single Cell 3’ Reagent v2 and v3 Kits. In brief, after generation of the droplets with the single cells and barcoded beads, cDNA synthesis was performed. Next, droplets were broken, cDNA was amplified and libraries were constructed with different Chromium i7 Sample indexes in order to record sample assignment during computational analysis. Finally, single-cell libraries were run paired-end (28 bp, 8 bp, 100 bp) on an Illumina HiSeq2500 platform on 2 lanes. Experimental summary metrics can be found in Supplementary Data 8.

Frozen human liver tissue dissociation into single-nucleus suspension and flow cytometry sorting

Human frozen liver tissue samples were dissociated into single-nuclei combining liquid homogenization cell lysis with Dounce homogenizer and detergent-based lysis methods. All steps of the nuclei isolation were performed on ice with precooled solutions and using 4 °C mode centrifugation. The dissociation protocol that was previously used on brain tissue50, was optimized here to maximize the nuclei isolation for the liver tissue. The protocol after the optimization included the following steps: first, thawing tissue sample was cut into smaller pieces, minced and transferred into a glass dounce homogenizer. 30 strokes of pestle A were used to homogenize the tissue in 0.3 M Sucrose (Sigma) solution including 0.002 M EDTA (Thermo Scientific), 1% BSA (Serva), and 1% Tergitol solution (Sigma). After 5 min of incubation, next 30 strokes of pestle B were applied to finalize the disruption process and deliberate nuclei from the cells into suspension. Homogenized solution was centrifuged at 600 g for 5 min and the nuclei pellet was washed twice in PBS solution (0.002 M EDTA, 1% BSA, 0.2 U/ul RNase Inhibitor (Thermo Scientific)). Finally, to remove any aggregate and debris the nucleus suspension was filtered through a 30 μm diameter strainer and resuspended in PBS solution (1% BSA, 0.2 U/ul RNase Inhibitor). Further, to enrich for individual nuclei, the suspension was sorted by applying a 4-way purity mode based on the selected DAPI positive nuclei population (1:1000, BD Pharmingen) using forward and side scatter gating strategy (FACS). These nuclei were sorted in bulk and kept on ice for >30 min. To ensure that the sorted nuclei were intact, they were stained with DAPI (1:500) to inspect under the fluorescence microscope and finally counted using a hemocytometer before a single-nucleus RNA-seq experiment.

Single-nucleus RNA-seq experiment

Sorted single-nucleus suspensions were loaded on a Single Cell B chip to generate single-nucleus gel beads in emulsion on a 10X Chromium controller. Single-nucleus RNA-seq libraries were prepared following the protocol of the Single Cell 3’ Reagent Kit v3 and sequenced paired-end (28 bp, 8 bp, 100 bp) using an Illumina HiSeq2500 platform.

Cross-sectional Imaging

For liver segmentation and volume measurement 3D-datasets of both computed tomography (CT) and magnetic resonance imaging (MRI) were used.

CT was acquired as a 128-slice multidetector helical intravenously contrast-enhanced (10–120 mL iomeprol; Imeron 400, Bracco, Milan, Italy) scanner (Ingenuity, Philips, Best, The Netherlands). Contrast phase scanning was adjusted to the necessity of visualization of all liver architecture relevant for segmentation (i.e., liver veins, portal vein branches (scan delay of 70–90 s after injection). Primarily axial reconstruction of images in 1–2.5 mm slice thickness (increment, 1)

MRI (1.5T, Magnetom Aera, Siemens, Erlangen, Germany) was acquired with the use of 0, 1 mL/kg body weight gadoxetic acid (Primovist/Eovist, Bayer, Leverkusen, Germany), in the early dynamic phases and negative contrast of blood vessels in the hepatobiliary phase (15–25 min. delay after injection) and a fat saturated breath hold T1-weighted interpolated 3D-sequence (VIBE).

All image data sets were digitally archived (DICOM format), pseudoanonymized, and exported to a dedicated post-processing workstation (Lenovo ThinkStation, Lenovo, Beijing, China) inside the institutional network.

Volume segmentation was done semi-automatically by outlining the liver surface excluding large hilar vessels and interceptions with a dedicated post-processing tool (3D Slicer, open-source software). Virtual resection along anatomic landmarks were performed to assess the future liver remnant before and after the PVE regarding the outcome parameters volume, vessel architecture, and tumor progression. 3D-Visualization of the 3D model was performed using the Blender software (Blender Foundation, Amsterdam, Netherlands).

H/E staining and Immunohistochemistry

For investigation of tissue sections human liver tissue samples (n = 3 per condition, Supplementary Fig. 7) were fixed with paraformaldehyde (PFA, Carl Roth, Karlsruhe, Germany), embedded in paraffin, sectioned into 3.5 µm thick slices using a microtome (MicromHM430, Thermo Fisher Scientific, Waltham, MA, USA), and mounted on slides.

For Hematoxylin and Eosin (H&E) staining the tissue sections were rehydrated. Then the tissue sections were incubated in Hämalaun (Merck, Darmstadt, Germany) for nuclei staining for 5 min at RT and washed under rinsing water for 10 min. Afterwards the tissue sections were stained with Eosin G-solution (Merck, Darmstadt, Germany) for 3 min. Finally, the tissue sections were dehydrated and the slides were embedded in the non-aqueous mounting medium Entellan (Merck, Darmstadt, Germany).

The immunostaining of tissue sections and their microscopic evaluation was performed as described previously51 with the following modifications. Briefly, tissue sections were rehydrated and epitopes were retrieved. Then, tissue slices were blocked for endogenous peroxidase activities and for unspecific binding. For detection of GLUL, HAL, SAA12, IGFBP2, CYP3A4, CK19, CD31 specific primary antibodies (all Abcam, United Kingdom, Cambridge) were used (Supplementary Data 9). Antibodies were diluted in TBS (Sigma, Munich, Germany) with 1% BSA (Sigma, Munich, Germany) and 0.03% TritonX-100 (Sigma, Munich, Germany).

The antibodies against the targets were visualized using Peroxidase-conjugated secondary antibodies (Supplementary Data 9). Secondary antibodies were diluted in TBST (TBS + 0.5% Tween20 (Sigma, Munich, Germany)) supplemented with 1% BSA. Detection was performed by using 3,30-Diaminobenzidine (DAB, Sigma, Munich, Germany).

The EnVision+Dual Link System-HRP (Dako, Glostup, Denmark) was used according to manufacturer instructions when the targets showed low expression in the tissue sections. All reactions were stopped and cell nuclei were stained with hematoxylin. Finally, the slices were dehydrated and embedded using Entellan (Merck, Darmstadt, Germany).

For immunofluorescence (IF) staining the above described procedure was repeated with the following modifications. Primary antibodies were diluted in TBS with 1% BSA and 0.1% Tween20. Further, the staining was performed without blocking for peroxidases. For visualization the secondary antibodies ALEXA Fluor 647 donkey anti rabbit and ALEXA Fluor 488 donkey anti mouse (both Abcam, United Kingdom, Cambridge) were used and diluted as described above. Cell nuclei were stained with Hoechst. Finally, the slides were embedded in Mowiol-488 (Carl Roth, Karlsruhe, Germany). Negative controls for DAB and IF stainings were made from all donors and treated in the same way but without usage of a primary antibody.

Zonation imaging analysis

Whole slide images of IF stainings were captured in fluorescence mode using a Slide Scanner (AxioScan Z1, Carl Zeiss, Oberkochen, Germany) with a 20/0.8 M27 Plan-Apochromat objective with channel 1 (Target) light source 630 nm, light source intensity 50% with extinction wavelength 631 nm and emission wavelength 647 nm and with channel 2 (Hoechst) fluorescence light source 385 nm, light source intensity 14,53% with extinction wavelength 353 nm and emission wavelength 465 nm with an Axiocam 506 m as imaging device. Resulting images were stored as raw data in the Carl Zeiss proprietary image pyramid format (CZI) with an object-related nominal pixel size of 0.227 µm × 0.227 µm.

Whole slide images of DAB and H/E stainings were captured in transmitted light mode using a Slide Scanner (Pannoramic Scan 2, 3DHISTECH, Budapest, Hungary) with a 20/1.6 Plan-Apochromat objective and stored as raw data in the MIRAX Virtual Slide Format (MRXS) with an object-related nominal pixel size of 0.243 µm × 0.243 µm.

Tissue sections were visually investigated for lobular architecture (n = 3 per condition, analog to Supplementary Fig. 10). For that we used sequenced sections from immunofluorescence stainings for GLUL and Hoechst (pericentral hepatocytes and cell nuclei) and HAL and Hoechst (periportal hepatocytes and cell nuclei), complemented by observation of tissue structures (e.g., portal triads and central vessels) to identify the lobular vessel architecture and lobular borders. For the further imaging analysis, the Hoechst channel from the GLUL staining was used. Here, feature regions containing identifiable liver lobules were stored from Zeiss ZEN in 10,000 × 10,000 Pixel Tiles as Portable Network Graphic (PNG). Clearly identifiable lobules were donor and condition dependent resulting in n = 15 lobules for healthy, n = 12 lobules for regenerated and n = 10 lobules for embolized tissue, respectively. Fiji (ImageJ) was used to preprocess the selected feature regions of the tissue. In a first step the coordinates of the lobule boundaries and the location of the central vein were set by hand and stored as “Comma Separated Values” (CSV) files for later imaging processing. Between each coordinate of the lobule boundaries and its corresponding central vein a portal-central axis results. The number of axes was at least six but finally dependent on lobule shape complexity and was tried to keep as close to six as achievable.Then a gaussian filtered version (sigma = 20) of the image was subtracted from the original to correct staining artifacts and normalize overexposed regions and the background. After that the image was stored as “Tagged Image File Format” (TIFF). The density investigation started with an approximate estimate of the nuclei centers and measurement of their spatial locations. For that an Otsu threshold was used to filter the detected nuclei and reduce the amount of nonspecific signals resulting in a binarized image. The feature region of the nuclei in the binarized image were processed with morphological filters to remove small white noise and fill holes. For that we use morphological opening and closing. Regions near to the center of a feature region are defined as foreground and regions far away to a feature region are defined as background. The distance transformation provides the pixels known as certain of a region that belongs to a nucleus. With a weighted threshold of the maximum distance transformation value we received sure foreground regions of an identified nucleus indicated as a marker. A connected component analysis labels these regions with any positive integers to separate these from the background. Processing these markers with a marker-based watershed method allows to allocate the unclassified regions of the image belonging to nuclei regions or not. The resulting region property provides sure coordinates of spatial locations of nuclei.

To get the spatial dependent location density we use the most popular spatial analysis technique called kernel density estimation (KDE). KDE is a statistical method and a non-parametric way to estimate the probability density function of a random feature and correlates it to features in its neighborhood. By correlating each feature with a kernel function and summing up all the weighted overlapping regions of the kernel we get the probability density as a level of spatial distribution of nuclei density. The kernel is chosen as a multivariate gauss function with a fixed bandwidth of 0.2 for all images to ensure comparability of lobules from different sections among each other. The bandwith was empirically determined by analyzing 3 of the control samples by using the scott algorithm for bandwidth estimation.

The resulting spatial density distribution of nuclei was plotted as a heatmap. In a next step we transferred the lobule coordinates into the heatmap and extracted the density values along the earlier defined portal-central axis of a lobule (n = 121 axes for healthy, n = 84 axes for regenerated tissue and n = 76 axes for embolized tissue). The portal-central density distribution of a condition was plotted as the mean density values along the portal-central axes including standard deviation.

Iterative indirect immunofluorescence imaging (4i) experiment

4i experiments were performed on 3μm thick formalin fixed paraffin embedded (FFPE) sections that were arranged on a large (110 × 75 mm) H1.5 glass plate (Schott Nexterion 1535661). The glass surface was silanized using 3-aminopropyltriethoxysilane (aptes). The tissue was non-covalently bound to the functionalized surface with 10% gluteraldehyde before deparaffinization with NeoClear and reductive Ethanol baths. A second fixation with 4% PFA was performed, followed by blocking of aldehydes with 50 mM NH4Cl. Heat induced epitope retrieval in Citrate Buffer (10 mM Citric Acid, 0.05% Tween20, pH 6.0) heated over 20 to 90 °C in a histological microwave (Milestone RHS1).

The glass plate with tissue sections was then mounted on a custom printed PLA superstructure (Prusa i3MK3S+ printer) to create in essence a single well SBS-format plate with an imageable area of 100 × 65 mm.

On this assembly iterative indirect immunofluorescence imaging (4i) was performed according to the method by G. Gut35 with volumes adapted to the well size. The plate was handled shielded from direct light until the imaging buffer was added. All incubations were done on an orbital shaker (160 rpm, 20 mm orbital diameter). In total, our assay comprised 36 antibodies and a nuclear stain.

The iterative immunofluorescence was done on a Nikon Ti2 automated microscope sided by a Crest X-light V3 spinning disc and a Lumencor Celesta light engine with a Nikon 20x water immersion NA 0.95 (MRD77200) to cover length scales from the millimeter to the micrometer scale (pixel size 325 nm) (Fig. 6a). The single sections were bounded in tiled regions to facilitate the later stitching of large images.

Image pre-processing (maximum intensity projection, camera baseline subtraction, shading correction, and stitching) was carried out with ImageJ batch processing macros.

4i data processing

Image data processing broadly followed the steps in ref. 52. Briefly, we first produced a rough mask to align each image. Then, we aligned full tissue stitched images by relying on the DAPI channel present in every cycle, using Elastix implementation in the SimpleITK python package53, with each image aligned to the preceding elution cycle. This was followed by image cropping and denoising using the scikit-image package54, followed by the production of a refined full tissue mask. Lastly, we performed background subtraction, using for images in each cycle both the preceding and succeeding elution cycles as a reference, weighted by the distance of the cycle to be corrected to each elution cycle.

4i data – pixel clustering and blood vessel detection

Images were converted into pixel-by-antibody matrices, and each image was additionally downsampled to 0.1% of the total pixels to facilitate handling and analysis. Following normalization, log-scaling, and standardization, the downsampled matrices were then used to obtain clusters of pixels using scanpy55. Leiden56 clustering was used with a resolution of 0.3, with 50 neighbors calculated on the top 10 principal components.

This clustering was then used to subset 80.000 pixels, proportionally weighted for each label, from each image. These pixels were then normalized, log-scaled and standardized. Data from the three samples was integrated using Harmony57, and the top 10 components were used for finding neighbors, followed by Leiden clustering with resolution 0.5. This was then propagated to the remaining pixels by assigning them with the label from the closest labeled pixel using euclidean distance.

This clustering resulted in a uniform labeling of pixels across the three images. Labels 8, 9, 10, and 11 were selected as being potentially associated with blood vessels. For all samples, the pixels representing the union of these labels were refined using scikit-image to remove small objects and smoothen the vessel mask. In addition, for the regenerating sample clusters, a blob detection algorithm was applied to resolve the centroids of vessels in very close proximity and segment them accordingly.

For each individual vessel, the mean signal for each antibody was calculated, as well as the eccentricity of the vessel’s shape. These vessels-by-features tables were used in scanpy to cluster the vessels in each sample to remove regions incorrectly identified as portal or central vessels, such as some tissue margins.

Lastly, to home in on the vessel proper, we used the nuclei segmentation mask (see next section) and scikit-image to find areas without nuclei. These regions were intersected with the vessel areas identified previously to obtain the focused vessel region.

4i data—nuclei-centered analysis

Nuclei masks for each sample were produced using cellpose58 based on the DAPI channel from the first elution cycle. These masks were used to define each nucleus’ position and the pixels assigned to them. Pixels corresponding to the cytoplasm were obtained by capturing a region of up to 5 pixels around each nuclei. In order to focus the analysis on the vessel microenvironment, only nuclei within a 300 pixel radius were considered for analysis.

The nuclear and cytoplasmic regions were used to collect various metrics. In addition to the mean antibody signal, normalized to a 0 to 1 interval, in each region, we also collected information on area and nuclear eccentricity, as well as the number of neighbors each nuclei had within a radius of 200 pixels. For the remaining analysis, the signal from the following antibodies was kept: CD45 (Abcam, ab10558), CLEC4M (Origene, CF810055), TROP2 (Invitrogen, PA5-47074), ACTA2 (Sigma, A2547), IGFBP7 (Invitrogen, PA5-47123), GLUL (Abcam, ab125724), CRP (Bethyl Laboratories, IHC-00613), LaminB1 (Abcam, ab76024), Catalase (Abcam, ab76024), and e-Cadherin (Abcam, ab11512).

Data for each sample was independently analyzed with scanpy55. All data was first standardized, followed by regressing out of the total antibody signal in the nuclei and cytoplasmic area, as well as the x and y coordinates of each nucleus in the tissue, to mitigate potential heterogeneities caused by uneven signal distribution across the large tissue area. A neighbors graph was calculated on all features, and this was used to generate a UMAP and perform Leiden clustering with resolution 0.8.

Nuclei were annotated into four major cell types based on a combination of these clusters and thresholds for various markers. This approach was chosen since the rarer immune cells, which were directly observable in the tissue from their CD45 signal, tended to group with other cell types, likely due to capturing neighboring signal from other co-locating cell types. For each cell type, a threshold of 0.5 standardized value was evaluated for different variables: hepatocytes were defined as having high Catalase, GLUL or CRP, high area, low eccentricity, and low CD45 signal; LSECs were defined as having high CLEC4M, CLEC14A, ISG15 or eccentricity, and low ACTA2 and TROP2; vessel stroma cells were defined as having high ACTA2, TROP2 or eccentricity, and low CD45; and immune cells were defined as having high CD45, low eccentricity and low TROP2. Since this still resulted in some ties, a label of immune cell was given precedence over hepatocytes, which in turn was given precedence over vessel stromal cells and LSEC. Lastly, apart from immune cells, the other labels were applied to each Leiden cluster based on a simple majority rule.

To annotate blood vessels as portal or central, we relied on the antibody signal of CRP (portal marker) or GLUL (central marker) in hepatocytes, and IGFBP7 (portal marker) and CLEC4M (central marker) in LSEC and stromal cells. For each nucleus, it was determined which marker (portal or central) was ranked higher in expression, and the nucleus was thus assigned that label. Then, two percentages were calculated for each vessel and each pair of portal/central markers, to determine the proportions of hepatocytes and LSEC+stromal cells that were portal or central. Lastly, vessels were classified as portal if the sum of both portal percentages was higher than that of the central percentages, and vice-versa.

For the cell type composition comparisons, a t-test was used with FDR correction, considering a significance threshold of 0.05. To avoid including nuclei not belonging to the microenvironment surrounding the vessel field proper (due to some identified vessels not originating from a perpendicular cut), only nuclei from vessels within 2 standard deviations of the mean log2 area were considered.

Single-cell RNA-seq computational analysis

Data processing after sequencing (Cell Ranger pipeline)

We used the Cell Ranger software (https://support.10xgenomics.com/single-cell-gene-expression/software/overview/welcome) to process the sequenced RNA libraries and generate gene expression count matrices for the analysis. We first transformed Illumina intensities, raw base call (BCL) files into reads using cellranger mkfastq. Next, we ran cell ranger count to align the reads to the human reference genome (GRCh38) using RNA-seq aligner STAR with default parameters (Supplementary Data 10). Uniquely mapped reads were based on barcodes and unique molecular identifiers (UMIs) assigned to cells and genes (ENSEMBL release 84) respectively. Read counts for a given gene and cell that are represented by a Chromium cellular barcode and UMI were used as an input for the subsequent expression analyses.

Single-cell data filtering and normalization

Prior to any processing, scrublet59 was used to assign a doublet score to all cells in each fresh tissue dataset. We used the R package Seurat (version 3.0)60 to process gene expression count matrices. We first applied SCTransform to normalize molecular counts, scale, and identify variable genes25 within each dataset separately. Cells in each sample were then finely clustered (Louvain algorithm, resolution = 10) and the average doublet score was calculated to identify small groups of similar doublets. Quality control filtering was done by applying the thresholds outlined in Supplementary Data 11.

Integration of single-cell data

Samples were analyzed in two groups: healthy only and all conditions (healthy, regenerating, and embolized). Both groups were integrated using CSS24. For the healthy data, integration was done using all common genes and the first 30 principal components. For integration of all samples, the top 3,000 variable genes from the healthy, regenerating and embolized samples, as well as the top 100 marker genes from each cell type identified in the healthy dataset were selected. These were used to do a PCA on the full data, of which the top 50 principal components were used. The genes considered allowed a coverage of the biological variability in all conditions and present populations, despite the unbalanced representation of cell types in each sequenced fraction (Hepatocytes and Non-parenchymal cells).

Dimensionality reduction, clustering, and annotation of single-cell data

Projection with UMAP34 and clustering, both for the healthy and for all combined datasets, was performed using all dimensions obtained from CSS.

For the healthy data, clusters were obtained using Louvain clustering with 0.9 resolution, and markers were detected for these populations using Seurat’s FindAllMarkers function (pseudocount.use = 0.1, logfc.threshold = 0.2, adjusted p < =0.05). Some clusters (9, 12, 19) were further individually subclustered to identify specific endothelial, T cell, and pDC/B cell populations, respectively. Annotation was done based on the general and subclustering identified markers, which resulted in some smaller clusters being merged under the same label. Cells were also grouped into five “major cell types” (Fig. 1b), and their marker genes were also calculated using Seurat’s FindAllMarkers function (pseudocount.use = 0.1, logfc.threshold=0.2, adjusted p < =0.05).

Clustering of combined datasets used the Louvain algorithm with 1.1 resolution. Markers were detected for the identified clusters using Seurat’s FindAllMarkers function (pseudocount.use = 0.1, logfc.threshold = 0.2, adjusted p < =0.05). Clusters 5, and 19, as well as clusters 6, 22, 26, were subclustered to identify more specific types of endothelial cells, macrophages, and T cells, respectively. All clusters and subclusters were annotated using their top marker genes, and resulted in some clusters being merged into the same cell population.

Identification of LSEC subtypes in the combined single-cell data

LSECs subsets were identified by reclustering previously annotated ECs in the combined dataset, followed by data renormalization. These cells were then filtered for non-endothelial and doublet populations based on previously reported LSEC marker genes6,9 and cells expressing genes from other cell types, respectively. Marker genes for each remaining cluster after filtering were determined using Seurat’s FindAllMarkers function (pseudocount.use = x, logfc.threshold = x, adjusted p < =0.05). This methodology allowed for the identification of bona fide LSECs (periportal, midzonal, and pericentral), as well as LSEC populations with unique expression profiles, cycling endothelial cells, lymphatic ECs and other non-LSEC ECs likely originating from the portal or central veins (Fig. 3).

Zonation signature creation in healthy, regenerating, and embolized tissue hepatocytes

Information on expression of previously established human and mouse zonation marker genes3,12 was used to identify portal- and central-zone hepatocytes within healthy and post-PVE samples. For each of the three conditions we generated a combined zonation expression signature based on portal and central expression markers. For each gene in both gene sets we calculated the z-normalized expression value across all cells. We then transformed the resulting expression values into the range of between 0 and 1 by subtracting the min expression and dividing by the maximum expression per gene across cells. For each zone-specific gene set we calculated the sum of the normalized gene expression values in a given cell. Per cell we generated combined expression signatures by adding the negative portal signatures to the central signatures. Based on these scores clusters in the UMAP were defined to be showing portal or central-specific expression signatures.

For this analysis healthy, regenerating, and embolized tissue hepatocytes were subset individually from the UMAP of combined datasets based on previously annotated cell type markers. Within each dataset, cells were renormalized using the SCTransform function in Seurat. Four PCs were then used to project cells in UMAP space.

Comparing healthy, regenerating, and embolized liver samples

For each annotated cell type, Seurat’s FindMarkers function was used to obtain the DE genes between pairs of conditions (Fig. 2f). A maximum of 10,000 cells was used for each condition. Additionally, in order to account for the sequencing depth, prior to calculating the DE genes for hepatocytes, the seqgendiff package61 was used to downsample the UMI counts for all conditions, taking the minimum median of UMI counts of the three conditions as reference (Supplementary Fig. 5b). Genes encoded in the Y chromosome were disregarded—male donors were only present in the regenerating and embolized conditions, as well as DE genes between conditions that have been detected as marker genes for other cell types that likely appeared due to ambient RNA contamination.

GO Term enrichment analysis

Enrichment for GO Terms was performed using the function enrichGO from the clusterProfiler62 package, using the all terms in the org.Hs.eg.db package database, with a q-value cutoff of 0.05. The genes considered for analysis were previously identified as DE with an adjusted p ≤ 0.05 and a logFC > 0.3. All genes tested for differential expression were used as a background set for the analysis. For plotting (Fig. 2k–n), significant GO Terms for each condition were clustered based on their gene similarities using hierarchical clustering, and then grouped into 6 clusters. The term with the lowest p value per cluster was chosen as a representative to be plotted.

Identification and comparison of hepatocyte and LSEC zonation across medical conditions

A steady-state transcriptomic zonation reference was established by identifying a latent axis ordering healthy hepatocytes (Fig. 3) and LSECs (Fig. 4) independently, using DiffusionMaps from the Destiny package63.

For hepatocytes, contaminating non-hepatocytes were first filtered, followed by renormalization and PCA. The first 4 PCs were used as input for DiffusionMaps, and the ranked DC1 dimension, normalized to values between 0 and 1, was defined as the healthy hepatocyte pseudozonation trajectory. Genes varying along this trajectory were determined by parametric ANOVA on a Generalized Additive Model meant to predict gene expression dependent on the pseudozonation trajectory, modeled as a natural spline with 3 degrees of freedom.

LSEC zonation was determined by applying DiffusionMaps to the first 10 PCs of bona fide LSEC populations—Periportal, Midzonal, and Pericentral LSECs (Fig. 4). This first step identified a few outlying cells. These were removed, and the remaining data was renormalized and projected with PCA. DiffusionMaps was run on the first 10 PCs, and the identified DPT was used as a pseudozonation trajectory, after ranking and normalization to the 0–1 interval. Genes varying along this trajectory were determined similarly to those for hepatocytes.

Comparison of hepatocyte and LSEC zonation in the regenerating and embolized liver to the established healthy reference was done by selecting the top 1000 varying genes in the healthy pseudozonation and used them to train a generalized additive model to predict the pseudozonation variable (Supplementary Figs. 8a and 11a). This model used a beta distribution for error modeling with a logistic link function, which guaranteed that the predicted trajectory would be in the interpretable 0–1 range. In each condition, varying genes were determined as described above.

Genes varying in the hepatocytes were determined as differing between pairs of conditions if their Spearman’s rank correlation coefficient was lower than 0.3. The fitted expression of these genes was clustered using Euclidean distance and the ward.D2 method for healthy vs regenerating and healthy vs embolized, to identify groups of genes differing in similar ways (Fig. 3i–l).

LSEC varying genes were compared between conditions using Spearman’s rank correlation on the fitted values. A correlation coefficient greater than 0.3 indicated a similar behavior between conditions, whereas values below that were considered a different or opposite behavior between conditions. To illustrate this, Supplementary Fig. 11e shows the top 30 similar genes of healthy vs regenerating and healthy vs embolized (PCC > = 0.3), as well as the top 35 different genes (PCC < 0.3) of the same comparisons; each group was obtained by clustered using Euclidean distance and ward.D2 method.

Identifying cell-cell communication events in healthy, regenerating, and embolized liver

Ligand-receptor pairs mediating cell-cell communication events were detected within each condition using CellPhoneDB (version 2.0)33, based on the annotated cell types from the complete data integration (Fig. 5b). The detected ligands and receptors were then used to create two types of projections summarizing cell-cell communication in the healthy and post-PVE liver. We projected a graph showing all correlations greater than 0.3 between all ligands and receptors using multidimensional scaling (Fig. 5e), and summarized in the same coordinate space each cell type as the median coordinates of the ligands and receptors that are expressed in it at the highest level. We also projected the mean expression per cell type and condition of all ligands and receptors using UMAP (Fig. 5f), and identified the interactions that are unique to healthy or both PVE conditions.

Detecting enriched types of variable interactions per condition

For each interaction, in each condition, we obtained a vector encoding whether an interaction was detected in a given pair of cell types. We used these vectors to calculate the mutual information between healthy and regenerating and healthy and embolized samples for each interaction. The resulting values were then used for Gene Set Enrichment Analysis64 to determine enriched or depleted types of interactions (Fig. 5g). Interaction types were manually annotated based on literature searches, and can be found in Supplementary Data 7.

Single-nucleus RNA-seq computational analysis

Single-nucleus RNA-seq data filtering, normalization and clustering analysis

We used the R package Seurat (version 3.0)60 to process gene expression count matrices. We first applied SCTransform to normalize molecular counts, scale and identify variable genes25. After manual inspection we applied per sample minimum and maximum thresholds on the number of detected genes in a given nucleus to exclude both nuclei with low RNA content and potential doublets (sn_H1: >150 and <1700 detected genes; sn_H3: >200 and <2000 genes; sn_H4: >70 and <1100; sn_R2: and sn_E2: >150 and <1200 genes). In addition, we excluded nuclei with more than 10% of UMIs aligning to mitochondrial genes. The number of nuclei used in the analysis for each condition is provided in Supplementary Data 12.

Integration of the single-nuclei datasets using batch effect correction

Sample-specific preprocessed datasets were merged based on the 3000 most variable genes Pearson residuals.

Cell type identification analysis in the merged single-nuclei healthy datasets

UMAP was used to represent the similarity of gene expression profiles between nuclei in 2D. The clustering of the healthy nuclei data revealed 5 major clusters. Clusters were assigned to cell types based on the presence of cell-type marker genes that showed a significantly higher expression in a given cluster. DE was performed using the Wilcoxon Rank Sum test between each cluster and remaining clusters.

Identification of zonation within hepatocytes and LSECs of fresh and frozen healthy liver tissues

We used the expression of previously established human and mouse zone-specific marker genes3,9,12 to identify portal and central hepatocytes (Supplementary Fig. 3b, c) as well as periportal and pericentral LSECs (Supplementary Fig. 4a, b) in both fresh and frozen tissue datasets. Portal and central expression signatures were calculated separately across these marker genes in each of the two cell types and datasets. Z-normalization was done per gene and the portal and central scores represented the sum across normalized portal or central marker gene expressions in a given nucleus or cell. The signature was then shown on UMAP embeddings of each cell type and processing protocol. For the frozen tissue dataset, a UMAP embedding containing previously annotated cell types was used. For the fresh tissue dataset hepatocytes and endothelial cells were projected using the top 4 or 15 principal components, respectively. Portal and central signatures were then used to define portal and central groups of cells. DE genes were identified between these sub-clusters in fresh and frozen hepatocytes and LSECs, respectively, by using FindMarkers function in Seurat with logFC threshold being at 0 and other default parameters. Fold changes between both zonation-linked DE analyses were significantly correlated for hepatocytes (Spearman’s rho = 0.19, p = 2.7 × 10−21) (Supplementary Fig. 3d) and LSEC (Spearman’s rho = 0.2, p = 4.6 × 10−27) (Supplementary Fig. 4c), suggesting that the tested sub-clusters between cells and nuclei shared a portal and central pattern of zonation.

For the hepatocytes the following genes HAMP, CRP, SDS, NAMPT, HAL, ID1 were identified as being higher expressed and were shared between both datasets in portal sub-clusters (logFC > 0.4 for fresh and logFC > 1 for frozen) (Supplementary Fig. 3d, e). Conversely, CYP2E1, CYP3A4, IGFBP1, BAAT, SLCO1B3, KLF6 showed higher expression in central sub-clusters of both datasets (logFC < −0.4 for fresh and logFC < −1 for frozen) (Supplementary Fig. 3d, e).

For the LSECs, VIM, EMP1, SPRY1, LMNA showed particularly high expression in peri-portal subclusters (with the logFC > 0.9 for fresh and logFC > 2 for frozen) (Supplementary Fig. 4c, d) and CLEC4G, STAB1, CLEC4M, OIT3, CTSD, LYVE1, CTSL, CD14 were significantly higher expressed in peri-central clusters (Supplementary Fig. 4c, d) in both fresh and frozen tissue datasets. Cell-cell interactions involving known genes related to angiocrine function used in Supplementary Fig. 14 have been obtained from65,66,67,68.

Zonation-specific protein expression validation using Human Protein Atlas

Healthy liver immunostaining images (Supplementary Fig. 3f) were downloaded from the Human Protein Atlas69 for expression information analysis of genes showing portal and central specific expression in our dataset.

Statistics and reproducibility

Number of replicates is stated throughout the main text and in figure legends, as well as panels illustrating the study design (Fig. 1a; Fig. 2a). No statistical method was used to predetermine sample size, which was subject to human sample availability. No data were excluded from the analyses. The various statistical tests used are detailed throughout the main text, methods, and figure legends. A p < = 0.05 was considered statistically significant for the various statistical tests performed. The experiments were not randomized. The Investigators were not blinded to allocation during experiments and outcome assessment.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.