{"id":478303,"date":"2024-01-05T19:00:00","date_gmt":"2024-01-06T00:00:00","guid":{"rendered":"https:\/\/platohealth.ai\/high-dimensional-phenotyping-to-define-the-genetic-basis-of-cellular-morphology-nature-communications\/"},"modified":"2024-01-06T17:33:51","modified_gmt":"2024-01-06T22:33:51","slug":"high-dimensional-phenotyping-to-define-the-genetic-basis-of-cellular-morphology-nature-communications","status":"publish","type":"post","link":"https:\/\/platohealth.ai\/high-dimensional-phenotyping-to-define-the-genetic-basis-of-cellular-morphology-nature-communications\/","title":{"rendered":"High-dimensional phenotyping to define the genetic basis of cellular morphology – Nature Communications","gt_translate_keys":[{"key":"rendered","format":"text"}]},"content":{"rendered":"
To study associations between genetic variants and morphological traits, we assembled a cohort of iPSC lines from 297 unique donors for which we generated image-based profiling and whole-genome sequencing data (Fig. 1<\/a>). We obtained pre-derived cell lines from the CIRM iPSC repository5<\/a><\/sup>. Age, sex, medical history, ethnicity, and relatedness to other samples were recorded using questionnaires at time of enrolment and sample collection. Each iPSC line was subjected to a pluripotency test as well as genotyping to identify any abnormal karyotypes. Upon receipt, we expanded and cryo-banked each iPSC line, and performed genotyping (using the Global Screening Array (GSA)) and 30X whole-genome sequencing (WGS) on all lines. Any cell lines displaying abnormal karyotypes or genomic rearrangements >1\u2009Mb were excluded from our study. The final cohort used in this work included 297 distinct donors of which 153 were male and 144 were female, with an average age of 21\u2009\u00b1\u200910 (sd) years. Of the 297 donors, 207 had self-reported ancestry of European and 90 individuals reported non-European (Table 1<\/a>). IPSC lines were generated from B-cells or fibroblasts using a non-integrating episomal vector system previously described5<\/a><\/sup> (Table 1<\/a>). All donors included in this study have been properly consented for iPSC derivation, the experiments performed in this work, and genomic data sharing. We performed a principal component analysis (PCA) to observe the genetic diversity of cells utilized in our collection (Fig. S1A<\/a>). A summary breakdown of our cohort is included in Table 1<\/a> and individual cell line level metadata is included Table S1<\/a>.<\/p>\n iPSC lines from 297 donors were expanded, quality-control checked and then subject to both high-throughput imaging with Cell Painting and 30X whole-genome sequencing (WGS). Overall, we imaged 5.1 \u00d7 106<\/sup> cells across all donors and quantified 3418 morphological traits per cell using CellProfiler software. We inferred genetic variants from the WGS data and investigated whether individual morphological traits associated with both rare and common variation.<\/p>\n<\/div>\n<\/div>\n To quantify cellular traits, we adopted the Cell Painting assay17<\/a>,18<\/a><\/sup>. This multiplexing dye assay uses six stains to capture morphological characteristics for eight cellular compartments: Hoechst 33342 (DNA), wheat germ agglutinin (WGA) (golgi and plasma membrane), concanavalin A (endoplasmic reticulum), MitoTracker (mitochondria), SYTO 14 (nucleoli and cytoplasmic RNA), and phalloidin (actin). Images are processed using the open-source CellProfiler software to extract thousands of features of each cell\u2019s morphology such as shape, intensity, and texture statistics, thus forming a high-dimensional profile for each single cell21<\/a><\/sup>.<\/p>\n We generated Cell Painting data from all 297 donors leveraging a systematic workflow to ensure cells were treated in identical fashion across all rounds of imaging. Cell lines were thawed in batches of 48 and passaged 3 days later into a 96-well deep well plate before being transferred into a 384-well screening plate using an automated liquid handling device (Fig. S1B<\/a>, Methods). Cells were plated at a density of 10k cells\/per well and fixed 6 h post-plating, so as to allow for cell attachment while minimizing differences in cell growth rates, which we observed during cell line expansion (Fig. S1C<\/a>). We determined these conditions through a pilot screen that contained 6 cell lines plated across various densities and fixation timepoints, which showed we could maximize differences between cell lines under these parameters (Fig. S1D<\/a>). Each screening plate was stained with the standard Cell Painting dyes and imaged on a Perkin Elmer Phenix automated microscope within 48 h. We implemented this same workflow across all rounds of imaging.<\/p>\n Images were processed using CellProfiler to measure morphological traits and construct single-cell image-based profiles (Methods21<\/a><\/sup>,). In total, we measured 3418 morphology traits for 5.1 million iPSCs from 297 donors after stringent QC (Methods, Fig. S2A<\/a>, Table S2<\/a>). We classified all morphological traits based on the cellular characteristics they represented, yielding five categories: Area and shape, Granularity, Intensity, Radial distribution, and Texture (Fig. 2a<\/a>). Prior studies have shown that cells often displayed varied morphology in response to environmental cues and context16<\/a><\/sup>. To explore whether the contribution of genetic variation to cell morphology is context dependent, we segregated all cells into two groups based on whether they had any cells in contact (called colony cells, 97.48% of all cells) or not (called isolate cells, 2.52% of all cells) (Fig. S2B<\/a>). We note that for the purposes of our study, \u201ccolony\u201d refers to the number of neighbors a given cell has and is distinct from the colony terminology which is often used in basic stem cell culture practices.<\/p>\n a<\/b> Summary of five categories of morphological traits captured in our data (n<\/i>\u2009=\u20093418). b<\/b> Explained variance across all morphological traits (n<\/i>\u2009=\u20093418). Data is presented in a Tukey-style boxplot with the median (Q2) and the first and the second quartiles (Q2, Q3) and error bars defined by the last data point within \u00b11.5-times the interquartile range. c<\/b> Exploring explained Variation in individual traits, namely distribution of mitochondria around nucleus, cytoplasmic Zernike shape metric 9_3, and cytoplasmic granularity in the RNA channel at scale 3, showed differences in sources of variance, including technical effects such as plate and well of imaging, whether the well was situated on the row or column on the edge of plate (onEdge), biological sources such as donor. Donor ID represents the difference among donors after accounting for their age, sex, disease-status and above-mentioned imaging-related technical factors. Residual is the remaining unaccounted variation in traits.<\/p>\n<\/div>\n<\/div>\n We next performed 30X whole-genome sequencing (WGS) on all iPSC lines. Following quality control (QC, see Methods), we retained 7,020,633 common (minor allele frequency (MAF)\u2009>\u20095%) and 122,256 rare (MAF\u2009<\u20091%) variants for downstream analyses.<\/p>\n Previous studies have shown that technical factors, including plate and well position can alter morphology-based readouts22<\/a><\/sup>. To explore the presence of cmQTLs in our data, we sought to identify technical factors which may confound our morphological phenotypes and remove these sources of variance from our downstream association tests. We performed a variance component analysis using well-level data to quantify the observed variance that can be attributed to each morphological trait by technical factors and cell line characteristics (Methods). We assessed the significance for each variance component, correcting for the number of tests, which was the product of the traits (n<\/i>\u2009=\u20093418) and factors (n<\/i>\u2009=\u20099) which include technical features such as imaging plate, well position, the number of cell neighbors, and whether the well was positioned on the edge of the plate (onEdge) in addition to demographic characteristics for the cell lines including genetic sex, reprogramming sample source, age of donor at the time of sample collection, and the clinical diagnosis for our tissue donors. We observed strong batch effects across imaging plates, which contributed the greatest degree of variance to our morphology traits (61.8\u2009\u00b1\u200917%, Fig. 2b<\/a>, Fig. S3A<\/a>). Several other confounders contributed varying levels of effect on different morphological traits (Fig. 2c<\/a>). After correcting for these covariates, the remaining difference among cell line donors was significantly associated with all traits, explaining 16.7\u2009\u00b1\u200911% of the variance. (Fig. 2b<\/a>). This indicated the potential for a genetic basis to the variability in morphology traits. Residual is the remaining (technical) variance in morphological traits which is unexplained by the factors discussed above. Interestingly, the difference among donors explained a greater degree of variance in the trait category of AreaShape relative to the other trait categories (Wilcoxon rank sum test P<\/i>\u2009=\u20091.1 \u00d7 10\u221255<\/sup>, Fig. S3B<\/a>). We note that some of the shared variance may be explained by non-genetic factors, such as stable epigenetic modifications. We observed that many traits had very high pairwise correlation (Pearson r<\/i>\u2009>\u20090.9) with one or more traits (Fig. S3C<\/a>). To reduce redundancy in our downstream analyses, we selected a common set of 246 traits having r<\/i>\u2009<\u20090.9 with each other by iteratively selecting a single representative trait for the set of correlated traits (r<\/i>\u2009>\u20090.9) (Methods). We refer to this common set of 246 traits as \u201ccomposite traits\u201d, which were used for our rare and common variant association tests (Table S3<\/a>). We next summarized well-level morphology data into donor-level values (i.e., pseudo-bulk) by mean-averaging individual morphology traits across all wells for a given donor, resulting in one measurement per trait per donor (N<\/i>\u2009=\u2009246 traits and 297 donors) for both isolate and colony cells. These donor-level trait values were used for our quantitative association tests.<\/p>\n Sequencing studies have identified hundreds of genes containing rare coding variants with association to disease burden23<\/a>,24<\/a>,25<\/a>,26<\/a><\/sup>. These variants often have large effect sizes but explain a modest degree of total disease heritability27<\/a><\/sup>. To explore the effect of rare genetic variation on cellular morphology, we analyzed the association of composite traits (n<\/i>\u2009=\u2009246) with gene-level burden of protein-altering rare variants (MAF\u2009<\u20090.01). To ensure well-powered investigation, we examined 9105 genes that had rare variants in at least 2% of donors (n<\/i> >= 6). We modeled individual morphology traits as a function of rare protein-altering variant burden in a gene, controlling for plate, well, and donor sex using linear regression (Methods). We performed our analysis separately for both colony and isolated cells. We identified 4 genome-wide significant associations between morphological traits and rare variant burden (P<\/i>\u2009<\u20092.2 \u00d7 10\u22128<\/sup>, Bonferroni correction for 246 traits and 9105 genes) (Fig. 3a<\/a>). These associations included one trait in colony cells and three traits in isolate cells. We did not observe any inflation in association statistics for these traits (Lambda (\u03bb) = 1.01 for the association in colony cells and \u03bb\u2009=\u20091.01, 0.96, 0.98 for the associations in isolate cells) (Fig. S4A<\/a>). While the top feature associations (using a stringent Bonferroni correction) are quite different between isolate and colony cells, there is a modest overall correlation between the associations of morphology traits and genetic variants (Fig. S4B<\/a>).<\/p>\n a<\/b> Manhattan plot showing association between morphological traits (n<\/i>\u2009=\u2009246) and rare variant burden in candidate genes (n<\/i>\u2009=\u20099105). Black dotted line represents the p<\/i>-value threshold after Bonferroni correction for the number of tested traits and genes (P<\/i>\u2009=\u20090.05\/246×9105, i.e. 2.2 \u00d7 10\u22128<\/sup>). Gray dotted line represents the p<\/i>-value threshold for suggestive evidence of association (P<\/i>\u2009=\u200910\u22126). b<\/b>\u2013d<\/b> Box plots displaying the association between the Zernike_9_3 cytoplasm shape metric and rare variant burden in WASF2<\/i> gene (b<\/b>), distribution of mitochondria around the nucleus and rare variant burden in PRLR<\/i> gene (c<\/b>) and cytoplasmic granularity measure in the RNA channel and rare variant burden in TSPAN15<\/i> gene (d<\/b>). We provide the effect size (\u03b2 estimate) and raw p<\/i>-value of the association for each trait. Data is presented in a Tukey-style boxplot with the median (Q2) and the first and the second quartiles (Q2, Q3) and error bars defined by the last data point within \u00b11.5-times the interquartile range (n<\/i>\u2009=\u2009297 unique cell lines).<\/p>\n<\/div>\n<\/div>\n Rare variant burden in WASF2<\/i> was negatively associated with a Zernike shape measure of the cytoplasm (Cytoplasm_AreaShape_Zernike_9_3<\/i>) in colony cells (n<\/i>\u2009=\u20093 missense and 1 in-frame deletion rare variants, \u03b2 or effect size (se)\u2009=\u2009\u22121.24 (0.18), P<\/i>\u2009=\u20093.1 \u00d7 10\u221210<\/sup>; Fig. 3b<\/a>). Zernike features represent polynomial reconstructions of an organelle or object of cells. WASF2<\/i> is named for its association with Wiskott-Aldrich syndrome, a rare genetic disorder which greatly increases the risk of various cancers (28<\/a>,29<\/a>,30<\/a><\/sup>, and31<\/a><\/sup>). WASF2<\/i> protein binds profilin, a G-actin-binding protein, promoting the exchange of ADP\/ATP on actin and the formation of actin filament clusters32<\/a>,33<\/a><\/sup>. The disruption of WASF2<\/i> impairs actin formation and organization that could lead to their polarized distribution and spindle-shaped cells34<\/a><\/sup>. In representative images of cells with rare variants in WASF2<\/i> it is difficult to identify this polarized and spindle-like shape by eye when compared to reference lines (Fig. S4C<\/a>). In addition to our genome-wide association, rare variant burden in WASF2<\/i> had nominal association (P<\/i>\u2009<\u20090.05) with 90 other traits including 27 traits of area and shape category, suggesting WASF2<\/i> may contribute to a range of cell morphological characteristics (Table S4<\/a>).<\/p>\n Three traits were significantly associated with rare variant burden in the PRLR<\/i> gene (n<\/i>\u2009=\u20096 missense rare variants, \u03b2 (se)\u2009=\u2009\u22121.17 (0.2), P<\/i>\u2009=\u20091.2 \u00d7 10\u22128<\/sup>; Fig. 3c<\/a>). The most interesting among these included asymmetries in the distribution of mitochondria in the perinuclear space (Cells_RadialDistribution_RadialCV_Mito_1of4)<\/i>. PRLR<\/i> function has been linked to several forms of cancers, including breast cancer and lymphoma (Kavarthapu et al.35<\/a><\/sup>, L\u00f3pez Fontana et al.36<\/a><\/sup>, Gharbaran et al.37<\/a><\/sup>). PRLR<\/i> encodes membrane-anchored receptors for a prolactin ligand and is a part of the class-I cytokine receptor superfamily and regulator of JAK-STAT5 pathway activity, regulating autocrine\/paracrine loops present in stem cells, which mediate their quiescence and proliferation38<\/a>,39<\/a><\/sup>. Previous findings in adipocytes showed PRLR<\/i> knockout alters mitochondrial packing and distribution throughout the cell40<\/a><\/sup>. Moreover, rare variant burden in PRLR<\/i> had nominal association (P<\/i>\u2009<\u20090.05) with 118 other traits, providing more support to PRLR<\/i> as a genetic determinant of cellular morphology (Table S5<\/a>).<\/p>\n We also inspected the associations with suggestive evidence, i.e., P<\/i>\u2009<\u200910\u22126<\/sup>. There were a total of 12 and 13 associations in colony and isolated cells, respectively, which passed this threshold (Table S6<\/a>). One of the strongest associations in our suggestive results was between the distribution in size of RNA particles in the cytoplasm (Cytoplasm_Granularity_3_RNA<\/i>) and rare variant burden in TSPAN15<\/i> gene (n<\/i>\u2009=\u20092 missense and 1 splice region rare variants in the gene, \u03b2 (se) = 0.9 (0.17), P<\/i>\u2009=\u20093.7 \u00d7 10\u22127<\/sup>; Fig. 3d<\/a>). TSPAN15<\/i> is expressed in all human tissues and encodes for a cell surface protein41<\/a><\/sup>. A member of the tetraspanin family of transmembrane segments, TSPAN15<\/i> has been implicated in tumor-related conditions (Huang et al.42<\/a><\/sup>). TSPAN15<\/i> plays a role in cell activation and self-renewal through negative regulation of Notch-signaling43<\/a><\/sup>. Disruption of TSPAN15<\/i> could lead to increased cell proliferation and transcriptional activation which may be represented in our data by an increase in the measurable RNA content in the cytoplasm.<\/p>\n To ensure our observed associations were not an artifact of our nonparametric regression model, we permuted the data by randomly assigning trait values across donors. These results suggested that our observed significance of association between rare variant burden and cell morphological traits was unlikely to have occurred by chance (Fig. S4D<\/a>). To confirm that our observed associations were not driven by somatic variation introduced during iPSC reprogramming or those which arise in cell culture (despite the short culture time in our study), we repeated our association test while restricting to only those variants that were previously observed in the gnomAD database (106,590 of 122,256 variants)44<\/a><\/sup>. All of our observed associations were recapitulated (significant after Bonferroni correction for multiple testing and with suggestive evidence) with concordant effect size and statistical significance (p<\/i>-value) (Fig. S4E<\/a>). Taken together, our findings suggest we could reliably identify associations between morphological traits and rare protein-coding variants.<\/p>\n CRISPR-based gene editing has been shown to be a viable mechanism for validating gene expression phenotypes resulting from rare variation45<\/a><\/sup>. To corroborate our rare-variant burden associations, we examined whether knockdown of these genes impacted the same morphological traits for which we identified a rare-variant burden association. We transfected iPSCs from a single cell line expressing constitutive dCas9-KRAB CRISPRi machinery with sgRNAs targeting the transcriptional start site (TSS) for WASF2<\/i>, PRLR<\/i>, and TSPAN15<\/i> (Fig. 4a<\/a>). Each gene was targeted by 2 different sgRNA sequences, which were validated for knockdown of their target gene (25-95% efficiency) (Fig. S5<\/a>, Table S7<\/a>). Cells transfected with non-targeting sgRNAs were included as controls. We generated per-well (population-averaged) morphological profiling using the same methods for our discovery cohort. We compared the morphological trait values for our rare-variant associations between non-targeting sgRNAs and those targeting our genes of interest. For each gene tested, we observed the predicted changes, and in the same direction, for each individual trait relative to controls (n<\/i>\u2009=\u200928 wells per targeting sgRNA and 52 wells per non-targeting sgRNA, Welch\u2019s two sample T<\/i> test, P<\/i>\u2009<\u20092.2 \u00d7 10\u221216<\/sup>) (Fig. 4b\u2013d<\/a>). Specifically, knockdown of WASF2<\/i> resulted in a decreased normalized score for the trait Cytoplasm_AreaShape_Zernicke_9_3<\/i> (Fig. 4b<\/a>). We observed that a reduction in the expression of TSPAN15<\/i> coincided with an increase in trait score for Cytoplasm_Granularity_3_RNA<\/i> (Fig. 4d<\/a>). Lastly, knockdown of PRLR<\/i> decreased Cells_RadialDistribution_RadialCV_Mito_1of4<\/i>, which defines the relationship between the radial distribution of mitochondria around the nucleus (Fig. 4c<\/a>). This effect is highlighted in representative images, whereby cells transfected with a PRLR<\/i> targeting sgRNA displayed more uniform distribution of mitochondria around the nucleus when compared to non-targeting sgRNA cells where mitochondria tend to colocalize to one side of the nucleus (Fig. 4e<\/a>).<\/p>\n a<\/b> Workflow for knockdown of rare-variant genes using CRISPR interference in iPSCs expressing constitutive dCas9-KRAB. b<\/b>\u2013d<\/b> Box plots displaying quantification of traits between control non-targeting sgRNAs and sgRNAs targeting WASF2<\/i>, TSPAN15<\/i>, and PRLR<\/i> on a per-well level (n<\/i>\u2009=\u200952 wells per non-targeting sgRNAs, n<\/i>\u2009=\u200956 wells per targeting sgRNAs, P<\/i>\u2009<\u20092.2 \u00d7 10\u221216<\/sup>, Welch\u2019s two-sided T<\/i> test)<\/i>. Effect on the trait score is consistent with what we observed in our rare-variant burden association. Data is presented in a Tukey-style boxplot with the median (Q2) and the first and the second quartiles (Q2, Q3) and error bars defined by the last data point within \u00b11.5-times the interquartile range. e<\/b> Representative image of an observable gene-trait association for PRLR<\/i>. Cells_RadialDistribution_RadialCV_Mito_1of4 relates to the asymmetric distribution of mitochondria in the ring right around the nucleus. In the non-targeting controls (left) we observed clustering of mitochondria on a particular side of the nucleus, whereas in the PRLR<\/i> knockdown sgRNA (right) we observed a more distributed presence of mitochondria around the nucleus.<\/p>\n<\/div>\n<\/div>\n Genome-wide association studies (GWAS) have identified thousands of common variants that are associated with common diseases and traits. These variants have small effect sizes at the individual level but combine to explain a large degree of common disease heritability (46<\/a>,47<\/a><\/sup>, O\u2019Connor et al.27<\/a><\/sup>). To identify common variants that are implicated in cell morphology, we performed 246 genome-wide association analyses, one for each composite trait. Each association was tested in colony and isolated cells separately (Fig. 5a<\/a>, Fig. S6C<\/a>). With our set of 297 donors, only one variant, rs315506, overlapping the chr17q11.2 locus, passed the genome-wide significance threshold (Bonferroni correction for 246 morphology traits, 5 \u00d7 10\u22128<\/sup>\/246\u2009=\u20092 \u00d7 10\u221210<\/sup>). rs315506 is an intergenic variant and was associated with spatial distribution of endoplasmic reticulum (ER) in the cytoplasm (Cytoplasm_RadialDistribution_RadialCV_ER_3of4<\/i>) in colonies (MAF\u2009=\u20090.08, \u03b2 (se)\u2009=\u2009\u22120.52 (0.08), P<\/i>\u2009=\u20091.4 \u00d7 10\u221210<\/sup>, Fig. S6A<\/a>). This variant also showed suggestive evidence of association (P<\/i>\u2009<\u200910\u22125<\/sup>\/246\u2009=\u20094.1 \u00d7 10\u22128<\/sup>) with spatial distribution of ER near the periphery of cells (Cells_RadialDistribution_MeanFrac_ER_4of4<\/i>). rs315506 lies in the center of a 400\u2009kb window containing the genes NF1<\/i>, CORPS<\/i>, UTP6<\/i> and SUZ12<\/i>. Chromosomal alterations on chr17q11.2 cause NF1 microdeletion syndrome, which has been shown to impair protein localization to the ER48<\/a>,49<\/a><\/sup>. To corroborate this observation, we analyzed the publicly available JUMP-Cell Painting data from U2OS cells that have perturbed NF1<\/i> and SUZ12<\/i> using CRISPR interference<\/a><\/div>\n
<\/a><\/div>\n
Cell line characteristics and technical factors drive variability in morphological traits<\/h3>\n
Rare variant burden for cell morphological traits<\/h3>\n
<\/a><\/div>\n
Functional validation of rare variant associations<\/h3>\n
<\/a><\/div>\n
Common variant associations for cell morphological traits<\/h3>\n