{"id":612748,"date":"2024-06-11T20:00:00","date_gmt":"2024-06-12T00:00:00","guid":{"rendered":"https:\/\/platohealth.ai\/visualization-of-incrementally-learned-projection-trajectories-for-longitudinal-data-scientific-reports\/"},"modified":"2024-06-12T23:23:15","modified_gmt":"2024-06-13T03:23:15","slug":"visualization-of-incrementally-learned-projection-trajectories-for-longitudinal-data-scientific-reports","status":"publish","type":"post","link":"https:\/\/platohealth.ai\/visualization-of-incrementally-learned-projection-trajectories-for-longitudinal-data-scientific-reports\/","title":{"rendered":"Visualization of incrementally learned projection trajectories for longitudinal data – Scientific Reports","gt_translate_keys":[{"key":"rendered","format":"text"}]},"content":{"rendered":"
IL-VIS (Fig. 1<\/a>) incrementally learns a high-dimensional trajectory (T^H)<\/span> representing the progression of electrophysiological properties of an organoid over n<\/i> sampling timepoints and visualizes it in 2D.<\/p>\n The (m(=9))<\/span> signals corresponding to the m<\/i> electrodes in the MEA at each sampling timepoint i<\/i>, where (1le i le n)<\/span>, are divided into segments of 4 s each. Each segment, denoted as (t^{i,j})<\/span> for the j<\/i>th segment at the i<\/i>th sampling timepoint, undergoes preprocessing using fast Fourier transformation (FFT)8<\/a><\/sup> (see \u201cMethods<\/a>\u201d). The collection of pre-processed (t^{i,j})<\/span> segments (forall j)<\/span> forms the i<\/i>th data increment, represented as (I_i)<\/span> (illustrated in Fig. 1<\/a>A). (I_i)<\/span> is then reduced to 2D using the incrementally trained dimensionality reduction model based on SONG (Fig. 1<\/a>B).<\/p>\n The model is trained for n<\/i> sessions, once for each sampling timepoint. At session i<\/i>, (I_i)<\/span> is combined with the existing data ((bigcup _{k=1}^{i-1}I_k)<\/span>), and the combined dataset (D_i = bigcup _{k=1}^{i}I_k)<\/span> is used to train the existing model parameters (theta _{i-1})<\/span> to produce updated parameters (theta _{i})<\/span>. (theta _{i})<\/span> is then used to generate a 2D visualization (V_i)<\/span> that visualizes all the increments from (I_1)<\/span> to (I_i)<\/span> (Fig. 1<\/a>C). The median of each (I_i)<\/span> in (V_i)<\/span> is traced from 1 to i<\/i> in order to obtain (T^L)<\/span> that visualizes the progression of the organoid\u2019s electrophysiological properties over time. If n<\/i> is sufficiently high, we would expect the consecutive and continuous data increments from an organoid to form a gradual progression and appear seamlessly connected, generating a continuous visualization. In the absence of such a continuous stream of data, a good visualization method should identify the trajectory progression even using discrete increments.<\/p>\n When visualizing the electrophysiological progression of a single organoid, the direction of the trajectory cannot be compared to another independently visualized organoid. Therefore, we extend IL-VIS to jointly model multiple organoids, allowing us to examine the directions of trajectories compared to each other and identify (dis)similarities in the electrophysiological progression of the corresponding organoids (see Fig. 6<\/a>A\u2013D, explained later).<\/p>\n In subsequent sections, we report two sets of results using (1) simulated data and (2) experimental MEA data obtained from cortical organoids.<\/p>\n Several studies have observed a gradual and non-linear increase in the electrical activity of brain organoids as they mature, characterized by parameters such as spike frequency, bursts, and synchrony9<\/a>,10<\/a><\/sup>. This increase in activity is indicative of the development of more complex and mature synaptic connections among neurons and stronger electrical transmission within the organoids11<\/a><\/sup>. We verify IL-VIS\u2019s ability to visualize this gradual but non-linear progression of electrophysiological properties as captured in high-dimensional data over time, in a 2D representation. However, it is possible that the actual progression of the electrophysiological properties may not show a gradual progression. This could be due to various factors such as developmental pathology, the influence of a disease, or administered therapeutic stimuli. After confirming that IL-VIS can accurately capture the expected gradual increase trend in healthy organoids, we can use IL-VIS to identify deviations from these trends resulting from the factors described above. Next, we validate IL-VIS\u2019s capability to capture the (dis)similarities between multiple progression trajectories. This feature allows for visualization of the effect of different treatments on the electrophysiological progression of multiple organoids relative to each other.<\/p>\n We generate simulated data for which the corresponding high-dimensional trajectory (T^H)<\/span> of the visualized trajectory (T^L)<\/span> is known. Specifically, we formulate a set of non-linear high-dimensional trajectories {(T^H_r)<\/span> where (rvarepsilon {1,…,4})<\/span>} that gradually progress with time. The details related to the formation of (I_i)<\/span>, (1 le ile n)<\/span> in each (T^H_r)<\/span> can be found in \u201cMethods<\/a>\u201d.<\/p>\n Figure 2<\/a> shows the visualizations obtained using PCA and UMAP in place of SONG in IL-VIS in which PCA fails to capture the non-linearity present in data, while UMAP fails to generate evolving visualizations across the sessions. Additional experiments with Independent Component Analysis (ICA), Multi-Dimensional Scaling (MDS), and t-distributed Stochastic Nonlinear Embedding (t-SNE) are provided in the supplementary Fig. 2<\/a>. These experiments demonstrate that they either fail to preserve the non-linearity, generate evolving visualizations (by preserving the orientation of the trajectories), or both. In contrast, by using SONG, as shown in Figs. 3<\/a> and 4<\/a>, visualizations successfully capture the gradually increasing trend and the non-linearity of the high-dimensional trajectories (see Supplementary Fig. 1<\/a> for pairwise Euclidean and geodesic distance heatmaps between the increments in (T^H_r)<\/span> and (T^L_r)<\/span>). Significantly, IL-VIS also generates evolving visualizations, preserving the relative structure between the increments or the orientation across incremental visualizations, confirming that IL-VIS could be used for gaining insights from ongoing experiments. This ability stems from two key factors. Firstly, SONG operates as a parametric method, and secondly, IL-VIS preserves and reuses these parameters throughout iterations. To elaborate further, we conducted an additional experiment in which we reset the parameters at the beginning of each session, intentionally excluding the carryover of parameter values from the previous session. The resulting visualizations are shown in Supplementary Fig. 3<\/a>. Notably, across sessions, the orientation of the trajectories is not preserved when the parameters are not carried forward. While it is conceivable to manually reorient the trajectories of each visualization, this approach may be impractical for intricate trajectories, highlighting a limitation of this approach. This underscores the significance of parameter continuity in preserving the meaningful evolution of visual representations over time. Additionally, we verify the manifold preservation quality of the SONG embeddings in Supplementary Fig. 4<\/a>, and the method\u2019s capability to cater to an even higher dimensional space (up to 10,000) in Supplementary Fig. 5<\/a>.<\/p>\n 2D visualizations obtained by the pipeline when modeling (T^H_1)<\/span> using (A<\/b>) PCA, (B<\/b>) UMAP and (C<\/b>) SONG as the dimensionality reduction technique. (T^H_1)<\/span> was defined on a pseudo-temporal parameter t<\/i> (Eq. 2<\/a> in \u201cMethods<\/a>\u201d). (phi _i(t))<\/span> is noise sampled from a Gaussian distribution (see \u201cMethods<\/a>\u201d). (lambda )<\/span> is the percentage of noise added which is 20% for this experiment. Five increments of data points are formed ((n=5)<\/span>) for each trajectory with no sampling gaps ((delta =0)<\/span>) in between thus each increment contains 100 data points (see \u201cMethods<\/a>\u201d). N<\/i> corresponds to the number of data points used in each visualization. Each row contains the five visualizations generated at the five sessions where a new data increment is introduced at each session. At session i<\/i>, all the increments thus far (i.e., (D_i = bigcup _{k=1}^{i}I_k)<\/span>) are visualized. In visualizations with multiple increments, different increments are displayed in varying intensities of the same color. The stronger the intensity, the newer the increment. By tracing the medians of the subsequent increments chronologically, the emergence of (T^L_1)<\/span> which is a 2D representation of (T^H_1)<\/span> is observed. (A<\/b>) PCA generated progressive visualizations but failed to preserve the non-linearity of (T^H_1)<\/span>. (B<\/b>) UMAP preserved the non-linearity of (T^H_1)<\/span> but failed to generate progressive visualizations over the sessions. (C<\/b>) SONG preserved the non-linearity while generating progressive visualizations over the sessions.<\/p>\n<\/div>\n<\/div>\n To replicate random noise arising from biological and non-biological variations in real data, we add random Gaussian noise (20%) to each data increment in (T^H_1)<\/span> (see \u201cMethods<\/a>\u201d) and visualized in Fig. 3<\/a>B\u2013E. Despite the presence of this added noise, (T^L_1)<\/span> successfully preserves the trend and non-linearity of (T^H_1)<\/span> (Fig. 3<\/a>B\u2013E). However, as the noise level increases from 0 to 20%, there\u2019s a corresponding decrease in the smoothness of (T^L_1)<\/span> (Supplementary Fig. 6<\/a>).<\/p>\n Although the processes underlying the biological experiments are continuous, the recorded features are discrete and the sampling time points may have considerable gaps between them. To replicate this, we introduce minor, medium, and major gaps between the increments in (T^H_1)<\/span> (\u201cMethods<\/a>\u201d) and visualize them in Fig. 3<\/a>C-E. Despite such gaps, (T^L_1)<\/span> captures a gradually increasing and non-linear path along which (T^H_1)<\/span> progresses, suggesting that IL-VIS can handle datasets with considerable gaps and noise. When major gaps are introduced (where (delta =99)<\/span>), five different clusters are visualized each corresponding to an increment instead of a continuous progression (Fig. 3<\/a>E). This agreed with our expectations because IL-VIS had very limited information about continuity.<\/p>\n The progression of electrophysiological properties of multiple organoids with different exposures may correspond to different trajectories in the high-dimensional space. Capturing the (dis)similarities between these trajectories and visualizing them as they evolve in a shared space would offer insights into the (dis)similarities in their electrophysiological progressions. For instance, a comparison between the electrophysiological maturation of organoids exposed and not exposed to a particular treatment would assist in understanding the treatment\u2019s effect. We validate this capability of IL-VIS as below.<\/p>\n First, we jointly model and plot two independent randomly-generated non-linear trajectories ((T^H_1)<\/span>, and (T^H_2)<\/span>) designed to originate at the same position while diverging at higher t<\/i> values in Fig. 4<\/a>A. As the new increments are added, the corresponding low dimensional trajectories (T^L_1)<\/span> and (T^L_2)<\/span> exhibit an increasing difference, highlighting their growing dissimilarity in later increments. Next, we extend the experiment by incorporating two secondary trajectories (T^H_3)<\/span> and (T^H_4)<\/span>, along with the principal trajectories ((T^H_1)<\/span> and (T^H_2)<\/span>). (T^H_3)<\/span> and (T^H_4)<\/span> are created as combinations of (T^H_1)<\/span> and (T^H_2)<\/span>, based on a similarity factor (alpha )<\/span> (\u201cMethods<\/a>\u201d). Specifically, when (alpha = 1)<\/span>, (T^H_3)<\/span> coincides with (T^H_1)<\/span>, and (T^H_4)<\/span> coincides with (T^H_2)<\/span>. Conversely, when (alpha = 0)<\/span>, (T^H_3)<\/span> coincides with (T^H_2)<\/span>, and (T^H_4)<\/span> coincides with (T^H_1)<\/span>. For (0< alpha <1)<\/span>, (T^H_3)<\/span> and (T^H_4)<\/span> would be exhibiting intermediate characteristics between (T^H_1)<\/span> and (T^H_2)<\/span> along the trajectory continuum. The resulting 2D visualizations in Fig. 4<\/a>B, C showcase these distinctive traits in the low-dimensional trajectories. Notably, when (alpha > 0.5)<\/span>, (T^L_3)<\/span> is visualized closer to (T^L_1)<\/span> than (T^L_2)<\/span>, and conversely, (T^L_4)<\/span> is visualized closer to (T^L_2)<\/span> than (T^L_1)<\/span> (Supplementary Table 1<\/a>). Subsequently, in Fig. 4<\/a>C, as (alpha )<\/span> is incremented from 0.6 to 0.9, the visualizations of (T^L_3)<\/span> and (T^L_4)<\/span> gradually approach (T^L_1)<\/span> and (T^L_2)<\/span>, respectively. Furthermore, within trajectory pairs such as (T^L_1)<\/span> and (T^L_3)<\/span> sharing similarities, i.e., when (alpha >0.5)<\/span>, the trajectories display increased separation in later increments compared to earlier increments, effectively capturing subtle but important dissimilarities. The experiments described above combined with these results, validate IL-VIS\u2019s ability to generate evolving visualizations that preserve the high-dimensional trajectories\u2019 similarity relationships.<\/p>\n 2D visualizations obtained by IL-VIS showed its capability in generating evolving visualizations capturing the gradually increasing and non-linear trends of a simulated 100-dimensional non-linear trajectory (T^H_1)<\/span> (A<\/b>) without noise\/gaps, (B<\/b>) with noise, (C<\/b>) with noise and minor sampling gaps between increments, (D<\/b>) with noise and medium sampling gaps between increments, and (E) with noise and major sampling gaps between increments. (T^H_1)<\/span> was defined on a pseudo-temporal parameter t<\/i> (Eq. 2<\/a>). (phi _i(t))<\/span> is noise sampled from a Gaussian distribution (\u201cMethods<\/a>\u201d). (lambda )<\/span> is the percentage of noise added: 0% for (A<\/b>) and 20% for (B\u2013E<\/b>). For each experiment, five data increments were formed ((n=5)<\/span>). The number of data points in each increment was determined based on (delta )<\/span> and was 100, 100, 75, 50, and 100 for experiments (A\u2013E<\/b>) respectively (see \u201cMethods<\/a>\u201d). N<\/i> corresponds to the number of data points used in each visualization. Each row contains the five visualizations generated at the five sessions where a new data increment is introduced to IL-VIS at each session. At session i<\/i>, all the increments thus far (i.e., (D_i = bigcup _{k=1}^{i}I_k)<\/span>) are visualized. In visualizations with multiple increments, different increments are displayed in varying intensities of the same color. The stronger the intensity, the newer the increment. By tracing the medians of the subsequent increments chronologically, the emergence of (T^L_1)<\/span> which is a 2D representation of (T^H_1)<\/span> is observed. Each consecutive visualization has maintained the structure across the previous visualization allowing us to relate between the visualizations. Furthermore, the progressiveness of the gradually increasing trajectory is captured by placing each new increment further away from the previous increments (Supplementary Fig. 1<\/a>). IL-VIS shows robustness to noise (B<\/b>) and minor to major gaps (C\u2013E<\/b>).<\/p>\n<\/div>\n<\/div>\n 2D visualizations obtained by IL-VIS showed its capability in capturing the (dis)similarities in the progressions of (A<\/b>) two trajectories: (T^H_1)<\/span> and (T^H_2)<\/span> and (B<\/b>) four trajectories: (T^H_1)<\/span> and (T^H_2)<\/span> together with (T^H_3)<\/span> and (T^H_4)<\/span>. (T^H_1)<\/span>, (T^H_2)<\/span>, (T^H_3)<\/span>, and (T^H_4)<\/span> are 100-dimensional non-linear trajectories. (T^H_1)<\/span> and (T^H_2)<\/span> are independent (principle trajectories) whereas (T^H_3)<\/span> and (T^H_4)<\/span> are dependent (secondary trajectories) on (T^H_1)<\/span> and (T^H_2)<\/span>. Specifically, (T^H_3)<\/span> and (T^H_4)<\/span> are created as compositions of (T^H_1)<\/span> and (T^H_2)<\/span> where a similarity factor ((alpha )<\/span>) is used to determine how closely a secondary trajectory is to a principal trajectory (see Eq. 3<\/a>). (phi _i(t))<\/span> is noise sampled from a Gaussian distribution and (lambda )<\/span> is the percentage of noise added which is 0% for this experiment (\u201cMethods<\/a>\u201d). Five increments of data points are formed ((n=5)<\/span>) for each trajectory with no sampling gaps ((delta =0)<\/span>) in between thus each increment contains 100 data points (\u201cMethods<\/a>\u201d). (I^r_i)<\/span> is the i<\/i>th data increment of (T^H_r)<\/span>. N<\/i> corresponds to the total number of data points used in each visualization. Each experiment contains five visualizations generated at the five sessions. At each session i<\/i>, the i<\/i>th data increment of all trajectories that are modeled together is introduced to the pipeline. Each trajectory is shown in a different color. In visualizations with multiple increments, different increments are displayed in varying intensities of the same color. The stronger the intensity, the newer the increment. By tracing the medians of the subsequent increments chronologically, the emergence of (T^L_r)<\/span> which is a 2D representation of (T^H_r)<\/span> is observed. (A<\/b>) (T^H_1)<\/span> and (T^H_2)<\/span> originate at the same position and diverge from each other as the increments are added which is consistent with the definition of the trajectories (B<\/b>) when (alpha =0.8)<\/span>, (T^L_3)<\/span> is visualized closer to (T^L_1)<\/span> than to (T^L_2)<\/span>, and (T^L_4)<\/span> is visualized closer to (T^L_2)<\/span> than to (T^L_1)<\/span> capturing the relative similarity between the principle and secondary trajectories (C<\/b>) Visualizations generated at the last session ((i=5))<\/span> in four independent experiments with four different (alpha )<\/span> values. The Simulated data<\/h3>\n
IL-VIS generates evolving visualizations capturing the underlying trends<\/h4>\n
<\/a><\/div>\n
IL-VIS is robust to noise and gaps in the data<\/h4>\n
IL-VIS captures (dis)similarities between the trajectory progressions<\/h4>\n
<\/a><\/div>\n
<\/a><\/div>\n