Generative Adversarial Network (GAN) Model development to generate synthetic image data
The GAN model employed in this study consists of a generator and a discriminator as shown in Fig. 1. The generator applied upsampling techniques to a random noise vector input, and generated synthetic images that closely resembled the original, real image. In contrast, the discriminator functioned as a binomial classifier, downscaling input cell images to discern between real and fake samples. The core of the GAN model lay in its adversarial training approach, wherein the generator and discriminator alternated undergoing iterative updates and compete with each other. The discriminator was trained to minimize a binary classification loss function, while the generator was trained to maximize the probability of the discriminator misclassifying the generated samples. The objective function is described in the following equation:
$$:underset{{G}_{theta:}}{text{min}}underset{{D}_{varphi:}}{text{max}}{L}_{theta:};varphi:=:sum:left(text{log}{D}_{varphi:}left(xright)+text{log}left(1-{D}_{varphi:}left({G}_{theta:}left(zright)right)right)right)$$
In order to facilitate balanced competition between the generator and discriminator, and otherwise promote impartial learning during adversarial training, the networks were designed with a symmetric structure. The generator consisted of four layers of transposed 2D CNNs, while the discriminator consisted of four layers of 2D CNNs. Both networks incorporated batch normalization and rectified linear unit (ReLU)/Leaky ReLU activations between each layer. The generator concluded with a Tanh activation function, while the discriminator utilized a Sigmoid function. Details of the generator and discriminator structure can be found in Table 1. The adversarial training optimized both neural networks, enhancing the model’s robustness for generalization and defense of subtle perturbations in the data.
The GAN model was trained with the objective of producing high-quality artificial hiPSC-CMs data, which included both synthetic images and videos. These generated cell images were combined with authentic data to form the training dataset for the cell classifier model. The inclusion of synthetic cell images served the purpose of improving the scale and diversity of the dataset, which in turn enhanced the accuracy of computational analysis for classifying hiPSC-CM images into various stages of maturation.
Cell classification framework
The cell classifier architecture was constructed with a layered structure that consists of five layers – including three CNN layers and two fully connected (FC) layers (Fig. 10). To investigate the impact of integrating synthetic data into the training dataset, three distinct datasets were curated for the training of the classifier. These datasets included a relatively small authentic dataset, a larger authentic dataset, and a dataset that combined both authentic and synthetic images. The cell classifier underwent testing with both seen and unseen data to evaluate the GAN model’s ability to generate synthetic images that contain detailed features of the cardiac cells that were not sufficiently represented in the experimental dataset due to limited sampling. Since the GAN model was only trained with the seen domain data, this evaluation was intended to demonstrate the GAN model’s ability to generate artificial images that contained features beyond what was present in the original dataset.
Schematic to show the relationship of the training data and the testing data for the cell classifier.
To validate the effectiveness of the proposed model, the classification outcomes were compared against four conventional machine learning algorithms: Support Vector Machine (SVM), Random Forest, K Nearest Neighbors (KNN), and Naive Bayes. To assess the generalization ability they each achieved from the incorporation of synthetic data, each conventional machine learning model was trained using both real and synthetic datasets. Subsequently, those models’ ability to generate synthetic images with novel features were evaluated using both seen and unseen domain testing data.
Fabrication of micropatterned hydrogel scaffolds to facilitate maturation of hiPSC-CMs
To generate maturation-enhanced hiPSC-CMs, they were cultured on a micropatterned, collagen IV coated photosensitive hydrogel with controlled mechanical properties. A 10% (w/v) gelatin methacrylate (GelMA) was combined with 0.5% Irgacure 2959 photoinitiator to generate photo-crosslinked hydrogels. Sterility was ensured via sterile 0.2 μm porous rapid filtration. These hydrogels were casted in a custom Teflon mold and sealed with glass to polymerize under 365 nm 8mW/cm2 UV light and subject to varying crosslinking times to generate a stiffness gradient of 10 kPa, 30 kPa, and 60 kPa. Polydimethylsiloxane (PDMS) stamps with micropatterns including 20 μm x 140 μm 40 μm x 280 μm, 75 μm x 525 μm, and 45 μm x 225 μm size rectangular patterns were fabricated using traditional photolithography and soft lithography (Fig. 11A). Plasma-activated PDMS stamps were coated with collagen IV protein and stamped onto the 10% GelMA hydrogel scaffolds (Fig. 11B). For the positive control groups, hiPSC-CMs were cultured on collagen IV coated MatTek glass well plates.
(A) Various micropatterns generated via lithography. (B) Immunostaining of collagen IV coated patterns (scale bars: 100 μm) (C) HiPSC-CMs cultured on collagen IV micropatterned GelMA hydrogel scaffold that demonstrate mature morphology (scale bar: 100 μm) (D) Motion vector analysis of contractility.
Optical measurements of cardiomyocyte structure and function
Commercially available human iPSC-derived cardiomyocytes (iCell2 cardiomyocytes, 01434) were obtained from Cellular Dynamics International Inc. (CDI, Madison, WI, USA). Cryopreserved iCell2 cardiomyocytes were rapidly thawed, then diluted in iCell2 plating medium and seeded onto standard 6-well and 96-well plates (Thermo Fisher Scientific) coated with 0.1% gelatin (Sigma Aldrich) for the control groups and on 10% GelMA hydrogel scaffolds coated with collagen type IV proteins for the maturation enhancement group (Fig. 11C). After 4 h post seeding, the plating medium was changed to a maintenance medium and then changed every 48 h thereafter. Cell cultures were maintained in the incubator at 37 °C and 5% CO2, 86% humidity. The hiPSC-CMs were cultured for two weeks and characterized every other day using a Nikon TE2000 inverted microscope to record the cellular morphology and beating dynamics at 10 frames per second. To assess contractile motion of hiPSC-CMs, movement was quantified using a custom MATLAB script, which measured pixel displacements of contracting cells over contraction and relaxation. For each video frame, the mean magnitude of displacement was measured to yield an average contractile movement. Normalized contractile motion was calculated foreach video as the mean of all peak contraction values observed in a 20 s period (Fig. 11D).
Generation of seen domain and unseen domain data
Videos of day 2, day 6 and day 14 hiPSC-derived cardiomyocytes were collected to represent the different stages of the cardiomyocyte maturation process. Images were extracted and randomly cropped from these videos to obtain 300 × 300 pixel of RGB cell images. The collected real images were separated into two groups: cells cultured in one maturation-promoting scaffold included in the seen domain, and cells cultured in another scaffold included in the unseen domain. Both groups of cells were cultured under the same conditions, and the same separation process also was done for the control group. The seen domain dataset was utilized for the training of the GAN and cell classifier, as well as for testing the accuracy of the cell classifier. The unseen domain data was employed for testing the generalization ability of the cell classifier.
Implementation and training of GAN model
All of our GAN and cell classification models were implemented through Pytorch on a standard workstation (Intel(R) Core(R) CPU i9-9980 XE CPU 3.00 GHz, 18 CPU cores, 8GB NVIDIA GeForce TRX 2080Ti). The Adam optimizer was employed to minimize the loss of the GAN model and a standard error back-propagation algorithm was used, with (:beta:)1=0.5 and (:beta:)2=0.999. A batch size of 64 was used, and the learning rate was set to 0.0002. The cell classifier underwent training for 2000 epochs. The weights were controlled with weight norm regularization to avoid overfitting.
To generate synthetic images, a dataset of 691 seen domain images was utilized. This dataset consisted of 229 images from day 2, 227 images from day 6, and 235 images from day 14. These images underwent transformations such as random cropping, random flipping, and resizing, resulting in images with a resolution of 128 × 128 pixels and RGB channels. The generator component of the GAN model took a noise vector of size (64,1) as input and generated an image of size (3,128,128) as output. The discriminator, on the other hand, took an image of size (3,128,128) as input and output a probability indicating whether the input image was genuine or artificial. The GAN model was trained for a total of 2000 epochs to generate 320 images for each class of cardiomyocytes.
To generate synthetic videos that replicate the beating dynamics of cardiomyocytes, a dataset comprising 124 groups of single cell time-series images was collected. Each group consisted of five consecutive frames, each being an RGB image of size 256 × 256 captured at a frame rate of 5 frames per second (FPS). These collected images underwent several transformations, including random cropping, random flipping, grayscale conversion, and resizing, resulting in each group containing five consecutive single-channel cell images of size 64 × 64. The generator component of the GAN model took a noise vector of size (64,1) as input and generated an output vector of size (5,64,64). On the other hand, the discriminator took a vector of size (5,64,64) as input and output a probability indicating whether the input vector represented a genuine beating cell or an artificial beating cell. The GAN model was trained for 2000 epochs using this setup. The generated vector of size (5,64,64) was further transformed into a short synthetic video that replicated the beating of a single cardiomyocyte.
Implementation and training of the cell classifier
The cell classifier architecture was structured with three convolutional neural network (CNN) layers followed by two fully connected (FC) layers. Each CNN layer had a kernel size of 3 and produced output channels of 32, 16, and 4, respectively. Following each CNN layer was a 2D maximum pooling layer with a size of (4,4). The two FC layers had sizes of 64 and 3 respectively, and the classifier concluded with a SoftMax activation layer. The input to the classifier was cardiac cell images, either real or artificial, with dimensions of (3,96,96) that randomly were transformed from the image training dataset with each image of size (3,128,128). The output of the classifier is a vector of length 3, which indicates the probabilities of each class for the input image.
To examine the impact of synthetic images generated by the GAN model, three distinct training datasets were prepared. The first dataset was the original training dataset used for training the GAN model, consisting of 691 real cell images from days 2, 6, and 14. The second dataset combined the images from the first dataset with an additional 960 real images (320 per cell class), resulting in a total of 1651 real cell images. The third dataset combined the 691 real images with 960 synthetic images (320 per cell class), resulting in a total of 1651 mixed real and fake cell images. The relationship among these three training datasets is depicted in Fig. 10. During training of the cell classifier, the Adam optimizer with (:beta:)1=0.9 and (:beta:)2=0.999 was utilized with a batch size of 64. The learning rate was set to 0.0005. The cell classifier underwent training for 1000 epochs.
- SEO Powered Content & PR Distribution. Get Amplified Today.
- PlatoData.Network Vertical Generative Ai. Empower Yourself. Access Here.
- PlatoAiStream. Web3 Intelligence. Knowledge Amplified. Access Here.
- PlatoESG. Carbon, CleanTech, Energy, Environment, Solar, Waste Management. Access Here.
- PlatoHealth. Biotech and Clinical Trials Intelligence. Access Here.
- Source: https://www.nature.com/articles/s41598-024-77943-0