Quantifying the recovery process of skeletal muscle on hematoxylin and eosin stained images via learning from label proportion

Animal procedure

C57BL/6J mice were purchased from Charles River Laboratories (Yokohama, Kanagawa, Japan). Mice were maintained in a controlled environment (temperature, (24 pm 2^circ)C; humidity, (50% pm 10%)) under a 12/12-h light/dark cycle. The mice were provided sterilized standard chow (DC-8; Nihon Clea, Tokyo, Japan) and water ad libitum. To induce muscle regeneration or degeneration, (100 mu)L of CTX (Cardiotoxin; (10 {mu })M in saline; Latoxan, Valence, France) or 50% v/v glycerol were injected into the tibialis anterior muscles with a 29-gauge needle under anesthesia using a medetomidine, midazolam, and butorphanol cocktail42,43,44. After euthanasia using cervical dislocation by skilled researchers following the Animal Experimentation Committee at Osaka University, tibialis anterior (TA) muscles were dissected, mounted on cork using kneaded Tragacanth Gum (Wako Pure Chemicals Industries, Osaka, Japan), and subsequently flash-frozen in liquid nitrogen-cooled isopentane (Wako Pure Chemicals Industries) for 1 minute. Following a 1-hour incubation on dry ice to evaporate the isopentane, the muscles were stored in sealed containers at (-80^circ)C. Transverse cryosections were cut (10-{mu })m thick and stained with Hematoxylin and eosin solution. In immunostaining of CTX-injected samples for validity of the expert manual annotations, transverse cryosections ((6 {mu })m thick) of TA muscles were fixed with 4% paraformaldehyde(PFA) for MyoD (Fig. 4(a)) or cooled-acetone for embryonic myosin heavy chain (eMyHC) staining (Fig. 4(b), 4(c)) for 10 min. After blocking with 5% skimmed milk, adjacent serial sections were stained with primary antibodies at (4^circ)C overnight. For eMyHC staining, an M.O.M. Kit (Vector Laboratories, Burlingame, CA, USA) was used to block endogenous mouse IgG. The primary antibodies utilized in this study are rat anti-mouse laminin alph2 (Enzo, Clone 4H8-2, Cat# ALX-804-190-C100), rabbit anti-mouse MyoD (Abcam, Cat# ab133627), mouse-eMyHC (DSHB, Clone F1.652), and rabbit anti-collagen type I (Bio-Rad, #2150-1410) antibodies. After washing, the sections were incubated with secondary antibodies conjugated with Alexa Fluor 488, 546, or 647 (Molecular Probes, Eugene, OR, USA). The washed samples were enclosed with VECTASHIELD Mounting Medium with DAPI (Vector Laboratories, #H-1200). To ensure thickness accuracy, we discarded the first or first two slices when changing the thickness of the tissue sections. The stained tissue images were captured using a Plan Apochromat (Keyence, Co) and a 20x objective lens (Nikon, Co.) with a BZ-X Analyzer. The captured image sizes had a width of 2877 to 4606 and a height of 2720 to 4355, totaling 12.5 megapixels. These images were saved in tag image file format (TIFF) along with the information on the elapsed days after CTX injection. Details of our captured images are shown in Table 2. In training data and Day 0 of unlabeled test data, both legs of each mouse were amputated and the tibialis anterior was imaged; otherwise, one image corresponds to a single mouse sample. The images with errors in freezing processing or staining were excluded from the dataset, for example, Day 0 of training data and Day 0 of unlabeled test data. Therefore the training data consists of 14 mice, the annotated test data of 5 mice, and the unlabeled test data on injection of CTX or glycerol consists of 26 mice. All procedures used for experimental animals were approved by the Experimental Animal Care and Use Committee of Osaka University (approval number: R02-3), and all of the methods were carried out under relevant guidelines and regulations and ARRIVE guidelines.

Table 2 Number of HE stained images used in experiments in Result section. Image size is width: 2877-4606 [pix], height: 2720-4355 [pix]. The training data was used to finetune Cellpose and train the classifier model. In the annotated test data, class annotations were manually performed by specialists, categorizing the recovery phase of each tissue region into stable (red), early phase (blue), mid-phase (yellow), and late phase (orange) to evaluate the trained model (Fig. 2).

Image preprocessing and dataset creation

We divided all the whole slide images (WSI) into training and test data as shown in Table 2. We created the annotated test dataset by selecting one image each from day 0, 3, 5, 7, and 14 (day 0 refers to the preinjection). We performed expert class manual annotation using Labelbox by categorizing the fiber conditions into four phases: stable (red), early phase (blue), mid-phase (yellow), and late phase (orange)45. Red represents intact or fully recovered myofibers. Blue indicates an area of early regeneration characterized by non/low MyoD expression, and basal lamina, but no nuclei as shown Fig. 4(a). Yellow marks the middle phase of regeneration characterized by notable MyoD expression as shown Fig. 4(b). Orange indicates the late phase including small (eMyHC-high) and large myotube (eMyHC-low), both of which have central myonuclei as shown Fig. 4(c). The validity of this manual class annotation provided by visual inspection of specialists and the accuracy of MyoRegenTrack is confirmed by comparing protein markers45 obtained from immunostaining of serial sections in Fig.  4. For more detailed information, see Supplementary figures S6, 7, 8.

Fig. 4
figure 4

The images for validity of the expert manual annotations. In the rough annotation and MyoRegenTrack, each color corresponds to blue: early phase, yellow: mid-phase, orange: late phase, red: stable, and white indicates areas where no cells were detected. In immunostaining,  (a) the red color of Day3 indicates the expression of MyoD,  (b) (c) the red color of Day5, 7 indicates embryonic myosin heavy chain (eMyHC),  (a) (b) (c) the green color shows the cell membrane marked by Laminin, and the blue color represents the nuclei stained with DAPI.

Areas where decisions could not be made were left unannotated and marked in white. Regions outside the muscle tissue were also masked in white. We split the WSIs into grids of (256 times 256) [pixel] images during the training and inference processes based on GPU memory limitations. All (256 times 256) [pixel] images for training were augmented by rotating them by 90, 180, and 270 degrees using OpenCV 4.8.1 (Python 3.7.13) and flipping them using the Flip function, increasing the data eightfold. We also conduct data augmentation, including RandomBrightness (p=0.5), RandomContrast (p=0.5), and RandomGamma (p=0.5) operations, to simulate random optical conditions using albumentations v1.3.1, doubling the number of images. The full data augmentation process increased the number of images by 16. To train the classifier, we randomly cropped 100 images of (64 times 64) [pixel] per grid image ((256 times 256) [pixel]) to ensure proper alignment with the roughly annotated proportions shown in Fig. 5.

Finetuning of cellpose for cell segmentation

As prior cell segmentation methods do not show satisfactory generalizability to skeletal muscles in the recovery process, we finetuned a Cellpose model46 to segment all cells that appeared during recovery. We split the WSIs in the training dataset into grids of (256 times 256) [pixel] images and manually annotated cell edges in every grid image using the GUI provided by Cellpose40, and remove the images that contain no cells. We used the annotated training data to finetune the cyto2 model of Cellpose version 2.0.3. Channels are set to [0, 0] during finetuning, indicating that cells were identified in grayscale and no nucleus channel was utilized. Other parameters were set to their default settings.

Computing the proportion of recovery phase on different days

During recovery of skeletal muscle tissues, damaged-myofiber-derived factors (DMDFs) produced from damaged myofiber activate satellite cells, which undergo proliferation and differentiation to regenerate the myofibers3. This process can be divided into four phases in terms of recovery phase: stable (denoted as red), early phase (denoted as blue), mid-phase (denoted as yellow), and late phase (denoted as orange), as shown in Fig. 5(a). We define the recovery phase as classes (C = 1,…,k,…K)(i.e. “stable”, “early”, “mid”, and “late”) and manually perform rough class annotation for the training data in Table. 2 by using the Windows application Paint (version 11.2404.45.0). As an index of roughness, the annotations were made with a circular pen of at least 100 pixels in the Paint application preinstalled in Windows. These annotations were intended to determine class proportions and, therefore, cannot serve as an accurate classification of each fiber to evaluate our classifier. We counted the number of pixels in each class from the rough annotations and summed the total pixels per daily label to determine the class proportion for each day in the training dataset. We obtain the proportion (textbf{p}_j = [p_1,…p_k,…p_K] in [0,1]^K, Vert textbf{p}_jVert _1=1) for classes C for each day (j in {“day0text {”}, “day3text {”}, “day5text {”}, “day7text {”}, “day14text {”}}) by counting the number of pixels in each class from the obtained rough annotations and aggregating the total number of pixels per daily label, as shown in Fig. 5(b).

Fig. 5
figure 5

The method for calculating the class proportions corresponding to the daily label. Note that Day 0 is before the CTX injection, and all fibers are considered intact if an image is 100% red. (a) The recovery progress of each tissue region after CTX injection. (b) Procedure for calculating proportions of each day from the training data based on rough annotation and pixel counting software.

Inference pipeline

We describe the inference pipeline illustrated in Fig. 6(a). An overview diagram for the user is provided in Supplementary Fig. S9. WSIs are split into grids of (256 times 256) [pixel] before cell segmentation with Cellpose. The grid image size is adjustable and not critical to the classification accuracy, as it is only used to facilitate cell segmentation using Cellpose. After segmentation, we obtain the contours and centroids of each cell object. For each cell object, we crop a (64 times 64) [pixel] image centered on the cell. In this paper, we refer these (64 times 64) [pixel] images as “cell images.” We extract the image features of cell images with a generic feature extractor, DINO24, to obtain feature instances (textbf{x}in mathbb {R}^D)((D in mathbb {N}): the dimension of the image feature map). We use the DINO ViT-B/8 model, a vision transformer backbone model with a patch size of 8. Additionally, we use the student checkpoint and enable average pooling patch tokens. The other parameters are set to their default settings. The obtained image features are passed to a trained classifier to predict the corresponding recovery phase of cells.

We describe the architecture of our classifier used to predict classes (C = {1, …, k, … K}) (e.g., red, blue, yellow, and orange) from image features (textbf{x}in mathbb {R}^D). As shown in Fig. 6(b), the classifier learns the mapping (mathcal {F}: mathbb {R}^D rightarrow mathbb {R}^K_+ (sum _{k=1}^{K} mathcal {F}(textbf{x})_k = 1)). It consists of a 3-layer perceptron, where we add a rectified linear unit (ReLU27) activation function between two fully connected layers. The K-dimensional output of the classifier is normalized using a SoftMax function to ensure that the sum of the outputs equals 1, thus representing the confidence of each class.

Fig. 6
figure 6

We describe the inference pipeline for the recovery phase classification of all cells from a WSI input in (a) and illustrate the architecture of the recovery phase classifier in (b). The learning process in the class proportion strategy for training our classifier is illustrated in (c).  (a) Due to the high resolution of WSIs, it is difficult to perform segmentation directly with Cellpose, so we split the images into (256 times 256) [pixel] grids. We calculated the centroid coordinates of each cell using the segmentation result from the finetuned Cellpose and cropped a (64 times 64) [pixel] image for each cell. These cell images were passed to a feature extractor to obtain the image features (textbf{x}in mathbb {R}^D)((D in mathbb {N})). The obtained image features were then fed into a classifier implemented based on a three-layer perceptron, which derived the confidence scores for each recovery phase of cells. The class with the highest confidence scores is the predicted recovery phase of the cell.  (b) The architecture design of the classifier. An image feature vector of dimension D was passed to the first fully connected layer of the classifier (FC1) and processed with a ReLU activation function. It was then passed to the next fully connected layer, FC2, and was compressed to dimension (D’). After applying another ReLU function, the features were input into the FC3 layer, which outputs a vector of dimension K. We applied a Softmax function to the K-dimensional vector output to obtain the confidence score for each recovery phase. The class with the highest confidence score is the predicted recovery phase.  (c) Our training pipeline using the LLP strategy. We randomly cropped 100 images of (64 times 64) [pixel] per grid image. We divide all cropped images into several bags, each containing N images. Note that each bag contained only images from the same day. Subsequently, we obtained the confidence scores of all images. We computed the sample proportion of each bag by calculating the sum of the confidence scores and minimized the Kullback-Leibler divergence of the ground truth proportion shown in Fig. 5 from the sample distribution.

Pseudo-Label30,47 is a weakly supervised learning method in which the daily label j is used to establish the class probability distribution (textbf{p}_j in [0,1]^K, Vert textbf{p}_jVert _1=1). When an instance (textbf{x}in mathbb {R}^D) with daily label j is given, it generates a random class C is generated a one-hot vector (textbf{y}in {0,1}^K) according to the probability (textbf{p}_j in [0,1]^K). We update the model by calculating the loss (mathcal {L}_{pseu}) using cross-entropy for each batch size, as in the equation below.

$$begin{aligned} mathcal {L}_{pseu} = sum mathcal {F}(textbf{x}) cdot log textbf{y}qquad qquad (because textbf{y}in {0,1}^K, P(y_k=1) = p_k) end{aligned}$$

(1)

We train our classifier model using learning from label proportion (LLP)31. LLP is a weakly supervised learning method used to predict the class of each instance given the proportions of bags of instances. In our research, we construct bags by sampling images from the same day and use the proportion of recovery phases on the corresponding day ( Fig. 5) as our ground truth proportion. We optimize our classifier model with a proportion loss (mathcal {L}_{prop}), which calculates the KL divergence (D_{KL}) : (mathbb {R}^K cdot mathbb {R}^K rightarrow mathbb {R}_+) of the ground truth proportion (textbf{p}_j in [0,1]^K) is calculated from the predicted proportions (hat{textbf{p}}_{j} in [0,1]^K) as follows.

$$begin{aligned} mathcal {L}_{prop} = D_{text {KL}}( textbf{p}_j Vert hat{textbf{p}}_{j}) = sum _{k=1}^{K} p_k log left( frac{p_k}{hat{p}_k}right) end{aligned}$$

(2)

As illustrated in Fig. 6(b,c), we compute image features (textbf{x}in mathbb {R}^D)(D: the dimension of the image feature map) related to daily labels j from each WSI in the procedure of Fig. 6(a). We are grouping many instances (textbf{x}) to M bags (mathbb {B}_1,…,mathbb {B}_m,…,mathbb {B}_M), which has N(Bag Size) instance group as (mathbb {B}_m={textbf{x}_1, textbf{x}_2,…,textbf{x}_N}) with class proportion (textbf{p}_j in [0,1]^K (Vert textbf{p}_{j}Vert _1=1)) corresponding daily labels j. We calculate the loss function (mathcal {L}_{prop}) for each bag therefore, the predicted proportions (hat{textbf{p}}_{j}) are computed for each bag. From each instance (textbf{x}in mathbb {R}^D) within the bag (mathbb {B}_m), we obtain the confidence (mathcal {F}(textbf{x}) in [0,1]^K) through the classifier (mathcal {F}: mathbb {R}^D rightarrow mathbb {R}^K_+). The predicted distribution (hat{textbf{p}}_{j}=[hat{p}_1,…,hat{p}_k,…, hat{p}_K] in [0,1]^K, Vert hat{textbf{p}}_{j}Vert _1=1) of classes (C={1,…,k,…K}) (e.g., red, blue, yellow, and orange) for the bag (mathbb {B}) is calculated as the average of these confidences values (mathcal {F}(textbf{x})) as follows.

$$begin{aligned} hat{p}_k = frac{1}{|mathbb {B}|} sum _{textbf{x}in mathbb {B}} mathcal {F}(textbf{x})_k qquad qquad qquad (because sum _{k=1}^{K} hat{p}_k = 1) end{aligned}$$

(3)

Recovery score and cell area rate

As shown in equation (4), the recovery score is calculated by weighting the predicted proportion (hat{textbf{p}}_{j}) with the weight (varvec{omega }).

$$begin{aligned} textrm{RecoveryScore} = hat{textbf{p}}_{j} cdot varvec{omega } =sum _{k=1}^{K} hat{p}_k cdot omega _{k} quad (because Vert hat{textbf{p}}_{j}Vert _1=1, varvec{omega } in [0,1]^K) end{aligned}$$

(4)

The tissue’s recovery score over dates is modeled using the sigmoid function (sigma (x) = frac{1}{1 + e^{-a(x-d)}}), where the gain (a > 0) represents the slope of the sigmoid curve, reflecting the speed of recovery, and the inflection point (d in mathbb {N}) [day] corresponds to the date when switched to recovery. For this dataset, we set (a = 0.65) and (d = 6) [day]. Using true proportion (textbf{p}_j in [0,1]^K, Vert textbf{p}_jVert _1=1) obtained individually for each WSI in Fig. 5, we derived the weights (varvec{omega } in [0,1]^K) through the least squares method using Numpy v1.20.3 with the function linalg.lstsq((textbf{p}_j, sigma (j))).

Statistical analysis

In Fig. 1(d,e), the sample size for the Mean IoU is the number of cells on each date, and the F1 score is the number of (256 times 256) [pixel] images about 100-124 segmented from the WSI. Since the Kolmogorov-Smirnov test could not confirm normality, non-parametric tests were conducted to derive p-values and effect sizes. The p-values from the Mann-Whitney test, the effect sizes with Cliff’s delta, and their confidence intervals were calculated using the Python library scipy.stats (version 1.7.3). The graphs were plotted using the Python library matplotlib.pyplot (version 3.5.1).