Close this search box.

Rapid deep learning-assisted predictive diagnostics for point-of-care testing – Nature Communications

Workflow of TIMESAVER for fast assay

Figure 1 presents three representative commercialized diagnostic tools: commercial LFA, PCR, and ELISA, along with their performance in terms of time, labor, cost, and accuracy. Generally, commercial PCR and ELISA tests take several hours, are labor-intensive, and incur higher costs. In contrast, rapid kits typically provide cost-effective, on-site diagnostics. We introduce TIMESAVER-assisted LFA, an approach that combines time-series deep learning architecture, AI-based verification, and enhanced result analysis to optimize LFA immunoassays. Our objective is to establish the fastest diagnostic time among existing commercially available kits while maintaining accuracy and affordability. Conventional rapid kit protocols typically require 10–20 min for analysis, posing challenges in time-sensitive applications like emergency medicine, infectious disease management, neonatal care, and heart stroke, where further assay time reduction is crucial.

Fig. 1: Figure of merit of AI-powered TIMESAVER algorithm.
figure 1

This schematic illustrates three representative commercialized diagnostic tools: commercial LFA, PCR, and ELISA, along with their performance metrics, including time, labor, cost, and accuracy. The TIMESAVER algorithm, utilizing a comprehensive time-series deep learning architecture, provides enhanced result analysis through AI-based verification, all within a rapid 1–2 min assay time, outperforming human experts with a 15-min assay. LFA, lateral flow assay; PCR, polymerase chain reaction; ELISA, enzyme-linked immunosorbent assay; TIMESAVER, Time-Efficient Immunoassay with Smart AI-based Verification; AI, artificial intelligence.

As shown in Fig. 1, our approach utilizes a time-series deep learning architecture and AI-based verification, resulting in a significant reduction in assay time to within 1–2 min using TIMESAVER. A more detailed discussion of the time-series deep learning architecture, known as the TIMESAVER algorithm, is provided in Fig. 2. This algorithm is specifically designed for learning from time-series data and has effectively reduced diagnosis times. Notably, the results demonstrate diagnosis times as short as 1–2 min for LFA when utilizing a smartphone or reader (See Supplementary Movie 1).

Fig. 2: Algorithm optimization.
figure 2

a TIMESAVER algorithm comprises object detection using YOLO, time series image analysis through CNN-LSTM, and the FC layer. b Object Finding: Two ROI selection methods in LFA compared. Selecting only the test line resulted in higher accuracy (95.2%) compared to using the window (92.9%). c Data Augmentation using RGB and HSV: combining both yielded an accuracy of 97.6%. d CNN Model Optimization: ResNet-50 demonstrated the highest accuracy at 97.6%, outperforming other models. e LSTM optimization: LSTM achieved an accuracy of 97.6%, while GRU obtained 91.7%. f Trade-off between Root Mean Squared Error (RMSE) and normalized GPU memory consumption, illustrating the AI-based optimized assay time. g Time-series Images: Sequential images show the progression of the assay over time. TIMESAVER, Time-Efficient Immunoassay with Smart AI-based Verification; CNN, convolution neural network; LSTM, long short term memory; FC, fully connected; ROI, region of interest; GRU, gated recurrent unit; RMSE, root mean squared error; GPU, Graphics Processing Unit; AI, artificial intelligence.

Model optimization for TIMESAVER algorithm

In Fig. 2, we present the deep learning architecture, TIMESAVER, utilized for predicting results, which consists of three components: YOLO, CNN-LSTM, and the FC layer. Figure 2a illustrates the overall scheme of TIMESAVER, a deep learning architecture consisting of three interconnected components. This involves transforming the entire image into a cropped image containing the test line, which is then processed through CNN and LSTM networks to generate a vector representation. Subsequently, the CNN and LSTM outputs are combined and passed through the FC layer to produce the predicted result.

Region of Interest (ROI) selection is a crucial step in rapid kit diagnosis (Fig. 2b). The selection of the Region of Interest (ROI) enhances the accuracy of detecting the specific concentration of the target biomarker or pathogen, thereby increasing sensitivity and specificity and minimizing the occurrence of false negatives and false positives39. As detailed in our previous research, we investigated two methods for ROI selection in LFAs: focusing on the window and the test line exclusively. The approach centered on the window area achieved a prediction accuracy of 92.9%, while a focus exclusively on the test line enhanced the prediction accuracy to 95.2%. Data augmentation is a vital technique, particularly for limited or imbalanced datasets (Fig. 2c). It involves applying various transformations to existing data, generating synthetic images to enrich the dataset and enhance the model’s robustness. In our study, we acquired RGB channel images and transformed them into HSV channel images. The data augmentation results were as follows: RGB achieved an accuracy of 95.2%, HSV achieved 64.3%, and combining RGB and HSV yielded a perfect accuracy of 97.6%.

In Fig. 2d, we optimized the CNN model. For feature extraction from images, we used a CNN specifically designed for image recognition and processing tasks, making CNNs essential in computer vision applications. Among the four frameworks evaluated (ResNet-18, ResNet-34, ResNet-50, DenseNet-12139), ResNet-50 exhibited the highest accuracy at 97.6%, surpassing the performance of shallow-layer models. In Fig. 2e, we optimized the LSTM model. When forecasting using time-series data, we employed advanced recurrent neural network (RNN) algorithms, including LSTM and gated recurrent unit (GRU). LSTM, a type of recurrent neural network, excels in handling sequential data and addresses the vanishing gradient problem by employing a sophisticated memory cell. LSTM achieved an accuracy of 97.6%, while GRU obtained 91.7%.

In Fig. 2f, we present the trade-off curve between root mean squared error (RMSE) and normalized graphics processing unit (GPU) memory consumption across various assay time frames, effectively illustrating the AI-based optimized assay time. Note that assay time refers to the sequential images used in training and testing. As we incorporated additional time-series data, the RMSE values were exponentially reduced, indicating enhanced accuracy. However, this improvement was accompanied by a linear increase in GPU memory consumption. Controlling GPU memory consumption is a key parameter for achieving optimal deep learning operation, as higher GPU memory consumption leads to longer training/test times and requires more expensive hardware. Consequently, we postulate that a 2-min time series may represent the optimal condition when employing the TIMESAVER model.

In Fig. 2g, we show the acquired images over time. After approximately 30 s, the samples loaded in the sample reservoir reached the test line, and the test line appeared after 1–2 min, depending on the concentration/titers of the target. All the images were taken at 10-s intervals, resulting in 6 images acquired per minute. For example, in a 2-min assay, we trained on 12 sequential images, then tested sequential images with a 2-min assay time. Interestingly, in the time scale of 1 to 2 min, we observed unclear background signals with the naked eye; however, the TIMESAVER model could detect the colorimetric signal with higher accuracy.

Assay of infectious diseases via TIMESAVER

Figure 3 presents the assessment of infectious diseases, specifically COVID-19 antigen and Influenza A/B, using a 2-min assay facilitated by the TIMESAVER model. To assess the diagnostic accuracy of COVID-19 in Fig. 3a, we employed standard data (target protein spiked rapid kit running buffer) and trained the TIMESAVER model using our training set, which included both the training data (n = 594) and a validation subset (10% of the training set). We developed a regression model for TIMESAVER and categorized the regression values into five classes: high, middle, mid-low, low, and negative control. It’s important to note that we categorized the images from data into these classes following the manufacturer’s supplied guidelines, which are as follows: high (levels 8–7), middle (levels 6–5), mid-low (levels 4–3), low (levels 2–1), and negative control (level 0). The manufacturer’s color chart is presented in Fig. S1. Utilizing standard data enables us to categorize into five classes, as opposed to the binary classification employed with clinical samples. Consequently, we can conduct a more comprehensive examination of the underlying causes of false positive and false negative signals. Since each dataset comprises 12 time frame images with 10-s intervals, the total number of images used for training was 7,128. We conducted tests with 84 data (54 positive and 30 negative). Our results indicate that the AI-based decision-making process, performed within 2 min, achieved a sensitivity of 96.3%, specificity of 100%, and accuracy of 97.6%, showcasing the excellence of the TIMESAVER model in making initial decisions.

Fig. 3: Evaluation of a commercial LFA for infectious disease (COVID-19, Influenza A/B) with a 2-min assay using the TIMESAVER Model.
figure 3

ac COVID-19 assay: a TIMESAVER achieved a sensitivity of 96.3%, specificity of 100%, and accuracy of 97.6% from 84 data, demonstrating the model’s proficiency in making initial decisions. b The ROC curve illustrates an AUC of 0.99. c The confusion matrix highlights that the false negatives were primarily associated with low-concentration samples. df Universality of COVID-19 assay by assessing its performance on various commercialized LFA Kits: d The average sensitivity, specificity, and accuracy across these five different models (n = 600), each with distinct form factors, were 94.5%, 93.5% and 94.2%, respectively. e The AUC value reached 0.98, as shown in the ROC curve. f The confusion matrix demonstrates that the ability to discriminate lower concentrations and negative controls is pivotal in LFA assays for achieving higher accuracy. gi Influenza assay: g The sensitivity, specificity, and accuracy of the influenza model were 93.8%, 100%, and 95.8%, respectively. h The AUC value attained 0.97, as indicated by the ROC curve. i The confusion matrix indicates that the false negatives were predominantly linked to samples with low concentrations. TIMESAVER, Time-Efficient Immunoassay with Smart AI-based Verification; ROC, receiver operating character; AUC, area under curve; LFA, lateral flow assay.

In Fig. 3b, c, we present receiver operating characteristic (ROC) curves and a confusion matrix for the 2-min assay of COVID-19 using the TIMESAVER algorithm. ROC curves provide a comprehensive view of the model’s performance, with a higher area under the curve (AUC) indicating better classification ability. Our analysis revealed that the TIMESAVER model achieved an AUC of 0.99, affirming its excellent performance as an assay classifier. The confusion matrix (Fig. 3c) highlights the critical nature of accurately diagnosing low-concentration data, a challenge even for experts when relying solely on visual inspection. We observed that the false negatives (n = 2) were caused by the low-concentration samples. This provides valuable insights for improving sensitivity and specificity. One viable strategy involves augmenting the training data. By incorporating more data with low concentrations, we can fine-tune sensitivity and specificity, as demonstrated in our previous paper39. In the following section, we will demonstrate the enhanced accuracy of our clinical assay. This will be achieved by integrating clinical data from 84 patients, including 13 with Ct values > 29, corresponding to low concentration/titer, and 32 healthy controls, as detailed in Fig. 5.

Universality is a key characteristic of the TIMESAVER algorithm. We validated its universality by assessing its performance on various commercialized LFA models (Fig. 3d–f). In this study, we tested an additional five LFA models (n = 600, Fig. 3d, Fig. S2, Supplementary Table 3). We exclusively trained the TIMESAVER model with an additional set of time-series data (n = 300) combined with the pre-existing dataset (n = 594), resulting in a total training dataset of 894. Given that each dataset consists of 12 time frame images, the total number of images used for training amounted to 10,728. To test the algorithm, we applied the TIMESAVER model initially trained with LFA model 1 (COVID-19 Ag LFA kits, Calth Inc.). Interestingly, the average sensitivity and specificity across these five different models (n = 600), each with distinct form factors, were 94.5% and 93.5%, respectively. The variation in performance can be attributed to differences in membrane types, designs, materials, flow rates, and other factors among LFA kits from various manufacturers. Such variations are expected due to the hardware-related disparities between these different LFAs. The AUC value reached 0.98 as shown in the ROC curve (Fig. 3e). Furthermore, from the confusion matrix (Fig. 3f), it is evident that the ability to discriminate lower concentrations and negative controls plays a pivotal role in LFA assays for achieving higher accuracy. We anticipate that further training with various LFA models will lead to increased accuracy, as demonstrated in our previous works39.

We broadened our validation efforts to include influenza testing. The influenza kit in our study had A, B, and control lines, but due to limited sample availability, we only tested for influenza A. Illustrated in Fig. 3g, h, the manuscript details the sensitivity, specificity, and accuracy in detecting Influenza A, based on a dataset of 192 test samples. The influenza test kits exhibited a sensitivity of 93.8%, specificity of 100%, and an accuracy of 95.8%. The AUC value derived from the ROC curve was 0.97. It was observed that the false negatives (n = 8) were predominantly due to samples with low concentrations, which adversely affected sensitivity. However, the Lateral Flow Assay (LFA) enhanced by the TIMESAVER model demonstrated that it is possible to achieve a quick assay time while still maintaining the essential sensitivity and specificity for effective point-of-care diagnosis.

Assay of non-infectious biomarkers for emergency room (ER) via TIMESAVER

Next, we further validated the performance of the TIMESAVER assay for non-infectious biomarkers, including Troponin I and hCG for ER. Initially focusing on Troponin I, as shown in Fig. 4a–c, we acknowledged its clinical relevance above 0.4 ng/ml, following previous research17. Therefore, we set a cut-off at 0.5 ng/mL and established a five-class multi-classification using recombinant protein, based on LFA manufacturer’s guideline. This involved training with 618 data, validation with 62 data, and testing with 96 data. The results yielded a sensitivity of 96.9%, specificity of 98.4%, and accuracy of 97.9% (Fig. 4a). In Fig. 4b, the AUC value from the ROC curve was 0.99, and the TIMESAVER demonstrated high accuracy within a 2-min diagnostic timeframe. TIMESAVER showed some false signals at lower concentrations (Fig. 4c), which appear to be more a limitation of the LFA rather than the algorithm. These results confirm the effectiveness of our algorithm in achieving multi-classification within just 2 min of testing, underscoring its utility in rapid diagnostic scenarios.

Fig. 4: Evaluation of non-infectious biomarkers (Troponin I and hCG) using the TIMESAVER Model.
figure 4

ac Troponin I assay: a The sensitivity, specificity, and accuracy for the detection of Troponin I were 96.9%, 98.4%, and 97.9%, respectively. b The AUC from the ROC curve was 0.99. c The confusion matrix between ground truth (y-axis) and predicted label (x-axis). df hCG test: d hCG detection achieved sensitivities of 97.5%, specificities of 95%, and an accuracy of 96.7% from 60 test data. e The ROC curve produced an AUC of 0.95. f. The confusion matrix. gi Evaluating the feasibility of a 1-min assay with hCG self-tests from 94 test data: g With a 1-min assay with TIMESAVER sensitivities of 90.6%, specificities of 93.3%, and an overall accuracy of 91.5% were achieved. h The accuracy with TIMESAVER at 1 min surpassed the accuracy of five experts measuring at 5 min. i The confusion matrix illustrates that false positives and negatives were predominantly associated with lower concentrations. TIMESAVER, Time-Efficient Immunoassay with Smart AI-based Verification; hCG, human chorionic gonadotropin; ROC, receiver operating character; AUC, area under curve; LFA, lateral flow assay.

In emergency room settings, rapid diagnosis of hCG is essential, particularly for assessing pregnancy in patients. (Fig. 4d) demonstrates the sensitivity, specificity, and accuracy for hCG detection within 2 min, using test data (n = 60). The results revealed that the sensitivity, specificity, and accuracy for hCG were 97.5%, 95.0%, and 96.7%, respectively. The AUC value derived from the ROC curve was 0.95 (Fig. 4e), and the confusion matrix (Fig. 4f) suggests the effective performance of the classifier, even when applied in a 2-min assay utilizing the TIMESAVER model.

We aimed to assess the feasibility of achieving the assay within 1 min using commercially available diagnostic tests (Fig. 4g–i). Generally, hCG self-tests exhibit rapid flow velocity, and signal readings are typically recommended after a 5-min wait according to the manufacturer’s guidelines. In our primary training data (n = 594), initially trained for COVID-19, we incorporated an additional hCG dataset (n = 24), resulting in a total training set of 618 data (Fig. 4g). The hCG dataset consisted of 30 images captured at 2-s intervals. We then used 12 images taken between 36 and 60 s. The test dataset consisted of 94 standard data. Even with a 1-min assay facilitated by TIMESAVER, we achieved a sensitivity of 90.6%, specificity of 93.3%, and an overall accuracy of 91.5%. The sensitivity, specificity, and overall accuracy of five human experts at 5 min were 90.9%, 87.3%, and 89.8%, respectively. In Fig. 4h, we observed that the accuracy with TIMESAVER at 1 min surpassed the accuracy of five experts at 5 min. As anticipated, false positives and false negatives of TIMESAVER at 1 min were primarily associated with lower concentrations (15 mIU), particularly those near the cutoff threshold (Fig. 4i).

Blind tests using clinical samples

Figure 5 illustrates the clinical evaluation of COVID-19 through blind tests. We assessed the blind tests from three different groups: untrained individuals, human experts, and TIMESAVER, utilizing 252 test data (156 positives and 96 negatives). Clinical samples were collected from COVID-19 patients at Seoul St. Mary’s Hospital, including SARS-CoV-2 patients (n = 52) and healthy controls (n = 32). The 252 test data come from the three different rapid kit tests performed on COVID-19 patients (n = 84). This information encompassed sample collection details, variants, sex, ages, and Ct values (Supplementary Tables 4, 5). All samples underwent RT-qPCR analysis, followed by the LFA assay. The data from the LFA assay were classified into five groups: high/middle/middle-low/low titer, and negative control, using a color chart level (high with levels 8–7, middle with levels 6–5, middle-low with levels 4–3, and low titer with levels 2–1 for positive, and negative with level 0). Among the positive data (n = 156), we distributed the data across four groups (high: 30, middle: 48, mid-low: 39, low: 39). We also included negative data from healthy controls (n = 96).

Fig. 5: Clinical validation via blind tests.
figure 5

a In the blind test, ten untrained individuals and ten human experts assessed 252 data, including various concentration levels and negative data from COVID-19 clinical samples. b, c Blind test results from a 15-min assay and a 2-min assay demonstrate significant fast assay with TIMESAVER (n = 3): b TIMESAVER with 2-min assay achieved higher accuracy (80.6%) compared to human experts (78.1%) and untrained individuals (70.7%) in the 15-min assay. c In the 2-min assay, TIMESAVER maintained high accuracy (up to 80.6%), while untrained individuals and human experts experienced lower accuracy rates (59.4% and 64.6% respectively). d Additional clinical data improved AUC (0.80) compared to the standard dataset alone (0.76). e With a 2-min assay, TIMESAVER outperformed human experts with 15-min assay, achieving an accuracy rate of 80.6% (compared to 78.1%) (n = 3). f The heat map shows that in the mid-low titer range, TIMESAVER demonstrated an accuracy rate of 84.6%, surpassing both untrained individuals (29.2%) and human experts (37.2%). Error bars represent standard deviation from the mean. TIMESAVER, Time-Efficient Immunoassay with Smart AI-based Verification; hCG, human chorionic gonadotropin; ROC, receiver operating character; AUC, area under curve; LFA, lateral flow assay.

For the blind test, ten untrained individuals and ten human experts each tested 252 data, including 30 high, 48 middle, 39 middle-low, 39 low, and 96 negative data. This resulted in a total of 5040 blind tests for both untrained individuals and human experts. As shown in Fig. 5a, the colorimetric assay results were captured using a custom-made charge-coupled device (CCD) camera, or potentially a smartphone camera, displaying clear positive images in high and middle concentrations. However, below the mid-low concentration, no distinct positive signal could be captured. Interestingly, the assay conducted within 2 min exhibited a larger background signal, which hindered the clear observation of the colorimetric signal by the naked eye.

We presented the results of blind tests using images from a 15-min assay (Fig. 5b) followed by a 2-min assay (Fig. 5c), involving both untrained individuals and human experts, as well as the TIMESAVER algorithm, which demonstrated a notable reduction in assay time. The 15-min assay shown in Fig. 5b was conducted following the manufacturer’s guidelines for conventional assays. In these 15-min assay images, untrained individuals reached an accuracy rate of 70.7%, while human experts attained 78.1%. The lower accuracy compared to the manufacturer’s claim of >90% sensitivity and >99% specificity can be attributed to our inclusion of a substantial number of lower titer data. Nevertheless, the TIMESAVER model surpassed both human experts and untrained individuals in performance, achieving a higher accuracy of 80.6% even in a shortened 2-min assay.

When the assay time was reduced to 2 min (Fig. 5c), identifying clear positive signals for mid-low concentrations became problematic for the naked eye, and the reddish background often led to more false positives. As a result, the accuracy rates for untrained individuals and human experts fell to 59.4% and 64.6%, respectively. In contrast, the TIMESAVER algorithm maintained a high accuracy of 80.6% in the 2-min assay. While the accuracy of human interpretation significantly decreased at lower concentrations (lower viral load), indicating a tendency for human error in rapid assessments, the AI-driven TIMESAVER algorithm showed greater precision, effectively handling background noise and unclear colorimetric signals. This allowed for fast assays with improved accuracy, showcasing the potential of AI in enhancing rapid diagnostic techniques.

Figure 5d displays the influence of clinical training data on ROC curves. Initially, we present ROC curves trained with a standard dataset (n = 594, shown in blue and labeled as ‘standard only’). We then demonstrate improved ROC curves achieved after additional training with clinical data (n = 694, shown in red and labeled as ‘standard and clinical’). The ROC curve is a widely used tool for assessing the clinical effectiveness of diagnostic models. The AUC with the inclusion of clinical data (0.80) exceeded that with the standard dataset alone (0.76). Although the TIMESAVER algorithm with a 2-min assay might not entirely match the accuracy standards of clinical laboratories, its ability to continuously improve diagnostic accuracy through learning from acquired images is notable. By further incorporating deep learning with clinical samples, we can enhance the clinical accuracy of our diagnostic approach.

We demonstrate the capability of TIMESAVER to achieve accuracy levels comparable to those of human experts in the shortest possible time frame (Fig. 5e). We initiated the assay timer when the sample was introduced into the sample reservoir, capturing sequential images over time. We established five distinct datasets, each representing varying assay durations (0.5, 1, 2, 3, and 4 min). For example, in the case of a 1-min assay, we obtained 6 images with 10-s intervals. Our TIMESAVER model demonstrated that it requires only 1 min to attain accuracy equivalent to that achieved by untrained individuals. With a 2-min assay, we achieved an accuracy rate of 80.6%, surpassing the accuracy of human experts at the 15-min mark (78.1%). As depicted in Fig. S3, the samples reached the test line within 1 min, enabling the AI to precisely ascertain the assay results during the initial color development phase. In comparison to conventional human-conducted assays, where TIMESAVER completes the assay in just 2 min, it consistently outperforms human experts in terms of accuracy.

The heat map indicates that human visual assessment, conducted by both untrained individuals and experts, shows a decrease in accuracy, particularly within the mid-low titer ranges (Fig. 5f). In the mid-low titer category, untrained individuals managed an average accuracy of only 29.2%, while human experts fared slightly better at 37.2%. In contrast, our algorithm achieved an accuracy rate of 84.6%. For the low titer category, the accuracy was even lower, with untrained individuals at 2.8% and human experts at 5.4%, but our deep learning algorithm significantly outperformed at 38.5% accuracy. In cases of high and middle titer concentrations, the TIMESAVER algorithm consistently provided reliable and accurate data, effectively eliminating the variability seen in human visual assessments.