The efficiency of the mannequin is evaluated on these 12 sufferers based on the beforehand talked about analysis metrics.
Mannequin analysis
Desk 6 presents all outcomes of every affected person’s mannequin. The errors of all 12 fashions are visualized in Fig. 7 to ease their comparability.
RMSE & MAE of the proposed mannequin for all sufferers.
The mannequin surpasses the findings reported within the Literature Evaluation part by demonstrating superior proficiency in forecasting glucose ranges. The RMSE values ranged between 15.96 and 21.57 with margins of error between 0.03 and 0.45. Amongst all sufferers, ID 570 achieved the bottom error (15.96), whereas ID 584 recorded the very best (21.57), reflecting variability in particular person glucose dynamics. On common, the mannequin achieved an RMSE of (19.24 pm 2.07) and an MAE of (13.64 pm 1.40). When contemplating confidence intervals throughout sufferers, RMSE was 21.88 (95% CI: 19.61–24.15) and MAE was 15.29 (95% CI: 13.98–16.59), confirming the soundness of efficiency estimates. Past error-based metrics, the mannequin attained a median R(^{2}) of (0.894 pm 0.044) (95% CI: 0.863–0.925), displaying that just about 90% of the variance in glucose trajectories was defined by the predictions. Equally, the typical MCC was (0.806 pm 0.049) (95% CI: 0.772–0.840), demonstrating balanced classification efficiency throughout hypoglycemia, normoglycemia, and hyperglycemia. These findings affirm not solely excessive predictive accuracy but in addition robustness and consistency throughout sufferers, reinforcing the scientific reliability of the proposed framework.
Though becoming a person mannequin to every affected person will be computationally costly, it allowed personalised forecasting. This strategy takes every the distinctive sample of glucose readings into consideration which ends up in tailor-made predictions for particular person sufferers. Apart from, RNN structure is discovered to be the right mannequin for the issue that forecasting glucose stage forecasting doesn’t require lengthy historic information, and one hour is taken into account a suitable window and generally utilized in literature29.
Moreover, To evaluate the scientific relevance of the proposed forecasting mannequin, the Clarke Error Grid (CEG) evaluation is carried out and visualized in Fig. 8. This reveals that 86.24% of the predictions fell inside Zone A, indicating correct and clinically acceptable readings, whereas 12.23% are in Zone B, representing benign errors with no important affect on scientific selections. Just one.53% of the predictions falls into Zone C, which can result in pointless remedy however doesn’t pose vital danger. Notably, no predictions are present in Zones D or E, that are related to harmful or doubtlessly life-threatening errors. These findings affirm the scientific utility and security of the proposed mannequin.

The clark grid error of the expected values.
Along with the CEG evaluation, class-specific efficiency metrics are computed for hypoglycemia, normoglycemia, and hyperglycemia ranges. Desk 7 presents the precision, recall, and F1-score for every class. The mannequin demonstrates robust efficiency in detecting hypoglycemic occasions, attaining a precision of 0.776, a recall of 0.859, and an F1-score of 0.912, underscoring its scientific utility in managing low glucose episodes. For normoglycemia, whereas the recall is excessive (0.953), indicating that almost all of normoglycemic values are accurately recognized, the decrease precision (0.493) suggests some overlap with adjoining courses. Hyperglycemia detection confirmed a balanced efficiency with a precision of 0.703, recall of 0.903, and F1-score of 0.829. These outcomes affirm that the mannequin not solely offers clinically protected predictions, as supported by the CEG, but in addition displays robust discriminative capacity throughout glycemic states, significantly in vital hypoglycemic ranges.
Mannequin explainability
To clinically interpret the proposed forecasting framework, explainable AI methods are employed. Particularly, SHAP (SHapley Additive exPlanations) values are computed to evaluate the contribution of every engineered characteristic to mannequin predictions. Desk 8 summarizes the imply absolute SHAP values throughout the dataset. The common glucose over the previous hour (avg) emerged as probably the most influential characteristic, adopted by development and time-of-day–associated options. This rating is in step with the characteristic significance derived from the choice tree evaluation (Desk 4), thereby linking engineered characteristic relevance to the interior conduct of the RNN.
In parallel, the temporal focus of the RNN is investigated utilizing an consideration mechanism. The eye weight heatmap in Fig. 9 illustrates the relative significance assigned to historic glucose readings throughout all samples. The outcomes point out that more moderen historical past steps (nearer to the prediction level) obtained better consideration in comparison with older steps, which aligns with scientific expectations that near-term values exert stronger affect on short-term glucose forecasts.

Consideration weight heatmap throughout samples and historic time steps. Newer historical past steps obtained increased weights.
By combining feature-level interpretability by means of SHAP with temporal interpretability by way of consideration visualization, the proposed framework offers complementary insights into mannequin conduct. This dual-perspective explainability improves transparency, builds scientific belief, and demonstrates that the mannequin captures each engineered characteristic relevance and short-term temporal dependencies successfully.
Comparability with comparable research
Though the proposed mannequin will be thought of as a easy and light-weight weight one, it outperforms different architectures both in accuracy or complexity as summarized in Desk 2.
Relating to21, it demonstrates a slightly decrease imply RMSE in comparison with the proposed work. However, the mannequin exhibited comparable efficiency with particular sufferers, akin to 563, 570, and 591, whereas surpassing the efficiency of21 with affected person 575, attaining an RMSE of 21.52 in comparison with their RMSE of twenty-two.7. Furthermore, regardless of methodology introduced in9 being restricted to simply six sufferers in comparison with the great work on all 12 sufferers on this examine, the RMSE values are remarkably the identical. Notably, the mannequin outperforms theirs within the case of sufferers 567 and 584, with RMSE values of 20.57 and 21.57, respectively, versus their RMSE values of twenty-two.76 and 22.22 for a similar sufferers. whereas within the context of8, their outcomes present the identical efficiency stage to the proposed mannequin, regardless of using an ensemble mannequin comprising LSTM, bidirectional LSTM, and a linear mannequin, which represents a considerably extra advanced structure than ours. Similar with14 which launched an intricate mixed mannequin of LSTM, WaveNet, and GRU, nevertheless, the mannequin demonstrates superior efficiency.
Past attaining aggressive forecasting accuracy, the scientific significance of the proposed mannequin lies in its capacity to anticipate vital glucose fluctuations with enough lead time for intervention. A 30-minute prediction horizon allows sufferers and caregivers to proactively modify insulin dosages, meal timing, or exercise ranges, doubtlessly stopping life-threatening hypoglycemic or hyperglycemic occasions.
Issues and Limitations
This examine was performed utilizing information from 12 sufferers extracted from a high-resolution CGM dataset. Whereas the dataset offered wealthy temporal and contextual data, the comparatively small pattern dimension presents limitations by way of generalizability. The chosen sufferers might not totally symbolize the range of glucose dynamics present in broader diabetic populations, together with life-style, comorbidities, and diabetes administration methods. Consequently, the present findings—although legitimate inside this managed cohort—might not totally seize the variability and complexity noticed in real-world settings. There’s a potential danger of sampling bias, the place the mannequin might inadvertently carry out higher for the particular information distribution of this group. Future work ought to deal with validating the framework utilizing bigger and extra heterogeneous populations, ideally by means of collaborations with scientific facilities or real-world deployments.
Actual-world software
The proposed glucose forecasting framework is designed with a transparent deal with real-world deployment, significantly as a part of a cloud-based infrastructure built-in with CGM units.
The system structure is illustrated in Fig.10, the place glucose readings are collected by way of wearable sensors and streamed to a cloud service. The cloud infrastructure incorporates MongoDB for storage, Apache Kafka for streaming, Spark for information processing, and PyTorch for mannequin coaching and inference. The expected outputs are then delivered by means of an software interface accessible by way of each cellular and net platforms. This end-to-end design ensures real-time processing and person suggestions, supporting early alerts for hypoglycemia and hyperglycemia. By integrating the forecasting mannequin with scalable cloud providers and patient-facing purposes, the framework demonstrates feasibility for real-world deployment and accessibility for each scientific and private diabetes administration.

System integration diagram illustrating glucose sensing, cloud storage/processing, and software interface.
Timing evaluation
To additional consider the computational effectivity of the proposed framework, a comparability was performed with various time-series layers together with LSTM, GRU, and a easy 1D CNN baseline. This evaluation was carried out to empirically exhibit that the RNN is probably the most light-weight structure for this software. All experiments have been run on a system geared up with an Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz, 16 GB RAM, SSD NVMe storage, and an NVIDIA GeForce GTX 1050 GPU. Desk 9 summarizes the coaching time, inference latency, and peak reminiscence utilization for every mannequin when educated on a single affected person dataset.
The outcomes affirm that the RNN requires considerably much less coaching time and GPU reminiscence in comparison with LSTM and GRU, whereas sustaining comparable inference latency. Though the CNN demonstrated comparable GPU reminiscence utilization to the RNN, it exhibited an extended coaching time. These findings assist the conclusion that the RNN is a computationally environment friendly and light-weight mannequin, making it well-suited for real-time glucose forecasting.



