This retrospective cohort study was approved by the Mass General Brigham Institutional Review Board. The requirement for written informed consent was waived, since the study involved minimal risk, used retrospective data collected during routine patient care, and did not affect how patients were treated. A large portion of the cohort had passed away, and all data access took place within secure, encrypted Mass General Brigham environments with restricted access.
Dataset
Facial images were collected from cancer patients receiving radiation therapy at Brigham and Women’s Hospital in Boston between 2012 and 2023. The original dataset included 2,763 patients who underwent multiple courses of radiation therapy during their cancer treatment and had corresponding facial photographs taken as part of the standard clinical workflow for identity verification at the start of each course. Through a step-by-step filtering process, as outlined in Supplementary Fig. 1, patients with more than one face detected per image, incomplete Electronic Health Records (EHR), and those aged 20 years or younger were excluded, resulting in a final cohort of 2,276 patients for analysis. For each patient, two facial images with the widest possible time gap were selected: one from the most recent radiation therapy course and one from the earliest course. This approach maximized the time window for calculating FAR.
Patients were further grouped for analyses based on the time interval between photographs: short-term (10–12 months), mid-term (366–730 days), and long-term (731–1,460 days) interval cohorts.
Sex information is self-reported. Race is self-reported using institutional categories with the option to decline.
FaceAge model description and performance metrics
We used the validated Foundation Artificial Intelligence Model for Health Recognition (FAHR-Face)25. FAhr-FaceAge is built on a Vision Transformer architecture pretrained through masked autoencoder self-supervised learning on the WebFace42M49 dataset, which contains over 40 million facial images. The model was then fine-tuned specifically for biological age estimation using a two-stage, age-balanced training strategy across 10 publicly available facial image datasets of presumed healthy individuals.
FAHR-FaceAge showed strong accuracy and generalizability, achieving a mean absolute error (MAE) of 5.1 years and a mean error (ME) of 0.2 years on the external public APPA-REAL50 dataset. Performance remained high across the full adult age range (20–100), which is most relevant for this current analysis of adults with cancer25.
FAR threshold selection
We took a structured, dual approach to define FAR thresholds, reflecting the markedly different levels of measurement variability observed across short-term (10–12 months), mid-term (366–730 days), and long-term (731–1,460 days) intervals.
First, for the short-term cohort, the dataset was randomly split into training (1/3) and test (2/3) subsets, and cutoff points ranging from −20 to 20 (in increments of 5) were evaluated. For each potential cutoff, log-rank tests were performed to compare survival outcomes between patients above and below the threshold. A FAR greater than 20 produced the strongest prognostic discrimination in the training data, as indicated by the lowest log-rank test p-value; this threshold was then validated in the test set, confirming its ability to distinguish patients with different survival outcomes.
Second, given the much lower variability in the long-term cohort (standard deviation 14.1 times smaller), FAR > 1 was chosen as the long-term threshold. This choice aligns with scaling the short-term threshold by the observed noise reduction factor; it also helps avoid inflated false positives in longer follow-up intervals where FAR measurements are more stable. A cutoff of FAR > 10 was selected for mid-term intervals to maintain a middle-ground threshold consistent with these variability patterns.
These thresholds were then applied to the corresponding cohorts. Kaplan–Meier survival curves were generated for the resulting groups, and a log-rank test was used to assess the statistical significance of survival differences.
Statistical analysis
Sex was considered during study design and included in both univariate and multivariate Cox regression models.
Cox proportional hazards regression analyses were performed to evaluate the association between FAR and survival outcomes.
For univariate analysis, Cox regression was conducted for each variable of interest, including high versus low FAR group, age at first photograph (in decades), time between photographs (in months), sex, race, cancer risk group, and diagnosis group. FAR was categorized for each time interval cohort using the data-driven approach described above: FAR > 20 versus ≤20 for short-term, FAR > 10 versus ≤10 for mid-term, and FAR > 1 versus ≤1 for long-term intervals. Hazard ratios (HR) with 95% confidence intervals (CI) and p-values were calculated for each variable.
To evaluate the independent prognostic value of FAR, multivariable Cox proportional hazards regression analyses were conducted, including all covariates that reached statistical significance in univariate analysis. Cancer risk group, derived from diagnosis, was excluded from multivariate models due to redundancy. Models were built with increasing levels of adjustment: FAR unadjusted, further adjusted for the time span between photographs, then additionally adjusted for sex, then race, and finally a fully adjusted model including cancer diagnosis at radiation course two. For each model, HR, 95% CI, and p-values for the FAR group variable were calculated.
All analyses were carried out using R version 4.3.1. Statistical significance was defined as P < 0.05.
Methods for generating hazard ratio contour plots
To visualize the combined effects of FAD and FAR on survival outcomes, adjusted hazard ratio contour plots were generated using Cox proportional hazards regression models in R (version 4.3.1). Analyses were conducted separately for the three cohorts with distinct time intervals between radiation therapy courses: short-term (10–12 months), mid-term (366–730 days), and long-term (731–1,460 days).
Covariates included in the analysis were age at the first photograph (in decades), time difference between photographs (in months), sex, race, and cancer diagnosis at radiation course two.
For each time interval, the dataset was filtered to include only patients whose time between photographs fell within the specified range. A Cox proportional hazards model was then fitted for each of the three interval cohorts:
$${text{Survival Time}} sim text{FAD}_{text{RT1}} + text{FAR} + text{Age}_{text{RT1}} + text{Time Difference} + text{Sex} + text{Race} + text{Diagnosis Group}_{text{RT2}}$$
This model estimated the log hazard ratios associated with FAD and FAR while adjusting for the covariates.
To construct the contour plots, a grid of FAD and FAR values covering the observed ranges in the data was generated for each time interval. For every combination of FAD and FAR within the grid, the log hazard ratio was predicted using the fitted Cox model, holding other covariates fixed at their median or reference levels (median age, median time difference, reference categories for sex, race, and diagnosis group). The predicted log hazard ratios were then exponentiated to obtain hazard ratios.
To ensure consistency across all contour plots, hazard ratio values from the prediction grids of each time interval were pooled, and the overall minimum and maximum hazard ratios were calculated. Hazard ratio breaks were defined at intervals of 0.2, adjusted to encompass the full range of observed hazard ratios. This allowed a uniform color scale to be applied across all plots, enabling direct comparisons between different time intervals.
Methods for Harrell’s C-index analysis
To compare the prognostic performance of FAR versus single time-point FAD measurements, Harrell’s C-index was calculated for each metric across different time intervals. Analyses were performed using R (version 4.3.1) with the survival package. For each time interval cohort, three predictors were evaluated: FADRT1, FADRT2, and FAR. Models were assessed both unadjusted and with full adjustment for clinical covariates (time span between photographs, sex, race, and cancer diagnosis at the second radiation therapy).
Median follow-up time
Median follow-up time was calculated using the reverse Kaplan–Meier method51.
Reporting summary
Additional information on study design is available in the Nature Portfolio Reporting Summary linked to this article.



