Deep Learning–Driven Multi-Omic Mapping Reveals Hidden Links Between Eye Imaging, Heart Health, And Brain Traits

Ethics statement

The UKB study received approval from the North-West Research Ethics Committee (reference number 06/MRE08/65), following the Declaration of Helsinki guidelines. Every participant provided written informed consent and retained the right to withdraw from the study at any time⁴⁷. This project utilized data from UKB under authorized project numbers 53144, 49978, and 11350. All retinal images included in this paper are shared with UKB’s explicit permission.

Cohort characteristics

UKB is a large-scale prospective population-based study involving 502,355 participants. Initial assessments were conducted at 22 different study centers between January 2006 and October 2010. Participants underwent comprehensive evaluations covering demographics, lifestyle, and clinical measurements; they also provided DNA samples through blood tests and underwent various physical assessments. A notable subset of these volunteers completed an ophthalmic evaluation (23%, n = 117,279). Among them, 67,664 had spectral domain OCT scans between 2006 and 2010 (referred to as ‘instance 0’), while an additional 17,090 were imaged for the first time during a repeat assessment period from 2012 to 2013 (referred to as ‘instance 1’)⁴⁸. The UKB data is categorized into data fields and categories, with specific category and field ID numbers provided in brackets to ensure reproducibility. To compare the characteristics of participants with and without imaging, statistical tests such as the Kruskal–Wallis test and χ² test were applied.

Imaging quality control

OCT scans deemed to be of low quality were excluded if their image quality score fell below 40 (reference ¹⁵). The scoring system used was the default one provided by the Topcon OCT device. Color fundus photographs (CFPs) were excluded if they received a ‘reject’ quality grade according to a deep learning algorithm called ‘Automorph’; only ‘usable’ and ‘good’ images were kept⁴⁹. Automorph was developed and trained using the EyePACS-Q dataset, where expert graders evaluated images based on illumination, presence of artifacts, and the ability to diagnose ocular diseases; complete details can be found in the cited publication.

Image preprocessing

For OCT scans, proprietary FDS files were converted into a readable format using the oct-converter Python package⁵⁰. The 64th OCT slice was selected for further analysis. CFPs were processed through cropping and segmentation using Automorph, and the resulting binary images of the blood vessels were used to fine-tune the autoencoder’s loss function described later. Imaging data was divided into training, testing, and validation sets in a ratio close to 70:15:15 for deep learning model development. Participants’ images were grouped so that each person’s paired scans fell within the same dataset split.

Autoencoder model architecture and training

Image transformations

During the training of the autoencoder, several preprocessing transformations were applied to both OCTs and CFPs. To minimize computational load, all images were resized to 224 × 224 pixels using bicubic interpolation. To enhance the model’s ability to generalize, additional transformations were included during training, such as random horizontal flips, random rotations within a range of 0–15 degrees, and random adjustments to brightness, contrast, and saturation (each ranging from 0.9 to 1.1).

Model architecture

Our adversarial autoencoder (AAE) was designed to generate compressed representations of retinal images (both CFPs and OCT scans). The structure of the AAE aligns its aggregated posterior distribution, q(z), with a chosen prior distribution, p(z). We adopted a Gaussian prior, N(0,1), resulting in latent vectors (z) that closely approximate a multivariate Gaussian distribution⁵¹. This characteristic supports many downstream analyses which assume normally distributed variables. Although the training of our AAE did not differentiate between left and right eyes, distinct AAEs were trained separately for OCTs and CFPs. The encoder comprised multiple layers, each containing a 2D convolution, batch normalization, Leaky ReLU activation, a multi-scale residual block (with kernel sizes of 3 and 5), and an efficient multi-scale attention module^52,53. At the bottleneck, a linear layer projected the image into a 256-dimensional vector. The decoder mirrored this setup, starting with a linear layer to upsample the vector, followed by transposed convolutions, batch normalization, Leaky ReLU, multi-scale residual blocks, and attention modules^52,53. The adversarial discriminator consisted of three linear layers with Leaky ReLU activations.

Loss functions

For training both the OCT-AAE and CFP-AAE, we employed mean absolute error (commonly known as L1 loss) along with adversarial loss. Due to the generally low contrast and narrow background color range of CFPs, capturing fine vascular details can be challenging. Early versions of our model struggled to reconstruct blood vessel structures adequately because they made up only a small portion of the overall pixel count and thus had minimal impact on the total L1 loss (see Supplementary Fig. 90 for an example). Using Automorph, we processed the CFP images to create binary masks highlighting the vascular regions⁴⁹. On average, blood vessels accounted for just 4.07% of all pixels in CFPs. As a solution, we adjusted the L1 loss to give 12.29 times greater importance to pixels containing blood vessels, enabling these structures to contribute approximately half of the L1 Loss. Additionally, we incorporated a perceptual loss function tailored for the CFP-AAE. Research indicates that the 4th ReLU activation of VGG16 (without batch normalization) delivers superior results in autoencoding tasks⁵⁴. Therefore, we chose this configuration for our perceptual loss computation. Both original and reconstructed images were normalized using ImageNet parameters before being fed into VGG16 (without batch normalization)⁵⁵. The perceptual loss was then computed as the mean squared error between the feature maps of the original and reconstructed images. To further standardize the latent vectors, we implemented an adversarial loss component. This was achieved by training a discriminator network to differentiate between the latent vectors produced by the AAE and those from a standard normal distribution. To stabilize training, we applied one-sided label smoothing⁵⁶. The relative contribution of each component to the total loss was optimized through a Bayesian hyperparameter sweep. Even with weighted loss, terminal arterioles and venules were not fully preserved during reconstruction. However, expanding the latent space dimensionality (e.g., testing 128-dimensions yielded SSIM = 0.876 vs. 256-dimensions = 0.891) showed improvement in SSIM scores. Despite this trend, we chose not to further increase latent space size to maintain the statistical and computational efficiency needed for subsequent analyses.

Image quality metrics

To evaluate image quality, we used Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index (SSIM), and L1 loss⁵⁷.

Hyperparameter search

A Bayesian hyperparameter search was carried out using the Weights and Biases platform (version 0.19.8)⁵⁸, with the SSIM metric on the test dataset serving as the objective function for the optimization. For both the CFP and OCT AAEs, the sweep was used to tune the discriminator learning rate, the autoencoder learning rate, the number of convolutional layers, the optimizer weight decay, and the adversarial loss weighting. In the CFP sweep, the perceptual loss weighting was also included as an additional search parameter. Each hyperparameter configuration was evaluated over 35 training epochs. The best-performing hyperparameters for the CFP-AAE consisted of 5 convolutional layers, a weight decay of 0.01 (using the AdamW optimizer), an autoencoder learning rate of 0.0005, a discriminator learning rate of 0.0001, a perceptual loss weight of 0.2, and a discriminator loss weight of 0.001⁵⁹. For the OCT-AAE, the optimal configuration comprised 4 convolutional layers, a weight decay of 0.001 (AdamW optimizer), an autoencoder learning rate of 0.0001, a discriminator learning rate of 0.00005, and a discriminator loss weight of 0.0001.

Additional AAE experimental details

All experiments were implemented in PyTorch⁶⁰. A learning rate warm-up schedule was applied over the first 15 epochs. The batch size was set to 32, and a random seed of 42 was used for reproducibility. The CFP model was trained for 400 epochs, while the OCT model was trained for 250 epochs. For sensitivity analysis, a second OCT and CFP model was trained with a latent dimension of 128, using the same hyperparameters as the main analysis. Training was performed on 4 Tesla T4 graphics processing units (GPUs). Two ophthalmologists (T.H.J. and P.I.S.) reviewed randomly selected reconstructed images to confirm that critical anatomical structures were faithfully preserved.

OCT- and CFP-AAE phenotypic association tests

Latent vectors derived from the left eye (both OCT and CFP modalities) of UKB study participants were used to reduce the dimensionality of downstream analyses. Consistent with prior studies, the left eye was chosen because its images are consistently of higher quality than those of the right eye^11,15. The total number of participants varied depending on the specific association test and is reported in the text where feasible, or in the supplementary materials where participant counts differed by phenotype. Both instance 0 and instance 1 ophthalmic images were included in these analyses (unless otherwise specified), and for participants imaged at both time points, only the first set of images was used.

Pearson correlation analyses

The ‘positive control’ Pearson correlation analyses encompassed all traits within the UKB eye measures category (category 100013), excluding right eye measures.

The experimental analyses included a combination of anatomical and functional cardiovascular and neurological traits. Specifically, all traits from the following UKB categories were included: brain MRI (category 1014), cognitive function (category 1005), heart MRI (category 1015), carotid ultrasound (category 101), and selected traits from the physical measures category (category 1006). This yielded a total of 50 cardiovascular traits and 892 neurological traits. The large number of neurological traits reflects the extensive set of derived neuroimaging features available in the UKB. To improve interpretability across thematic groupings in plots, HDBSCAN clustering was applied⁶¹. Traits that overlapped across a large number of study participants were grouped into clusters, while the remaining traits were assigned to a ‘not clustered’ category—though they were still included in subsequent analyses. Data were scaled using the interquartile range via ‘RobustScaler’ in SciKit Learn⁶². The clusterable features comprised 35 ophthalmic traits (n = 62,729), 889 neurological features (n = 12,219), and 37 cardiovascular traits (n = 11,295). A leaf clustering approach was used. Given the smaller number of cardiovascular and ophthalmic traits, a minimum cluster size of 3 was set for these, while a minimum cluster size of 10 was used for neurological traits due to their greater number.

For each analysis, outlier values for cardiovascular, neurological, or ophthalmic measures were excluded—specifically, data points exceeding 2.5 standard deviations from the mean. Pearson correlation coefficients and associated P values were then computed using the SciPy Python package⁶³.

Cox proportional hazards and Welch’s t-test analyses

Disease outcomes were obtained from the ‘First Occurrences’ UKB catalog (category 1712). The UKB first occurrences were generated by mapping Read code information from ‘Primary Care data’ (category 3000), ICD-9 and ICD-10 codes from ‘Hospital inpatient data’ (category 2000), ICD-10 codes from ‘Death Register records’ (fields 40001 and 40002), and ‘self-reported medical condition codes’ (field 20002) reported at baseline or subsequent UKB assessment center visits. The UKB estimates that hospital inpatient and death data are largely complete through 31 May 2022, beyond which data may be incomplete⁶⁴. Accordingly, 31 May 2022 was used as the censoring date for all time-to-event analyses. For the Cox proportional hazards models, participants who had experienced a given disease event prior to retinal imaging were excluded. For the t-test, the goal was to assess whether latent features differed significantly based on the presence or absence of disease at the time of imaging, and therefore participants with disease events at the time of scan were included. Disease outcomes comprised (1) a positive control set of diseases identifiable on retinal imaging as determined by two ophthalmologists (T.H.J. and P.I.S.), including disorders of the retina, choroid, optic nerve, and lens (19 diseases); (2) all available neurodegenerative diseases (10 diseases); and (3) all available cardiovascular disorders (23 diseases). The full list of diseases examined is provided in Supplementary Table 31.

Cox proportional hazards modeling was performed using the lifelines Python package⁶⁵. The primary outcome of interest was the association between left eye Ret-AAE embeddings and time-to-event data for ophthalmic, cardiovascular, and neurodegenerative diseases. Covariates included genetic sex and age at the time of scan. t-tests were conducted using the SciPy package in Python⁶³. Welch’s t-test was selected over the standard Student’s t-test, as it is more robust when the two populations have unequal sample sizes or variances⁶⁶. Results with fewer than 30 samples per group were excluded from the final analysis in both approaches. Baseline associations between Ret-AAE embeddings and Huntington’s disease, spinal muscular atrophy and related syndromes, Alzheimer’s disease, and vascular dementia could not be tested due to insufficient cases (<30) at baseline imaging. Huntington's disease was additionally excluded from Cox proportional hazards analysis for the same reason.

CCA

Canonical correlation analysis (CCA) was employed to investigate the relationship between CFP–OCT-derived embeddings and 251 metabolic biomarkers measured by Nightingale Health (category 220)⁶⁷. Because of participant overlap between instance 0 and instance 1 metabolic sampling, only instance 0 data were used.

In this analysis, participants lacking imaging data or metabolite samples were excluded. To minimize the impact of extreme outliers, Winsorization was applied to the top and bottom 1% of values⁶⁸. The analysis employed the SciKit-learn CCA model, with both embeddings and metabolomic data scaled using the CCA’s built-in scaler. Statistical significance of each canonical variate was evaluated via permutation testing: the metabolomic matrix was randomly shuffled 1,000 times while keeping the embedding matrices fixed, generating a null distribution of correlations^69,70. The P value was derived by comparing the original dataset’s average CCA correlation against this null distribution. The number of modes tested was selected by examining a Scree plot of the first 10 modes and truncating at the point where the regression coefficient dropped substantially.

SPLS

Alongside CCA, we applied a canonical SPLS (Sparse Partial Least Squares) analysis using the mixOmics R package (version 6.32.0)⁷¹. Unlike standard partial least squares, SPLS selects only the most relevant features based on sparsity constraints—making it better suited for highly correlated data like metabolomics. This approach served as a robust validation of our CCA findings. Data preprocessing mirrored that used for CCA. Lasso penalization parameters were tuned via fivefold cross-validation repeated five times, optimizing for maximum correlation between predicted and observed components. For CFP SPLS, we used 2 components: 30 embedding variables and 25 metabolites in component 1, and 10 embedding variables and 10 metabolites in component 2. OCT SPLS used 1 component with 10 embedding variables and 10 metabolite variables. All SPLS analyses were run in canonical mode, and associated visualizations were generated using mixOmics.

Genetic analyses

Choice of GWAS method

We selected REGENIE version 4.1 for genome-wide association studies (GWAS)⁷². Step 1 utilized UK Biobank microarray data (field 22418), while Step 2 used imputed genetic data (field 22828).

Phenotypes and covariate data

The phenotypes analyzed were latent features extracted from left-eye OCT and color fundus photography (CFP) images for each participant. This resulted in 256 GWAS per participant per imaging modality (512 total GWAS per participant). Covariates included sex (field 31), age at scan date (field 22006), and spherical equivalent refractive error. The latter was calculated using the most reliable autorefraction measurement (field 5276): spherical equivalent = spherical error (field 5085) + 0.5 × cylindrical error (field 5086)¹⁵.

Quality control for genetic analyses

Discovery cohort analyses included participants from instance 0 (initial assessment, 2006–2010); replication cohort analyses used those from instance 1 (first repeat visit, 2012–2013). Overlapping individuals were excluded to ensure independence. Only participants of genetically confirmed European ancestry—determined via principal components (field 22006)—were included.

Participant-level quality control excluded individuals with mismatched genetic and reported sex (fields 22001 and 31), sex chromosome aneuploidy (field 22019), close genetic relatives in UKB (field 2202), extreme heterozygosity or missingness outliers (field 22027), or >10% missing genotype data⁷³.

Variant-level filtering was performed using plink2⁷⁴. For microarray data (REGENIE Step 1), criteria included: minor allele frequency ≥ 0.01, minor allele count ≥ 20, ≤10% missingness, and Hardy–Weinberg equilibrium P ≥ 1 × 10⁻¹⁵. For imputed data (Step 2), additional filters included: INFO score ≥ 0.8 post-GWAS, removal of duplicate variants, and Hardy–Weinberg P ≥ 1 × 10⁻⁶.

GCTA COJO

We refined significant associations using GCTA-COJO¹². Variants on different chromosomes or separated by more than 1,500 kb were assumed independent; otherwise, default GCTA-COJO settings were applied.

Gene set analysis

Gene set enrichment was conducted with MAGMA²⁴, using canonical pathway collections from MSigDB⁷⁵: BioCarta (292 sets), KEGG MEDICUS (658), PID (196), Reactome (1,787), WikiPathways (885), and legacy KEGG (186). Variants were mapped to genes using dbSNP v135 and NCBI 37.3 annotations. Linkage disequilibrium (LD) reference data came from the European subset of the 1000 Genomes Project⁷⁶.

Global genetic correlation analysis

We assessed genome-wide genetic correlations between OCT/CFP latent features and cardiovascular or neurological diseases using LD score regression (LDSC)⁷⁷. Summary statistics for these diseases were obtained from public GWAS studies in GRCh37 that included necessary metadata^78–83. Only latent features with mean χ² > 1.02 (indicating polygenicity) were included. Following established protocols⁷⁷, variants were filtered for presence in HapMap3, MAF > 0.01, INFO > 0.9, and exclusion of strand-ambiguous or duplicate SNPs. LD scores were based on the 1000 Genomes European panel⁷⁶. The LDSC intercept was left unconstrained. Heritability estimates for polygenic latent features were also derived via LDSC, though without formal significance testing.

LDSC results are not provided here, as this method was not part of the analysis pipeline.

Grad-CAM

To make our learned embeddings easier to interpret, we used Grad-CAM⁸⁴. In line with earlier studies, we found that visualizations from the earliest convolutional layers lacked meaningful semantic content, while those from the deepest layers were too coarse to pinpoint specific image features⁸⁵. Therefore, we centered our main analyses on the third convolutional layer for both CFP and OCT imaging modalities, as it offered the best trade-off between spatial detail and semantic richness. When Grad-CAM did not produce a clear or interpretable map, we switched to Layer-CAM or examined nearby convolutional layers instead; such cases are explicitly noted in the corresponding figure legends⁸⁶. Saliency maps are shown only for embeddings discussed in the text. Whenever feasible, we describe embeddings in terms of their anatomical location to aid understanding. However, due to the abstract nature of embeddings, this is not always possible.

Latent feature traversals

Beyond Grad-CAM, we conducted systematic latent feature traversals using the trained AAE⁸⁷. For each latent dimension (embedding) z_i, we first selected a representative reference image whose value in that dimension was closest to the mean (approximately zero). We then varied only z_i by ±4 standard deviations while keeping all other dimensions fixed. These modified latent vectors were passed through the AAE decoder to generate reconstructed images, allowing us to visualize how isolated changes along each latent axis affect the output.

Because the resulting changes can be subtle, we supplemented visual inspection with quantitative difference maps. For CFPs (three-channel RGB images), we calculated the average change in pixel intensity per color channel compared to the reference reconstruction. For OCT images (single-channel), we computed the absolute difference in intensity at each pixel. These maps highlight the image regions most responsive to changes in each latent dimension and help clarify what visual features the model has learned to encode.

A key strength of autoencoding is its ability to detect image patterns that may not be obvious to human observers. As a result, many latent traversals produce diffuse, hard-to-interpret changes in pixel intensity with limited biological or clinical meaning. Only those figures containing human-interpretable features are included in the supplementary material (Supplementary Figs. 14–68).

Annotations for both latent traversals and Grad-CAM saliency maps were independently reviewed and confirmed by two ophthalmologists (T.H.J. and P.I.S.).

Controlling for type 1 error risk

We applied family-wise error rate control using Bonferroni correction to manage the risk of false positives. To define what constitutes a “family” of tests, we examined whether autoencoder embeddings were strongly correlated with one another. As shown in Supplementary Figs. 91 and 92, the embeddings exhibited minimal meaningful correlation. Therefore, each embedding was treated as its own independent family, and multiple testing correction was applied based on the number of tests performed within each embedding. Importantly, this is not a study-wide correction—we did not adjust across all embeddings, as doing so would be overly conservative and lead to excessive type 2 errors, especially given the large scale and hypothesis-free nature of this work. Instead, we evaluate our findings in light of convergent evidence across multiple analyses, report raw P values to support reader interpretation, assess results based on biological plausibility, and, where possible, validate key findings in an independent UK Biobank cohort. While this approach provides a more balanced and nuanced interpretation, the risk of type 1 error remains a consideration, and the main findings would benefit from external experimental validation when suitable multi-omic datasets become available. Bonferroni-corrected significance thresholds for all other analyses are listed in the relevant supplementary table legends.

Statistics and reproducibility

All methods are detailed in the appropriate sections of the paper. Sample sizes varied depending on the analysis and are reported alongside each result where feasible, or otherwise provided in the Supplementary Data. No formal power calculations were used to predetermine sample sizes; instead, all available data that met analysis-specific quality control criteria were included. All analyses were carried out using reproducible computational pipelines, as outlined in the Methods section.

Reporting summary

Additional details about the research design can be found in the Nature Portfolio Reporting Summary associated with this article.

Top Posts

Architecting Digital Sovereignty: Cloud-Native Patterns Beyond Data Residency

Navigating the AI Landscape: Your Guide to Selecting the Perfect Smart Home Security Platform

Kawasaki Robotics Unveils RL030N Physical AI Platform at Automate

Deep Learning–Driven Multi-Omic Mapping Reveals Hidden Links Between Eye Imaging, Heart Health, and Brain Traits

Hermes Agent Adds Asynchronous Subagents, So Delegated Work No Longer Blocks the Parent Chat

The Protocol That Transformed Our Agent Architecture

3 Clever Pandas Tricks to Supercharge Your Data Cleaning & Preparation

A generalizable Hi-C foundation model for chromatin architecture, single-cell and multiomics analysis across species

Flash-KMeans: The IO-Aware, Exact K-Meams That Outpaces FAISS by 200× on GPUs

Mathematical String Probability: A Human-Powered Solution to the 3Blue1Brown Challenge

Architecting Digital Sovereignty: Cloud-Native Patterns Beyond Data Residency

Navigating the AI Landscape: Your Guide to Selecting the Perfect Smart Home Security Platform

Kawasaki Robotics Unveils RL030N Physical AI Platform at Automate

Becoming an LLM Engineer in 2026: Your Complete Roadmap

Deep Learning–Driven Multi-Omic Mapping Reveals Hidden Links Between Eye Imaging, Heart Health, and Brain Traits

Cracking the Code on Crypto Costs, Crashing the Gates on Tokenized Access

UK Mandates ID or Facial Scans for Social Media Account Creation

AWS WAF Unlocks AI Bot Monetization: Charge AI Traffic for Content Access

Trending

Architecting Digital Sovereignty: Cloud-Native Patterns Beyond Data Residency

Navigating the AI Landscape: Your Guide to Selecting the Perfect Smart Home Security Platform

Latest Posts

Not More Data, but Better World Models – Unite.AI

OpenAI Is Hiring Head of Preparedness, Amid AI Cyberattack Fears

Subscribe to Updates

Top Posts

Deep Learning–Driven Multi-Omic Mapping Reveals Hidden Links Between Eye Imaging, Heart Health, and Brain Traits

Ethics statement

Cohort characteristics

Imaging quality control

Image preprocessing

Autoencoder model architecture and training

Image transformations

Model architecture

Loss functions

Image quality metrics

Hyperparameter search

Additional AAE experimental details

OCT- and CFP-AAE phenotypic association tests

Pearson correlation analyses

Cox proportional hazards and Welch’s t-test analyses

CCA

SPLS

Genetic analyses

Choice of GWAS method

Phenotypes and covariate data

Quality control for genetic analyses

GCTA COJO

Gene set analysis

Global genetic correlation analysis

Grad-CAM

Latent feature traversals

Controlling for type 1 error risk

Statistics and reproducibility

Reporting summary

Related Posts