Knowledge and research design
We analysed the constraints of current ML approaches for predicting molecular biomarkers (for instance, mutations, genomic instability indicators and protein expression) from H&E stained WSIs. A high-level idea diagram of those approaches is supplied in Fig. 1. We hypothesize that interdependencies amongst biomarker statuses and clinicopathological variables within the coaching knowledge, and the disregard of such associations throughout mannequin growth, bias ML fashions in direction of counting on aggregated influences of a number of elements in WSIs fairly than patterns linked to particular person biomarkers. As an example this, we retrospectively analysed n = 8,221 sufferers with breast most cancers (BRCA), colorectal most cancers (CRC), endometrial most cancers (UCEC) and lung most cancers throughout 4 cohorts for which WSIs and/or molecular info (for instance, receptor standing, gene mutations and so forth) have been out there (Strategies). These embrace: TCGA (n = 2,683), Molecular Taxonomy of Breast Most cancers Worldwide Consortium (METABRIC; n = 2,433)24,25, Memorial Sloan Kettering Most cancers Centre (MSK; n = 2,486)26,27 and DFCI (n = 619)28. Utilizing these datasets, we carried out the 4 main steps listed beneath:
-
(1)
An evaluation of the interdependency amongst biomarkers and somatic mutation standing of genes in samples;
-
(2)
Coaching deep studying fashions to foretell biomarker standing from WSIs;
-
(3)
Stratification evaluation and permutation testing to evaluate whether or not the mannequin educated to foretell a sure biomarker is biased by the standing of different biomarkers or clinicopathological variables;
-
(4)
An evaluation of the added worth of utilizing ML fashions in predicting numerous biomarkers over and above the pathologist-assigned grade.
a, The ML-based prediction of molecular biomarkers from WSIs entails utilizing coaching knowledge of WSIs with recognized biomarker statuses. The ML mannequin accepts the illustration of a WSI ((X)) as enter and predicts the standing of a sure biomarker ((Y)) because the goal. b, A really perfect predictor ought to have the ability to predict the standing of a molecular biomarker from histological results of that biomarker contained within the WSI, and its output (Z) needs to be unbiased of unrelated confounding elements (lumped right into a variable C) as proven within the simplified causal diagram. Conversely, if the predictor’s output relies not solely on the histological results of (left(Yright)) but in addition on different confounding elements (for instance, histological grade, TMB or standing of different biomarkers), then the prediction is confounded as a result of the mannequin is counting on these extra covariates fairly than solely on the consequences of ((Y)). Credit score: icons in a, Flaticon.com.
Drawing from established strategies in gene purposeful evaluation20,21,29,30,31, we quantified the interdependency amongst molecular issue labels throughout sufferers by evaluating their sample of co-occurrence and mutual exclusivity. We used log odds ratios (LOR) to quantify these relationships, the place constructive LOR values point out co-occurrence, and unfavorable values point out mutual exclusivity. Statistical significance was assessed with a two-sided Fisher’s actual check, and the ensuing P values have been corrected for a number of speculation testing.
To evaluate whether or not biomarker interdependencies introduce bias into WSI-based fashions, we analysed three deep studying algorithms with completely different ideas of operation: attention-based (CLAM32), graph neural network-based (({mathrm{SlideGraph}}^{infty })33) and a WSI-level multimodal basis mannequin (TITAN22). These algorithms signify current ML approaches that don’t explicitly contemplate interdependencies between prediction variables. As CLAM and ({mathrm{SlideGraph}}^{infty }) depend on a patch-level encoder, we educated them with two completely different encoders: CTransPath34 (educated on histology photos) and ShuffleNet35 (educated on ImageNet)36 to reduce encoder-specific bias. For every biomarker, we prepare these fashions with each encoders on the TCGA cohort utilizing fourfold cross-validation and report AUROC as a efficiency metric. We additional evaluated the educated fashions on two unbiased validation cohorts, CPTAC37 and the Australian Breast Most cancers Tissue Financial institution (ABCTB)38. Lastly, we used WSI-level illustration from a multimodal basis mannequin (TITAN)22, educated on 330,000 picture–textual content pairs, underneath the speculation that these embeddings higher seize biomarker-related morphology, and educated each single-output and multi-output biomarker predictors on them.
To research whether or not WSI-based biomarker prediction fashions are confounded by the interdependency amongst molecular elements or clinicopathological variables (for instance, histological grade or TMB), we carried out a stratification evaluation and permutation testing. For every mannequin, we outline two kinds of variable: the prediction variable, which is the biomarker the mannequin is educated to foretell, and stratification variables, that are biomarkers or clinicopathological options displaying important mutual exclusivity or co-occurrence with the prediction variable and will act as confounders (recognized in step 1). The motivation for contemplating interdependent variables as confounders is that they might be related to a shared phenotypic sample in WSIs, which the mannequin can exploit as proxies for the prediction variable, doubtlessly resulting in biased predictions when such indicators are absent or decoupled at check time. To detect such confounders, we consider mannequin efficiency at two ranges: (1) throughout your complete cohort and (2) inside subgroups outlined by stratification variables. Inspecting mannequin efficiency inside these subgroups permits us to isolate the impact of the prediction variable from confounders. If the mannequin actually captures prediction variable particular patterns in WSIs, its subgroup-level efficiency ought to carefully match the cohort-level efficiency. Against this, substantial variations between subgroups and total efficiency point out the affect of confounding results or Simpson’s paradox39,40. To quantify these results, we carry out permutation testing and report their statistical significance.
For instance, to judge whether or not the efficiency of a WSI-based predictor for oestrogen receptor (ER) standing (prediction variable) is influenced by TP53 mutation standing (stratification variable), we first divide the cohort into two subgroups on the premise of the stratification variable: sufferers with a TP53-mutant standing and sufferers with a TP53 wild-type standing. We then compute the AUROC of the ER predictor inside every of those subgroups. Lastly, we examine these subgroup-level AUROCs to the mannequin’s total AUROC throughout your complete cohort. A considerable distinction between subgroup and cohort-level AUROCs signifies a possible bias, suggesting the mannequin captures the mixed results of ER and TP53 fairly than ER-specific options alone. To ascertain statistical significance, we run a permutation check with 10,000 trials (see Strategies for extra particulars). This definition of the ‘prediction variable’ (ER standing on this instance) and the ‘stratification variable’ (TP53 standing on this instance) shall be used constantly in subsequent outcomes and figures to make sure readability. Repeating this throughout different stratification variables (for instance, grade and TMB) offers a scientific approach of detecting the affect of confounding elements on completely different WSI-based fashions.
To evaluate the added worth of ML fashions in predicting numerous biomarkers over and above pathologist-assigned grades, we used a help vector machine with one-hot encoded histological grades to foretell numerous scientific biomarkers following the identical protocols used for weakly supervised fashions.
Biomarker statuses present important interdependencies and variations
Our evaluation revealed important interdependencies ((Pll 0.05)) amongst biomarkers throughout most cancers sorts (Fig. 2 and Supplementary Fig. 1). In BRCA, elevated ER and progesterone receptor (PR) expression co-occur with mutations in CDH1, MAP3K1 and PIK3CA, however not with TP53, which is mutually unique with CDH1, GATA3, MAP3K1 and PIK3CA41. In CRC, MSI-high (MSI-H) instances regularly carry BRAF, ATM, ARID1A and RNF43 mutations and are much less prone to harbour KRAS mutations; BRAF-mutant tumours additionally present larger TMB and present co-occurrence with ATM, RNF43 and ARID1A. Related patterns of interdependencies are additionally noticed in UCEC and lung adenocarcinoma (LUAD) (Supplementary Fig. 1). As an example, in UCEC, PTEN mutations co-occur with APC, ATM, JAK1, KRAS and ARIDA, whereas in LUAD, STK11 mutations co-occur with KEAP1 however hardly ever with EGFR.

The warmth maps show a set of biomarkers and genes alongside the axes, with cell colors throughout the warmth map displaying the power of affiliation (darkish pink colors for co-occurrence and darkish blue for mutual exclusivity). Cells marked with asterisks point out statistically important associations (Benjamini–Hochberg FDR-corrected P values from two-sided Fisher’s actual check (Pll 0.05)). The highest bar above every warmth map reveals the share of instances mutated for a particular gene in case of gene mutations, whereas for biomarkers, it signifies the share of sufferers with elevated ER, PR and HER2 in case of breast tumours, excessive MSI, hypermutation and CIMP exercise and CIN for colorectal tumours. CINGS, chromosomally instable versus genome steady; HM, hypermutated.
Our evaluation additional confirmed that, throughout the similar tissue sort, biomarker associations can range throughout datasets, displaying sampling variations. Within the TCGA-BRCA cohort, MAP3K1 mutations confirmed mutual exclusivity with AKT1 and ARID1A, whereas within the METABRIC cohort, they confirmed a bent in direction of co-occurrence (Fig. 2). ER standing and excessive TMB confirmed gentle co-occurrence within the TCGA-BRCA cohort however mutual exclusivity within the METABRIC cohort. Within the TCGA-CRC cohort, BRAF-mutant tumours have been considerably much less prone to harbour TP53 mutations, whereas this affiliation is much less pronounced within the DFCI cohort and lacks statistical significance. Related cross-dataset variations have been noticed in UCEC and LUAD (Supplementary Fig. 1). As an example, in TCGA-LUAD, BRAF and STK11 confirmed a weak tendency in direction of mutual exclusivity, whereas within the MSK cohort, they confirmed a weak tendency in direction of co-occurrence.
These outcomes present that biomarker statuses are considerably interdependent and that their affiliation patterns can range throughout datasets. Consequently, ML fashions educated on WSIs could study composite phenotypes pushed by a number of interdependent biomarkers, introducing cohort-specific biases and limiting their generalizability to unseen instances.
Prediction of biomarkers and gene alterations from WSIs
To reveal that the ML fashions analysed within the research have been correctly educated, we report biomarker prediction efficiency throughout algorithms, characteristic embeddings and modelling approaches (Fig. 3 and Supplementary Figs. 1 and a couple of). Totally different mannequin configurations achieved AUROCs >0.80 for a number of biomarkers in each cross-validation and unbiased validation cohorts.

The plots present the AUROC for 2 weakly supervised fashions (CLAM and ({mathrm{SlideGraph}}^{infty })), every educated with two completely different patch-level encoders: ShuffleNet, a convolutional neural network-based encoder pretrained on pure photos, and CTransPath, a transformer-based mannequin pretrained on WSIs by self-supervised studying. For every biomarker or gene mutation, the comparative predictive efficiency for these 4 model-encoder combos is proven. Darkish and light-weight pink bars correspond to CLAM with CTransPath and ShuffleNet, respectively, whereas darkish and light-weight blue bars correspond to ({mathrm{SlideGraph}}^{infty }) with CTransPath and ShuffleNet, respectively. Bar heights signify imply AUROC values, whereas error bars point out the 95% confidence (two-sided, utilizing Pupil’s t-distribution) calculated throughout 100 class-stratified bootstrap sampling runs. Bar labels are colour-coded, with yellow denoting biomarkers and inexperienced denoting mutations.
In BRCA, CLAM with CTransPath options predicts receptor standing with common AUROCs of 0.87 and 0.90 for ER and 0.79 and 0.78 for PR, in cross-validation (TCGA-BRCA) and unbiased validation (ABCTB) cohorts, respectively. Related AUROCs have been noticed for ({mathrm{SlideGraph}}^{infty }) (CTransPath). These fashions additionally inferred gene mutations with excessive accuracy; for instance, CLAM (CTransPath) predicted CDH1 and TP53 mutations with AUROCs of 0.88 and 0.82 in TCGA-BRCA and 0.91 and 0.82 in CPTAC-BRCA, respectively.
Past breast tumours, these fashions additionally achieved excessive AUROC values for predicting biomarkers and gene mutations in CRC, lung most cancers and UCEC (Fig. 3 and Supplementary Fig. 2). As an example, ({mathrm{SlideGraph}}^{infty }) (CTransPath) predicted MSI standing in CRC with an AUROC of 0.89 in TCGA-CRC (cross-validation) and 0.84 in CPTAC-CRC (unbiased validation). A powerful predictive efficiency was additionally noticed for different biomarkers, together with BRAF, CpG island methylator phenotype pathway (CIMP), CINGS and hypermutation standing (Fig. 3).
Aside from weakly supervised approaches, single-output and multi-output fashions educated on TITAN WSI-level characteristic illustration confirmed roughly related efficiency (Supplementary Fig. 3). For instance, the multi-output mannequin predicts the ER and PR standing of TCGA-BRCA instances with an AUROC of 0.89 and 0.81, respectively, carefully matching the AUROC values of fashions educated underneath the single-output setting (ER 0.89 and PR 0.79).
These outcomes verify the right coaching of those fashions. Subsequent, on the premise of AUROC, we chosen the perfect mannequin for every biomarker and assessed the affect of biomarker interdependencies by permutation testing and stratification evaluation.
Interdependence in biomarker standing results in entangled histology phenotypes captured from WSIs
Our confounding issue evaluation reveals that WSI-based predictors are strongly influenced by biomarker interdependencies. Throughout a number of biomarkers, the upper cohort-level AUROCs achieved by these fashions drop considerably in subgroups outlined by the statuses of assorted stratification variables (Fig. 4 and Supplementary Figs. 4–7). For instance, ({mathrm{SlideGraph}}^{infty }) predicts colorectal tumours’ MSI standing (the ‘prediction variable’) with an AUROC of 0.88 (0.873–0.886). Nevertheless, when the identical affected person set is split into hypermutated and non-hypermutated subgroups (the ‘stratification variable’), the AUROC for MSI standing prediction drops to 0.72 inside every subgroup. The same impact is noticed in stratification by different biomarkers displaying co-occurrence with MSI (for instance, CIMP exercise, hypermutation and APC statuses) and people displaying mutual exclusivity (for instance, BRAF and CINGS) (Fig. 4).

AUROC values are illustrated on the y axis, with the highest x axis indicating the prediction variables and the underside x axis displaying the stratification variables. The predictive efficiency of every predictor on all of the instances within the cohort (denoted by ‘All’ within the plot) over 100 bootstrap runs is proven utilizing a violin plot, whereas its efficiency in numerous stratification teams is depicted with a doughnut chart, with the centre representing the AUROC values. The horizontal white line inside every violin marks the imply of the distribution. Doughnuts marked with an asterisk on the prime point out statistically important variation in leads to the stratification evaluation (Benjamini–Hochberg FDR-corrected P values from two-sided permutation testing (Pll 0.05)). The share values on the backside of the doughnut point out the proportion of constructive (MUT/excessive) or unfavorable (WT/low) instances relative to the standing of the stratification variables. Purple and blue colors in every doughnut point out the proportion of constructive and unfavorable instances in every stratified group regarding prediction variables. MUT, mutated; WT, wild-type.
These observations prolong past colorectal tumours and are evident in biomarker predictors of breast and endometrial tumours, no matter the precise mannequin structure, characteristic embeddings or coaching methodology used. As an example, in breast tumours, the efficiency of the ER predictor considerably declines in instances with GATA3, CDH1 and PIK3CA mutations (Fig. 4). Likewise, the ER predictor’s AUROC drops considerably in each PR-positive and unfavorable instances, in addition to in TP53-mutant and wild-type instances. Related traits are obvious for PR, TP53, CDH1 and PIK3CA predictors (Fig. 4). This development of inconsistent subgroup efficiency can be noticed for different single- and multi-output fashions, equivalent to these using TITAN WSI-level characteristic illustration (Supplementary Figs. 5–7). For instance, the AUROC of the ER predictor drops from 0.89 to 0.57 in single-output settings, whereas it drops from 0.88 to 0.58 underneath multi-output settings.
These outcomes counsel that the biomarker prediction from ML fashions is contingent on the standing of different interdependent biomarkers, and these fashions are most likely counting on composite phenotypes arising from doubtlessly interacting biomarkers fairly than studying biomarker-specific morphology.
WSI-based biomarker prediction is confounded by histology grade
WSI-based fashions predict breast tumour receptor standing (ER, PR) with excessive cohort-level AUROCs of 0.87 and 0.79 within the TCGA-BRCA cohort, and 0.90 and 0.78 within the ABCTB cohort, respectively. Nevertheless, the stratification evaluation by tumour grade reveals marked subgroup-level efficiency drops (Fig. 5). The ER predictor AUROC drops to 0.76 for medium-grade instances in each cohorts, and the PR predictor AUROCs in low and medium-grade instances drop to 0.59 and 0.69 within the TCGA-BRCA cohort and to 0.65 and 0.73 within the ABCTB cohort. Mutation predictors present related grade-specific efficiency declines; for instance, AUROC of the TP53 predictor drops from 0.81 (cohort-level) to 0.73, 0.73 and 0.72 for low-, medium- and high-grade instances. These patterns prolong past breast tumours and are evident within the mutation predictors of endometrial tumours, no matter mannequin structure, characteristic embeddings or coaching methodology (Fig. 5 and Supplementary Fig. 8). For instance, TP53 predictors educated on TITAN WSI-level embeddings additionally present efficiency drops in high-grade instances, with AUROCs lowering from 0.83 to 0.77 in single-output settings and from 0.86 to 0.77 in multi-output settings.

a, Within the plots, AUROC values are illustrated on the y axis, with the highest x axis indicating the prediction variables and the underside x axis displaying the affected person stratification with respect to histological grade. The predictive efficiency of every predictor on all of the instances within the cohort (denoted by ‘All’ within the plot) over 100 bootstrap runs is proven utilizing a violin plot, whereas its efficiency in a bunch of sufferers with a sure histological grade is depicted with a doughnut chart, with the centre representing the AUROC values. The horizontal white line inside every violin marks the imply of the distribution. Doughnuts marked with an asterisk on the prime point out statistically important variations in outcomes (Benjamini–Hochberg FDR-corrected P values from two-sided permutation testing (Pll 0.05)). Purple and blue colors in every doughnut point out the proportion of constructive and unfavorable instances in every stratified group in relation to prediction variables. b, Warmth maps highlighting the shift within the affiliation construction between histological grade and biomarker standing throughout two distinct datasets. The color depth displays the power of affiliation, with darkish pink indicating sturdy co-occurrence and darkish blue indicating sturdy mutual exclusivity.
Our evaluation additional reveals that the obvious AUROCs of WSI-based fashions are delicate to shifts in biomarker-grade associations between coaching and check cohorts. For instance, in high-grade UCEC instances, the TP53 predictor attains an AUROC of 0.70 within the TCGA cohort however solely 0.36 within the CPTAC cohort, a sample in step with a shift in TP53-grade relationship from sturdy co-occurrence within the coaching cohort to average mutual exclusivity within the check cohort. Equally, in low-grade instances, the ER predictor achieves an AUROC of 0.96 within the ABCTB cohort in contrast with a cross-validation AUROC of 0.90 in TCGA-BRCA, most likely reflecting a stronger ER-grade affiliation in ABCTB than in TCGA. In line with these, single- and multi-output fashions educated on TITAN WSI-level characteristic representations confirmed related sensitivity (Supplementary Fig. 8). For instance, in TCGA-UCEC, TP53 AUROC drops from 0.83 to 0.77 in high-grade instances for the single-output mannequin and from 0.86 to 0.77 for the multi-output mannequin. In CPTAC-UCEC, the place the grade–mutation affiliation differs, the drop in AUROC is extra pronounced, from 0.61 to 0.53 for the single-output mannequin and from 0.74 to 0.60 for the multi-output mannequin.
The confounding affect of grade is additional supported by experiments wherein, for chosen biomarkers, we educated separate fashions for grade 1, 2 and three sufferers; these grade-specific fashions attained decrease AUROCs than the pooled mannequin (Supplementary Desk 1). For instance, in TCGA-BRCA, the TP53 grade-specific predictors achieved AUROCs of ~0.73 in contrast with 0.84 for the pooled mannequin, and ER and PR confirmed related reductions. To judge whether or not these disparities may very well be attributed to demographic variations, we examined the demographic steadiness between biomarker-positive and biomarker-negative instances and located average racial variations (Supplementary Desk 2). We due to this fact repeated the grade-stratified experiment solely on sufferers in a single racial subgroup (white). The identical traits endured (Supplementary Desk 3); for instance, the ER predictor educated solely on grade 1 instances achieved an AUROC of 0.66, considerably decrease than the pooled AUROC of 0.85, suggesting that demographic elements are unlikely to drive these efficiency variations (Supplementary Desk 3).
These outcomes, harking back to Simpson’s paradox, point out that WSI-based biomarker prediction fashions rely closely on grade-associated morphology fairly than biomarker-specific phenotypic signatures, making them much less generalizable to exterior cohorts the place grade–biomarker associations differ from these within the coaching knowledge.
The added predictive energy of biomarker predictors past pathologist grade assignments
Our evaluation reveals that the standing of a number of biomarkers throughout most cancers sorts will be inferred with accuracy larger than anticipated from pathologist-assigned grade, and in a number of instances, approaches the efficiency of deep studying fashions. In BRCA, grade-based ER and PR classifiers achieved AUROCs of 0.76 and 0.70 within the TCGA-BRCA cohort and 0.79 and 0.71 within the ABCTB cohort, respectively (Fig. 6). Grade additionally predicts TP53 mutations with an AUROC of 0.75, almost matching the 0.81 achieved by weakly supervised ML fashions. Related AUROC patterns have been seen for TP53 and PTEN predictors within the TCGA-UCEC and CPTAC-UCEC cohorts. These outcomes counsel that, for some biomarkers, ML algorithms supply restricted extra predictive worth over pathologist-assigned grade (Fig. 3). The sturdy grade–biomarker affiliation additionally dangers ML fashions linking grade-associated phenotypic variations to biomarker standing; due to this fact, WSI-based fashions are anticipated to exceed this grade-derived baseline and set up strong phenotype–genotype associations which are unbiased of tumour grade.

The plots illustrate the AUROC achieved by a help vector machine classifier educated to foretell a biomarker/gene mutation from one-hot encoded histological grades. Bar heights signify imply AUROC values, whereas error bars point out the 95% confidence (two-sided, utilizing Pupil’s t-distribution) calculated throughout 100 class-stratified bootstrap sampling runs. Bar labels are colour-coded, with yellow denoting biomarkers and inexperienced denoting mutations.
WSI-based biomarker prediction is confounded by the density of mutations in different genes
WSI-based fashions infer BRAF and TP53 mutations in colorectal tumours (TCGA-CRC) from WSIs with excessive confidence, attaining AUROCs 0.774 (0.764–0.785) and 0.717 (0.711–0.722), respectively (Fig. 7a). Nevertheless, stratification evaluation reveals a major problem: for instances with low mutation density in genes aside from BRAF (denoted as ({mathrm{TMB}}_{widetilde{{BRAF}}})), the BRAF predictor accuracy drops to an AUROC of 0.65 (Fig. 7a). Equally, the TP53 predictor AUROC drops to 0.50 for top TMB instances. Within the CPTAC-CRC cohort, related traits have been noticed, with BRAF and TP53 predictors’ efficiency dropping in high and low TMB instances, respectively. As well as, APC and KRAS mutation predictors are additionally influenced by TMB. This commentary additionally extends to UCEC, the place the PTEN predictor achieved AUROCs of 0.803 in TCGA-UCEC and 0.731 in CPTAC-UCEC however drops to 0.63 and 0.32 for low TMB instances within the respective cohorts (Fig. 7a).

a, AUROC values are plotted on the y axis, with the highest x axis indicating the prediction variables and the underside x axis displaying sufferers’ stratification with respect to TMB. The predictive efficiency of every predictor on all of the instances within the cohort (denoted by ‘All’ within the plot) over 100 bootstrap runs is proven utilizing a violin plot, whereas its efficiency in sufferers with excessive and low TMB is depicted with a doughnut chart, with the centre representing the AUROC values. The horizontal white line inside every violin marks the imply of the distribution. Doughnuts marked with an asterisk on the prime point out statistically important variation in outcomes (Benjamini–Hochberg FDR-corrected P values from two-sided permutation testing (Pll 0.05)). Purple and blue colors in every doughnut point out the proportion of constructive and unfavorable instances in every stratified group based mostly on prediction variables. b, Warmth maps highlighting the shift within the affiliation construction between TMB and gene mutations throughout two distinct datasets. The color depth displays the power of affiliation, with darkish pink indicating sturdy co-occurrence and darkish blue indicating sturdy mutual exclusivity.
We additional present that various associations between TMB and biomarker standing throughout datasets considerably affect the prediction accuracy of WSI-based predictors. In CRC, the affiliation between KRAS mutation and TMB is barely stronger within the CPTAC-CRC cohort in contrast with the TCGA-CRC cohort (Fig. 7b). This stronger affiliation may clarify the KRAS predictor’s considerably improved prediction accuracy (AUROC: 0.83) in excessive TMB instances within the CPTAC-UCEC cohort, in contrast with an AUROC of 0.63 for top TMB instances within the TCGA-CRC cohort. This evaluation means that the mannequin’s predictions should not solely influenced by the KRAS mutation standing, which is the goal prediction variable, but in addition by the general TMB, which impacts the prediction accuracy.



