Coaching and analysis datasets
The dataset used on this examine originates from the identical filtered subset of the CELLxGENE census (v.2023-05-15)3 that was curated for the scTab examine11. This subset was constructed by making use of strict inclusion standards to the total census: solely major human cells profiled with 10x Genomics applied sciences have been retained and the characteristic area was restricted to 19,331 human protein-coding genes. Cell sorts have been required to seem in not less than 5,000 cells drawn from a minimal of 30 donors. All gene expression profiles have been size-factor normalized to 10,000 counts per cell and log-transformed with a pseudocount of 1 (that’s, f(x) = log(x + 1)). The ensuing dataset included 22,189,056 cells annotated with 164 distinct cell sorts, spanning 5,052 donors and 56 tissues. For the ID job, we adopted the identical donor-partitioned knowledge cut up as utilized by Fischer et al.11—that’s, 15,240,192 cells for coaching, 3,500,032 for validation and three,448,832 for testing.
The OOD take a look at dataset consisted of all newly added human cells in a subsequent launch of the CELLxGENE census (v.2023-12-15). These cells have been additionally profiled utilizing 10x Genomics platforms and annotated with one of many 164 labels noticed throughout coaching. This resulted in roughly 2.6 million cells drawn from 21 research, protecting 80 of the 164 coaching cell sorts.
Cell ontology
We used the cell ontology obtained from the Ontology Lookup Service at EMBL-EBI because the hierarchical scaffold for all analyses10. The ontology was represented as a DAG, the place nodes correspond to cell sorts and directed edges correspond to is_a subtype relationships. We restricted the ontology to the 164 distinct cell sorts noticed within the coaching set (‘Coaching and analysis datasets’). In CELLxGENE, which is the atlas utilized in our examine, cell sorts are annotated by the unique knowledge contributors after which harmonized by mapping every label to the closest cell ontology time period as specified by the portal’s knowledge schema. Whereas the cell ontology gives a worthwhile scaffold for representing hierarchical relationships amongst cell sorts, it is very important observe that its construction is repeatedly being revised the place sure definitions and mappings between cell sorts stay below energetic refinement.
As a result of every cell kind corresponds to a node within the DAG, we will additional classify them on the idea of the kind of node they signify. A node was outlined as a leaf if it had no kids within the pruned ontology and as an inside node if it had not less than one youngster. We additionally distinguished between related nodes, which had not less than one mum or dad or youngster current within the curated coaching set, and remoted nodes, which had none of their ancestors or descendants represented within the coaching knowledge. These definitions have been used to evaluate how the hierarchical loss propagates info throughout the ontology (Supplementary Fig. 6).
Analysis protocol
Classification efficiency was evaluated utilizing the macro F1 rating, which computes the unweighted common of the F1 scores throughout all cell sorts. This metric ensures that every cell kind contributes equally to the general rating, no matter class imbalance or prevalence within the dataset. For C cell sorts, the macro F1 rating is computed as
$${rm{m}}{rm{a}}{rm{c}}{rm{r}}{rm{o}},{F}_{1},{rm{s}}{rm{c}}{rm{o}}{rm{r}}{rm{e}}=frac{1}{C}displaystyle mathop{sum }limits_{i=1}^{C}frac{2times {rm{p}}{rm{r}}{rm{e}}{rm{c}}{rm{i}}{rm{s}}{rm{i}}{rm{o}}{{rm{n}}}_{i}instances {rm{r}}{rm{e}}{rm{c}}{rm{a}}{rm{l}}{{rm{l}}}_{i}}{{rm{p}}{rm{r}}{rm{e}}{rm{c}}{rm{i}}{rm{s}}{rm{i}}{rm{o}}{{rm{n}}}_{i}+{rm{r}}{rm{e}}{rm{c}}{rm{a}}{rm{l}}{{rm{l}}}_{i}}$$
(1)
the place precisioni and recalli are outlined for the ith class as
$${{textual content{precision}}}_{i}=frac{{{textual content{TP}}}_{i}}{{{textual content{TP}}}_{i}+{{textual content{FP}}}_{i}},,,,,,,,,{rm{r}}{rm{e}}{rm{c}}{rm{a}}{rm{l}}{{rm{l}}}_{i}=frac{{{textual content{TP}}}_{i}}{{{textual content{TP}}}_{i}+{{textual content{FN}}}_{i}}.$$
(2)
Right here, the phrases TPi, FPi and FNi denote the variety of true positives, false positives and false negatives for the ith cell kind, respectively. We adopted the analysis framework launched by Fischer et al.11 within the scTab examine, notably due to the way in which these authors dealt with variations within the granularity of annotations that may happen throughout completely different research: specifically, a predicted label is taken into account appropriate if it precisely matches the ground-truth label or if it corresponds to a descendant of the ground-truth label within the cell ontology (that’s, the prediction is a extra particular subtype). This accounts for the truth that some datasets present coarse-grained annotations (for instance, T cell) whereas others embody extra detailed subtypes (for instance, ‘CD4-positive, α–β T cell’). In such circumstances, predicting a sound subtype is handled as appropriate, because it stays in keeping with the unique label. Another prediction, together with a coarser label (equivalent to a mum or dad node) or an unrelated class, is taken into account incorrect.
Mannequin particulars
We evaluated three mannequin architectures of accelerating complexity: a linear classifier, an MLP and the TabNet transformer mannequin. Every mannequin takes as enter the total set of 19,331 human protein-coding genes. To make sure a good comparability throughout fashions and with earlier work, we adopted the structure configurations and hyperparameters used within the scTab benchmarking examine from Fischer et al.11 (Supplementary Tables 1–3). The fashions utilizing CE versus HCE share an identical structure and hyperparameter settings; the loss time period is the one distinction between them. Particularly, for the fashions with CE, we used one of the best hyperparameters out there in line with the unique scTab examine. For the fashions utilizing the HCE loss, we didn’t carry out further hyperparameter tuning and as an alternative stored the (presumably suboptimal) hyperparameters used for the fashions with CE. Word that, whereas latest efforts have explored large-scale basis fashions to be taught transferable embeddings for single-cell knowledge, such approaches haven’t but demonstrated clear benefits over less complicated, task-specific approaches for cell-type annotation11,22. We subsequently targeted on strategies the place we may simply isolate and examine the direct results of implementing the HCE technique.
HCE loss operate
The HCE loss operate extends the usual CE loss by explicitly encoding the structural relationships throughout the cell ontology. With the usual CE, the loss is computed instantly from uncooked mannequin predictions, treating all cell sorts as unbiased courses. Let p = (p1, …, pC) denote the uncooked predicted chances for C completely different cell sorts. The usual CE loss is given by
$${{mathcal{L}}}_{{rm{C}}{rm{E}}}=-mathop{sum }limits_{i=1}^{C}{mathbb{1}}{{rm{l}}{rm{a}}{rm{b}}{rm{e}}{rm{l}}=i},log ,{p}_{i}$$
(3)
the place ({mathbb{1}}{label=i}) is an indicator operate that is the same as 1 if the true class label is the ith cell kind and 0 in any other case. The HCE adjusts these predictions to replicate hierarchical dependences encoded within the ontology’s DAG. The adjusted rating si for the ith cell kind is computed because the sum of the anticipated chance for its label and the anticipated chances of all its descendant subtypes
$${s}_{i}={p}_{i}+mathop{sum }limits_{jin {mathcal{D}}(i)}{p}_{!j}$$
(4)
the place ({mathcal{D}}(i)) denotes the set of all descendants of cell kind i within the DAG. This adjustment ensures that the chance of a mum or dad node displays its total subgraph. The hierarchical loss is then
$${{mathcal{L}}}_{{rm{H}}{rm{C}}{rm{E}}}=-mathop{sum }limits_{i=1}^{C}{mathbb{1}}{{rm{l}}{rm{a}}{rm{b}}{rm{e}}{rm{l}}=i},log ,{s}_{i},.$$
(5)
This formulation instantly parallels the analysis framework, the place predictions are thought-about appropriate in the event that they match the ground-truth label or any of its descendants. By aligning the coaching goal with the evaluation criterion, HCE encourages cell-type classification fashions to distribute chance mass in a approach that respects organic hierarchy and annotation granularity.
Think about an ontology subgraph that’s rooted on the node T cell, which incorporates subtype labels equivalent to CD4+ T cell, CD8+ T cell and γ–δ T cell. The HCE permits classifications fashions to foretell fine-grained subtypes when out there, whereas additionally deferring to mum or dad classes when annotations are coarse or ambiguous. For instance, if some research annotate cells as T cell whereas others use extra particular labels equivalent to CD4+ T cell or CD8+ T cell, the adjusted rating is computed as
$${s}_{{rm{T}}{rm{c}}{rm{e}}{rm{l}}{rm{l}}}={p}_{{rm{T}}{rm{c}}{rm{e}}{rm{l}}{rm{l}}}+{p}_{{rm{C}}{rm{D}}{4}^{+}}+{p}_{{rm{C}}{rm{D}}{8}^{+}}+{p}_{{rm{gamma }}{-}delta }+ldots .$$
(6)
This hierarchical set-up permits the mannequin to combination subtype info upward, enhancing consistency throughout annotations with various granularity.
Implementation particulars for the HCE loss
We carried out the HCE loss utilizing a reachability matrix R ∈ {0, 1}C×C, the place component Rij = 1 if the jth class is reachable from the ith class (which means j is both i itself or j is a descendant of i within the hierarchy) and Rij = 0 in any other case. The reachability relation encoded on this matrix is a partial order and has the next mathematical properties:
Reflexive—each class is reachable from itself (diagonal parts are 1).
Antisymmetric—if class i can attain j and j can attain i, then i = j.
Transitive—if class i can attain j and j can attain okay, then i can attain okay.
Certainly, the reachability matrix represents the transitive closure of the inverted adjacency matrix of the hierarchical DAG construction. For the reason that authentic DAG encodes is_a relationships from youngster to mum or dad, we invert the sting instructions to allow parent-to-descendant reachability, making certain reflexivity by setting the diagonal to 1. Every educated mannequin outputs a uncooked chance distribution p = (p1, …, pC) over the category labels. The adjusted scores are computed by way of matrix–vector multiplication: s = Rp, which effectively aggregates descendant chances for every class. We then apply a log transformation with numerical stability log(s + ϵ), the place ϵ = 10−6. The ultimate loss makes use of a weighted unfavourable log-likelihood as carried out in PyTorch, with class weights computed following scikit-learn’s compute_class_weight strategy: wi = N/(Cni), the place N is the overall variety of samples, C is the variety of courses and ni is the rely of samples for the category i. The entire loss for a single coaching pattern x with true label t is
$${{mathcal{L}}}_{{rm{H}}{rm{C}}{rm{E}}}(x)=-{w}_{t},log ({s}_{t}+epsilon ).$$
(7)
This formulation maintains consistency with the fashions educated with the weighted CE, whereas incorporating hierarchical construction by means of environment friendly matrix operations.
Statistical analysis of efficiency variations throughout loss capabilities
To evaluate adjustments in predictive efficiency induced by the ontology-aware coaching technique, we computed per-cell-type variations in macro F1 rating between fashions educated with normal CE and HCE throughout 4 unbiased coaching runs. For every cell kind, a paired t-test was carried out and P values have been adjusted utilizing the Holm–Bonferroni methodology to appropriate for a number of speculation testing. Statistically vital variations point out cell sorts for which ontology-aware coaching produces constant adjustments past random variability.
Reporting abstract
Additional info on analysis design is on the market within the Nature Portfolio Reporting Abstract linked to this text.



