Bimodal VAE
A Variational Autoencoder (VAE) is a generative model that performs variational inference on a latent variable z. It is formally defined by the joint distribution p(x,z) = p(z)p(x|z). Since the true posterior q(z|x) is intractable, both it and the conditional distribution p(x|z) are approximated using neural networks, trained by optimizing the Evidence Lower Bound (ELBO) loss function:
$$beginxx=q(z_zleft[log pleft(right)right]-mathrmzleft(qleft(proper)zzleft(0,Iright)proper)endz$$
(1)
The bimodal VAE extends the standard VAE to handle datasets where (1) inputs consist of two distinct modalities and (2) one modality may be absent for some samples. Note that although VAMB is trained on both tetranucleotide frequencies (TNFs) and abundance data, we do not classify it as bimodal in this context because both TNFs and abundances are available for every sample and can be combined into a single input modality by concatenating their respective vectors.
While a standard VAE uses a neural network encoder to approximate the posterior q(z|x) from input x, a bimodal VAE generalizes this by modeling three posteriors: q(z|x1,x2), q1(z|x1), and q2(z|x2), replacing the single q(z|x). It also employs two separate decoders to model the distributions p(x1|z) and p(x2|z). Different multimodal VAE architectures vary in (1) how they implement the neural networks for q(z|x1,x2), q1(z|x1), and q2(z|x2) and/or (2) the design of their loss function.
TaxVAMB incorporates the VAEVAE50 model from the bimodal VAE family, which implements q(z|x1,x2), q1(z|x1), and q2(z|x2) using dedicated neural networks. The following ELBO-based loss L is minimized during training:
$$beginzz_zz_[z_[log _(z_z)+log z_(z_z)] -mathrm(q(_z,z_)Vert p(z_z))-mathrmz(q(z_,z_)Vert p(z_z))] +{{mathbbz}}_{_{textz}(_z)}[{{mathbb}}_,z_zright)[log _{1}({x}_{1}z)]-mathrm{KL}(q({x}_{1})Vert p(z))] +{{mathbb{E}}}_{{p}_{textual content{paired}}({x}_{2})}[{{mathbb{E}}}_{q(z| {x}_{2})}[log {p}_{2}({x}_{2}z)]-mathrm{KL}(q({x}_{2})Vert p(z))] { {mathcal L} }_{1}{=}{{mathbb{E}}}_{{p}_{textual content{unpaired}}({x}_{1})}[{{mathbb{E}}}_{q(z| {x}_{1})}[log {p}_{1}({x}_{1}z)]-mathrm{KL}(q({x}_{1})Vert p(z))] { {mathcal L} }_{2}{=}{{mathbb{E}}}_{{p}_{textual content{unpaired}}({x}_{2})}[{{mathbb{E}}}_{q(z| {x}_{2})}[log {p}_{2}({x}_{2}z)]-mathrm{KL}(q({x}_{2})Vert p(z))] {mathcal L} {=}{ {mathcal L} }_{textual content{paired}}+{{mathcal{L}}}_{1}+{{mathcal{L}}}_{2}finish{array}$$
Here, DKL (p(x)||q(x)) denotes the Kullback–Leibler divergence between two probability distributions p(x) and q(x), defined as:
$$start{array}{l}mathrm{KL}left(pleft(xright)mathrmqleft(xright)proper)={int }_{-infty }^{infty }pleft(xright)log left(frac{pleft(xright)}{qleft(xright)}proper){rm{d}}xend{array}$$
(2)
The training process begins by constructing a dataset containing both paired and unpaired samples. Let C represent the complete list of contigs. Three independent shuffled copies of C are created: Cpaired, C1, and C2. Paired samples are ordered tuples (x1,x2) where x1 is the concatenated vector of TNFs and abundances (the standard VAMB input), x2 is the taxonomy label vector, and both correspond to the same contig from Cpaired. An unpaired TNF and abundance vector x1 is drawn from the list C1, while an unpaired taxonomy label x2 comes from C2. The forward pass follows the steps outlined in Algorithm 1.
Algorithm 1Loss computation (forward pass)
Input: Paired sample (({x}_{1},{x}_{2})), unpaired sample ({x{prime} }_{1}), unpaired sample ({x{prime} }_{2})
1: (z{prime} sim qleft({x}_{1},{x}_{2}proper))
2: ({z}_{{x}_{1}}sim {q}_{1}left({x}_{1}proper))
3: ({z}_{{x}_{2}}sim {q}_{2}left({x}_{2}proper))
4: ({d}_{1}={D}_{mathrm{KL}}(q(z{prime} |{x}_{1},{x}_{2})parallel {q}_{1}({z}_{{x}_{1}}|{x}_{1}))+{D}_{mathrm{KL}}({q}_{1}({z}_{{x}_{1}}|{x}_{1})parallel p(z)))
5: ({d}_{2}={D}_{mathrm{KL}}(q(z{prime} |{x}_{1},{x}_{2})parallel {q}_{2}({z}_{{x}_{2}}|{x}_{2}))+{D}_{mathrm{KL}}({q}_{2}({z}_{{x}_{2}}|{x}_{2})parallel p(z)))
6: ({L}_{mathrm{paired}}=log {p}_{1}({x}_{1}z)+log {p}_{2}({x}_{2}z)+log {p}_{1}({x}_{1}|{z}_{{x}_{1}})) (+log {p}_{2}({x}_{2}|{z}_{{x}_{2}})+{d}_{1}+{d}_{2})
7: ({L}_{{x}_{1}}=log {p}_{1}({x{prime} }_{1}|{z}_{{x}_{1}})+{D}_{mathrm{KL}}({q}_{1}({z}_{{x}_{1}}|{x{prime} }_{1})parallel p(z)))
8: ({L}_{{x}_{2}}=log {p}_{2}({x{prime} }_{2}|{z}_{{x}_{2}})+{D}_{mathrm{KL}}({q}_{2}({z}_{{x}_{2}}|{x{prime} }_{2})parallel p(z)))
9: (L={L}_{mathrm{paired}}+{L}_{{x}_{1}}+{L}_{{x}_{2}})
Data preprocessing
The data preprocessing pipeline follows the same procedure as used in Taxometer (version 5.0.4)42 and VAMB (version 5.0.4)11. Synthetic short paired-end reads from each sample were aligned using bwa-mem (version 0.7.15)73. The resulting BAM files were sorted with SAMtools (version 1.14)74. Contigs shorter than or equal to 2,000 bp were filtered out from all datasets. For long-read datasets, sequencing was performed using Pacific Biosciences HiFi technology. Each sample was assembled with hifiasm-meta (version 0.3.1)3, reads were mapped using minimap2 (version 2.24)75 with the ‘-ax map-hifi’ option, and then processed through the same workflow as the short-read data.
Abundances and TNFs
The computation of abundances and tetranucleotide frequencies (TNFs) follows the identical pipeline as in Taxometer (version 5.0.4)42 and VAMB (version 5.0.4)11. Abundance and TNF calculations were performed using the VAMB metagenome binning tool11. To compute TNFs, tetramer frequencies of unambiguous bases were calculated for each contig, projected into a 103-dimensional orthonormal space, and normalized by z-scaling each tetranucleotide across all contigs. To determine the abundances
Here is the paraphrased HTML content:
To determine the abundance of each pattern, we utilized pycoverm (version 0.6.0). The abundances had been first normalized inside every pattern by the full variety of mapped reads, after which throughout all samples in order that they summed to 1. To acquire absolute abundance, the sum of abundances for a contig was calculated earlier than the cross-sample normalization step. After this, the dimensionality of the function desk was Nc × (103 + Ns + 1), the place Nc represents the variety of contigs and Ns represents the variety of samples.
Neighborhood construction and hyperparameters
The encoder architectures for the mixed vector of abundances and TNFs mirror these utilized in Taxometer (model 5.0.4)42 and VAMB (model 5.0.4)11. The enter vector of dimensionality Nc × (103 + Ns + 1) was processed by 4 totally related layers ((103 + Ns + 1) × 512, 512 × 512, 512 × 512, 512 × 512) with Leaky ReLU activation (detrimental slope of 0.01), every incorporating batch normalization (ϵ 1 × 10−5, momentum 0.1) and dropout (P = 0.2).
The encoder community for taxonomy labels accepted an enter of dimensions Nl, the place Nl is the variety of leaves within the taxonomic tree. This enter vector was processed by 4 totally related layers (Nl × 512, 512 × 512, 512 × 512, 512 × 512) with Leaky ReLU activation (detrimental slope of 0.01), every making use of batch normalization (ϵ 1 × 10−5, momentum 0.1) and dropout (P = 0.2).
The encoder community for the concatenation of each modalities had enter dimensions of (103 + Ns + 1) + Nl, the place Ns is the variety of samples and Nl is the variety of leaves within the taxonomic tree. This enter vector was processed by 4 totally related layers ((103 + Ns + 1) × 512, 512 × 512, 512 × 512, 512 × 512) with Leaky ReLU activation (detrimental slope of 0.01), every utilizing batch normalization (ϵ 1 × 10−5, momentum 0.1) and dropout (P = 0.2).
The bimodal VAE options two decoder networks, one for every modality. Each observe the identical architectures as their corresponding encoders, with the enter vector having the dimensionality of the latent area and the output having the dimensionality of the respective modality.
For short-read datasets, the community was educated for 300 epochs with a batch measurement of 256 and a latent area dimensionality of 32. For long-read datasets, the community was educated for 1,000 epochs with a batch measurement of 1,024 and a latent area dimensionality of 64. All fashions used the Adam optimizer with studying charges set by D-Adaptation76. The mannequin was applied in PyTorch (model 1.13.1)77, and CUDA (model 11.7.99) was used when operating on a V100 GPU.
Hierarchical loss
The hierarchical loss follows the identical formulation as in Taxometer (model 5.0.4)42. A phylogenetic tree was constructed for every dataset based mostly on the taxonomy classifier annotations assigned to the contig set. The ensuing taxonomy tree T due to this fact shaped a subgraph of the total taxonomy, and the doable predictions had been restricted to those who appeared within the search outcomes. Within the experiments described above, we used a flat softmax loss. Let Nl be the variety of leaves in tree T. The possibilities for leaf nodes of the taxonomy tree had been computed through softmax over the community output layer of dimensionality 1 × Nl. The likelihood of every inner node was then calculated because the sum of its kids’s possibilities, computed recursively from the underside up. The mannequin output consisted of a likelihood vector for every doable label. Throughout backpropagation, the detrimental log-likelihood loss was computed for the true node and all of its ancestors. Predictions had been made at each taxonomic stage, and at every stage, the descendant node with the best likelihood was chosen. If no descendant node had a likelihood exceeding 0.5, predictions from that stage and all decrease ranges had been excluded from the output.
Taxonomic classifiers
We obtained taxonomic annotations for contigs from all 9 short-read and two long-read datasets utilizing MMseqs2 (model 17.b804f)33, Metabuli (model 1.1.0)78, Centrifuge (model 1.0.4.2)35, and Kraken2 (model 2.1.3)32. For MMseqs2, we used the mmseqs taxonomy command. For Metabuli, we used the metabuli classify command with the ‘—seq-mode 1’ flag. For Centrifuge, we used the centrifuge command with the ‘-okay 1’ flag. For Kraken2, we used the kraken command with the ‘—minimum-hit-groups 3’ flag. MMseqs2 and Metabuli had been arrange to make use of GTDB model 220 as the reference database. Centrifuge and Kraken2 had been configured to make use of NCBI identifiers, launch 229. All taxonomic annotations had been first refined with Taxometer42 (model 5.0.4) utilizing default parameters (epochs 100, batch measurement 1,024). For the datasets in Figs. 4 and 5, the MMseqs2 classifier configured with GTDB was used for all datasets; for the wheat phyllosphere dataset, we used Kraken2 (model 2.1.3) configured with NCBI.
Benchmarked binners
Metabat (model 2.17-66-ga512006) was run utilizing the ‘metabat’ command with default parameters. Metadecoder (model 1.0.19) was utilized with the ‘protection’, ‘seed’, and ‘cluster’ instructions as described on GitHub. Comebin (model 1.0.4) was run with the ‘run comebin.sh’ command and default parameters. ‘Comebin (single)’ refers to operating Comebin in single-sample mode. SemiBin2 (model 2.2.1) was run utilizing the ‘multi_easy_bin’ command with the flags ‘—engine gpu —separator C -t 20 —write-pre-reclustering-bins and —self-supervised’. VAMB, AVAMB, and TaxVAMB had been run as a part of the VAMB codebase (model 5.0.4), with the instructions ‘vamb bin default’, ‘vamb bin avamb’, and ‘vamb bin taxvamb‘. The workflows are publicly accessible on GitHub. Log information for failed runs are additionally obtainable on GitHub (https://github.com/RasmussenLab/TaxVamb-Benchmarks/tree/main/log_files_for_crashed_runs).
Reclustering utilizing SCGs
The short-read and long-read reclustering algorithms based mostly on SCGs had been similar to these launched in SemiBin2 (ref. 21). The code was tailored from the SemiBin2 codebase for the TaxVAMB codebase. TaxVAMB used the identical 107 single-copy marker genes as SemiBin2 for estimating the completeness, contamination, and F1 rating of every bin. Completeness for a given bin was calculated as (N/107), the place 107 is the variety of distinctive SCGs current in that bin. Contamination was calculated as (G − N)/G, the place G is the full variety of sequences matching any SCG. The F1 rating was calculated as (2 × completeness × (1 − contamination)) / (completeness + (1 − contamination)).
For the short-read datasets, ok-means-based reclustering of TaxVAMB and VAMB clusters was carried out. Bins containing two or extra marker genes of the identical kind had been reclustered utilizing the weighted ok-means strategy, with contigs holding the repeated marker genes serving because the
*Note: The unique paragraph seems to be minimize off mid-sentence (“the contigs containing the repeated marker gene because the…”), so I’ve preserved that truncation within the paraphrased model.*
To refine the preliminary centroids, we used a contamination-reduction strategy. This led to cleaner bins with less unwanted genetic material. For the long-read datasets, we performed fresh clustering using the DBSCAN algorithm from the Python scipy library (version 1.10.0), completely ignoring any prior clusters created by TaxVAMB or VAMB. Just like SemiBin2, we ran DBSCAN across a range of ϵ values: 0.01, 0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5, and 0.55. From all resulting bins, we repeatedly selected the one with the highest F1 score, removed its contigs from the pool, and repeated this selection process for the remaining bins. This loop continued until no further bins met our strict quality thresholds (greater than 90% completeness and less than 5% contamination). One key modification made during the TaxVAMB long-read reclustering process was performing the above steps separately for each group of contigs that the Taxometer refinements had assigned to the same genus.
CAMI2 benchmarks
To benchmark short-read performance, we utilized five different CAMI2 datasets: Airways (ten samples), Oral (ten samples), Skin (ten samples), Gastrointestinal (ten samples), and Urogenital (nine samples). Each dataset consisted of sample-specific assemblies. The CAMI2 sets contain the following numbers of genomes with nonzero abundance: Oral (799), Skin (610), Urogenital (254), Gastrointestinal (268), and Airways (828). The distinct genome counts per individual sample are detailed in Supplementary Table 5. We compared several binners using the artificial CAMI2 toy human microbiome dataset, including Metabat10, MetaDecoder79, COMEBin22, SemiBin2 (ref. 21), AVAMB28, and VAMB11. Taxonomic labels from four different classifiers were fed into TaxVAMB: MMSeqs2 (ref. 33), Metabuli77, Kraken2 (ref. 35), and Centrifuge78. Post-processing steps for AVAMB, VAMB, and TaxVAMB included reclustering based on single-copy genes (SCs). Our primary metric was the number of High-Quality (HQ) bins and assemblies, as predicted by BinBencher (version 0.3.0)54. For the MAG taxonomic annotation experiment, we used CheckM2 (version 1.0.2)40. All comparisons were run against the CAMI2 ground truth using BinBencher (version 0.3.0)54. Key metrics included the count of HQ assemblies (precision ≥ 0.95, recall ≥ 0.9) and HQ genomes. As defined in the BinBencher documentation, precision is calculated as the ratio of true positive mapping positions to the total positions in a bin. For genomes, recall is calculated relative to the total size of the source genome, whereas for assemblies, recall is calculated relative to only the portion of the genome covered by the input contig. The number of HQ genomes reflects MAG quality relative to the biological organism itself (and is thus limited by dataset quality), while the assembly metric better isolates the performance gains of the algorithm itself.
Short-read real-world benchmarks
To evaluate real-world short-read data, we used the following six datasets: seawater (5 samples)80, bee gut (18 samples)81, forest soil (12 samples)82, rhizosphere (10 samples)83, human saliva (15 samples)84, and vaginal microbiome (10 samples)85. Assemblies were generated for each sample using metaSPAdes (version 4.2.6)86, and read mapping was performed using minimap2 (version 2.24)87 with the ‘-ax sr’ parameter. Just as with the benchmarking step, we used taxonomic labels from four classifiers as input for TaxVAMB: MMSeqs2 (ref. 33), Metabuli78, Kraken2 (ref. 32), and Centrifuge35. MMSeqs2 was tested against three different databases: GTDB, TrEMBL88 (January 2025 release), and Kalamari89 (version 3.7). Completeness and contamination of the resulting MAGs were checked using CheckM2 (version 1.0.2)40. Chimeric genomes were detected using GUNC (version 1.0.61)55 via the ‘gunc run’ command. Supplementary Table 6 provides the read counts for each sample within these datasets.
Long-read benchmarks
For long-read evaluation, we used a human gut metagenome dataset (four samples) and an anaerobic digester sludge dataset (three samples)90. Both were sequenced using Pacific Biosciences HiFi technology. Individual samples were assembled with hifiasm-meta (version 0.3.1)3, and reads were mapped using minimap2 (version 2.24)87 with the ‘-ax map-hifi’ setting, following the same workflow as the short-read pipeline. MAG quality (completeness and contamination) was evaluated using CheckM2 (version 1.0.2)40. We identified chimeric genomes by running GUNC (version 1.0.61)55 with the ‘gunc run’ command.
Multisample scaling
For the experiment measuring the number of bins generated as we increased the number of samples, we used a large human gut dataset containing 1,000 samples (Almeida et al.)58 and a wheat phyllosphere dataset with 211 samples. We divided all samples from each dataset into three groups: chunks of 100 samples, chunks of 10 samples, and single-sample chunks. Each chunk was processed as an independent dataset, and we then summed the total number of HQ bins across the different chunk sizes. Taxonomic annotations for the human gut dataset were generated using MMseqs2, while the wheat phyllosphere dataset used Kraken2.
Taxonomic annotation validation: k-fold analysis
We applied a 5-fold cross-validation to verify the accuracy of our taxonomic annotations. All contigs in a dataset with classifier annotations were randomly divided into five non-overlapping groups (folds). Taxometer was trained five times, each time using a different fold as the validation set and the remaining four folds as the training set. After training on the remaining folds, it generated predictions for the unseen validation fold. Finally, we reassembled the five predictions to reconstruct the full dataset. This method guarantees that every contig is predicted without the model ever having seen its specific classifier annotation during training.
Our evaluation metric was defined as the fraction of correctly predicted contigs at the domain level (e.g., Bacteria, Archaea), accounting for cases where ground-truth annotations might be missing. Essentially, the score reflects how many predictions align with the available ground-truth labels, normalized by the total number of contigs in the set.(text{Accuracy} = frac{text{No. of correct predictions (where ground truth exists)}}{text{Total no. of contigs in dataset}}) A ‘correct’ prediction is one where the Taxometer output matches the ground-truth classifier annotation. If no ground truth existed for a contig, that contig was excluded from the numerator (since correctness cannot be assessed) but was still counted in the denominator to reflect the presence of missing data. This calculation gives the overall percentage of correct predictions across the entire dataset. This benchmarking tool is accessible in the codebase as the vamb taxonomy_benchmark command.
Bin annotations for CAMI2 MAGs
To perform taxonomic classification in Supplementary Figs. 9 and 10, Kraken2 with the NCBI database was employed alongside GTDBtk, which provides GTDB-based annotations. Rather than merely comparing these two annotation systems against each other, each one was benchmarked independently against ground-truth annotations from CAMI2 datasets. The original ground-truth taxonomy labels were provided as NCBI identifiers within the CAMI2 challenge dataset. To obtain corresponding GTDB annotations for CAMI2 data, the CAMI2 ground-truth genomes were processed through GTDBtk, producing complete classifications down to the species level. Several genomes were manually checked to confirm alignment with NCBI-based annotations. Since the CAMI2 genomes are sourced from public databases, the accuracy of the GTDBtk-derived annotations is considered highly reliable, and they were therefore adopted as the reference benchmark. In this setup, Kraken2 results were evaluated against CAMI2’s supplied ground truth, while GTDBtk-based bin annotations were assessed using GTDBtk’s own genome-level classifications as the gold standard.
Human gut (irritable bowel syndrome) dataset: sample collection and processing
Fecal samples were obtained from 4 healthy volunteers and 20 individuals diagnosed with irritable bowel syndrome (IBS). Each person contributed between 1 and 5 samples, resulting in a total of 70 samples. DNA was isolated from the fecal material using a bead-beat micro AX Gravity Kit (A&A Biotechnology), following the manufacturer’s protocol. Further purification was performed using phenol–chloroform extraction and ethanol precipitation, following an established procedure91. Sequencing was performed by NovoGene using PCR-based library preparation and 150-nucleotide paired-end reads generated on the Illumina NovaSeq 6000 platform. Raw sequencing reads underwent quality filtering and adapter removal using trim_galore (v1.15), which internally uses cutadapt (v3.6.9)92. A default Phred quality score threshold of 20 was applied, and an additional 16 and 6 nucleotides were removed from the 5′ and 3′ ends of reads, respectively, as this trimming strategy improved assembly continuity in prior benchmarking tests89. Human DNA contamination was filtered out using BMtagger (v1.1.0)93 with the GRCh38.p13 human genome as reference. Metagenomic assembly was carried out using SPAdes (v3.15.4)94.
Wheat phyllosphere dataset: sample collection and processing
Composite samples consisting of 30 flag leaves were collected from 24 field plots of Triticum aestivum (bread wheat) on nine occasions between June 7 and July 14, 2022, at a field trial site in Ringsted, Denmark. The experimental layout included three wheat cultivars, four biological replicates, and two treatment conditions: untreated (unsprayed) and fungicide-treated (sprayed). Leaves were washed in 100 ml of wash solution (0.9% NaCl + 0.05% Tween-80), subjected to vigorous shaking for 2 minutes, sonication for 2 minutes, another round of shaking for 2 minutes, filtration through a 10 µm filter, and centrifugation at 4,000g for 15 minutes. The resulting pellet was resuspended in 1 ml of 1× PBS and stored at −20°C until DNA extraction. Genomic DNA was extracted using the FastDNA SPIN Kit for Soil (MP Biomedicals) according to the manufacturer’s instructions, with final elution in 100 µl of DES buffer. Sequencing libraries were prepared using the Illumina Nextera XT DNA Library Prep Kit (Illumina); however, for samples with DNA concentrations below 0.1 ng/µl, a modified protocol was used involving a one-fold dilution of the amplicon tagment mix, 20 PCR cycles, and an increased ratio of AMPure XP beads (Beckman Coulter)95. Libraries were sequenced on the Illumina NovaSeq 6000 S4 platform using paired-end 150 bp chemistry (v1.5).
Wheat phyllosphere dataset: data analysis
Raw sequencing reads were processed using fastp (v0.23.2)96 with the following parameters: ‘–trim_tail1 –cut_tail –trim_poly_g –dedup –length_required 80’. Quality assessment of the cleaned reads was conducted using MultiQC (v1.12)97. To eliminate reads derived from wheat or potential human contamination, reads were aligned to reference genomes GCF_018294505.1, MG958554.1, and GCF_000001405.40 (GRCh38.p14) using Bowtie2 (v2.5.3)98. Only paired-end reads where both mates remained unmapped were retained using SAMtools (v1.18)73 with the ‘fastq -f 13’ option. Individual metagenomic assemblies were generated for each sample using SPAdes (v3.15.4)94 in metagenomic mode (‘–meta’) with k-mer sizes set to 21, 33, 55, 77, and 99. Assembly quality metrics were evaluated using QUAST (v5.2.0)99. Metagenome-assembled genomes (MAGs) were reconstructed using TaxVAMB, incorporating Kraken2 taxonomic annotations based on the NCBI database. MAGs predominantly composed of eukaryotic contigs were assessed for completeness using BUSCO (v5.8.2)67. For phylogenetic tree construction, MAGs were taxonomically classified using GTDBtk (v2.4.0) with GTDB release 220. Phylogenetic trees were visualized using the R packages ggtree (v3.19)100, tidytree (v0.4.6), and treeio (v1.24.1). A two-sample Mann–Whitney U-test was performed to compare P. agglomerans abundance levels between two temporal groups: 143 samples collected during early time points (June 7, 10, 14, 17, and 21, 2022) and 103 samples from later dates (June 28 and July 4, 7, and 14, 2022), using scipy (v1.10.0).
Reporting summary
Additional details regarding the experimental and analytical design are available in the Nature Portfolio Reporting Summary associated with this article.



