Aebersold, R. & Mann, M. Mass spectrometry-based proteomics. Nature 422, 198–207 (2003).
Google Scholar
Dupree, E. J. et al. A critical review of bottom-up proteomics: the good, the bad, and the future of this field. Proteomes 8, 14 (2020).
Google Scholar
Matthiesen, R. Methods, algorithms and tools in computational proteomics: a practical point of view. Proteomics 7, 2815–2832 (2007).
Google Scholar
Chen, C., Hou, J., Tanner, J. J. & Cheng, J. Bioinformatics methods for mass spectrometry-based proteomics data analysis. Int. J. Mol. Sci. 21, 2873 (2020).
Google Scholar
Szabo, Z. & Janaky, T. Challenges and developments in protein identification using mass spectrometry. Trends Anal. Chem. 69, 76–87 (2015).
Google Scholar
Mudge, J. M. et al. Standardized annotation of translated open reading frames. Nat. Biotechnol. 40, 994–999 (2022).
Google Scholar
Zhu, C. et al. Identification of non-canonical peptides with moPepGen. Nat. Biotechnol. 44, 568–573 (2026).
Google Scholar
Keenan, E. K., Zachman, D. K. & Hirschey, M. D. Discovering the landscape of protein modifications. Mol. Cell 81, 1868–1878 (2021).
Google Scholar
Kleikamp, H. B. C. et al. Database-independent de novo metaproteomics of complex microbial communities. Cell Syst. 12, 375–383 (2021).
Google Scholar
Minegishi, Y., Haga, Y. & Ueda, K. Emerging potential of immunopeptidomics by mass spectrometry in cancer immunotherapy. Cancer Sci. 115, 1048–1059 (2024).
Google Scholar
Muth, T. & Renard, B. Y. Evaluating de novo sequencing in proteomics: already an accurate alternative to database-driven peptide identification? Brief. Bioinform. 19, 954–970 (2017).
Google Scholar
Wen, B. et al. Deep learning in proteomics. Proteomics 20, 1900335 (2020).
Google Scholar
Eloff, K. et al. InstaNovo enables diffusion-powered de novo peptide sequencing in large-scale proteomics experiments. Nat. Mach. Intell. 7, 565–579 (2025).
Google Scholar
Zhang, X. et al. π-PrimeNovo: an accurate and efficient non-autoregressive deep learning model for de novo peptide sequencing. Nat. Commun. 16, 267 (2025).
Google Scholar
Melendez, C. et al. Accounting for digestion enzyme bias in Casanovo. J. Proteome Res. 23, 4761–4769 (2024).
Google Scholar
Grimes, M. et al. Integration of protein phosphorylation, acetylation, and methylation data sets to outline lung cancer signaling networks. Sci. Signal. 11, eaaq1087 (2018).
Google Scholar
Schwämmle, V. et al. Systems level analysis of histone H3 post-translational modifications (PTMs) reveals features of PTM crosstalk in chromatin regulation. Mol. Cell. Proteomics 15, 2715–2729 (2016).
Google Scholar
Liu, J., Qian, C. & Cao, X. Regulation of innate immunity through post-translational modifications. Immunity 45, 15–30 (2016).
Google Scholar
Sui, Y., Shen, Z., Wang, Z., Feng, J. & Zhou, G. Lactylation in cancer: metabolic mechanisms and therapeutic approaches. Cell Death Discov. 11, 68 (2025).
Google Scholar
He, X. et al. Lysine vitcylation: a vitamin C-driven protein modification that boosts STAT1-mediated immune responses. Cell 188, 1858–1877 (2025).
Google Scholar
Paik, Y.-K. et al. The Chromosome-Centric Human Proteome Project: cataloging proteins encoded in the genome. Nat. Biotechnol. 30, 221–223 (2012).
Google Scholar
Su, J. et al. RoFormer: transformer enhanced with rotary position embeddings. Neurocomputing 568, 127063 (2024).
Google Scholar
Mao, Z., Zhang, R., Xin, L. & Li, M. Addressing the missing-fragmentation issue in de novo peptide sequencing with a two-stage graph-based deep learning approach. Nat. Mach. Intell. 5, 1250–1260 (2023).
Google Scholar
Zolg, D. P. et al. ProteomeTools: systematic analysis of 21 post-translational protein modifications using liquid chromatography tandem mass spectrometry (LC-MS/MS) with synthetic peptides. Mol. Cell. Proteomics 17, 1850–1863 (2018).
Google Scholar
Zhang, J. et al. PEAKS DB: database search supported by de novo sequencing for sensitive and accurate peptide identification. Mol. Cell. Proteomics 11, M111.010587 (2012).
Google Scholar
Tran, N. H. et al. NovoBoard: a comprehensive framework for assessing false discovery rates and accuracy in de novo peptide sequencing. Mol. Cell. Proteomics 23, 100849 (2024).
Google Scholar
Yu, F. et al. Detecting modified peptides through localization-aware open search. Nat. Commun. 11, 4065 (2020).
Google Scholar
Rebak, A. S. et al. A quantitative, site-specific map of the citrullinome reveals widespread citrullination and provides insights into PADI4 substrates. Nat. Struct. Mol. Biol. 31, 977–995 (2024).
Google Scholar
Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search tools. Nucleic Acids Res. 25, 3389–3402 (1997).
Google Scholar
Vaswani, A. et al. Attention is all you need. In Advances in Neural Information Processing Systems 30 (eds Guyon, I. et al.) 5998–6008 (Curran Associates, 2017).
Dao, T., Fu, D. Y., Ermon, S., Rudra, A. & Ré, C. FlashAttention: fast, memory-efficient exact attention with IO-awareness. In Proc. 36th International Conference on Neural Information Processing Systems (eds Koyejo, S. et al.) 16344–16359 (Curran Associates, 2022).
Bahdanau, D., Cho, K. & Bengio, Y. Neural machine translation by jointly learning alignment and translation. In 3rd International Conference on Learning Representations (ICLR 2015), Conference Track Proceedings (eds Bengio, Y. & LeCun, Y.) (2015).
Tran, N. H., Zhang, X., Xin, L., Shan, B. & Li, M. De novo peptide sequencing using deep learning. Proc. Natl Acad. Sci. USA 114, 8247–8252 (2017).
Google Scholar
Qiao, R. et al. De novo peptide sequencing independent of instrument resolution for high-resolution devices. Nat. Mach. Intell. 3, 420–425 (2021).
Google Scholar
Li, W. & Godzik, A. Cd-hit: a fast program for clustering
and comparing large sets of protein or nucleotide sequences. Bioinformatics 22, 1658–1659 (2006).
Google Scholar
Needleman, S. B. & Wunsch, C. D. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443–453 (1970).
Google Scholar
Treen, D. G. C. et al. SIMILE enables alignment of tandem mass spectra with statistical significance. Nat. Commun. 13, 2510 (2022).
Google Scholar
Zhong, H., Marcus, S. L. & Li, L. Two-dimensional mass spectra generated from the analysis of 15N-labeled and unlabeled peptides for efficient protein identification and de novo peptide sequencing. J. Proteome Res. 3, 1155–1163 (2004).
Google Scholar
Mao, Z. RNovA test datasets. Zenodo (2026).
Perez-Riverol, Y. et al. The PRIDE database at 20 years: 2025 update. Nucleic Acids Res. 53, D543–D553 (2024).
Google Scholar
Zhang, Q. RNovA. GitHub (2026).



