Experiment #324 ended nicely. 😉 This time I constructed a small mission round log anomaly detection. In about two days, I went from roughly 60% effectiveness within the first runs to a remaining F1 rating of 0.9975 on the HDFS benchmark. Below my present preprocessing and analysis setup, LogAI reaches F1=0.9975, which is barely above the 0.996 HDFS consequence reported for LogRobust in a latest comparative examine. What meaning in follow:
What I discover particularly attention-grabbing is that that is in all probability the primary log anomaly detection mannequin constructed on high of Mamba-3 / SSM, which was solely revealed a number of weeks in the past. The mannequin is small:
For comparability, my earlier method took round 20 hours to coach. The dataset right here is the basic HDFS benchmark from LogHub / Zenodo, based mostly on Amazon EC2 logs:
This benchmark has been utilized in loads of papers since 2017, so it’s a helpful place to check concepts. The half that stunned me most was not simply the rating, however what truly made the distinction. I began with a reasonably commonplace NLP-style method:
That bought me one thing like 0.61–0.74 F1, relying on the run. It seemed cheap at first, however I stored hitting a wall. Hyperparameter tuning helped a bit, however not sufficient. The breakthrough got here after I stopped treating logs like pure language. As an alternative of splitting strains into subword tokens, I switched to template-based tokenization: one log template = one token representing an occasion sort. So as an alternative of feeding the mannequin one thing like textual content, I feed it sequences like this: [5, 3, 7, 5, 5, 3, 12, 12, 5, …] The place for instance:
That one change did rather a lot directly:
The second necessary change was matching the classifier head to the structure. Mamba is causal, so the final token carries a compressed abstract of the sequence context. As soon as I revered that within the pooling/classification setup, the mannequin began behaving the way in which I had hoped. The coaching pipeline was easy:
Information break up was 70% prepare / 10% val / 20% take a look at, so the reported F1 is on classes the mannequin didn’t see throughout coaching. One other helpful factor is that the output isn’t just binary. The mannequin provides a steady anomaly rating from 0 to 1. So in manufacturing this might be used with a number of thresholds, for instance:
Or with an adaptive threshold that tracks the baseline noise stage of a selected system. A broader lesson for me: expertise and workflows I developed whereas taking part in with AI fashions for chess switch surprisingly nicely to different domains. That’s not precisely new – loads of AI labs began with video games, and plenty of nonetheless do – however it’s satisfying to see it work in follow. Additionally, I undoubtedly didn’t get right here alone. This can be a mixture of:
Very tough break up:
Now I’ll in all probability construct a dashboard and do this by myself Astrography / Astropolis manufacturing logs. Or I could push it additional first on BGL, Thunderbird, or Spirit. Truthfully, I nonetheless discover it fairly wild how a lot can now be accomplished on a gaming PC when you mix respectable {hardware}, public analysis, and newer architectures rapidly sufficient. Curious what individuals right here assume:
If there’s curiosity, I may share extra in regards to the preprocessing, coaching loop, and the errors that bought me caught at 60-70% earlier than it lastly clicked. P.S. I additionally examined its effectiveness and reproducibility throughout completely different seeds. On most of them, it truly carried out barely higher than earlier than. submitted by /u/Adam_Jesion |
Subscribe to Updates
Get the latest tech insights from TechnologiesDigest.com on AI, innovation, and the future of digital technology.
Trending
- Introducing Anthropic’s Claude Opus 4.7 mannequin in Amazon Bedrock
- T-Cellular offers you a Google Pixel 10a without cost – plus an additional reward
- Skild acquires Fetch Robotics property from Zebra Automation
- Why having “humans in the loop” in an AI battle is an phantasm
- Anthropic Unveils Opus 4.7 as AI Rivalry Intensifies
- Most “AI SOCs” Are Simply Sooner Triage. That is Not Sufficient.
- Python Mission Setup 2026: uv + Ruff + Ty + Polars
- High 10 instruments for multi-cloud structure design



![[P] I educated a Mamba-3 log anomaly detector that hit 0.9975 F1 on HDFS — and I’m curious how far this may go [P] I trained a Mamba-3 log anomaly detector that hit 0.9975 F1 on HDFS — and I’m curious how far this can go](https://technologiesdigest.com/wp-content/uploads/2026/04/P-I-trained-a-Mamba-3-log-anomaly-detector-that-hit.png)