Hermes Agent already supports persistent memory out of the box. The Nous Research open-source agent includes curated memory embeddings and full-text retrieval across sessions. But one community contributor argues that native memory capabilities fall short for demanding workloads. Enter Memory OS—a brand-new MIT-licensed library built by ClaudioDrews—which layers six distinct memory tiers onto Hermes. It introduces a vector database, structured fact storage, and a self-maintaining knowledge wiki. While still early, the project shows real promise, and its architecture offers a glimpse into how future agent memory systems might be designed.
Memory OS
Memory OS isn’t a simple add-on for Hermes. It’s a full supplementary framework running parallel to Hermes Agent’s built-in memory. Hermes natively uses workspace directories and a session log. Memory OS preserves those and introduces four additional layers on top. Everything runs locally via Docker, Qdrant, Redis, and Python 3.11+. It integrates with any LLM provider Hermes supports—OpenRouter, OpenAI, Anthropic, Ollama included. The README describes it as a memory operating system, not just a single enhancement.
The Six Layers, From Files to Vectors
- Layer 1 — Workspace: Stores MEMORY.md, USER.md, and CREATIVE.md, embedding them directly into the system prompt each turn.
- Layer 2 — Sessions: Leverages state.db, a SQLite database with FTS5 full-text search spanning previous conversations.
- Layer 3 — Structured Facts: Captures lasting knowledge in memory_store.db using SQLite, HRR, FTS5, and trust scores. A feedback loop continuously adjusts those confidence ratings over time, paired with entity resolution.
- Layer 4 — Fabric: A heavily modified fork of the Icarus Plugin. This adaptation adds LLM-driven session extraction not present in the upstream esaradev/icarus-plugin. Cross-session recall is managed through 16 tools, including fabric_recall, fabric_write, and fabric_brief.
- Layer 5 — Vector Database: Powered by Qdrant. Employs 4096d Cosine vectors combined with BM25 sparse search for keyword-style matching.
- Layer 6 — LLM Wiki: A continuously updated vault of topics, entities, and comparisons. The wiki is ingested back into Qdrant on an ongoing basis through a process named wiki-continuous-ingest.
How the Retrieval Flow Works
The pipeline governs when memory is read and stored. During pre_llm_call, Memory OS performs what it calls surgical recall. It queries four sources simultaneously: Fabric, Qdrant, Sessions, and Facts. Each source passes through a relevance threshold check before its contents reach the model. Per-session deduplication prevents duplicate context from surfacing twice. A social-noise filter skips trivial exchanges like a simple “thanks.” During post_llm_call and on_session_end, the system automatically extracts and records new insights. The primary goal is keeping token usage lean, not flooding the context window.
The Fallback Cascade and Cleanup
Layer 5’s retrieval follows a four-stage fallback chain. It attempts hybrid search first, then dense vectors, then lexical, then SQLite. If one approach fails, the next steps in. This design ensures retrieval keeps working even under vector database degradation. Memory OS also runs a weekly decay scanner to phase out outdated entries. Semantic dedup merges nearly identical memories when cosine similarity goes past 0.92. These maintenance routines aim to prevent memory bloat during months of continuous use.
Local-First, And Deliberately So
Memory OS positions itself in contrast to cloud memory providers such as mem0, Zep, and Letta. Its pitch is that memory infrastructure should run entirely on your machine. Memory data stays local—no subscription required. API calls still go to whichever LLM provider you select. Hermes already integrates eight external memory partners, including mem0 and Honcho. Memory OS is not one of those official integrations. It’s an independent, community-built framework layered directly on Hermes. For teams subject to data-residency requirements, a local-first memory store can make a real difference.
Strengths and Limitations
Strengths:
- Well-defined layered architecture that cleanly separates files, sessions, facts, vectors, and a wiki
- Entirely local infrastructure with no cloud memory service dependency
- Works with any LLM provider, matching Hermes Agent’s flexibility
- Designed for token efficiency through gated retrieval and per-session deduplication
Limitations:
- Very early stage, with limited commit history
- A forked Icarus Plugin that the author confirms is not compatible with upstream
- Heavier setup: Docker, Qdrant, Redis, and an ARQ Worker required
- No public benchmarks for recall accuracy, response latency, or token savings
Key Takeaways
- Memory OS is a community-built, MIT-licensed framework adding six memory tiers on top of Hermes Agent.
- It integrates workspace files, FTS5 session search, trust-scored facts, a forked Icarus fabric, Qdrant vectors, and a self-maintaining LLM wiki.
- Retrieval triggers on
pre_llm_callwith gated, deduplicated pulls from four sources; capture fires onpost_llm_callandon_session_end. - The memory layer runs fully local and is provider-agnostic, though LLM API calls still route to your chosen provider.
Explore the Repo. Also, feel free to follow us on Twitter and don’t forget to join our 150k+ ML SubReddit and subscribe to our Newsletter. Wait—are you on Telegram? You can join us there too.
Looking to partner with us to promote your GitHub repo, Hugging Face page, product launch, or webinar? Get in touch
The post Meet Memory OS: A 6-Layer Open-Source Memory Stack Built on Top of Hermes Agent appeared first on MarkTechPost.




6 layers, fully local: