Constructing a Retrieval-Augmented Technology (RAG) pipeline is simple; constructing one which doesn’t hallucinate throughout a 10-Okay audit is almost unimaginable. For devs within the monetary sector, the ‘standard’ vector-based RAG strategy—chunking textual content and hoping for the very best—usually ends in a ‘text soup’ that loses the important structural context of tables and steadiness sheets.
VectifyAI is making an attempt to shut this hole with the launch of Mafin 2.5, a multimodal monetary agent, and PageIndex, an open-source framework that shifts the trade towards ‘Vectorless RAG.’
The Drawback: Why Vector RAG Fails Finance
Conventional RAG depends on semantic similarity. If you happen to ask about ‘Net Income,’ a vector database seems for chunks of textual content that sound like internet earnings. Nonetheless, monetary paperwork are layout-dependent. A quantity in a cell is meaningless with out its header, and people headers are sometimes stripped away throughout conventional PDF-to-text conversion.
That is the ‘garbage in, garbage out’ entice: even the neatest LLM can not purpose appropriately if the enter information has misplaced its hierarchical construction.
Mafin 2.5: Accuracy at Scale
Mafin 2.5 isn’t only a fine-tuned mannequin; it’s a reasoning engine that achieved 98.7% accuracy on FinanceBench, considerably outperforming GPT-4o and Perplexity in monetary retrieval duties.
What units it aside for devs is its native integration with high-fidelity information sources:
- Complete SEC Entry: Direct indexing of 10-Okay, 10-Q, and 8-Okay filings.
- Earnings Intel: Actual-time and historic earnings name transcripts.
- Market Information: Dwell tickers throughout the Russell 3000 and Nasdaq.

PageIndex: The Transfer to ‘Vectorless’ RAG
The ‘secret sauce’ behind Mafin 2.5’s precision is PageIndex. PageIndex replaces conventional flat embeddings with a hierarchical tree index.
As an alternative of looking via random chunks, PageIndex permits an LLM to ‘reason’ via a doc’s construction. It builds a semantic tree—basically an clever map of the doc—enabling the agent to establish the precise part, web page, and line merchandise required.
Key technical options embody:
- Imaginative and prescient-Native Help: PageIndex helps Imaginative and prescient-based RAG, permitting fashions to ‘see’ the worldwide structure of a web page (charts, advanced grids) somewhat than relying solely on OCR textual content.
- Hierarchical Navigation: It transforms PDFs right into a navigable tree construction, guaranteeing the connection between headers and information stays intact.
- Traceability: Not like the ‘black box’ of vector similarity, each reply has a transparent path via the doc tree, offering a much-needed audit path for regulated monetary environments.
Key Takeaways
- Unprecedented Monetary Accuracy (98.7%): Mafin 2.5 has set a brand new state-of-the-art document on the FinanceBench benchmark, reaching 98.7% accuracy. This considerably outperforms general-purpose fashions like GPT-4o (~31%) and Perplexity (~45%) by specializing in specialised monetary reasoning somewhat than common retrieval.
- The Shift to ‘Vectorless RAG’: Transferring away from the “vibe-based” search of conventional vector databases, PageIndex introduces Reasoning-based RAG. It makes use of an LLM to ‘reason’ its means via a doc’s construction, mimicking how a human analyst navigates a report to seek out particular information factors.
- Hierarchical ‘Tree’ Indexing vs. Chunking: As an alternative of chopping paperwork into arbitrary, contextless textual content chunks, PageIndex organizes PDFs right into a semantic tree construction (an clever Desk of Contents). This preserves the crucial relationship between headers, nested tables, and footnotes that conventional RAG usually destroys.
- Imaginative and prescient-Native & OCR-Free Workflows: The framework helps Imaginative and prescient-based Vectorless RAG, permitting the AI to ‘see’ and retrieve data instantly from web page pictures. This can be a game-changer for monetary paperwork the place the visible structure of a steadiness sheet or advanced grid is as necessary because the numbers themselves.
- Enterprise-Grade Traceability: Not like the ‘black box’ of vector similarity, PageIndex gives a absolutely auditable reasoning path. Each response is linked to particular nodes, pages, and sections, offering the transparency required for high-stakes monetary audits and compliance.
Take a look at the Technical particulars and Repo. Additionally, be at liberty to observe us on Twitter and don’t overlook to hitch our 100k+ ML SubReddit and Subscribe to our Publication. Wait! are you on telegram? now you’ll be able to be part of us on telegram as nicely.

Michal Sutter is an information science skilled with a Grasp of Science in Information Science from the College of Padova. With a stable basis in statistical evaluation, machine studying, and information engineering, Michal excels at remodeling advanced datasets into actionable insights.




