Liquid AI has launched LFM2-24B-A2B, a mannequin optimized for native, low-latency instrument dispatch, alongside LocalCowork, an open-source desktop agent software obtainable of their Liquid4All GitHub Cookbook. The discharge offers a deployable structure for operating enterprise workflows solely on-device, eliminating API calls and information egress for privacy-sensitive environments.
Structure and Serving Configuration
To attain low-latency execution on client {hardware}, LFM2-24B-A2B makes use of a Sparse Combination-of-Specialists (MoE) structure. Whereas the mannequin incorporates 24 billion parameters in complete, it solely prompts roughly 2 billion parameters per token throughout inference.
This structural design permits the mannequin to take care of a broad data base whereas considerably decreasing the computational overhead required for every era step. Liquid AI stress-tested the mannequin utilizing the next {hardware} and software program stack:
- {Hardware}: Apple M4 Max, 36 GB unified reminiscence, 32 GPU cores.
- Serving Engine:
llama-serverwith flash consideration enabled. - Quantization:
Q4_K_M GGUFformat. - Reminiscence Footprint: ~14.5 GB of RAM.
- Hyperparameters: Temperature set to 0.1, top_p to 0.1, and max_tokens to 512 (optimized for deterministic, strict outputs).
LocalCowork Instrument Integration
LocalCowork is a very offline desktop AI agent that makes use of the Mannequin Context Protocol (MCP) to execute pre-built instruments with out counting on cloud APIs or compromising information privateness, logging each motion to a neighborhood audit path. The system consists of 75 instruments throughout 14 MCP servers able to dealing with duties like filesystem operations, OCR, and safety scanning. Nevertheless, the offered demo focuses on a extremely dependable, curated subset of 20 instruments throughout 6 servers, every rigorously examined to attain over 80% single-step accuracy and verified multi-step chain participation.
LocalCowork acts as the sensible implementation of this mannequin. It operates utterly offline and comes pre-configured with a set of enterprise-grade instruments:
- File Operations: Itemizing, studying, and looking out throughout the host filesystem.
- Safety Scanning: Figuring out leaked API keys and private identifiable data (PII) inside native directories.
- Doc Processing: Executing Optical Character Recognition (OCR), parsing textual content, diffing contracts, and producing PDFs.
- Audit Logging: Recording each instrument name regionally for compliance monitoring.
Efficiency Benchmarks
Liquid AI group evaluated the mannequin towards a workload of 100 single-step instrument choice prompts and 50 multi-step chains (requiring 3 to six discrete instrument executions, corresponding to looking out a folder, operating OCR, parsing information, deduplicating, and exporting).
Latency
The mannequin averaged ~385 ms per tool-selection response. This sub-second dispatch time is very appropriate for interactive, human-in-the-loop purposes the place instant suggestions is critical.
Accuracy
- Single-Step Executions: 80% accuracy.
- Multi-Step Chains: 26% end-to-end completion charge.
Key Takeaways
- Privateness-First Native Execution: LocalCowork operates solely on-device with out cloud API dependencies or information egress, making it extremely appropriate for regulated enterprise environments requiring strict information privateness.
- Environment friendly MoE Structure: LFM2-24B-A2B makes use of a Sparse Combination-of-Specialists (MoE) design, activating solely ~2 billion of its 24 billion parameters per token, permitting it to suit comfortably inside a ~14.5 GB RAM footprint utilizing
Q4_K_M GGUFquantization. - Sub-Second Latency on Client {Hardware}: When benchmarked on an Apple M4 Max laptop computer, the mannequin achieves a median latency of ~385 ms for tool-selection dispatch, enabling extremely interactive, real-time workflows.
- Standardized MCP Instrument Integration: The agent leverages the Mannequin Context Protocol (MCP) to seamlessly join with native instruments—together with filesystem operations, OCR, and safety scanning—whereas mechanically logging all actions to a neighborhood audit path.
- Robust Single-Step Accuracy with Multi-Step Limits: The mannequin achieves 80% accuracy on single-step instrument execution however drops to a 26% success charge on multi-step chains as a result of ‘sibling confusion’ (deciding on the same however incorrect instrument), indicating it at the moment capabilities greatest in a guided, human-in-the-loop loop somewhat than as a completely autonomous agent.
Take a look at the Repo and Technical particulars. Additionally, be at liberty to comply with us on Twitter and don’t overlook to hitch our 120k+ ML SubReddit and Subscribe to our Publication. Wait! are you on telegram? now you possibly can be a part of us on telegram as effectively.



