Poolside AI launched the primary two fashions in its Laguna household: Laguna M.1 and Laguna XS.2. Alongside these, the corporate is releasing pool — a light-weight terminal-based coding agent and a twin Agent Consumer Protocol (ACP) client-server — the identical atmosphere Poolside makes use of internally for agent RL coaching and analysis, now accessible as a analysis preview.
What are These Fashions, and Why Ought to You Care?
Each Laguna M.1 and Laguna XS.2 are Combination-of-Specialists (MoE) fashions. As an alternative of activating all parameters for each token, MoE fashions route every token by means of solely a subset of specialised sub-networks referred to as ‘experts.’ This implies a big complete parameter depend and the potential headroom that comes with it whereas solely paying the compute value of a a lot smaller “activated” parameter depend at inference time.
Laguna M.1 is a 225B complete parameter MoE mannequin with 23B activated parameters, educated from scratch on 30T tokens utilizing 6,144 interconnected NVIDIA Hopper GPUs. It accomplished pre-training on the finish of final 12 months and serves as the inspiration for the whole Laguna household. On benchmarks, it reaches 72.5% on SWE-bench Verified, 67.3% on SWE-bench Multilingual, 46.9% on SWE-bench Professional, and 40.7% on Terminal-Bench 2.0.
Laguna XS.2 is the second-generation MoE and Poolside’s first open-weight mannequin, constructed on every thing discovered since coaching M.1. At 33B complete parameters with 3B activated per token, it’s designed for agentic coding and long-horizon work on an area machine — compact sufficient to run on a Mac with 36 GB of RAM by way of Ollama. It scores 68.2% on SWE-bench Verified, 62.4% on SWE-bench Multilingual, 44.5% on SWE-bench Professional, and 30.1% on Terminal-Bench 2.0. Poolside can even launch Laguna XS.2-base quickly for practitioners who wish to fine-tune.
Structure: The Effectivity Choices in XS.2
XS.2 makes use of sigmoid gating with per-layer rotary scales, enabling a blended Sliding Window Consideration (SWA) and world consideration format in a 3:1 ratio throughout 40 complete layers — 30 SWA layers and 10 world consideration layers. Sliding Window Consideration limits every token’s consideration to an area window of 512 tokens slightly than the total sequence, dramatically slicing KV cache reminiscence. The worldwide consideration layers at a 1-in-4 ratio protect long-range dependencies with out paying the total value in all places. The mannequin additionally quantizes the KV cache to FP8, additional decreasing reminiscence per token.
Underneath the hood, XS.2 makes use of 256 specialists with 1 shared professional, helps a context window of 131,072 tokens, and options native reasoning assist — interleaved pondering between device calls with per-request management over enabling or disabling pondering.




Coaching: Three Areas Poolside Pushed Arduous
Poolside group trains all its fashions from scratch utilizing its personal information pipeline, its personal coaching codebase (Titan), and its personal agent RL infrastructure. Three areas noticed explicit funding for Laguna.
AutoMixer: Optimizing the Information Combine Robotically. Information curation and the combo that goes into coaching is extraordinarily impactful on closing mannequin efficiency. Quite than counting on handbook heuristics, Poolside developed an automixing framework that trains a swarm of roughly 60 proxy fashions, every on a special information combine, and measures efficiency throughout key functionality teams — code, math, STEM, and customary sense. Surrogate regressors are then match to approximate how modifications in dataset proportions have an effect on downstream evaluations, giving a discovered mapping from information combine to efficiency that may be immediately optimized. The strategy is impressed by prior work together with Olmix, MDE, and RegMix, tailored to Poolside’s setting with richer information groupings.
On the information facet, each Laguna fashions have been educated on greater than 30T tokens. Poolside’s diversity-preserving information curation strategy — which retains parts of mid- and lower-quality buckets alongside top-quality information to keep away from STEM bias — yields roughly 2× extra distinctive tokens in comparison with precision-focused pipelines, with the achieve persisting at longer coaching horizons. A separate deduplication evaluation additionally confirmed that world deduplication disproportionately removes high-quality information, informing how the group tuned its pipeline. Artificial information contributes about 13% of the ultimate coaching combine in Laguna XS.2, with the Laguna collection utilizing roughly 4.4T+ artificial tokens in complete.
Muon Optimizer. Quite than AdamW — the most typical optimizer in massive mannequin coaching — Poolside used a distributed implementation of the Muon optimizer by means of all coaching levels of each fashions. In preliminary pre-training ablations, the analysis group achieved the identical coaching loss as an AdamW baseline in roughly 15% fewer steps, with massive absolute analysis uplifts on the ultimate mannequin, and achieved studying price switch throughout mannequin scales. An extra profit: Muon requires just one state per parameter slightly than two, decreasing reminiscence necessities for each coaching and checkpointing. Throughout pre-training of Laguna M.1, the overhead from the optimizer was lower than 1% of the coaching step time.
Poolside additionally runs periodic hash checks on mannequin weights throughout coaching replicas to catch silent information corruption (SDC) from faulty GPUs — particularly errors in arithmetic logic and pipeline registers, which not like DRAM and SRAM are usually not lined by ECC safety.
Async On-Coverage Agent RL. That is arguably essentially the most complicated piece of the Laguna coaching stack. Poolside constructed a totally asynchronous on-line RL system the place actor processes pull duties from a dataset, spin up sandboxed containers, and run the manufacturing agent binary towards every job utilizing the freshly deployed mannequin. The ensuing trajectories are scored, filtered, and written to Iceberg tables, whereas the coach constantly consumes these data and produces the subsequent checkpoint — inference and coaching working asynchronously in parallel, with throughput tuned to steadiness off-policy staleness.
Key Takeaways
- Poolside releases its first open-weight mannequin: Laguna XS.2 is a 33B complete parameter MoE mannequin with solely 3B activated parameters per token, accessible below an Apache 2.0 license — compact sufficient to run domestically on a Mac with 36 GB of RAM by way of Ollama.
- Robust benchmark efficiency at small scale: Laguna XS.2 scores 68.2% on SWE-bench Verified and 44.5% on SWE-bench Professional, whereas the bigger Laguna M.1 (225B complete, 23B activated) reaches 72.5% on SWE-bench Verified and 46.9% on SWE-bench Professional — each educated from scratch on 30T tokens.
- Muon optimizer beats AdamW by 15% in coaching effectivity: Poolside changed AdamW with a distributed implementation of the Muon optimizer, attaining the identical coaching loss in roughly 15% fewer steps, with decrease reminiscence necessities — just one state per parameter as a substitute of two.
- AutoMixer replaces handbook information mixing with discovered optimization: As an alternative of handcrafted information recipes, Poolside trains a swarm of ~60 proxy fashions on completely different information mixes and matches surrogate regressors to optimize dataset proportions — with artificial information making up ~13% of Laguna XS.2’s closing coaching combine from a complete of 4.4T+ artificial tokens.
- Totally asynchronous agent RL with GPUDirect RDMA weight switch: Poolside’s RL system runs inference and coaching in parallel, transferring lots of of gigabytes of BF16 weights between nodes in below 5 seconds by way of GPUDirect RDMA, utilizing a token-in, token-out actor design and the CISPO algorithm for off-policy coaching stability.
Take a look at the Mannequin Weights and Technical particulars. Additionally, be happy to observe us on Twitter and don’t overlook to hitch our 130k+ ML SubReddit and Subscribe to our E-newsletter. Wait! are you on telegram? now you may be part of us on telegram as effectively.
Must accomplice with us for selling your GitHub Repo OR Hugging Face Web page OR Product Launch OR Webinar and many others.? Join with us



