MiniMax M3 Debuts With Revolutionary MSA Architecture: Unleashing 1M-Token Context, Native Multimodality, And Powerful Agentic Coding

On June 1, 2026, MiniMax officially launched its M3 model, featuring MSA—a new sparse attention architecture—that supports a context window of up to 1 million tokens. M3 also natively handles image and video inputs and can operate desktop computers. The API is already live.

The model is accessible through MiniMax Code, the MiniMax Token Plan, and the MiniMax API. As the successor to M2.7, M3 aims to be the first open-source model to merge advanced coding performance, massive context capacity, and native multimodal processing in one unified system. The technical report and model weights will be released within 10 days.

MSA: MiniMax Sparse Attention

The core innovation of M3 is the MSA (MiniMax Sparse Attention) architecture. Unlike traditional full attention, which requires quadratically increasing computational power as context length grows, MSA introduces a pre-filtering stage to manage this cost.

According to the MiniMax team, MSA achieves superior context coverage than methods like DSA and MoBA by dividing the KV cache into more precise blocks. The architecture also uses a “KV outer gather Q” operator. By aggregating queries that match specific KV blocks and reading each block only once, the process becomes significantly more efficient, reportedly over 4x faster than current open-source implementations like Flash-Sparse-Attention.

At 1 million tokens, M3’s per-token computational cost is just 1/20th of previous M2 models. The prefill stage is more than 9x faster, and decoding is over 15x faster. Despite these efficiency gains, tests show MSA matches full attention across most tasks.

Coding and Agentic Benchmarks

M3 demonstrates strong performance in coding and autonomous agent tasks. Results were gathered from a mix of internal testing and public leaderboards:

SWE-Bench Pro: 59.0% (outperforms GPT-5.5 and Gemini 3.1 Pro)
Terminal-Bench 2.1: 66.0%
SWE-fficiency: 34.8%
KernelBench Hard: 28.8% (tested on NVIDIA Blackwell GPUs)
MCP Atlas: 74.2%
Claw-Eval: Top score among models in the General Task Group
SVG-Bench: Beats Opus 4.7

For multimodal understanding, M3 scores above Gemini 3.1 Pro on OmniDocBench. In computer usage tasks (OSWorld-Verified), it achieves a 70.06% completion rate.

The MiniMax team also developed an interactive training framework to better simulate real-world workflows. This involves multi-turn collaboration, including requirement discussions, feedback loops, and iterative project development, moving beyond typical single-turn evaluations.

Native Multimodality

M3 was trained from the start with text, images, and video together. The team emphasizes that this mixed-modality training is vital for performance. After rebuilding the pipeline for this approach, the training dataset was expanded to 100 trillion tokens.

The model natively supports image and video inputs and can operate desktop environments.

Real-World Example: MiniMax Tasks

Academic Paper Reproduction: M3 was asked to reproduce the experiments for an award-winning ICLR paper. The model worked for nearly 12 hours, producing 18 code commits and 23 charts, successfully completing the core experiments without human help.

CUDA Kernel Optimization: Starting from a basic task description and a non-functional skeleton, M3 optimized an FP8 matrix multiplication kernel on Hopper GPUs. Through 147 submissions over 24 hours, it increased hardware utilization from 7.6% to 71.3% (a 9.4x improvement). Most competing models stop improving within the first 30 submissions, whereas M3 continued to iterate.

PostTrainBench (Autonomous Training): M3 was tasked with running a full training cycle for four base models to improve their performance across various skills like math and code generation. After 12 hours of independent work, M3 achieved a score of 0.37, placing it ahead of several other models but slightly below Opus 4.7.

Marktechpost’s Visual Guide

Overview

MiniMax M3: Frontier Coding,

One-Million-Token Context, Built-In Multimodal Support

MiniMax launched M3 on June 1, 2026. The API is currently available for use. Model weights and a detailed technical report are scheduled to be made open-source within the next 10 days.

M3 is the latest addition to the M-series lineup, succeeding M2.7. MiniMax presents it as the first openly available model that integrates all three of these capabilities within a single unified architecture:

1M
Token Context Window

59.0%
SWE-Bench Pro Score

MSA
Sparse Attention Architecture

70.06%
OSWorld-Verified (Computer Use)

Architecture

MSA: MiniMax Sparse Attention

Traditional full attention scales quadratically in computation — as the input sequence gets longer, processing costs rise proportional to the square of the sequence length. MSA addresses this bottleneck at the operator level.

When compared to methods such as DSA and MoBA, MSA breaks down KV cache blocks with greater precision, resulting in broader effective context coverage.

MSA follows a “KV outer gather Q” strategy — each KV block is fetched just once, memory reads happen sequentially in a contiguous pattern, and computational throughput is substantially higher than what is typically achieved with conventional approaches.

>9×
Prefill Speedup at 1M ctx

>15×
Decoding Speedup at 1M ctx

1/20
Per-token compute vs M2 at 1M

>4×
Faster than Flash-Sparse-Attn

Benchmarks

Coding and Agentic Performance

Results shared by MiniMax. SWE-Bench Verified was run with Claude Code scaffolding, averaged over 4 independent runs. SWE-Bench Pro was also run with Claude Code scaffolding, aligned with the official evaluation protocol.

SWE-Bench Pro: 59.0% — exceeds GPT-5.5 and Gemini 3.1 Pro; nears Opus 4.7
Terminal-Bench 2.1: 66.0%
SWE-fficiency: 34.8%
KernelBench Hard: 28.8% — evaluated on NVIDIA Blackwell GPUs (sm_120)
MCP Atlas: 74.2%
Claw-Eval: Top-scoring model among all evaluated (161 tasks)
SVG-Bench: Outperforms Opus 4.7
OmniDocBench: Beats Gemini 3.1 Pro
OSWorld-Verified: 70.06% — 361 samples, Max Steps = 200

Multimodality

Native Multimodal Training from the Ground Up

M3 was trained on mixed modalities starting from the very first step. Text, images, and video are all learned together from the outset — rather than being bolted on during a later fine-tuning stage.

MiniMax reports that interleaved data — sequences where text and images are woven together naturally — matters more to overall model quality than most people think.

After completely rebuilding the data pipeline to support these interleaved formats, the total training data was expanded to roughly 100 trillion tokens.

Image input
Video input
Desktop computer operation (computer use)

Real-World Tasks

Three Internal Tasks Documented by MiniMax

Paper Reproduction — On its own over approximately 12 hours, M3 recreated the ICLR 2025 paper Learning Dynamics of LLM Finetuning, producing 18 code commits and 23 experimental figures without any human involvement.
CUDA Kernel Optimization — Over roughly 24 hours, M3 improved an FP8 GEMM kernel on NVIDIA Hopper GPUs: 147 benchmark submissions, 1,959 tool calls, and six major optimization milestones. Peak Hopper FP8 utilization climbed from 7.6% to 71.3% (a 9.4× improvement). The best-performing solution came at submission number 145.
PostTrainBench — Over 12 hours, M3 independently executed the full cycle of data synthesis, training, evaluation, and iteration across four base models. It earned a score of 0.37, placing it below Opus 4.7 (0.42) and GPT-5.5 (0.39), but ahead of all other tested models. Targets included: AIME2025, BFCL, GPQA Main, GSM8K, and HumanEval.

MiniMax Code

MiniMax Code: An Agent Product Built and Trained with M3

MiniMax Code is an agent product co-designed and trained alongside M3. It is available at agent.minimaxi.com/download and works with MiniMax Token Plans.

Agent Teams — multiple agents working simultaneously through multi-stage workflows that can be adjusted on the fly
Producer + Verifier loop — an adversarial harness that lets the system self-correct during task execution
Computer use — M3’s built-in multimodal abilities allow for cross-application desktop automation
Built on OpenCode and Pi — MiniMax has stated its intention to release MiniMax Code as open-source down the road

// Example use case
User (on phone): “Open the local ERP client
and batch-enter invoice data from this Excel file.”
→ MiniMax Code manages all operations across
applications, files, and systems on the desktop.

API & Pricing

API Details and Token Plan Tiers

The M3 API is accessible at platform.minimax.io.

Pricing by input length: Requests using up to 512K tokens are billed at the standard rate. Requests exceeding 512K tokens incur the higher long-context rate.

Thinking mode: Can be toggled on or off per request. Both modes are priced identically.

Service tiers: standard (default) and priority (set via service_tier=priority) — priority access is available through sales and will be rolled out to all users in the near future.

Plus
~1.7B tokens/mo
$20/mo

Max
Roughly 5.1 billion tokens per month
$50 monthly

Ultra
Roughly 9.8 billion tokens per month
$120 monthly

All types of usage—including text, images, speech, and music—share this same token allowance.

Key Takeaways

Crucial Information for Engineers and Researchers

MiniMax M3 was officially released on June 1, 2026. Its API is currently active. The company has pledged to publish open model weights and a comprehensive technical report within a 10-day timeframe.
When handling 1M-token contexts, MSA achieves >9× faster prefill and >15× quicker decoding compared to M2, while only requiring 1/20th of the computational cost per token.
M3 achieves a 59.0% score on SWE-Bench Pro, outperforming both GPT-5.5 and Gemini 3.1 Pro.
It is inherently multimodal right from the start—capable of processing both image and video inputs—and secures 70.06% on OSWorld-Verified for tasks involving computer interaction.
The “Thinking” mode can be turned on or off whenever a request is made. Pricing for the Token Plan begins at $20 monthly (providing approximately 1.7B M3 tokens).

Key Takeaways

The MiniMax M3 model debuted on June 1, 2026, and its API is readily accessible now. MiniMax has firmly committed to making the model’s open weights and a detailed technical report available to the public within the next 10 days.
Utilizing MiniMax Sparse Attention (MSA), the model provides a greater than 9× speedup in prefill and a more than 15× speedup in decoding when working with a 1M-token context compared to M2. This efficiency comes at merely 1/20th of the per-token computational expense.
On the SWE-Bench Pro benchmark, M3 reached a score of 59.0%, thereby beating out GPT-5.5 and Gemini 3.1 Pro.
Built to be multimodal from the ground up, M3 effortlessly accommodates image and video inputs, and it hits a 70.06% success rate on OSWorld-Verified for practical computer use scenarios.

Presenting MiniMax M3: The Pioneering Open-Weights Model to Merge Three Cutting-Edge Capabilities
– Coding & Agentic Edge: 59.0% SWE-Bench Pro, 66.0% Terminal Bench 2.1, 34.8% SWE-fficiency, 28.8% KernelBench Hard, 74.2% MCP Atlas
– MiniMax Sparse Attention expands context capacity to 1M
-… pic.twitter.com/TF891iJukF
— MiniMax (official) (@MiniMax_AI) June 1, 2026

Be sure to review the Technical specifics. Additionally, we’d love for you to follow us on Twitter and remember to become a part of our 150k+ ML SubReddit and sign up for our Newsletter. Wait! If you prefer Telegram, you can now connect with us there too.

Interested in a partnership to advertise your GitHub Repository, Hugging Face Profile, Product Launch, Webinar, or anything else? Get in touch with us

Top Posts

Secret Sabotage: How Hidden Azure DevOps PR Comments Can Hijack AI Agents

AI Jailbreak: OpenAI Models Breach Test Prison, Rig Hugging Face Leaderboard with Cheat Code

Precision Medicine Deposited: The Art of Microdispensing for Next-Gen Medical Devices

MiniMax M3 Debuts With Revolutionary MSA Architecture: Unleashing 1M-Token Context, Native Multimodality, and Powerful Agentic Coding

Beyond Guesswork: A Slurm-Powered Battle Plan for Benchmarking Distributed LLM Servers

Beyond Prompt Engineering: How 4 Context Bricks Silence RAG Hallucinations

Run Mythos Enhanced Coding Model Locally with llama.cpp on Raspberry Pi

Astryx: Meta’s Open-Source React Toolkit—150+ Accessible Components, 7 Themes, and a CLI Agent-Ready Design System

Endless Code: Mastering the Art of the 24-Hour Claude Agent

Unlock Peak Performance: Your Blueprint for Lightning-Fast Agentic Coding with Claude

Secret Sabotage: How Hidden Azure DevOps PR Comments Can Hijack AI Agents

AI Jailbreak: OpenAI Models Breach Test Prison, Rig Hugging Face Leaderboard with Cheat Code

Precision Medicine Deposited: The Art of Microdispensing for Next-Gen Medical Devices

When the World Cup Collided with the Cloud: 2026’s Digital Traffic Surge

Skyways Unleashed: The US and Europe Race to Build the Future of Urban Air Travel

5 No-Cost Courses to Transform from AI Newbie to Pro

Beyond Guesswork: A Slurm-Powered Battle Plan for Benchmarking Distributed LLM Servers

The Magic of Friction: Engineering Smarter Robot World Models

Trending

Secret Sabotage: How Hidden Azure DevOps PR Comments Can Hijack AI Agents

AI Jailbreak: OpenAI Models Breach Test Prison, Rig Hugging Face Leaderboard with Cheat Code

Latest Posts

Not More Data, but Better World Models – Unite.AI

OpenAI Is Hiring Head of Preparedness, Amid AI Cyberattack Fears

Subscribe to Updates

Top Posts

MiniMax M3 Debuts With Revolutionary MSA Architecture: Unleashing 1M-Token Context, Native Multimodality, and Powerful Agentic Coding

MSA: MiniMax Sparse Attention

Coding and Agentic Benchmarks

Native Multimodality

Real-World Example: MiniMax Tasks

Marktechpost’s Visual Guide

MiniMax M3: Frontier Coding,

One-Million-Token Context, Built-In Multimodal Support

MSA: MiniMax Sparse Attention

Coding and Agentic Performance

Native Multimodal Training from the Ground Up

Three Internal Tasks Documented by MiniMax

MiniMax Code: An Agent Product Built and Trained with M3

API Details and Token Plan Tiers

Crucial Information for Engineers and Researchers

Key Takeaways

Related Posts