On June 1, 2026, MiniMax officially launched its M3 model, featuring MSA—a new sparse attention architecture—that supports a context window of up to 1 million tokens. M3 also natively handles image and video inputs and can operate desktop computers. The API is already live.
The model is accessible through MiniMax Code, the MiniMax Token Plan, and the MiniMax API. As the successor to M2.7, M3 aims to be the first open-source model to merge advanced coding performance, massive context capacity, and native multimodal processing in one unified system. The technical report and model weights will be released within 10 days.
MSA: MiniMax Sparse Attention
The core innovation of M3 is the MSA (MiniMax Sparse Attention) architecture. Unlike traditional full attention, which requires quadratically increasing computational power as context length grows, MSA introduces a pre-filtering stage to manage this cost.
According to the MiniMax team, MSA achieves superior context coverage than methods like DSA and MoBA by dividing the KV cache into more precise blocks. The architecture also uses a “KV outer gather Q” operator. By aggregating queries that match specific KV blocks and reading each block only once, the process becomes significantly more efficient, reportedly over 4x faster than current open-source implementations like Flash-Sparse-Attention.
At 1 million tokens, M3’s per-token computational cost is just 1/20th of previous M2 models. The prefill stage is more than 9x faster, and decoding is over 15x faster. Despite these efficiency gains, tests show MSA matches full attention across most tasks.
Coding and Agentic Benchmarks
M3 demonstrates strong performance in coding and autonomous agent tasks. Results were gathered from a mix of internal testing and public leaderboards:
- SWE-Bench Pro: 59.0% (outperforms GPT-5.5 and Gemini 3.1 Pro)
- Terminal-Bench 2.1: 66.0%
- SWE-fficiency: 34.8%
- KernelBench Hard: 28.8% (tested on NVIDIA Blackwell GPUs)
- MCP Atlas: 74.2%
- Claw-Eval: Top score among models in the General Task Group
- SVG-Bench: Beats Opus 4.7
For multimodal understanding, M3 scores above Gemini 3.1 Pro on OmniDocBench. In computer usage tasks (OSWorld-Verified), it achieves a 70.06% completion rate.
The MiniMax team also developed an interactive training framework to better simulate real-world workflows. This involves multi-turn collaboration, including requirement discussions, feedback loops, and iterative project development, moving beyond typical single-turn evaluations.
Native Multimodality
M3 was trained from the start with text, images, and video together. The team emphasizes that this mixed-modality training is vital for performance. After rebuilding the pipeline for this approach, the training dataset was expanded to 100 trillion tokens.
The model natively supports image and video inputs and can operate desktop environments.
Real-World Example: MiniMax Tasks
Academic Paper Reproduction: M3 was asked to reproduce the experiments for an award-winning ICLR paper. The model worked for nearly 12 hours, producing 18 code commits and 23 charts, successfully completing the core experiments without human help.
CUDA Kernel Optimization: Starting from a basic task description and a non-functional skeleton, M3 optimized an FP8 matrix multiplication kernel on Hopper GPUs. Through 147 submissions over 24 hours, it increased hardware utilization from 7.6% to 71.3% (a 9.4x improvement). Most competing models stop improving within the first 30 submissions, whereas M3 continued to iterate.
PostTrainBench (Autonomous Training): M3 was tasked with running a full training cycle for four base models to improve their performance across various skills like math and code generation. After 12 hours of independent work, M3 achieved a score of 0.37, placing it ahead of several other models but slightly below Opus 4.7.
Marktechpost’s Visual Guide
Max
Roughly 5.1 billion tokens per month
$50 monthly
Ultra
Roughly 9.8 billion tokens per month
$120 monthly
All types of usage—including text, images, speech, and music—share this same token allowance.
Key Takeaways
- The MiniMax M3 model debuted on June 1, 2026, and its API is readily accessible now. MiniMax has firmly committed to making the model’s open weights and a detailed technical report available to the public within the next 10 days.
- Utilizing MiniMax Sparse Attention (MSA), the model provides a greater than 9× speedup in prefill and a more than 15× speedup in decoding when working with a 1M-token context compared to M2. This efficiency comes at merely 1/20th of the per-token computational expense.
- On the SWE-Bench Pro benchmark, M3 reached a score of 59.0%, thereby beating out GPT-5.5 and Gemini 3.1 Pro.
- Built to be multimodal from the ground up, M3 effortlessly accommodates image and video inputs, and it hits a 70.06% success rate on OSWorld-Verified for practical computer use scenarios.
Be sure to review the Technical specifics. Additionally, we’d love for you to follow us on Twitter and remember to become a part of our 150k+ ML SubReddit and sign up for our Newsletter. Wait! If you prefer Telegram, you can now connect with us there too.
Interested in a partnership to advertise your GitHub Repository, Hugging Face Profile, Product Launch, Webinar, or anything else? Get in touch with us



