Moonshot AI Unveils Kimi K2.7-Code: A Leap Forward With +21.8% Boost On Kimi Code Bench V2 Over K2.6

Moonshot AI launched Kimi K2.7-Code this week. It’s a specialized coding and agentic model. The model weights are available on Hugging Face under a Modified MIT license. You can also access it via the Kimi API and Kimi Code.

K2.7-Code is built for complex, long-running software engineering tasks rather than casual conversation. It can plan, make edits, execute tools, and debug through multiple steps. Moonshot has paired the model with a subscription-based coding platform.

Kimi K2.7-Code

K2.7-Code uses a Mixture-of-Experts architecture. It has 1 trillion total parameters but only activates 32 billion per token. The setup includes 384 experts, with 8 chosen per token plus 1 shared expert. It features 61 layers, one of which is a dense layer.

The attention mechanism uses MLA, and the feed-forward network relies on SwiGLU. A MoonViT vision encoder contributes an additional 400 million parameters for handling image and video inputs. The model comes with built-in INT4 quantization. Its context window supports 256K tokens (262,144).

Two important limitations to note: Thinking mode is always required — turning it off triggers an API error. Sampling settings are locked at temperature 1.0, top_p 0.95, n 1, and penalties 0.0. The default maximum output length is 32,768 tokens.

You can self-host the model using vLLM, SGLang, or KTransformers. The Hugging Face repository is substantial, taking up around 595 GB of disk space. This is designed for server-grade deployments, not for running on a laptop.

Benchmark

The Moonshot team shared results across six benchmarks. They compared K2.7-Code with K2.6, GPT-5.5, and Claude Opus 4.8. K2.7-Code outperforms K2.6 on every metric. The biggest improvement in coding is on Kimi Code Bench v2, jumping from 50.9 to 62.0.

Benchmark	Kimi K2.6	Kimi K2.7-Code	GPT-5.5	Claude Opus 4.8	K2.7 vs K2.6
Kimi Code Bench v2	50.9	62.0	69.0	67.4	+21.8%
Program Bench	48.3	53.6	69.1	63.8	+11.0%
MLS Bench Lite	26.7	35.1	35.5	42.8	+31.5%
Kimi Claw 24/7 Bench	42.9	46.9	52.8	50.4	+9.3%
MCP Atlas	69.4	76.0	79.4	81.3	+9.5%
MCP Mark Verified	72.8	81.1	92.9	76.4	+11.4%

K2.7-Code does edge out Opus 4.8 on MCP Mark Verified, scoring 81.1 compared to 76.4. It also comes close to GPT-5.5 on MLS Bench Lite. K2.7-Code was tested in Kimi Code CLI, GPT-5.5 in Codex xhigh, and Opus 4.8 in Claude Code xhigh.

Reasoning-Token Efficiency: A Cost Claim, Not Just Quality

The Moonshot team claims roughly 30% fewer reasoning tokens are used compared to K2.6. They describe this as “less overthinking.”

Reasoning tokens are billed as output tokens on most pricing plans. Agentic coding workflows involve hundreds or thousands of steps. Every plan, retry, and verification incurs the thinking cost again. A 30% reduction adds up significantly over a long session.

This improvement shows up in three ways simultaneously. First, reduced output-token cost per task. Second, quicker steps, which improves the experience in interactive CLI sessions. Third, more steps can fit within the context window before hitting limits.

Use Cases With Examples

Large-scale repository refactors are the primary use case. Direct the agent at a failing test suite. It reads through files, makes changes across multiple modules, and re-runs tests until they all pass.
Code reviews are another strong fit. Provide a pull request diff and request a risk analysis. The 256K context window can accommodate large diffs, logs, and related files all at once.
MCP tool-use workflows are a third good match. K2.7-Code scored 81.1 on MCP Mark Verified. That benchmark evaluates correct tool calls through the Model Context Protocol. Examples include CI checks, ticket updates, and file edits all within a single loop.
Long-context analysis is a fourth use case. The model handles text, image, and video inputs. Documentation, screenshots, and a recorded reproduction can all be included in one prompt.

Marktechpost’s Interactive Explorer

Kimi K2.7-Code — Interactive Explorer

Company-reported benchmarks and official API pricing. Released June 12, 2026. Verified June 12, 2026.

Benchmarks

Cost Calculator

Specs

Source: Moonshot AI Kimi K2.7-Code model card. K2.7-Code ran in Kimi Code CLI; GPT-5.5 in Codex xhigh; Claude Opus 4.8 in Claude Code xhigh. First-party numbers, not an independent leaderboard.

Input cost$0.00

Output cost$0.00

Est. monthly total$0.00

$0.00

Rates: cached input $0.19 / 1M, cache-miss input $0.95 / 1M, output $4.00 / 1M (official Kimi pricing). Savings line illustrates K2.7-Code’s reported ~30% lower reasoning-token usage vs K2.6, applied to the reasoning share of output. Estimate only.

Source: Kimi K2.7-Code Hugging Face model card and Kimi API docs.

A Minimal Quickstart

The Kimi API is compatible with the OpenAI SDK. The model identifier is kimi-k2.7-code. Don’t try to override the fixed sampling parameters, or the request will fail.

import os
from openai import OpenAI

# Base URL and key per the Kimi API docs at platform.moonshot.ai
client = OpenAI(
    api_key=os.environ.get("MOONSHOT_API_KEY"),
    base_url="
)

messages = [
    {"role": "system", "content": "You are a coding agent."},
    {"role": "user", "content": "Refactor utils.py to remove duplicate code."},
]

resp = client.chat.completions.create(
    model="kimi-k2.7-code",
    messages=messages,
    max_tokens=32768,  # default cap; also the maximum
    # thinking is enabled by default and cannot be disabled.
    # temperature (1.0), top_p (0.95), n (1), and penalties (0.0) are
    # fixed server-side. Passing any other value returns an error.
)

msg = resp.choices[0].message
print(msg.content)

# Multi-step tool calls: append the full assistant message so that
# reasoning_content is preserved. Dropping it errors on the next turn.
# messages.append(msg.model_dump())

Two tool-use guidelines come from the documentation. Retain reasoning_content from the current turn within the conversation context. And set

Set tool_choice to either "auto" or "none".

K2.7-Code vs. the Competition

Model	License	Parameters	Context Window	API Cost (input / output per 1M tokens)
Kimi K2.7-Code	Modified MIT (open)	1T total / 32B active	256K	$0.95 / $4.00
Kimi K2.6	Open-weight	1T-class MoE	256K	~$0.67–0.95 / ~$3.39–4.00
GPT-5.5	Closed	Not disclosed	—	Not listed in Moonshot’s comparison
Claude Opus 4.8	Closed	Not disclosed	1M	$5.00 / $25.00
Qwen3-Coder-480B-A35B	Open (Qwen license)	480B / 35B active	256K native	Varies by provider

K2.7-Code offers cached input at $0.19 per 1M tokens.

Pros and Cons

What’s good:

Open weights under Modified MIT, with a genuine option to self-host.
Solid, across-the-board improvements over K2.6 on coding and agent benchmarks.
API costs are competitive compared to closed frontier models.
Outperforms Opus 4.8 on the MCP Mark Verified benchmark (per company data).

What to watch out for:

All top-line benchmark figures come from the vendor at launch.
Thinking mode is always on — there’s no way to turn it off.
Sampling parameters are fixed and can’t be adjusted.
Multi-step tool calls require keeping reasoning_content intact.
At 595 GB, the model weights demand significant resources for self-hosting.

The Bottom Line

All headline benchmarks are from Moonshot itself; third-party results are still awaited.
K2.7-Code is an open-weight model fine-tuned for coding, built on top of Kimi K2.6.
Moonshot claims a +21.8% improvement on Kimi Code Bench v2 compared to K2.6.
The model generates about 30% fewer reasoning tokens than K2.6.

Explore the model weights, Kimi Code, and API. Also, feel free to follow us on Twitter, join our 150k+ ML SubReddit, and subscribe to our Newsletter. And if you’re on Telegram, we’re there too — come join us!

Looking to collaborate on promoting your GitHub repo, Hugging Face page, product launch, or webinar? Get in touch with us

Top Posts

Polymarket vs Kalshi: The Battle for the FIFA World Cup Prediction Crown

Chinese Hackers Secretly Hijack Authentication Flow to Spy on Isolated Network for a Decade

SBA Launches Bold Audit to Scrutinize Economically Disadvantaged Contracting Firms

Moonshot AI Unveils Kimi K2.7-Code: A Leap Forward with +21.8% Boost on Kimi Code Bench v2 Over K2.6

“One Job, Many Minds: Harnessing a Team of Claudes for Every Task”

Synthetic Data: Transforming Virtual Experiments into Groundbreaking Biomedical Discoveries

Google’s Gemini-SQL2 Achieves 80.04% on BIRD Leaderboard with Gemini 3.1 Pro

When PyMuPDF Misses the Table: Unlocking PDF Parsing for RAG with Azure Layout

“Unlock 3 Powerful NumPy Tricks to Supercharge Your Numerical Performance”

Pioneering Otitis Media Diagnosis: The 4DO-DETR Breakthrough

Polymarket vs Kalshi: The Battle for the FIFA World Cup Prediction Crown

Chinese Hackers Secretly Hijack Authentication Flow to Spy on Isolated Network for a Decade

SBA Launches Bold Audit to Scrutinize Economically Disadvantaged Contracting Firms

Wearable Health Data Is Overwhelming Doctors — Here’s What Comes Next

Bigger Context Isn’t the Answer for RAG — Here’s the System That Actually Solves It

Moonshot AI Unveils Kimi K2.7-Code: A Leap Forward with +21.8% Boost on Kimi Code Bench v2 Over K2.6

Commerce Moves to Slash Nearly Half Its Web Presence in Sweeping Gov Reform Push

Hidden Dangers in At-Home DNA Test Kits: The Risks the Fine Print Won’t Tell You

Trending

Polymarket vs Kalshi: The Battle for the FIFA World Cup Prediction Crown

Chinese Hackers Secretly Hijack Authentication Flow to Spy on Isolated Network for a Decade

Latest Posts

Not More Data, but Better World Models – Unite.AI

OpenAI Is Hiring Head of Preparedness, Amid AI Cyberattack Fears

Subscribe to Updates

Top Posts

Moonshot AI Unveils Kimi K2.7-Code: A Leap Forward with +21.8% Boost on Kimi Code Bench v2 Over K2.6

Kimi K2.7-Code

Benchmark

Reasoning-Token Efficiency: A Cost Claim, Not Just Quality

Use Cases With Examples

Marktechpost’s Interactive Explorer

Kimi K2.7-Code — Interactive Explorer

A Minimal Quickstart

K2.7-Code vs. the Competition

Pros and Cons

What’s good:

What to watch out for:

The Bottom Line

Related Posts