Moonshot AI launched Kimi K2.7-Code this week. It’s a specialized coding and agentic model. The model weights are available on Hugging Face under a Modified MIT license. You can also access it via the Kimi API and Kimi Code.
K2.7-Code is built for complex, long-running software engineering tasks rather than casual conversation. It can plan, make edits, execute tools, and debug through multiple steps. Moonshot has paired the model with a subscription-based coding platform.
Kimi K2.7-Code
K2.7-Code uses a Mixture-of-Experts architecture. It has 1 trillion total parameters but only activates 32 billion per token. The setup includes 384 experts, with 8 chosen per token plus 1 shared expert. It features 61 layers, one of which is a dense layer.
The attention mechanism uses MLA, and the feed-forward network relies on SwiGLU. A MoonViT vision encoder contributes an additional 400 million parameters for handling image and video inputs. The model comes with built-in INT4 quantization. Its context window supports 256K tokens (262,144).
Two important limitations to note: Thinking mode is always required — turning it off triggers an API error. Sampling settings are locked at temperature 1.0, top_p 0.95, n 1, and penalties 0.0. The default maximum output length is 32,768 tokens.
You can self-host the model using vLLM, SGLang, or KTransformers. The Hugging Face repository is substantial, taking up around 595 GB of disk space. This is designed for server-grade deployments, not for running on a laptop.
Benchmark
The Moonshot team shared results across six benchmarks. They compared K2.7-Code with K2.6, GPT-5.5, and Claude Opus 4.8. K2.7-Code outperforms K2.6 on every metric. The biggest improvement in coding is on Kimi Code Bench v2, jumping from 50.9 to 62.0.
| Benchmark | Kimi K2.6 | Kimi K2.7-Code | GPT-5.5 | Claude Opus 4.8 | K2.7 vs K2.6 |
|---|---|---|---|---|---|
| Kimi Code Bench v2 | 50.9 | 62.0 | 69.0 | 67.4 | +21.8% |
| Program Bench | 48.3 | 53.6 | 69.1 | 63.8 | +11.0% |
| MLS Bench Lite | 26.7 | 35.1 | 35.5 | 42.8 | +31.5% |
| Kimi Claw 24/7 Bench | 42.9 | 46.9 | 52.8 | 50.4 | +9.3% |
| MCP Atlas | 69.4 | 76.0 | 79.4 | 81.3 | +9.5% |
| MCP Mark Verified | 72.8 | 81.1 | 92.9 | 76.4 | +11.4% |
K2.7-Code does edge out Opus 4.8 on MCP Mark Verified, scoring 81.1 compared to 76.4. It also comes close to GPT-5.5 on MLS Bench Lite. K2.7-Code was tested in Kimi Code CLI, GPT-5.5 in Codex xhigh, and Opus 4.8 in Claude Code xhigh.
Reasoning-Token Efficiency: A Cost Claim, Not Just Quality
The Moonshot team claims roughly 30% fewer reasoning tokens are used compared to K2.6. They describe this as “less overthinking.”
Reasoning tokens are billed as output tokens on most pricing plans. Agentic coding workflows involve hundreds or thousands of steps. Every plan, retry, and verification incurs the thinking cost again. A 30% reduction adds up significantly over a long session.
This improvement shows up in three ways simultaneously. First, reduced output-token cost per task. Second, quicker steps, which improves the experience in interactive CLI sessions. Third, more steps can fit within the context window before hitting limits.
Use Cases With Examples
- Large-scale repository refactors are the primary use case. Direct the agent at a failing test suite. It reads through files, makes changes across multiple modules, and re-runs tests until they all pass.
- Code reviews are another strong fit. Provide a pull request diff and request a risk analysis. The 256K context window can accommodate large diffs, logs, and related files all at once.
- MCP tool-use workflows are a third good match. K2.7-Code scored 81.1 on MCP Mark Verified. That benchmark evaluates correct tool calls through the Model Context Protocol. Examples include CI checks, ticket updates, and file edits all within a single loop.
- Long-context analysis is a fourth use case. The model handles text, image, and video inputs. Documentation, screenshots, and a recorded reproduction can all be included in one prompt.
Marktechpost’s Interactive Explorer
Kimi K2.7-Code — Interactive Explorer
Company-reported benchmarks and official API pricing. Released June 12, 2026. Verified June 12, 2026.
Benchmarks
Cost Calculator
Specs
Source: Moonshot AI Kimi K2.7-Code model card. K2.7-Code ran in Kimi Code CLI; GPT-5.5 in Codex xhigh; Claude Opus 4.8 in Claude Code xhigh. First-party numbers, not an independent leaderboard.
Input cost$0.00
Output cost$0.00
Est. monthly total$0.00
$0.00
Rates: cached input $0.19 / 1M, cache-miss input $0.95 / 1M, output $4.00 / 1M (official Kimi pricing). Savings line illustrates K2.7-Code’s reported ~30% lower reasoning-token usage vs K2.6, applied to the reasoning share of output. Estimate only.
Source: Kimi K2.7-Code Hugging Face model card and Kimi API docs.
A Minimal Quickstart
The Kimi API is compatible with the OpenAI SDK. The model identifier is kimi-k2.7-code. Don’t try to override the fixed sampling parameters, or the request will fail.
import os
from openai import OpenAI
# Base URL and key per the Kimi API docs at platform.moonshot.ai
client = OpenAI(
api_key=os.environ.get("MOONSHOT_API_KEY"),
base_url="
)
messages = [
{"role": "system", "content": "You are a coding agent."},
{"role": "user", "content": "Refactor utils.py to remove duplicate code."},
]
resp = client.chat.completions.create(
model="kimi-k2.7-code",
messages=messages,
max_tokens=32768, # default cap; also the maximum
# thinking is enabled by default and cannot be disabled.
# temperature (1.0), top_p (0.95), n (1), and penalties (0.0) are
# fixed server-side. Passing any other value returns an error.
)
msg = resp.choices[0].message
print(msg.content)
# Multi-step tool calls: append the full assistant message so that
# reasoning_content is preserved. Dropping it errors on the next turn.
# messages.append(msg.model_dump())Two tool-use guidelines come from the documentation. Retain reasoning_content from the current turn within the conversation context. And set
Set tool_choice to either "auto" or "none".
K2.7-Code vs. the Competition
| Model | License | Parameters | Context Window | API Cost (input / output per 1M tokens) |
|---|---|---|---|---|
| Kimi K2.7-Code | Modified MIT (open) | 1T total / 32B active | 256K | $0.95 / $4.00 |
| Kimi K2.6 | Open-weight | 1T-class MoE | 256K | ~$0.67–0.95 / ~$3.39–4.00 |
| GPT-5.5 | Closed | Not disclosed | — | Not listed in Moonshot’s comparison |
| Claude Opus 4.8 | Closed | Not disclosed | 1M | $5.00 / $25.00 |
| Qwen3-Coder-480B-A35B | Open (Qwen license) | 480B / 35B active | 256K native | Varies by provider |
K2.7-Code offers cached input at $0.19 per 1M tokens.
Pros and Cons
What’s good:
- Open weights under Modified MIT, with a genuine option to self-host.
- Solid, across-the-board improvements over K2.6 on coding and agent benchmarks.
- API costs are competitive compared to closed frontier models.
- Outperforms Opus 4.8 on the MCP Mark Verified benchmark (per company data).
What to watch out for:
- All top-line benchmark figures come from the vendor at launch.
- Thinking mode is always on — there’s no way to turn it off.
- Sampling parameters are fixed and can’t be adjusted.
- Multi-step tool calls require keeping
reasoning_contentintact. - At 595 GB, the model weights demand significant resources for self-hosting.
The Bottom Line
- All headline benchmarks are from Moonshot itself; third-party results are still awaited.
- K2.7-Code is an open-weight model fine-tuned for coding, built on top of Kimi K2.6.
- Moonshot claims a +21.8% improvement on Kimi Code Bench v2 compared to K2.6.
- The model generates about 30% fewer reasoning tokens than K2.6.
Explore the model weights, Kimi Code, and API. Also, feel free to follow us on Twitter, join our 150k+ ML SubReddit, and subscribe to our Newsletter. And if you’re on Telegram, we’re there too — come join us!
Looking to collaborate on promoting your GitHub repo, Hugging Face page, product launch, or webinar? Get in touch with us



