This week, Tokyo-based Sakana AI released its first commercial offering, ‘Sakana Marlin.’ The team behind it describes the tool as a Virtual CSO (Chief Strategy Officer)—an autonomous B2B research agent designed specifically for enterprise use.
Unlike a typical chatbot that responds in seconds, Marlin works differently. You provide a single research topic, and it operates independently for as long as eight hours. Once complete, it delivers a comprehensive report along with a set of presentation slides. According to Sakana, each session can involve hundreds or even thousands of queries to large language models.
What is Sakana Marlin
Marlin is not a conversational assistant—it’s an enterprise-grade research agent. Feed it a single topic or question, and it takes over from there: formulating hypotheses, scanning sources, and cross-checking findings on its own. The goal is to condense weeks of strategic analysis into just a few hours.
The output is built for executives and decision-makers. The Japanese press release mentions reports spanning dozens of pages, while the English version references reports reaching up to approximately 100 pages. During a hands-on press session, reports came in between 60 and 100 pages, citing 60 to 80 sources. Every report is structured with a main body, a reference section, and appendices. Presentation slides are automatically created using AI image-generation tools.
Sakana fine-tuned Marlin through a closed beta in April 2026, where roughly 300 professionals put it to work on real-world assignments. Those assignments covered strategy development, market research, risk assessment, and competitive analysis. Sakana has also formed a partnership with MUFG and secured strategic investment from Citigroup.
Inside AB-MCTS: Wider or Deeper
At the core of Marlin lies AB-MCTS—Adaptive Branching Monte Carlo Tree Search. This technique stems from Sakana’s earlier research paper titled “Wider or Deeper? Scaling LLM Inference-Time Compute with Adaptive Branching Tree Search.”
AB-MCTS frames reasoning as a tree-search challenge. At every step, the algorithm faces a choice: it can go wider by producing a brand-new candidate answer, or it can go deeper by refining an answer that already looks promising. Traditional repeated sampling only goes wider in parallel and then hopes one of the answers turns out to be correct.
A multi-LLM variation introduces another option—it can assign a step to a completely different model. In Sakana’s ARC-AGI-2 benchmark tests, this multi-model collaboration made a measurable difference. Using a combination of o4-mini, Gemini 2.5 Pro, and DeepSeek-R1, the system solved roughly 27.5% of tasks, compared to about 23% with o4-mini alone. Marlin leverages this same adaptive search approach for extended, long-horizon research tasks.
The second foundational piece behind Marlin is workflow automation drawn from Sakana’s AI Scientist project, which demonstrated fully autonomous scientific discovery and was published in the journal Nature.
Interactive demo: The embeddable widget (marlin-abmcts-demo.html) visualizes the “wider or deeper” decision-making process in real time. Hit Run and watch the tree expand. Nodes shaded in green represent higher scores, and the optimal path is highlighted. Switch on the “Multi-LLM” toggle to observe how steps get distributed across different models.
AB-MCTS: “Wider or Deeper?” — interactive search
A simplified visual of Sakana AI’s Adaptive Branching Monte Carlo Tree Search. Each step the policy chooses to widen (new candidate) or deepen (refine a promising line).
Search state
Budget used0 / 24
Nodes (candidates)1
Best score0.00
Wider / Deeper0 / 0
low score
high score
best path
How Marlin Compares
Marlin is built for depth, not speed. Standard deep-research tools return answers in minutes or tens of minutes. Marlin intentionally invests hours to produce higher-quality output. The competitor run times listed below are approximate and drawn from publicly reported figures, not official specifications.
| Tool | Typical run time | Output | Primary user |
|---|---|---|---|
| Sakana Marlin | Up to ~8 hours | Report (dozens to ~100 pages) + slides | Enterprise strategy teams |
| OpenAI Deep Research | ~Minutes to tens of minutes | Cited text report | General and pro users |
| Perplexity Deep Research | ~A few minutes | Cited text answer | General users |
| Google Gemini Deep Research | ~Minutes | Cited text report | General and workspace users |
The trade-off is straightforward: you wait longer and pay per session, but in exchange you receive more thorough hypothesis testing and a polished, ready-to-use deliverable. You can stop a run at any point, though credits are consumed regardless.
Pricing
Sakana provides pay-as-you-go access alongside Pro, Team, and Enterprise plans. Pay-as-you-go starts at 100 credits per run, priced at ¥98 per credit. The Pro plan costs ¥150,000 per month and includes 2,000 credits. The Team plan is ¥400,000 per month with 6,000 credits. Enterprise pricing is customized and comes with dedicated support.
Use Cases, With Examples
Marlin is best suited for high-stakes questions where research is the main bottleneck. Below are concrete examples aligned with its intended use cases.
- Market entry: “Evaluate Japan’s stablecoin and tokenized-payments market following recent regulatory changes.” Marlin identifies key drivers, risks, and structured strategic options in a detailed report.
- Risk analysis: “Model potential resolution scenarios for a Strait of Hormuz blockade.” Rather than simply summarizing information, it weighs competing hypotheses before arriving at conclusions.
- Competitive analysis: “Profile three competitors and rank our positioning gaps.” It produces presentation slides ready for a strategy review meeting.
Each of these scenarios fits within a single prompt and a single unattended run. A human should still review the cited output before making any decisions based on it.
Try the Engine Yourself: TreeQuest
Marlin itself is not available for self-hosting, but you can experiment with its core algorithm right now. Sakana has open-sourced AB-MCTS as TreeQuest under the Apache 2.0 license. Install it, define a generate function,
Then execute a predetermined search budget.
import random
import treequest as tq
# Each node contains a state you define; score must be between 0 and 1.
def generate(parent_state):
if parent_state is None: # None means start from the root node
new_state = "First draft"
else:
new_state = f"Improved version of: {parent_state}"
score = random.random() # replace this with an LLM-based evaluation
return new_state, score
algo = tq.ABMCTSA() # Adaptive Branching MCTS (version A)
search_tree = algo.init_tree()
for _ in range(10): # allocate 10 generation cycles
search_tree = algo.step(search_tree, {"generate": generate})
best_state, best_score = tq.top_k(search_tree, algo, k=1)[0]
print("TOP RESULT:", best_state, round(best_score, 3))Replace the random score with an LLM evaluator to match the real-world usage pattern. TreeQuest also provides multi-LLM search capabilities and checkpoint support for extended sessions. Checkpoint support is important since prolonged runs may encounter API failures during execution.
Strengths and Weaknesses
Strengths
- Research backed by peer review: AB-MCTS featured at NeurIPS and AI Scientist published in Nature.
- Complete outputs, including references, supplementary materials, and presentation slides.
- Smart resource allocation focuses compute power on the most promising paths.
- The open-source core (TreeQuest) allows AI researchers to explore the methodology.
Weaknesses
- Extended processing times make rapid iteration slower compared to research tools that operate in minutes.
- Automated reports may include subtle errors that require manual verification.
- Pricing and features are designed for enterprise clients, not solo developers.
- Marlin as a product is proprietary; only the core algorithm is publicly available.
Key Takeaways
- Sakana Marlin handles autonomous research tasks lasting up to roughly eight hours each.
- A single run generates a report spanning dozens of pages, along with presentation slides.
- It builds on AB-MCTS (NeurIPS 2025 Spotlight) and AI Scientist workflows (Nature).
- Pricing starts with a pay-per-use model: 100 credits per run at ¥98 per credit.
- It serves finance departments, corporate strategy teams, consulting firms, and think-tank organizations.
Sources
- Sakana AI — Sakana Marlin release:
- Sakana AI — Sakana Marlin product page:
- Sakana AI — AB-MCTS research and TreeQuest:
- SakanaAI/treequest (GitHub, Apache 2.0):



