Microsoft Research’s AI Frontiers lab has introduced Fara1.5, a series of computer-use agent (CUA) models designed for web browsing. This release includes three variants: Fara1.5-4B, Fara1.5-9B, and Fara1.5-27B. These models work seamlessly with MagenticLite, Microsoft’s secure browser environment built specifically for these agents.
Computer-use agents are visual-action systems that control an actual browser. They analyze screenshots and generate mouse and keyboard inputs to accomplish tasks. Modern agent tools such as OpenAI’s Operator and Google’s Gemini 2.5 Computer Use belong to this category.
Fara1.5-27B achieves a 72% task completion rate on Online-Mind2Web. This benchmark evaluates performance across 300 tasks on 136 well-known websites. In the same test, OpenAI’s Operator reaches 58.3% and Gemini 2.5 Computer Use reaches 57.3%. Yutori’s Navigator n1 attains 64.7%, while Fara1.5-9B scores 63.4%. This represents nearly double the performance of its predecessor Fara-7B, which achieved 34.1% on the same benchmark.

Architecture and agent loop
The models are built on Qwen3.5 base checkpoints across 4B, 9B, and 27B sizes. They follow an observe-think-act cycle. At each stage, the model receives the previous conversation history along with the latest three browser screenshots. It then produces thoughts and determines the next action.
The action set covers standard mouse and keyboard operations and browser-specific functions such as web search. It also includes meta-actions for managing context. These involve storing facts for future reference and requesting user clarification. These meta-actions enable the agent to handle extended tasks and collaborate effectively with users.
Training mix
Training involves supervised fine-tuning on approximately two million samples. The dataset consists of 60% web trajectories and 12.8% synthetic environments. Form completion and user interactions make up 12.5%. Grounding accounts for 8.8% and VQA 4.9%. Smaller portions include GUI drag, instruction following, and safety. Loss computation is restricted to the three most recent turns in each trajectory.


FaraGen1.5: The Synthetic Data Pipeline
FaraGen1.5 is the automated system used to generate training data. It includes three main components: environments, solvers, and verifiers.
Environments are divided into two types:
– Open-internet tasks run on regular websites (no login needed).
– Gated-domain tasks require user logins or involve actions that can’t be reversed, like sending an email.
For gated domains, Microsoft created six synthetic clones—called FaraEnvs—that simulate real apps: Mail, Calendar, Stream, ML, Stay, and Scheduler. Each clone has a realistic interface, working API, and pre-loaded user data to mimic actual use.
These clones were built with GitHub Copilot CLI and improved step-by-step by humans. Because Microsoft controls every detail of these clones, they know what the right outcome should be.
– If a task changes the app’s state (e.g., sends a message), the system checks database snapshots before/after to verify.
– For unchanged tasks, the system compares results to pre-set correct answers.
The solver uses OpenAI’s GPT-5.4 with custom tools matching the actions Fara1.5 can take. On the Online-Mind2Web benchmark, this solver scores 83% (up from 67% for the earlier Fara-7B solver). If the solver needs more input, it calls a user simulator.
Before training, trajectories go through three verification checks:
– **Correctness:** Uses AI-generated rules for open tasks and database checks for synthetic data.
– **Efficiency:** Penalizes unnecessary extra steps.
– **User interaction:** Ensures the agent pauses at key moments, like when waiting for a decision.
—
Critical Points Safety
Fara1.5 automatically pauses and asks for user input in three cases:
1. The task needs personal info that hasn’t been shared.
2. The task description isn’t clear or lacks necessary details.
3. An irreversible action (e.g., deleting data or sending a message) is about to happen without confirmation.
For safety, Microsoft trains the agent on public safety datasets and internal data aligned with their Responsible AI standards. In MagenticLite, every action is logged for review. The sandboxed browser also keeps the agent separated from the user’s device for added security.
—
Other Benchmark Results
On the WebVoyager test:
– **Fara1.5-27B:** 88.6%
– 9B model: 86.6%
– 4B model: 80.8%
The 9B version also beats similar-sized models like MolmoWeb 8B and GUI-Owl-1.5 8B.
To ensure consistency, all tests run on Browserbase with multiple independent repeats.
On the WebTailBench v1.5 (testing rare web tasks):
– **Fara1.5-9B:** 64.5% process success, 32.3% outcome success
– GPT-5.4: 79.6% process, 57.4% outcome
—
Key Takeaways
Here are five quick highlights:
- Microsoft Research introduced Fara1.5: browser agents in 4B, 9B, and 27B versions, built on Qwen3.5.
- Fara1.5-27B achieves 72% on Online-Mind2Web, surpassing OpenAI Operator (58.3%), Gemini 2.5 CU (57.3%), and Yutori Navigator n1 (64.7%).
- FaraGen1.5 pipelines allow training on secure apps via six cloned systems (FaraEnvs) built with GitHub Copilot CLI.
- Fara1.5 stops to ask the user when: info is missing, tasks are unclear, or irreversible actions are pending user approval.
Read the Technical details. Also, follow us on Twitter, join our 150k+ ML SubReddit, and sign up for our Newsletter. Already on Telegram? Join our Telegram community!
Interested in collaborating—for GitHub repos, Hugging Face pages, product launches, or webinars? Connect with us.
说明:HTML 结构保持不变,仅对正文内容重新表述,使行文更简洁易懂,同时维持英文原文风格和术语准确性。



