Fara1.5-27B achieved a 72% success rate on Online-Mind2Web, outperforming both OpenAI Operator (58.3%) and Gemini 2.5 Computer Use (57.3%).
The open-weight models are available in three sizes—4 billion, 9 billion, and 27 billion parameters—all fine-tuned from Alibaba’s Qwen 3.5.
Fara1.5-9B is currently available on Azure AI Foundry, with the 4B and 27B versions launching soon.
Picture this: you tell your computer to search for vacation rentals, browse five different websites, complete the booking form, and pick the option nearest to the beach. You head off to grab a coffee. By the time you return, everything is handled. That’s what “computer use agents” promise—AI that can see your browser screen and perform actions like clicking, scrolling, and typing on your behalf, just as a person would, without needing any extra browser extensions.
OpenAI made the first move with Operator, debuting it in January 2025 at a steep $200 per month, then later integrating it into ChatGPT Agent before discontinuing it in August. Google followed with Gemini 2.5 Computer Use. Both remain proprietary, rely on the cloud, and carry significant usage expenses.
This week, Microsoft Research introduced a compact model called Fara1.5—and on the key benchmarks, it surpasses them both.
The family spans three variants: 4 billion, 9 billion, and 27 billion parameters, all built on Alibaba’s Qwen3.5 base model, which Microsoft fine-tuned specifically for web-browsing tasks, with publicly available weights. (Parameters define the scope of an AI model’s capabilities, with larger numbers typically indicating greater capacity.)
Achieving this result meant rethinking every stage of the pipeline from the ground up. “We began with a straightforward question: What would it take to make a small model truly capable at agentic tasks?” the AI Frontiers team explained. “The answer covered the entire development lifecycle—data generation, training objectives, model architecture, and orchestration all needed to be redesigned as a unified effort rather than tackled independently.”
The benchmarks
Online-Mind2Web is the critical benchmark that matters for the specific challenge Microsoft set out to tackle. It evaluates how successfully an AI agent can accomplish 300 varied, real-world tasks across 136 active, popular websites—such as comparing products, submitting forms, and reserving services—measured as the percentage of tasks completed correctly on the live, dynamic internet.
Fara1.5-27B reached 72%. OpenAI Operator came in at 58.3%. Google’s Gemini 2.5 Computer Use landed at 57.3%. Yutori’s Navigator n1, the leading proprietary competitor, scored 64.7%. Even the mid-range Fara1.5-9B hit 63.4%—edging past both OpenAI and Google.
Open-source competitors also lagged behind. Alibaba’s GUI-Owl-1.5, with 8 billion parameters, achieved a score of 48.6%. AI2’s MolmoWeb reached 35.3%. Microsoft’s earlier model, Fara-7B, scored 34.1%—meaning this new release nearly doubles the performance of its predecessor at a similar scale.
On WebVoyager, another benchmark that evaluates task completion on the live web, Fara1.5-27B achieved 88.6%, slightly surpassing OpenAI Operator’s 87.0% and outperforming H Company’s Holo2, which has 30 billion parameters, at 83.0%.
How it learned
The key lies in the training process. Microsoft employed a system called FaraGen1.5 to produce the training data. Here’s the interesting twist: they used GPT-5.4—OpenAI’s advanced model—as a “teacher agent” to show how to carry out browser tasks. Those demonstrations then became the training material for Fara1.5. Essentially, they leveraged OpenAI’s top-performing model to train a competing open-source alternative.
Additionally, they built six simulated, fully operational replicas of real websites—including email clients, calendars, and marketplaces—so the model could practice tasks that need logins or irreversible actions (such as actually sending an email or booking a flight) without affecting real accounts. This approach is known as synthetic domain
This capability is a direct result of its advanced training, which plays a major role in why Fara1.5 manages restricted or sensitive tasks more effectively than earlier versions.
Every model is built to pause and request confirmation before taking any irreversible action. “Finding the right balance between strong safety measures like Critical Points and a smooth user experience is essential,” explained Yash Lara, Senior PM Lead at Microsoft Research, in an interview with VentureBeat. “A user interface such as Microsoft Research’s Magentic-UI is crucial because it allows users to step in when needed while minimizing the burden of constant approvals.”
This distinction is important because OpenAI made the risks very clear when it unveiled ChatGPT Agent. “Once you log ChatGPT Agent into websites or activate connectors, it gains access to private information from those platforms, including emails, files, and account details,” the company stated.
Fara1.5 processes everything within MagenticLite, a secure, isolated sandboxed browser setup in which every action by the agent is recorded and can be stopped by the user at any time.
The browser AI space has become increasingly competitive, with entries like Google’s Gemini integrated into Chrome, Perplexity’s Comet browser, and Anthropic’s Claude for Chrome. What sets Fara1.5 apart is its open nature: publicly available model weights, open-source inference code hosted on GitHub, and the ability to run on hardware you own. The Fara1.5-9B model is currently available on Azure AI Foundry, while the 4B and 27B versions are expected to launch soon. Microsoft has also indicated plans to extend Fara1.5’s reach beyond the browser to desktop applications and enterprise software in the near future.
Daily Debrief Newsletter
Get the most important news stories of the day delivered each morning, along with exclusive features, a podcast, videos, and more.