Qwen3.7-Plus By Alibaba Vision, Deep Reasoning, Tool Use & Autonomous Iteration On Bailian

The Qwen team at Alibaba has officially launched Qwen3.7-Plus. Developers can now access the model via Alibaba Cloud’s Bailian platform, which international customers know as Model Studio. This platform provides API access for third-party developers. The launch comes after Alibaba initially introduced the Qwen3.7 lineup in May.

Qwen3.7-Plus

Qwen3.7-Plus operates as a multimodal large language model. It has the ability to process both video and images, in addition to standard text instructions. Conversely, its counterpart, Qwen3.7-Max, is limited strictly to text.

It’s important to note that this model focuses on visual comprehension rather than creation. It can observe and interpret media, but it cannot produce visuals. Alibaba handles image and video generation through entirely distinct model groups.

Alibaba characterizes this release as an advancement in multimodal hybrid agent technology. In this context, an agent is an AI that can strategize and execute actions over multiple stages. By integrating image and video understanding, Qwen3.7-Plus introduces five key capabilities: deep reasoning, self-programming, tool invocation, verification and testing, and autonomous iteration.

With self-programming, the AI can author and tweak its own code. Tool invocation enables it to utilize outside APIs or functions. Through verification and testing, it can evaluate its own outputs for accuracy. Autonomous iteration allows it to repeat processes until a goal is fully met. Essentially, these features transform the model from a simple responder into an active participant.

The Vision Case

Qwen3.7-Plus serves as the multimodal component of the 3.7 series. Its preview version has already demonstrated impressive visual benchmarks. On the Vision Arena leaderboard, Qwen3.7-Plus-Preview claimed the 16th spot overall, positioning Alibaba as the fifth-leading laboratory for vision technology. It’s worth noting that the model’s ranking and the lab’s ranking are distinct metrics.

LM Arena operates the Vision Arena as an impartial rating system. Participants cast votes on model responses to image-based prompts without knowing which model they are evaluating. Although ranking 16th places it behind leading American labs, it remains highly competitive within the global landscape. For tasks involving heavy visual processing—such as large-scale optical character recognition, data chart interpretation, or video frame scrutiny—this metric is especially significant.

The text-exclusive Max variant provides the foundational reasoning for this generation. Upon its release, Max achieved a score of 56.6 on the Artificial Analysis Intelligence Index, marking the top performance by any Chinese model at that time.

The Agentic Loop

A prominent change in the Qwen3.7 series is its emphasis on agentic functionality. The Alibaba team is tailoring these models specifically for extended, complex tasks. The Bailian platform, which hosts these models, contributes two important features to support this.

First, it includes an Agentic Reinforcement Learning (RL) system. This mechanism leverages real-world execution data to progressively enhance the model’s precision. Second, it provides built-in safety constraints. These guardrails ensure that autonomous operations remain within predefined boundaries, a critical feature when an agent has the authority to execute commands or alter files.

Marktechpost’s Visual Explainer

AI Models · Field Guide
1 / 7

Alibaba Qwen · June 2, 2026

A multimodal large language model with image and video understanding, deep reasoning, and agentic features. Available via API on Alibaba Cloud’s Bailian platform, accessed internationally as Model Studio.

Use the arrows or swipe to explore →

01 · What it is

A multimodal large language model

MultimodalAlibaba’s answer to multimodal AI users, currently available via API, targets bottleneck resolution.Here is the rewritten version with simplified text while keeping the HTML structure intact:

Michal Sutter is a data scientist. He holds a Master’s degree in Data Science from the University of Padova. His expertise includes statistics, machine learning, and data engineering. Michal specializes in analyzing large datasets and turning findings into practical solutions.

Top Posts

Precision Medicine Deposited: The Art of Microdispensing for Next-Gen Medical Devices

When the World Cup Collided with the Cloud: 2026’s Digital Traffic Surge

Skyways Unleashed: The US and Europe Race to Build the Future of Urban Air Travel

Qwen3.7-Plus by Alibaba Vision, Deep Reasoning, Tool Use & Autonomous Iteration on Bailian

A multimodal large language model

5 No-Cost Courses to Transform from AI Newbie to Pro

The System76 Thelio Mira: My Dream Linux Desktop Come True

Google’s Gemini 3.6 Flash: Slashing Enterprise Agent Token Costs

Stop ML Chaos: Your Blueprint for Experiment Order

NVIDIA Cosmos 3 Edge: 4B-Power Robot Brains Thinking and Acting on Your Device

5 Premier MCP Servers to Supercharge Agentic Development

Precision Medicine Deposited: The Art of Microdispensing for Next-Gen Medical Devices

When the World Cup Collided with the Cloud: 2026’s Digital Traffic Surge

Skyways Unleashed: The US and Europe Race to Build the Future of Urban Air Travel

5 No-Cost Courses to Transform from AI Newbie to Pro

Beyond Guesswork: A Slurm-Powered Battle Plan for Benchmarking Distributed LLM Servers

The Magic of Friction: Engineering Smarter Robot World Models

Trump Mobilizes Defense Industry to Chart Software and Supplier Networks Nationwide

KuCoin Pay: Weaving Crypto Seamlessly Into Everyday Payments

Trending

Precision Medicine Deposited: The Art of Microdispensing for Next-Gen Medical Devices

When the World Cup Collided with the Cloud: 2026’s Digital Traffic Surge

Latest Posts

Not More Data, but Better World Models – Unite.AI

OpenAI Is Hiring Head of Preparedness, Amid AI Cyberattack Fears

Subscribe to Updates

Top Posts

Qwen3.7-Plus by Alibaba Vision, Deep Reasoning, Tool Use & Autonomous Iteration on Bailian

Qwen3.7-Plus

The Vision Case

The Agentic Loop

Marktechpost’s Visual Explainer

A multimodal large language model

Related Posts