The Qwen team at Alibaba has officially launched Qwen3.7-Plus. Developers can now access the model via Alibaba Cloud’s Bailian platform, which international customers know as Model Studio. This platform provides API access for third-party developers. The launch comes after Alibaba initially introduced the Qwen3.7 lineup in May.
Qwen3.7-Plus
Qwen3.7-Plus operates as a multimodal large language model. It has the ability to process both video and images, in addition to standard text instructions. Conversely, its counterpart, Qwen3.7-Max, is limited strictly to text.
It’s important to note that this model focuses on visual comprehension rather than creation. It can observe and interpret media, but it cannot produce visuals. Alibaba handles image and video generation through entirely distinct model groups.
Alibaba characterizes this release as an advancement in multimodal hybrid agent technology. In this context, an agent is an AI that can strategize and execute actions over multiple stages. By integrating image and video understanding, Qwen3.7-Plus introduces five key capabilities: deep reasoning, self-programming, tool invocation, verification and testing, and autonomous iteration.
With self-programming, the AI can author and tweak its own code. Tool invocation enables it to utilize outside APIs or functions. Through verification and testing, it can evaluate its own outputs for accuracy. Autonomous iteration allows it to repeat processes until a goal is fully met. Essentially, these features transform the model from a simple responder into an active participant.
The Vision Case
Qwen3.7-Plus serves as the multimodal component of the 3.7 series. Its preview version has already demonstrated impressive visual benchmarks. On the Vision Arena leaderboard, Qwen3.7-Plus-Preview claimed the 16th spot overall, positioning Alibaba as the fifth-leading laboratory for vision technology. It’s worth noting that the model’s ranking and the lab’s ranking are distinct metrics.
LM Arena operates the Vision Arena as an impartial rating system. Participants cast votes on model responses to image-based prompts without knowing which model they are evaluating. Although ranking 16th places it behind leading American labs, it remains highly competitive within the global landscape. For tasks involving heavy visual processing—such as large-scale optical character recognition, data chart interpretation, or video frame scrutiny—this metric is especially significant.
The text-exclusive Max variant provides the foundational reasoning for this generation. Upon its release, Max achieved a score of 56.6 on the Artificial Analysis Intelligence Index, marking the top performance by any Chinese model at that time.

The Agentic Loop
A prominent change in the Qwen3.7 series is its emphasis on agentic functionality. The Alibaba team is tailoring these models specifically for extended, complex tasks. The Bailian platform, which hosts these models, contributes two important features to support this.
First, it includes an Agentic Reinforcement Learning (RL) system. This mechanism leverages real-world execution data to progressively enhance the model’s precision. Second, it provides built-in safety constraints. These guardrails ensure that autonomous operations remain within predefined boundaries, a critical feature when an agent has the authority to execute commands or alter files.
Marktechpost’s Visual Explainer
AI Models · Field Guide
1 / 7
Michal Sutter is a data scientist. He holds a Master’s degree in Data Science from the University of Padova. His expertise includes statistics, machine learning, and data engineering. Michal specializes in analyzing large datasets and turning findings into practical solutions.




