Anthropic Unveils Claude Opus 4.8: Elevated AI Coding And Enhanced Safety At The Same Premium Price

In brief

Anthropic launched Claude Opus 4.8 on Thursday, only six weeks after releasing Opus 4.7.
The new version delivers improved performance in software engineering, reasoning, and computer usage benchmarks, while maintaining the same pricing at $5/$25 per million input/output tokens.
Opus 4.8’s safety alignment scores now match those of Claude Mythos Preview, Anthropic’s experimental flagship model, with significantly reduced rates of deceptive or potentially harmful behavior compared to the previous version.

Six weeks—that’s the short span between Anthropic’s launch of Opus 4.7 and the arrival of Opus 4.8.

The latest model performs faster and scores higher on benchmark evaluations, and it includes a range of new capabilities—yet the cost remains unchanged: $5 per million input tokens and $25 per million output tokens, just like its predecessor.

There’s also an accelerated mode that runs the identical model at 2.5x the speed for $10 per million input tokens and $50 per million output tokens. Anthropic notes this rate is now threefold cheaper than what rapid mode cost on earlier models, which is a diplomatic way of acknowledging it was considerably more expensive previously.

SWE-bench Pro is arguably the most critical benchmark for gauging this model’s strength. It evaluates whether an AI can genuinely tackle complex, multi-language software engineering challenges sourced from actual production codebases—measured as the percentage of problems successfully resolved.

On that evaluation, Opus 4.8 achieved 69.2%, climbing from 64.3% with Opus 4.7. OpenAI’s GPT-5.5 reached 58.6%, and Google’s Gemini 3.1 Pro came in at 54.2%. For a model at an identical price level, that represents a notable improvement.

On Humanity’s Last Exam—a rigorous test featuring expert-level questions spanning dozens of academic fields, scored as a percentage of correct answers—Opus 4.8 reached 49.8% without tools and 57.9% with them, surpassing all three competitors. OSWorld-Verified, which assesses real-world computer interaction tasks such as navigating software interfaces, scored 83.4%, edging past Opus 4.7’s result of 82.8%.

The one setback: Terminal-Bench 2.1, which gauges AI performance on command-line operations. GPT-5.5 leads at 78.2%, while Opus 4.8 scores 74.6%—an improvement over Opus 4.7’s 66.1% and ahead of Gemini’s 70.3%, but finishing second still means falling short.

Five ways to think

Anthropic has introduced a new feature that allows users to adjust the intensity of the model’s reasoning process. The standard setting, “High,” is suitable for the majority of tasks, while the “Extra” mode—internally labeled “xhigh” within Claude Code—utilizes additional computational resources to tackle more challenging inquiries. The “Max” setting represents the highest level of processing power, and “Low” and “Medium” options utilize fewer tokens for the same operations, prioritizing speed over precision.

This new control is integrated into the model selection interface on claude.ai and Cowork and is accessible across all subscription tiers. According to Anthropic, the default “High” setting operates with a token consumption comparable to Opus 4.7’s standard configuration but delivers superior performance—a claim that appears to be both a technical achievement and a well-crafted message.

It’s also worth noting that Anthropic’s latest tokenizer for Opus processes more tokens per task. Consequently, users opting for Opus over the Claude Sonnet model—a less powerful alternative—will likely incur higher costs. However, Sonnet remains a cost-effective choice for everyday activities and complex problems that aren’t at the cutting edge of scientific research or software development.

Additionally, rate limits within Claude Code have been increased to accommodate the greater token usage associated with the “Extra” and “Max” configurations.

Top Posts

Revolutionary UX Overhaul for Leading Bitcoin Hardware Wallet

GreyVibe Hackers Weaponize ChatGPT and Gemini in Sophisticated Cyberattacks

From Battlefield to Bedside: The Military Health System’s Bold Reinvention of Health IT

Anthropic Unveils Claude Opus 4.8: Elevated AI Coding and Enhanced Safety at the Same Premium Price

Daily Debrief Newsletter

Revolutionary UX Overhaul for Leading Bitcoin Hardware Wallet

AI Clash! Research Reveals Top AI Models Struggle to Reach Consensus on Fundamental Facts

Soaring 138%: Dell’s AI and Trump Rally Fuel Gains, but Is a Pullback Looming?

Courtroom Battle Over $293 Billion in Dormant Bitcoin: Anonymous Claimant Asserts Legal Right Without Knowledge of Private Keys

Why Institutions Are Locked Out of DeFi’s Golden Age

Ethereum Foundation Drawn Back Into the Crypto Culture War Fray

Revolutionary UX Overhaul for Leading Bitcoin Hardware Wallet

GreyVibe Hackers Weaponize ChatGPT and Gemini in Sophisticated Cyberattacks

From Battlefield to Bedside: The Military Health System’s Bold Reinvention of Health IT

Telenor IoT and Sateliot Unite NB-IoT With LEO Satellite Coverage

IBM and Red Hat Unite Against Open-Source Chaos

RAG Is Draining Your Budget — So I Created a Cost-Saving Layer to Stop the Burn

AI Clash! Research Reveals Top AI Models Struggle to Reach Consensus on Fundamental Facts

ChatGPhish: How ChatGPT’s Web Feature Became a Phisher’s Playground

Trending

Revolutionary UX Overhaul for Leading Bitcoin Hardware Wallet

GreyVibe Hackers Weaponize ChatGPT and Gemini in Sophisticated Cyberattacks

Latest Posts

Not More Data, but Better World Models – Unite.AI

OpenAI Is Hiring Head of Preparedness, Amid AI Cyberattack Fears

Subscribe to Updates

Top Posts

Anthropic Unveils Claude Opus 4.8: Elevated AI Coding and Enhanced Safety at the Same Premium Price

In brief

Five ways to think

Almost as Safe as Claude Mythos

The Growing Cost Divide

We put it to the test

Daily Debrief Newsletter

Related Posts