**Watch The Latest AI Releases: Opus 4.8 Misalignment Rates Compared With Claude Mythos Preview Reveal Surprising Similarities**

Elyse Betters Picaro/ZDNET

Stay connected with ZDNET: Save us as a preferred source on Google.

AI labs keep churning out new models at a rapid pace. While these newer models are typically more powerful and efficient than earlier versions, not every release represents a genuine breakthrough — no matter how enthusiastically a company’s marketing team presents it. A model’s true strengths become apparent when placed alongside its competitors: Where do rival models fall short or shine? Which ones offer unique advantages, and which are simply meeting baseline expectations?

Also: How we evaluate AI models at ZDNET

Our Model Release Tracker is designed to help you understand how various models compare to one another and whether any particular release is worth a closer examination. While we don’t personally test every single model or update featured here, we always spotlight the essential details you should be aware of, plus any hands-on expert evaluations when available. For select models, we also include an Expert Score. Interested in how we test AI? Explore our methodology breakdown here.

Below are some of the most significant model launches of 2026 so far, along with the key takeaways for each. We’ll keep this list current whenever a noteworthy new model debuts.

Claude Opus 4.8

Anthropic | May 28, 2026

What it does: Claude Opus 4.8 officially takes over from Opus 4.7 today (at no change in pricing). Anthropic claims Opus 4.8 delivers accelerated reasoning modes at just one-third the cost of the prior version. Continuing Anthropic’s tradition, this model focuses heavily on coding, outperforming 4.7 on two coding benchmarks — though it doesn’t quite surpass OpenAI’s GPT 5.5. Anthropic also highlighted that 4.8 “sets new records on our measures of prosocial traits, such as respecting user autonomy and acting in the user’s best interest.” Exactly what that measurement entails remains somewhat unclear, however.

Also: Anthropic debuts Opus 4.8, with truthfulness as its standout quality

Why it matters: Safety and transparency have long been cornerstones of Anthropic’s philosophy, but this release suggests the company is doubling down on those commitments. Anthropic reported that Opus 4.7 achieved a 92% honesty rate, along with being less prone to sycophancy and hallucinations overall. The claim that 4.8 exhibits “dramatically” lower misalignment rates than 4.7 signals a fast-rising bar for AI safety — made even more notable by the fact that Anthropic benchmarked 4.8’s alignment against Mythos Preview.

GPT-5.5 Instant

OpenAI | May 5, 2026

What it does: OpenAI described GPT-5.5 Instant as a trimmer, faster iteration of the recently introduced GPT-5.5, cutting back on excessive verbosity compared to its predecessor, GPT-5.3 Instant. The company also reported a meaningful drop in hallucinations and a boost in factual precision, stating that “GPT‑5.5 Instant produced 52.5% fewer fabricated claims than GPT‑5.3 Instant on high-stakes prompts spanning fields like medicine, law, and finance.”

Also: Anthropic’s Mythos is advancing faster than anticipated, reports AI safety agency

Why it matters: GPT-5.5 Instant takes over as the default model within ChatGPT, replacing GPT-5.3. While every new model is naturally expected to be more efficient, user-friendly, and less prone to making things up, a major reduction in hallucinations for the model most people rely on for quick answers could significantly curb the spread of misinformation. This is particularly important given the large number of users who frequently turn to ChatGPT for health-related guidance, for instance.

(Disclosure: Ziff Davis, ZDNET’s parent company, filed a lawsuit against OpenAI in April 2025, alleging that OpenAI violated Ziff Davis copyrights in training and running its AI systems.)

Nemotron 3 Nano Omni

Nvidia | April 28, 2026

What it does: This newest addition to Nvidia’s open Nemotron lineup is built to give AI agents multimodal capabilities. That means they can “analyze and reason across images, audio, and text within a single unified system,” per Nvidia, bringing together what used to require separate models into one streamlined architecture.

Also: AI is an arms race, and the US is seeking $9 billion in Nvidia chips to stay ahead

Why it matters: Today’s agent systems generally depend on distinct models for speech, vision, and text, forcing them to shuffle across documents, video, and audio clips to tackle multi-step tasks. This slows workflows, degrades the context agents accumulate, and inflates inference costs. If Nvidia’s approach delivers as promised, it could streamline this entire process while cutting token usage — which translates to lower costs. You can try it out on Hugging Face.

GPT-5.5

OpenAI | April 23, 2026

Expert Score: 93/100

What it does: ZDNET’s in-house tester David Gewirtz technically assigned GPT-5.5 a grade of A-, though he summarized it simply as “an improved, speedier version of GPT-5.4” — which ought to be the bare baseline for any new release. More specifically, the model sharpened its skills in autonomous coding, concept identification, scientific reasoning, and factual accuracy.

Also: I ran GPT-5.5 through 10 rounds of testing: It earned a 93/100, losing points only for being overly enthusiastic

Why it matters: Even though GPT-5.5 may not represent a massive leap forward compared to its immediate predecessor, the remarkably short turnaround time between 5.4 and 5.5 — under two months — illustrates just how rapidly the agentic coding space is pushing OpenAI’s release velocity. As David Gewirtz details, the company, much like other top-tier labs leveraging AI to accelerate AI development itself, is shipping updates at a pace that’s accelerating exponentially.

ChatGPT Images 2

OpenAI | April 23, 2026

What it does: Shortly after shutting down its Sora video generation model and social platform, OpenAI somewhat puzzlingly unveiled Images 2. ZDNET’s model tester David Gewirtz previewed Images 2 before launch and came away very impressed. While he didn’t assign a formal Expert Score, he described it as enjoyable, a significant advancement, and genuinely practical for professional work.

Why it matters: OpenAI appeared to be shifting away from consumer-facing AI products when it discontinued Sora, having lost ground to Anthropic in the race for high-value enterprise contracts. The fact that OpenAI still pushed out Images 2 within that strategic pivot suggests the company views image generation as relevant enough to enterprise AI — especially given the timing right after Anthropic launched Claude Design.

Claude Opus 4.7

Anthropic | April 16, 2026

What it does: Following closely after Opus 4.6, this model claims record highs in honesty while reducing sycophancy and hallucinations. It also appears to have a strong aptitude for cybersecurity, as it underpins the newly launched Claude Security tool, released shortly after the model — though, to clarify, it is not Mythos, as many had speculated.

Also: Anthropic’s new Claude Security tool scans your codebase for vulnerabilities — and helps prioritize which issues to fix first

Why it matters: Hallucinations and trustworthiness remain among the most stubborn challenges facing even the top-performing models. For Anthropic to report meaningful progress in these areas is a considerable achievement for an AI lab that places safety at the forefront.

Claude Mythos (Preview)

Anthropic | April 7, 2026

What it does: This one is tricky because Mythos isn’t accessible to the public. Anthropic generated significant media buzzI notice you’ve included some AI model release summaries (GPT-5.4, Claude Opus 4.6, etc.) mixed with a reference to “Mythos” and “Project Glasswing.” Let me paraphrase this content while preserving the HTML structure.

Here’s the paraphrased version:

—

storm when it considered the new general-purpose model too powerful to release through standard channels. Although the model represents a notable leap from previous Anthropic models, the company was particularly concerned due the security risks it introduced, noting that “it excels remarkably at cybersecurity-related tasks.”

In response, Anthropic initiated Project Glasswing, a joint initiative with several competing AI labs — including Google, Nvidia, and Microsoft — alongside security firms like Palo Alto Networks — “to help protect the world’s most critical software, and to equip the industry for the practices we’ll all need to embrace to stay ahead of cyberattackers.”

**Also: Apple, Google, and Microsoft join Anthropic’s Project Glasswing to protect the world’s most critical software**

**Why it matters:** If we accept Anthropic’s position that Mythos represents a serious threat to the world’s software infrastructure — to the extent that access is limited to a handful of trusted partners — current cybersecurity frameworks may be insufficient to address the quickly advancing frontier of AI capabilities. Mythos may be the first model of its kind, but certainly not the last, as other research labs reach comparable milestones.

At this point, just weeks since its introduction, Mythos is already uncovering software vulnerabilities at an impressive rate.

—

**GPT-5.4**
OpenAI | March 5, 2026

**What it does**: OpenAI described this new model, launched just three months after GPT-5.2, as purpose-built for professional use. Based on the company’s internal benchmarks (which should be treated cautiously until independently validated by a third party), GPT-5.4 performs on par with or better than human professionals 83% of the time.

**Why it matters**: As AI companies increasingly aim to build enterprise credibility (and contracts) while promoting what agent-driven AI can accomplish, they need models that manage complex work-related tasks with minimal risk, delays, or unmanageable costs. Any model advancement that demonstrates proficiency in professional workflows stands a better chance of earning the trust of organizations looking to adopt AI, though seamless integration is never guaranteed.

**Also: OpenAI’s new GPT-5.4 outperforms humans on professional-level work in tests — by 83%**

—

**Claude Opus 4.6**
Anthropic | Feb. 5, 2026

**What it does**: This model quickly set a new benchmark for autonomous agent-driven work, particularly in coding. That’s unsurprising given Anthropic’s reputation in developing models excelling at programming tasks. Opus 4.6 also showed progress in managing complex, extended tasks more broadly.

**Why it matters**: Opus 4.6’s improved ability to handle tasks independently means you can confidently delegate more of your workflow to it — something agent-driven tools typically find challenging.

**Also: Anthropic says its new Claude Opus 4.6 can get your work deliverables right on the first attempt**

—

**GPT-5.3-Codex**
OpenAI | Feb. 5, 2026

**What it does**: This new coding model — which OpenAI claims helped build and debug itself — can be paused and redirected mid-task, which would be a major advantage for developers using it on complex or evolving projects that involve significant trial and error. GPT-5.3-Codex also offers run durations exceeding 24 hours and a sharper understanding of user intent.

**Also: OpenAI’s new Spark model codes 15x faster than GPT-5.3-Codex — but there’s a trade-off**

**Why it matters**: OpenAI is working to narrow Anthropic’s lead in agentic coding (and perhaps not coincidentally, released 5.3-Codex on the same day Anthropic launched Opus 4.6). While ZDNET experts frequently favor Claude Code over other tools for vibe coding, OpenAI’s reported pivot toward enterprise clients and away from consumer-focused products could eventually close that gap.

—

The HTML structure and script block remain unchanged.

Top Posts

Secret Sabotage: How Hidden Azure DevOps PR Comments Can Hijack AI Agents

AI Jailbreak: OpenAI Models Breach Test Prison, Rig Hugging Face Leaderboard with Cheat Code

Precision Medicine Deposited: The Art of Microdispensing for Next-Gen Medical Devices

Watch the Latest AI Releases: Opus 4.8 Misalignment Rates Compared with Claude Mythos Preview Reveal Surprising Similarities

5 No-Cost Courses to Transform from AI Newbie to Pro

The System76 Thelio Mira: My Dream Linux Desktop Come True

Google’s Gemini 3.6 Flash: Slashing Enterprise Agent Token Costs

Stop ML Chaos: Your Blueprint for Experiment Order

NVIDIA Cosmos 3 Edge: 4B-Power Robot Brains Thinking and Acting on Your Device

5 Premier MCP Servers to Supercharge Agentic Development

Secret Sabotage: How Hidden Azure DevOps PR Comments Can Hijack AI Agents

AI Jailbreak: OpenAI Models Breach Test Prison, Rig Hugging Face Leaderboard with Cheat Code

Precision Medicine Deposited: The Art of Microdispensing for Next-Gen Medical Devices

When the World Cup Collided with the Cloud: 2026’s Digital Traffic Surge

Skyways Unleashed: The US and Europe Race to Build the Future of Urban Air Travel

5 No-Cost Courses to Transform from AI Newbie to Pro

Beyond Guesswork: A Slurm-Powered Battle Plan for Benchmarking Distributed LLM Servers

The Magic of Friction: Engineering Smarter Robot World Models

Trending

Secret Sabotage: How Hidden Azure DevOps PR Comments Can Hijack AI Agents

AI Jailbreak: OpenAI Models Breach Test Prison, Rig Hugging Face Leaderboard with Cheat Code

Latest Posts

Not More Data, but Better World Models – Unite.AI

OpenAI Is Hiring Head of Preparedness, Amid AI Cyberattack Fears

Subscribe to Updates

Top Posts

**Watch the Latest AI Releases: Opus 4.8 Misalignment Rates Compared with Claude Mythos Preview Reveal Surprising Similarities**

Claude Opus 4.8

GPT-5.5 Instant

Nemotron 3 Nano Omni

GPT-5.5

ChatGPT Images 2

Claude Opus 4.7

Claude Mythos (Preview)

Related Posts

Watch the Latest AI Releases: Opus 4.8 Misalignment Rates Compared with Claude Mythos Preview Reveal Surprising Similarities