OpenAI's New Spark Mannequin Codes 15x Quicker Than GPT-5.3-Codex - However There Is A Catch

OpenAI / Elyse Betters Picaro / ZDNET

Observe ZDNET: Add us as a most popular supply on Google.

ZDNET’s key takeaways

OpenAI targets “conversational” coding, not sluggish batch-style brokers.
Huge latency wins: 80% quicker roundtrip, 50% quicker time-to-first-token.
Runs on Cerebras WSE-3 chips for a latency-first Codex serving tier.

The Codex crew at OpenAI is on fireplace. Lower than two weeks after releasing a devoted agent-based Codex app for Macs, and solely per week after releasing the quicker and extra steerable GPT-5.3-Codex language mannequin, OpenAI is relying on lightning putting for a 3rd time.

Additionally: OpenAI’s new GPT-5.3-Codex is 25% quicker and goes method past coding now – what’s new

Right now, the corporate has introduced a analysis preview of GPT-5.3-Codex-Spark, a smaller model of GPT-5.3-Codex constructed for real-time coding in Codex. The corporate reviews that it generates code 15 instances quicker whereas “remaining highly capable for real-world coding tasks.” There’s a catch, and I will discuss that in a minute.

Additionally: OpenAI’s Codex simply acquired its personal Mac app – and anybody can strive it totally free now

Codex-Spark will initially be accessible solely to $200/mo Professional tier customers, with separate price limits in the course of the preview interval. If it follows OpenAI’s typical launch technique for Codex releases, Plus customers can be subsequent, with different tiers gaining entry pretty shortly.

(Disclosure: Ziff Davis, ZDNET’s guardian firm, filed an April 2025 lawsuit towards OpenAI, alleging it infringed Ziff Davis copyrights in coaching and working its AI methods.)

Increasing the Codex household for real-time collaboration

OpenAI says Codex-Spark is its “first model designed specifically for working with Codex in real-time — making targeted edits, reshaping logic, or refining interfaces and seeing results immediately.”

Let’s deconstruct this briefly. Most agentic AI programming instruments take some time to answer directions. In my programming work, I may give an instruction (and this is applicable to each Codex and Claude Code) and go off and work on one thing else for some time. Generally it is just some minutes. Different instances, it may be lengthy sufficient to get lunch.

Additionally: I acquired 4 years of product improvement carried out in 4 days for $200, and I am nonetheless surprised

Codex-Spark is outwardly capable of reply a lot quicker, permitting for fast and steady work. This might pace up improvement significantly, particularly for easier prompts and queries.

I do know that I have been sometimes pissed off after I’ve requested an AI a brilliant easy query that ought to have generated a direct response, however as a substitute I nonetheless needed to wait 5 minutes for a solution.

By making responsiveness a core function, the mannequin helps extra fluid, conversational coding. Generally, utilizing coding brokers feels extra like old-school batch model coding. That is designed to beat that feeling.

GPT-5.3-Codex-Spark is not meant to switch the bottom GPT-5.3-Codex. As a substitute, Spark was designed to enrich high-performance AI fashions constructed for long-running, autonomous duties lasting hours, days, or weeks.

Efficiency

The Codex-Spark mannequin is meant for work the place responsiveness issues as a lot as intelligence. It helps interruption and redirection mid-task, enabling tight iteration loops.

That is one thing that appeals to me, as a result of I at all times consider one thing extra to inform the AI ten seconds after I’ve given it an task.

Additionally: I used Claude Code to vibe code a Mac app in 8 hours, however it was extra work than magic

The Spark mannequin defaults to light-weight, focused edits, making fast tweaks relatively than taking large swings. It additionally would not routinely run checks except requested.

OpenAI has been capable of cut back latency (quicker turnaround) throughout the total request-response pipeline. It says that overhead per consumer/server roundtrip has been diminished by 80%. Per-token overhead has been diminished by 30%. The time-to-first-token has been diminished by 50% by session initialization and streaming optimizations.

One other mechanism that improves responsiveness throughout iteration is the introduction of a persistent WebSocket connection, so the connection would not have to repeatedly be renegotiated.

Powered by Cerebras AI chips

In January, OpenAI introduced a partnership with AI chipmaker Cerebras. We have been overlaying Cerebras for some time. We have lined its inference service, its work with DeepSeek, its work boosting the efficiency of Meta’s Llama mannequin, and Cerebras’ announcement of a actually large AI chip, meant to double LLM efficiency.

GPT-5.3-Codex-Spark is the primary milestone for the OpenAI/Cerebras partnership introduced final month. The Spark mannequin runs on Cerebras’ Wafer Scale Engine 3, which is a high-performance AI chip structure that enhances pace by placing all of the compute assets on a single wafer-scale processor the scale of a pancake.

Additionally: 7 ChatGPT settings tweaks that I can not work with out – and I am an influence consumer

Often, a semiconductor wafer incorporates an entire bunch of processors, which later within the manufacturing course of get lower aside and put into their very own packaging. The Cerebras wafer incorporates only one chip, making it a really, very large processor with very, very intently coupled connections.

In accordance with Sean Lie, CTO and co-founder of Cerebras, “What excites us most about GPT-5.3-Codex-Spark is partnering with OpenAI and the developer community to discover what fast inference makes possible — new interaction patterns, new use cases, and a fundamentally different model experience. This preview is just the beginning.”

The gotchas

Now, listed below are the gotchas.

First, OpenAI says that “when demand is high, you may see slower access or temporary queuing as we balance reliability across users.” So, quick, except too many individuals need to go quick.

Here is the kicker. The corporate says, “On SWE-Bench Pro and Terminal-Bench 2.0, two benchmarks evaluating agentic software engineering capability, GPT-5.3-Codex-Spark underperforms GPT-5.3-Codex, but can accomplish the tasks in a fraction of the time.”

Final week, within the GPT-5.3-Codex announcement, OpenAI mentioned that GPT-5.3-Codex was the primary mannequin it classifies as “high capability” for cybersecurity, in keeping with its revealed Preparedness Framework. Alternatively, the corporate admitted that GPT-5.3-Codex-Spark “does not have a plausible chance of reaching our Preparedness Framework threshold for high capability in cybersecurity.”

Additionally: I ended utilizing ChatGPT for the whole lot: These AI fashions beat it at analysis, coding, and extra

Suppose on these statements, pricey reader. This AI is not as sensible, however it does do these not-as-smart issues rather a lot quicker. 15x pace is definitely nothing to sneeze at. However do you really need an AI to make coding errors 15 instances quicker and produce code that’s much less safe?

Let me let you know this. “Eh, it’s good enough” is not actually ok when you may have hundreds of pissed off customers coming at you with torches and pitchforks since you immediately broke their software program with a brand new launch. Ask me how I do know.

Final week, we discovered that OpenAI makes use of Codex to put in writing Codex. We additionally know that it makes use of it to have the ability to construct code a lot quicker. So the corporate clearly has a use case for one thing that is method quicker, however not as sensible. As I get a greater deal with on what that’s and the place Spark matches, I will let you realize.

What’s subsequent?

OpenAI shared that it’s working towards twin modes of reasoning and real-time work for its Codex fashions.

The corporate says, “Codex-Spark is the first step toward a Codex with two complementary modes: longer-horizon reasoning and execution, and real-time collaboration for rapid iteration. Over time, the modes will blend.”

The workflow mannequin it envisions is fascinating. In accordance with OpenAI, the intent is that ultimately “Codex can keep you in a tight interactive loop while delegating longer-running work to sub-agents in the background, or fanning out tasks to many models in parallel when you want breadth and speed, so you don’t have to choose a single mode up front.”

Additionally: I attempted a Claude Code rival that is native, open supply, and fully free – the way it went

Basically, it is working towards the very best of each worlds. However for now, you may select quick or correct. That is a troublesome selection. However the correct is getting extra correct, and now, not less than, you may go for quick whenever you need it (so long as you retain the trade-offs in thoughts and also you’re paying for the Professional tier).

What about you? Would you commerce some intelligence and safety functionality for 15x quicker coding responses? Does the concept of a real-time, interruptible AI collaborator enchantment to you, or do you like a extra deliberate, higher-accuracy mannequin for critical improvement work?

How involved are you in regards to the cybersecurity distinction between Codex-Spark and the total GPT-5.3-Codex mannequin? And if you happen to’re a Professional consumer, do you see your self switching between “fast” and “smart” modes relying on the duty? Tell us within the feedback beneath.

You possibly can observe my day-to-day undertaking updates on social media. Be sure you subscribe to my weekly replace publication, and observe me on Twitter/X at @DavidGewirtz, on Fb at Fb.com/DavidGewirtz, on Instagram at Instagram.com/DavidGewirtz, on Bluesky at @DavidGewirtz.com, and on YouTube at YouTube.com/DavidGewirtzTV.

Top Posts

Stablecoins Behave Like FX Markets as Liquidity Splits: Eco CEO

Orthrus: towards evolutionary and practical RNA basis fashions

‘A better group of people’: A 12 months after deep staffing cuts, HHS on monitor to develop its workforce

OpenAI’s new Spark mannequin codes 15x quicker than GPT-5.3-Codex – however there is a catch

IoT Fleet Administration: Telematics, Monitoring and Operational Optimization

Google AI Releases Auto-Diagnose: An Massive Language Mannequin LLM-Based mostly System to Diagnose Integration Check Failures at Scale

Your outdated iPad or Android pill may be your new sensible dwelling panel – here is how

The regional knowledge centre revolution powered by AI demand

0G IoT Options Scales to 500,000 Endpoints in Mexico with Hybrid LPWAN Technique

Qwen Crew Open-Sources Qwen3.6-35B-A3B: A Sparse MoE Imaginative and prescient-Language Mannequin with 3B Lively Parameters and Agentic Coding Capabilities

Stablecoins Behave Like FX Markets as Liquidity Splits: Eco CEO

Orthrus: towards evolutionary and practical RNA basis fashions

‘A better group of people’: A 12 months after deep staffing cuts, HHS on monitor to develop its workforce

IoT Fleet Administration: Telematics, Monitoring and Operational Optimization

Extended AI use could be hazardous to your well being and work: 4 methods to remain secure

How a quantum pc can be utilized to truly steal your bitcoin in ‘9 minutes’

Payouts King ransomware makes use of QEMU VMs to bypass endpoint safety

Google AI Releases Auto-Diagnose: An Massive Language Mannequin LLM-Based mostly System to Diagnose Integration Check Failures at Scale

Trending

Stablecoins Behave Like FX Markets as Liquidity Splits: Eco CEO

Orthrus: towards evolutionary and practical RNA basis fashions

Latest Posts

Not More Data, but Better World Models – Unite.AI

OpenAI Is Hiring Head of Preparedness, Amid AI Cyberattack Fears

Subscribe to Updates

Top Posts

OpenAI’s new Spark mannequin codes 15x quicker than GPT-5.3-Codex – however there is a catch

ZDNET’s key takeaways

Increasing the Codex household for real-time collaboration

Efficiency

Powered by Cerebras AI chips

The gotchas

What’s subsequent?

Related Posts