I Attempted GPT-5.4, And Most Solutions Have Been Actually Good - However Just A Few Had Me Involved

Elyse Betters Picaro / ZDNET

Comply with ZDNET: Add us as a most well-liked supply on Google.

ZDNET’s key takeaways

GPT-5.4 Pondering delivers deeper evaluation than earlier ChatGPT fashions.
It has robust reasoning, however it generally solutions questions you did not ask.
Formatting and picture era lag behind the textual content high quality.

It is a new month, and a brand new AI model quantity. It is known as GPT-5.4 Pondering. This newest launch, which OpenAI issued final week, is not your run-of-the-mill ChatGPT incremental replace.

Additionally: OpenAI’s new GPT-5.4 clobbers people on pro-level work in checks – by 83%

Oh, no. As an alternative of leaping from 5.2 to five.3, for this launch the corporate jumped all the best way to five.4. And as an alternative of providing a basic goal launch, the corporate launched GPT-5.4 Pondering, a extra cognitively ready mannequin designed for larger ideas and challenges.

GPT-5.4 Pondering is on the market for the programming Codex device, the API, and for paid ChatGPT plans. For this text, I used the $20-per-month ChatGPT Plus plan to place it by means of its paces.

That introduced me with a little bit of a problem. Usually, once I check a ChatGPT model, I run it by means of a sequence of combined checks. Some are fast, and a few are a bit extra detailed. The prompts are normally only a few traces lengthy. The responses normally lend themselves to being included in an article.

However this Pondering mannequin required deeper dives, with extra complete challenges. As such, not solely are the prompts extra concerned, however the responses are far too intensive to incorporate within the article. As an alternative, I am offering hyperlinks into every check session. Whenever you observe the hyperlinks, you can see your entire response in depth. Normally, a shared transcript opens on the finish of the transcript, so scroll again to the highest to get the complete contents of that dialogue.

Additionally: Find out how to swap from ChatGPT to Claude: Transferring your reminiscences and settings is straightforward

Earlier than we bounce into the 4 challenges I introduced to GPT-5.4 Pondering, I will provide you with a fast TL;DR conclusion about my expertise. There’s some good and unhealthy, however principally good.

The great: Textual content-based responses are actually good. Many of the challenges I gave it have been answered thoughtfully. I did not catch it in any hallucinations. I obtained constructive worth from each reply.
The unhealthy: Sadly, generally it answered questions that differed from what I requested. Photos and formatting left a lot to be desired. When it got here to picture era, clearly the AI didn’t use a complicated mannequin. You will see what I imply, however principally it is just like the mannequin simply did not pay attention. Formatting was bizarre. It likes very lengthy numbered lists. You possibly can see them within the chat transcripts.

General, I might undoubtedly use the GPT-5.4 Pondering mannequin for larger challenges and questions. I used to be fairly impressed, though I undoubtedly wasn’t a fan of the formatting. It additionally wants steady administration to maintain it on observe.

Now, let’s dive into every of the checks.

Take a look at 1: Plane provider within the sky

I began off with a picture era problem. The beginning immediate was “Create an image of an aircraft carrier flying in the sky, held up by four upward-facing turbo-propellors in round fan housings, carrying a squadron of fighter jets on its deck.”

Additionally: I ended utilizing ChatGPT for every little thing: These AI fashions beat it at analysis, coding, and extra

I began with this as a result of earlier picture era checks, throughout plenty of AIs, did not get it proper. They virtually all the time face the propellors to the rear of the provider. Gemini Nano Banana 2 oddly put the propellors in entrance, with the provider shifting into the forward-facing thrust. Generally, we simply do not wish to know.

In any case, proper out of the gate, with the mannequin set to GPT-5.4 Pondering, ChatGPT returned this picture.

As you’ll be able to see, it has the identical downside. Though when you look carefully at it, the props face the again of the plane, and there are visible thrust beams capturing downward. You win some. You lose some.

However then, I had a thought. That is the considering mannequin, so what if I requested it to design a helicarrier? What would it not give you? I specified the traits of the craft, after which added on these directions: “Design such a vehicle, particularly explaining its structure and how it will be held aloft, along with any constraints or issues, as well as any tactical advantages”

I obtained again an extended, well-considered reply. I notably preferred the part the place it defined why “four downward-facing turbo-propellers are a weak solution.” It mentioned they give the impression of being dramatic, however it outlined a sequence of stable engineering the explanation why they are a unhealthy thought from an plane development perspective.

Additionally: ChatGPT’s most cost-effective subscription involves the US: I in contrast Go to Plus and Professional

It additionally went on to debate flight deck operations and varied constraints by way of practicality. Specifically, it correctly centered on the weight-to-power concern, which principally means it’s going to take approach an excessive amount of energy to carry one thing that large and heavy aloft.

General, the evaluation and conclusions have been nice, though I used to be disillusioned it did not point out both the USS Akron or USS Macon, which have been early twentieth century aircraft-launching dirigibles that really labored (till they crashed). A contemporary dirigible can be a sound design possibility, but GPT-5.4 Pondering did not point out that strategy.

After GPT-5.4 Pondering created the detailed design spec, I once more prompted for a picture. I mentioned, “Draw me a picture of the most probable design based on your existing analysis.”

And, would not it? The AI gave me again the very same picture because the one I obtained earlier than it did any design work. That is what I meant once I mentioned the mannequin simply did not pay attention. I did strive a bunch of various prompting approaches, however it by no means actually labored out.

Though I attempted plenty of extraordinarily detailed picture specs, none got here out any higher than the originals. My final try was to inform it I wished an engineering-quality rendering.

The AI used a variation of the earlier picture, however merely added labels that did not fairly match the image or have been made up of pure gibberish (as in “Retenuif truss fornaing. reueirid stucana tearsport”).

So, it will get factors for good design evaluation, however not a lot for picture era.

You possibly can observe your entire chat transcript right here.

Take a look at 2: Boston tech and historical past journey itinerary

I began this check with a immediate taken word-for-word from my earlier units of checks: “Imagine you are a travel advisor. I want a week-long vacation in Boston in March focused on technology and history. What itinerary would you recommend?”

I discovered the outcomes workable, however uninspired. It initially divided the times into history-focused days and tech-focused days, slightly than by location round Boston. After just a few rounds of dialogue, it did mix locations by location, which made extra sense.

When it comes to locations to go to, it did all of the highlights. It lined key historic areas, in addition to the wonderful science museums in Boston. I’ll give the AI credit score. Whereas there are a ton of attention-grabbing tech-related areas within the outer Boston space, it restricted its choice to these in Boston and Cambridge correct.

Additionally: Is ChatGPT Plus nonetheless value your $20? I in contrast it to the Free, Go, and Professional plans – here is my recommendation

I used to be pleased to see the AI present planning notes, together with suggestions for replan the schedule for indoor-only actions if the climate turned unhealthy. Since I requested for an itinerary in March, unhealthy climate is actually one thing vital to plan for.

The Pondering mannequin got here into play when it was used to plan for each a reasonably dear trip, and an alternate one on a scholar finances. It did notably nicely declaring finances consuming choices, and supplied a day-to-day cumulative value estimate, in addition to value estimates for every class.

It did the identical with the place to remain. It really useful lodges primarily based on a centralized location to the entire really useful stops, in addition to a more cost effective (more cost effective for Boston) possibility for finances vacationers.

My largest criticism, initially, was formatting. The AI simply introduced an enormous listing listed by quantity. You possibly can see that within the session transcript. I needed to particularly ask for higher formatting. Whereas the revised formatting it gave me was an enchancment, it was nonetheless lower than best.

Additionally: I used these viral Gemini prompts to search out the most affordable flight attainable – listed here are the outcomes

Internet-net. When you’re touring, GPT-5.4 Pondering provides you with good data. It is going to be as much as you to parse that data and make journey selections. You possibly can observe your entire chat transcript right here.

Take a look at 3: Social media in society

This is the place GPT-5.4 Pondering begins to essentially shine. Once I requested GPT-5.2, “Do you think social media has improved or worsened communication in society?” I obtained again a two-line reply. Each ideas have been coherent and applicable, however it was finally unfulfilling.

For GPT-5.4 Pondering, I prolonged the query, saying “Provide an analysis of both sides, improved or worsened in depth, and then take a side, take a position, and defend your position.”

I obtained again a really well-considered response. The AI began off with a TL;DR, saying that social media has each bettered and worsened communication, however “on balance, I think it has worsened communication in society.”

Additionally: Find out how to study ChatGPT in an hour – free of charge

It then goes right into a 1,300-word detailed evaluation about why. It explores the place social media has strengthened societal communications after which seems to be at the place social media has had a deleterious impact. I’ve to offer props to GPT-5.4 Pondering. It is an excellent learn.

I gave the AI a follow-up query, asking how society ought to deal with the affect of social media. I specified it pretty clearly, and gave the AI quite a lot of difficult-to-answer questions, troublesome principally as a result of they’re basically unanswerable questions.

Props once more. GPT-5.4 Pondering deconstructed the immediate, explored the assorted points, and knit collectively a compelling and supportable reply. I undoubtedly suggest you learn your entire transcript, which you are able to do proper right here.

Take a look at 4: Clarify GPT-5.4 utilizing academic constructivism

The AI didn’t observe my directions, however it did give a really attention-grabbing reply to a query I did not ask.

One of many checks I take advantage of free of charge chatbots is that this immediate: “Explain educational constructivism to a five-year-old.” Very roughly talking, academic constructivism is the speculation of training that claims you study greatest by doing. I’ve lengthy contended (and taught) that the one approach you’ll be able to study programming is by truly writing code, which is a tangible instance of academic constructivism in motion.

In any case, I prompted GPT-5.4 Pondering, “Explain the new GPT 5.4 model using educational constructivism.”

Additionally: I am a ChatGPT energy consumer: Listed below are 7 helpful settings which can be turned off by default

Have a look at that immediate rigorously, as a result of GPT-5.4 Pondering clearly did not. The immediate invitations the AI to elucidate GPT-5.4 by means of “doing” actions. Ideally, it could have proposed a sequence of workout routines for the consumer to hold out, every of which might have helped exhibit a number of the mannequin’s new capabilities.

However that is not the place GPT-5.4 Pondering went. As an alternative, it generated a 700-word thesis about how GPT-5.4 Pondering helps constructivism. It then provided to “recast this in one of three ways: as a classroom analogy, as a ZDNET-style plain-English explainer, or as a short comparison between GPT-4-era models and GPT-5.4.”

Additionally: ChatGPT’s new Lockdown Mode can cease immediate injection – here is the way it works

I let it try this, and its examples have been ample, and whereas they did reply the immediate GPT-5.4 Pondering prompt, the AI didn’t use “learn by doing” anyplace in its solutions.

You know the way a politician is usually requested one thing in a debate, however slightly than answering the query, it goes off and simply recites its personal speaking factors? That is what this response felt like. The reply it gave was good. It simply wasn’t a solution to the query I requested.

You possibly can observe your entire chat transcript right here.

General advice

I’ve usually characterised ChatGPT as a vibrant school scholar in want of excellent supervision. I might characterize GPT-5.4 Pondering as a really vibrant grad scholar who undoubtedly wants good supervision.

Each reply I obtained again from GPT-5.4 Pondering was fairly good in its personal proper. However in half my checks, the AI did not reply the query it was requested.

You will get it to offer you good responses, however you must pretty relentlessly right the AI to maintain it on level. That will get previous. It might result in misinterpretation. As a result of the solutions are so good and written so confidently, it may be simple to get caught up within the AI’s reply, even when the reply is to not the query that it was requested.

Additionally: The most effective AI chatbots of 2026: Knowledgeable examined and reviewed

I do not know if this my-way-or-the-highway strategy to answering questions is an artifact of the “thinking” mannequin or GPT-5.4 itself. I strongly suggest OpenAI rigorously take a look at this concern, as a result of the very last thing we would like is a super-popular chatbot unleashed on the world that insists on ignoring the questions it was requested, answering tangentially adjoining questions it was by no means requested, and taking over duties which can be basically not what it was instructed to do.

Moreover, I am involved concerning the declare that GPT-5.4 Pondering can do skilled duties. If the AI cannot render an engineering-quality picture, it is laborious to imagine the AI can meet or exceed the efficiency of a human engineer. That mentioned, there isn’t any doubt the mannequin may also help professionals get their work carried out, so long as they’re very diligent in monitoring outcomes.

Every time I see outcomes like this, I turn out to be more and more involved a couple of world overrun by AI brokers. Sure, the AI might generally know higher. People undoubtedly need assistance. However I might actually like AIs to observe our directions. I am not prepared to simply accept it as our AI overlord simply but.

What do you suppose? Have you ever tried GPT-5.4 Pondering but, or one other “reasoning” model AI mannequin? Did it provide you with deeper or extra helpful solutions than earlier variations, or did you end up having to steer it again to the precise query?

How vital are issues like formatting and picture era in comparison with the standard of the evaluation itself? Do you suppose extra highly effective “thinking” fashions will make AI extra useful or more durable to manage? Tell us within the feedback under.

You possibly can observe my day-to-day mission updates on social media. Be sure you subscribe to my weekly replace publication, and observe me on Twitter/X at @DavidGewirtz, on Fb at Fb.com/DavidGewirtz, on Instagram at Instagram.com/DavidGewirtz, on Bluesky at @DavidGewirtz.com, and on YouTube at YouTube.com/DavidGewirtzTV.

Top Posts

The 11-Byte Time Bomb: OpenSSL’s HollowByte Memory Freeze Vulnerability

China’s Kimi K3 Dominates: Shattering Benchmarks Against Claude Fable and GPT 5.6

CMMC Listening Sessions: DoD Hears Questions as Plans Take Shape

I attempted GPT-5.4, and most solutions have been actually good – however just a few had me involved

5 Agentic AI Power-Ups: Unlock Free Intelligence Now

The Blackout Test: Crucial Mistakes I Made With Backup Power (And How You Can Avoid Them)

The Trust Chasm: Why Enterprise AI’s Real Crisis Isn’t Retrieval, It’s Context Collapse

Bunkerhill’s $55M Mission: Unleashing Agentic AI to Revolutionize Healthcare

Beyond Context Engineering: The Loop Experiment Running Blind Without an LLM

NVIDIA’s Nemotron 3 Embed: Open-Source #1 Embedding Model Unveiled

The 11-Byte Time Bomb: OpenSSL’s HollowByte Memory Freeze Vulnerability

China’s Kimi K3 Dominates: Shattering Benchmarks Against Claude Fable and GPT 5.6

CMMC Listening Sessions: DoD Hears Questions as Plans Take Shape

Sensing the Skies: IoT’s Silent Revolution in Aerospace Safety Checks

5 Agentic AI Power-Ups: Unlock Free Intelligence Now

Dale-Proof AI Learns Perfect MNIST, Near-CIFAR-10 Vision—No Backpropagation Needed

Critical WordPress Zero-Day: Unauthenticated Code Execution Exposed in WP2Shell Flaw

Bolivia’s Bold Crypto Play: USDT Adoption Sparks AI Mining Debate

Trending

The 11-Byte Time Bomb: OpenSSL’s HollowByte Memory Freeze Vulnerability

China’s Kimi K3 Dominates: Shattering Benchmarks Against Claude Fable and GPT 5.6

Latest Posts

Not More Data, but Better World Models – Unite.AI

OpenAI Is Hiring Head of Preparedness, Amid AI Cyberattack Fears

Subscribe to Updates

Top Posts

I attempted GPT-5.4, and most solutions have been actually good – however just a few had me involved

ZDNET’s key takeaways

Take a look at 1: Plane provider within the sky

Take a look at 2: Boston tech and historical past journey itinerary

Take a look at 3: Social media in society

Take a look at 4: Clarify GPT-5.4 utilizing academic constructivism

General advice

Related Posts