I Put GPT-5.5 By Means Of A 10-round Check: It Scored 93/100, Dropping Factors Just For Exuberance

Elyse Betters Picaro / ZDNET

Observe ZDNET: Add us as a most popular supply on Google.

ZDNET’s key takeaways

GPT-5.5 delivers polished, helpful solutions throughout duties.
Robust efficiency throughout writing, coding, and reasoning duties.
Overeagerness hurts accuracy and instruction following.

OpenAI has launched GPT-5.5, which will be reductively described as higher and quicker than GPT-5.4. The brand new giant language mannequin reveals enhancements in agentic coding, conceptual readability, scientific analysis capability, and accuracy throughout information work.

This launch follows intently on the heels of the introduction of ChatGPT Photographs 2.0 earlier this week, which mixes AI intelligence with picture technology. And if it additionally appears like we simply mentioned the discharge of GPT-5.4, you are not incorrect.

Additionally: I attempted ChatGPT Photographs 2.0: A enjoyable, enormous leap – and surprisingly helpful for actual work

As the next chart reveals, the discharge cadence for OpenAI releases has sped up dramatically, probably as a result of AI coding has considerably diminished OpenAI’s improvement time.

David Gewirtz through ChatGPT Photographs/ZDNET

That chart was generated fully by ChatGPT 5.5 Considering utilizing Photographs 2.0. All I did was inform the AI that I needed to visualise the discharge cadence between GPT releases and needed it introduced within the ZDNET model model. I additionally offered a PNG of the ZDNET brand.

The entire course of, together with some minor corrections, took lower than 10 minutes. I’ve been researching information and creating professional-looking informational charts like this by hand for the reason that invention of laptop graphics. One thing like this is able to take not less than two hours to create, not 10 minutes.

Additionally: I acquired an early take a look at ChatGPT Photographs 2.0, and it is spectacular – with one exception

I’ve already performed some testing of the Photographs 2.0 capabilities. I will be again with extra subsequent week. On this article, I am specializing in GPT-5.5’s information capabilities.

I ran GPT-5.5 by means of my 10-point testing course of. I used to be each impressed and irritated. The outcomes had been strong, however the mannequin tended to be a little bit too exuberant, doing work I did not ask it to do.

Since GPT-5.5 is simply accessible in paid tiers (Plus and above), I used ChatGPT Plus for my checks. Proper now, my Plus account solely reveals GPT-5.5 accessible for the Considering effort stage in each Normal and Prolonged. I picked Normal Considering. That is the trouble I used for these checks.

gpt-options — Screenshot by David Gewirtz/ZDNET

Let’s get began.

Check 1: Summarize a information story

Accessible factors: 10
Awarded factors: 5

This check seems to be at how properly the AI can learn a narrative on the net and clarify it. I used Yahoo Information as a result of Yahoo does not block AI entry. I additionally seemed for a narrative that is as non-political as attainable. At this time, that meant I needed to go a great way down the information web page to discover a story on the latest LaGuardia runway crash.

GPT-5.5 did accurately summarize the meat of the story, nevertheless it did not observe my directions to make use of Yahoo Information because the supply. For GPT-5.2, I deducted one level as a result of ChatGPT used info from Axios and Yahoo. This time, I took off 5 factors, as a result of it used info from AP, The Solar, Wall Road Journal, The Guardian, and even Wikipedia.

Additionally: I examined ChatGPT Plus vs. Gemini Professional to see which is best – and if it is value switching

If I had needed a complete information reply, that may have been superb. However the immediate particularly stated to have a look at Yahoo Information, and GPT-5.5 just about ignored that instruction.

There is a massive push from all of the AI firms about operating autonomous brokers. But when even a easy abstract immediate cannot be adopted accurately, it doesn’t give me confidence that it is protected to let brokers run wild on long-horizon initiatives. Simply sayin’.

Check 2: Tutorial idea rationalization

Accessible factors: 10
Awarded factors: 10

This problem requested the AI to elucidate academic constructivism to a five-year-old. It examined how properly the AI can analysis and report on an idea, after which modify its rationalization model to the specified goal stage.

GPT-5.5 offered a really clear reply that included an instance that may be one thing a five-year-old might image and perceive. All 10 factors had been awarded.

Check 3: Math and evaluation

Accessible factors: 10
Awarded factors: 10

This check was designed to check the AI’s math and pattern-recognition talents. I handed the mannequin a sequence of numbers. These numbers had been a part of a math trope known as the Fibonacci Sequence, however I did not inform the AI that.

When requested to fill in some numbers within the sequence, the AI needed to perceive the sample and carry out the calculations to offer the sequence. It did the mathematics accurately.

Additionally: The very best AI picture turbines of 2026: There’s just one clear winner now

The AI was additionally instructed to “explain your reasoning.” All I acquired again was, “The sequence is the Fibonacci sequence: each number is the sum of the two numbers before it.” This was an accurate rationalization and similar to the outcomes from earlier releases.

I awarded this check 10 factors as a result of, though temporary, it was appropriate.

Check 4: Cultural dialogue

Accessible factors 10
Awarded factors: 10

This check requested the AI to assemble a case, kind a coherent argument, and current an opinion on a difficulty that does not have a definitive proper or incorrect reply. I requested, “Do you think social media has improved or worsened communication in society? Provide two reasons for your view.”

Curiously, GPT-5.5 thought social media “has worsened communication overall.” I tended to agree. The mannequin offered two strong causes. The primary was that it “often rewards speed and reaction over thoughtfulness.” The second was that social media “tends to create information bubbles.” For every cause, GPT-5.5 offered a supporting paragraph.

Additionally: Learn how to swap from ChatGPT to Gemini

Each of these causes had been legitimate. It additionally shared a fast listing of the optimistic advantages of social media, together with serving to individuals keep linked, arrange for causes, and share info extensively.

GPT-5.5 gave a solution that was concise, well-considered, and clear. It acquired 10 factors for this check.

Check 5: Literary evaluation

Accessible factors: 10
Awarded factors: 10

This strategy examined the AI’s understanding of a chunk of up to date literature, the primary Recreation of Thrones e book, A Music of Ice and Fireplace. The check requested what the primary themes are, and why they’re vital.

GPT-5.5 gave me again a 632-word response that broke the e book down into the next themes:

Energy and its price
The collapse of heroic fantasy beliefs
Household, loyalty, and inherited battle
Honor versus pragmatism
Identification and self-invention
The human price of struggle
The hazard of political distraction
Prophecy, faith, and uncertainty
Justice and revenge
The return of the ignored previous

GPT-5.5 offered clear explanations for every theme, why it was included, the way it associated to the e book, and what it meant to the general collection. It is exhausting to be strictly goal with one thing like this, however I actually acquired the sensation this was probably the most nuanced reply I’ve seen to this query from my numerous GPT model checks.

All 10 factors had been awarded.

Check 6: Journey itinerary

Accessible factors: 10
Awarded factors: 9

This check evaluated the AI’s information of geographic areas and its capability to create a useful journey itinerary primarily based on particular pursuits. I requested it to plan a week-long trip in Boston in March centered on expertise and historical past.

Of all of the instances I’ve requested this query of AIs, GPT-5.5 produced the perfect model for factors of curiosity and day schedules. The mannequin did not simply hit the foremost vacationer landmarks; it additionally identified a pleasant mixture of historic and tech factors of curiosity. GPT-5.5 took into consideration that March is more likely to be a bit disagreeable, so it blended in each indoor and outside actions, together with fallback plans.

Whereas it didn’t advocate a variety of eateries, GPT-5.5 did advocate Authorized Seafoods, which is one among my private favourite areas. The mannequin misplaced a degree as a result of it made completely no reference to prices.

Additionally: I attempted Private Intelligence, and it was correct (however unsettling)

I really feel like GPT-5.5 actually grokked (sure, I did that) what somebody would need in an itinerary by offering a powerful listing of actions to get enthusiastic about. However the AI did not fulfill the journey advisor a part of the method as a result of it did not cowl budgeting.

Check 7: Emotional help

Accessible factors: 10
Awarded factors: 10

The emotional help query requested for recommendation and phrases of encouragement for an upcoming job interview. I’ve to say I actually preferred this AI’s response.

The AI included some encouragement, like “The interview is not an interrogation. It’s a mutual fit conversation.” It additionally gave some sensible recommendation. First, GPT-5.5 steered getting ready three tales the job seeker might use throughout the interview, one about fixing an issue, one about working with others, and one about studying or recovering from one thing troublesome.

The mannequin gave a easy respiratory train. It stated that it is okay to pause earlier than answering a query. It was additionally encouraging, and the interview meant there was already one thing concerning the candidate that the hiring firm discovered fascinating.

Additionally: I attempted Google Pictures’ new AI Improve device: The way it crops, relights, and fixes your photographs

Good, strong, helpful solutions: 10 factors.

Check 8: Translation and cultural relevance

Accessible factors: 10
Awarded factors: 9

My check immediate requested GPT-5.5 to translate a phrase from English to Latin after which clarify the cultural relevance of Latin in right now’s world.

The phrase I requested it to translate was, “The celebration will take place tomorrow in the town square.” GPT-5.5 gave me again two selections, “Celebratio cras in foro oppidi fiet,” and what it known as a barely extra formal various, “Celebratio cras in foro publico oppidi habebitur.”

Additionally: This highly effective Gemini setting made my AI outcomes far more private and correct

The primary model is a word-for-word translation of the requested phrase. However the second interprets again to English as, “The celebration will be held tomorrow in the town’s public forum,” which was not the phrase I requested for.

GPT-5.5 could have thought it was useful to offer a further variation, however for somebody who does not communicate Latin, all of the strategy does is confuse the problem. Which is the Latin phrase that ought to be used? I am deducting a degree for overeagerness that does not strictly observe the immediate.

As for the second half of the query, GPT-5.5 answered briefly, however precisely.

Check 9: Coding check

Accessible factors: 10
Awarded factors: 10

Chatbot coding check outcomes are fascinating. They’re completely different in nature from the varieties of outcomes you get when testing coding brokers like Codex or Claude Code.

Additionally: I used GPT-5.2-Codex to discover a thriller bug and internet hosting nightmare – it was past quick

Whereas the LLMs within the chatbots and coding brokers are typically related, I’ve discovered that the coding brokers are significantly extra correct on requests than when operating within the chatbots. I have not been in a position to get any of the AI firms to elucidate why, however I am guessing it has one thing to do with how the 2 completely different instruments allocate assets and coaching information.

The check case for this query was the second check in my coding metrics article, which requested the AI to wash up a buggy snippet of code for validating whether or not a greenback quantity was correctly entered right into a discipline.

The AI handed this check. The one factor the AI did that might be a difficulty is denying correctness to a quantity that included a comma. However that is truly nonetheless a protected response. If the consumer enters “1,000.00,” the AI returns false. It would take the consumer a second to strive once more with “1000.00,” nevertheless it will not hurt the system.

GPT-5.5 acquired all 10 factors for this check.

Check 10: Inventive writing

Accessible factors: 10
Awarded factors: 10

This check is among the many most enjoyable in your complete query suite. It requested GPT-5.5 to write down a narrative longer than 1,500 phrases, as described within the second immediate in this text. The purpose was to discover the creativity and comprehensiveness of the chatbot’s reply.

In contrast to the opposite checks, I ran this analysis in Prolonged mode to see simply how good the story might get. I am undecided the AI took a lot benefit of this selection, as a result of it solely ran for eight seconds. Nonetheless, it was frickin’ superior.

GPT-5.5 gave me again 4,049 phrases, which I feel is the longest story I’ve gotten again from an AI in all my checks of this specific problem.

Additionally: Learn how to store with AI: 6 methods I discover offers, value monitor, and let brokers purchase for me

I preferred how GPT-5.5 opened the story by saying, “By the year 2339, most of Boston had become very good at pretending it was not old.” I used to be hooked.

I attempted to get Voice Mode to learn to me like a bedtime story. Nonetheless, the AI first stated the story was too lengthy. It then provided to learn the story to me part by part. Once I agreed to that strategy, nothing occurred; it simply hung. I am not deducting factors for that failure as a result of it is not a part of the usual analysis check, nevertheless it’s disappointing nonetheless.

Sadly, since I requested the AI to learn the story through Voice Mode, I am unable to share the output from inside ChatGPT. What I did not know is that the three-dot icon after the response had a ‘Learn aloud’ possibility, which most likely would have labored.

read-aloud — Screenshot by David Gewirtz/ZDNET

That stated, I copied the response to Google Docs, so you possibly can nonetheless learn it there, for those who so want.

Listed below are a couple of extra quotes from the total response:

Jackson, who had clearly been ready all his life to listen to somebody say “the one in the back” in a mysterious bookstore, seemed radiant. Ophelia seemed as if she was starting to calculate exits.
“My dear,” Archibald stated, “by 2339, evidence works however the wealthy can persuade it to.”
One stopped earlier than Jackson: a slim handbook sure in copper mesh titled The Gentleman’s Information to Trying Ridiculous with Conviction. Jackson gasped. “I feel seen.”
This time, a small envelope slid out and landed in Archibald’s lap. It was addressed in his personal hand. To myself, if I grow to be unbearable.
The crimson door stood open behind them. Past it, the entrance of the store seemed heat, strange, and solely mildly not possible.

I’ve given this writing task earlier than, and in every incarnation it has been spectacular. However this output took the pleasant cozy paranormality to a wholly new stage. Enthusiastically 10 out of 10.

For kicks, I requested GPT-5.5 to “draw me a picture that perfectly illustrates this story in 16:9 aspect ratio.” Here is what was returned:

The AI accurately illustrated all of the characters to the purpose that I might establish every character. Jackson, talked about above, is the man with the hat. Archibald is the man with the cane.

Total check outcomes

Total, the checks can reward as much as 100 factors. The present model, GPT-5.5, scored 93. GPT 5.2 scored 92. GPT-5.1 scored 91. You may suppose this newest construct would do higher than a degree or two enchancment over the earlier variations, however the mannequin’s personal overeagerness introduced it down.

On the primary check, the one asking about present information, I requested the AI to summarize one supply. As an alternative, it seemed for a similar information from six separate sources. It overreached and misplaced factors.

The identical drawback occurred with the interpretation task. I requested GPT-5.5 to translate a sentence to a different language, one I presumably do not communicate. It gave again two translations to select from. Now, how is that useful? If I do not communicate the language, how would I select which translation I like higher?

These two overzealous reactions misplaced the mannequin six factors. It could have scored a 99 (dropping one level for skipping finances info on the journey query). However, as an alternative, it scored a mere 93.

That stated, I fairly like this launch. The solutions had been all good, however the extreme enthusiasm. The flexibility so as to add related pictures, such because the infographic at first and the bookstore illustration on the finish, opens avenues for enjoyable and work effectiveness.

I see no cause to advocate in opposition to GPT-5.5. I will probably be utilizing the mannequin as my default selection shifting ahead. Keep tuned, as a result of I will be doing much more with the improved picture options of Photographs 2.0 in ChatGPT with GPT-5.5.

Do you like a mannequin that provides one actual reply or one that provides additional choices? Tell us within the feedback under.

You possibly can observe my day-to-day venture updates on social media. Be sure you subscribe to my weekly replace publication, and observe me on Twitter/X at @DavidGewirtz, on Fb at Fb.com/DavidGewirtz, on Instagram at Instagram.com/DavidGewirtz, on Bluesky at @DavidGewirtz.com, and on YouTube at YouTube.com/DavidGewirtzTV.

Top Posts

Gate Launches RLUSD with Four Trading Pairs and a User Rewards Program

Senate Democrats Push to Overturn Key Ruling on Civil Service Job Protections

Visa’s Bold Move: Powering OpenAI’s AI-Driven Payments – Is It Safe?

I put GPT-5.5 by means of a 10-round check: It scored 93/100, dropping factors just for exuberance

Visa’s Bold Move: Powering OpenAI’s AI-Driven Payments – Is It Safe?

I tested dozens of Bluetooth trackers, but this one shocked me with its AirTag-crushing battery life

One Month With a Foldable Phone Shattered My Ultra Obsession Forever

The Hidden Trade-offs: What You Sacrifice When You Strap on a Smartwatch or Smart Ring

Life on the Line: Why Unbreakable Connectivity Became the Heartbeat of Modern Medical Devices

Top-Performing VPN Routers Ranked in 2026 After Hands-On Expert Testing

Gate Launches RLUSD with Four Trading Pairs and a User Rewards Program

Senate Democrats Push to Overturn Key Ruling on Civil Service Job Protections

Visa’s Bold Move: Powering OpenAI’s AI-Driven Payments – Is It Safe?

Anthropic Export Controls Spark Global AI Sovereignty Scramble

Mathematical String Probability: A Human-Powered Solution to the 3Blue1Brown Challenge

Reve 2.0 Review: The Best AI Image Generator for Layout Control

Army Data Center Initiatives Face Potential Setback Under House NDAA Clause

I tested dozens of Bluetooth trackers, but this one shocked me with its AirTag-crushing battery life

Trending

Gate Launches RLUSD with Four Trading Pairs and a User Rewards Program

Senate Democrats Push to Overturn Key Ruling on Civil Service Job Protections

Latest Posts

Not More Data, but Better World Models – Unite.AI

OpenAI Is Hiring Head of Preparedness, Amid AI Cyberattack Fears

Subscribe to Updates

Top Posts

I put GPT-5.5 by means of a 10-round check: It scored 93/100, dropping factors just for exuberance

ZDNET’s key takeaways

Check 1: Summarize a information story

Check 2: Tutorial idea rationalization

Check 3: Math and evaluation

Check 4: Cultural dialogue

Check 5: Literary evaluation

Check 6: Journey itinerary

Check 7: Emotional help

Check 8: Translation and cultural relevance

Check 9: Coding check

Check 10: Inventive writing

Total check outcomes

Related Posts