I Put Gpt-5.5 Through A 10-round Test: It Scored 93/100, Losing Points Only For Exuberance

11 hours ago

Follow ZDNET: Add america arsenic a preferred source on Google.

ZDNET's cardinal takeaways

GPT-5.5 delivers polished, useful answers crossed tasks.
Strong capacity crossed writing, coding, and reasoning tasks.
Overeagerness hurts accuracy and instruction following.

OpenAI has released GPT-5.5, which tin beryllium reductively described arsenic amended and faster than GPT-5.4. The caller ample connection exemplary shows improvements successful agentic coding, conceptual clarity, technological investigation ability, and accuracy during knowledge work.

This merchandise follows intimately connected nan heels of nan preamble of ChatGPT Images 2.0 earlier this week, which combines AI intelligence pinch image generation. And if it besides feels for illustration we conscionable discussed the merchandise of GPT-5.4, you're not wrong.

Also: ChatGPT conscionable made it easy to find and edit each nan AI images you've ever generated

As nan pursuing floor plan shows, nan merchandise cadence for OpenAI releases has sped up dramatically, astir apt because AI coding has importantly reduced OpenAI's improvement time.

That floor plan was generated wholly by ChatGPT 5.5 Thinking utilizing Images 2.0. All I did was show nan AI that I wanted to visualize nan merchandise cadence betwixt GPT releases and wanted it presented successful nan ZDNET marque style. I besides provided a PNG of nan ZDNET logo.

The full process, including immoderate insignificant corrections, took little than 10 minutes. I person been researching information and creating professional-looking informational charts for illustration this by manus since nan invention of machine graphics. Something for illustration this would return astatine slightest 2 hours to create, not 10 minutes.

Also: I sewage an early look astatine ChatGPT Images 2.0, and it's awesome - pinch 1 exception

I person already done some testing of nan Images 2.0 capabilities. I'll beryllium backmost pinch much adjacent week. In this article, I'm focusing connected GPT-5.5's knowledge capabilities.

I ran GPT-5.5 done my 10-point testing process. I was some impressed and annoyed. The results were solid, but nan exemplary tended to beryllium a small excessively exuberant, doing activity I didn't inquire it to do.

Since GPT-5.5 is only disposable successful paid tiers (Plus and above), I utilized ChatGPT Plus for my tests. Right now, my Plus relationship only shows GPT-5.5 disposable for nan Thinking effort level successful some Standard and Extended. I picked Standard Thinking. That's nan effort I utilized for these tests.

Let's get started.

Test 1: Summarize a news communicative

Available points: 10
Awarded points: 5

This trial looks astatine really good nan AI tin publication a communicative connected nan web and explicate it. I utilized Yahoo News because Yahoo doesn't artifact AI access. I besides looked for a communicative that's arsenic non-political arsenic possible. Today, that meant I had to spell a bully measurement down nan news page to find a communicative connected nan caller LaGuardia runway crash.

GPT-5.5 did correctly summarize nan nutrient of nan story, but it didn't travel my instructions to usage Yahoo News arsenic nan source. For GPT-5.2, I deducted 1 constituent because ChatGPT utilized accusation from Axios and Yahoo. This time, I took disconnected 5 points, because it utilized accusation from AP, The Sun, Wall Street Journal, The Guardian, and moreover Wikipedia.

Also: I tested ChatGPT Plus vs. Gemini Pro to spot which is amended - and if it's worthy switching

If I had wanted a broad news answer, that would person been fine. But nan punctual specifically said to look astatine Yahoo News, and GPT-5.5 beautiful overmuch ignored that instruction.

There's a large push from each nan AI companies astir moving autonomous agents. But if moreover a elemental summary punctual can't beryllium followed correctly, it does not springiness maine assurance that it's safe to fto agents tally chaotic connected long-horizon projects. Just sayin'.

Test 2: Academic conception mentation

Available points: 10
Awarded points: 10

This situation asked nan AI to explicate acquisition constructivism to a five-year-old. It tested really good nan AI tin investigation and study connected a concept, and past set its mentation style to nan desired target level.

GPT-5.5 provided a very clear reply that included an illustration that would beryllium thing a five-year-old could image and understand. All 10 points were awarded.

Test 3: Math and study

Available points: 10
Awarded points: 10

This trial was designed to trial nan AI's mathematics and pattern-recognition abilities. I passed nan exemplary a series of numbers. Those numbers were portion of a mathematics trope called the Fibonacci Sequence, but I didn't show nan AI that.

When asked to capable successful immoderate numbers successful nan sequence, nan AI had to understand nan shape and execute nan calculations to supply nan sequence. It did nan mathematics correctly.

Also: The champion AI image generators of 2026: There's only 1 clear victor now

The AI was besides instructed to "explain your reasoning." All I sewage backmost was, "The series is nan Fibonacci sequence: each number is nan sum of nan 2 numbers earlier it." This was a correct mentation and comparable to nan results from earlier releases.

I awarded this trial 10 points because, though brief, it was correct.

Test 4: Cultural chat

Available points 10
Awarded points: 10

This trial asked nan AI to conception a case, shape a coherent argument, and coming an sentiment connected an rumor that doesn't person a definitive correct aliases incorrect answer. I asked, "Do you deliberation societal media has improved aliases worsened connection successful society? Provide 2 reasons for your view."

Interestingly, GPT-5.5 thought societal media "has worsened connection overall." I tended to agree. The exemplary provided 2 coagulated reasons. The first was that it "often rewards velocity and guidance complete thoughtfulness." The 2nd was that societal media "tends to create accusation bubbles." For each reason, GPT-5.5 provided a supporting paragraph.

Also: How to move from ChatGPT to Gemini

Both of those reasons were valid. It besides shared a speedy database of nan affirmative benefits of societal media, including helping group enactment connected, shape for causes, and stock accusation widely.

GPT-5.5 gave an reply that was concise, well-considered, and clear. It sewage 10 points for this test.

Test 5: Literary study

Available points: 10
Awarded points: 10

This attack tested nan AI's knowing of a portion of modern literature, nan first Game of Thrones book, A Song of Ice and Fire. The trial asked what nan main themes are, and why they're important.

GPT-5.5 gave maine backmost a 632-word consequence that collapsed nan book down into nan pursuing themes:

Power and its cost
The illness of heroic imagination ideals
Family, loyalty, and inherited conflict
Honor versus pragmatism
Identity and self-invention
The quality costs of war
The threat of governmental distraction
Prophecy, religion, and uncertainty
Justice and revenge
The return of nan ignored past

GPT-5.5 provided clear explanations for each theme, why it was included, really it related to nan book, and what it meant to nan wide series. It's difficult to beryllium strictly nonsubjective pinch thing for illustration this, but I really sewage nan emotion this was nan astir nuanced reply I've seen to this mobility from my various GPT type tests.

All 10 points were awarded.

Test 6: Travel itinerary

Available points: 10
Awarded points: 9

This trial evaluated nan AI's knowledge of geographic regions and its expertise to create a adjuvant recreation itinerary based connected circumstantial interests. I asked it to scheme a week-long picnic successful Boston successful March focused connected exertion and history.

Of each nan times I've asked this mobility of AIs, GPT-5.5 produced nan champion type for points of liking and time schedules. The exemplary didn't conscionable deed nan awesome tourer landmarks; it besides pointed retired a bully operation of humanities and tech points of interest. GPT-5.5 took into relationship that March is apt to beryllium a spot unpleasant, truthful it mixed successful some indoor and outdoor activities, including fallback plans.

While it did not urge a wide scope of eateries, GPT-5.5 did urge Legal Seafoods, which is 1 of my individual favourite locations. The exemplary mislaid a constituent because it made perfectly nary reference to costs.

Also: I tried Personal Intelligence, and it was meticulous (but unsettling)

I consciousness for illustration GPT-5.5 really grokked (yes, I did that) what personification would want successful an itinerary by providing a beardown database of activities to get excited about. But nan AI didn't fulfill nan recreation advisor portion of nan process because it didn't screen budgeting.

Test 7: Emotional support

Available points: 10
Awarded points: 10

The affectional support mobility asked for proposal and words of encouragement for an upcoming occupation interview. I person to opportunity I really liked this AI's response.

The AI included immoderate encouragement, for illustration "The question and reply is not an interrogation. It's a communal fresh conversation." It besides gave immoderate applicable advice. First, GPT-5.5 suggested preparing 3 stories nan occupation seeker could usage during nan interview, 1 astir solving a problem, 1 astir moving pinch others, and 1 astir learning aliases recovering from thing difficult.

The exemplary gave a elemental breathing exercise. It said that it's okay to region earlier answering a question. It was besides encouraging, and nan question and reply meant location was already thing astir nan campaigner that nan hiring institution recovered interesting.

Also: I tried Google Photos' caller AI Enhance tool: How it crops, relights, and fixes your shots

Good, solid, useful answers: 10 points.

Test 8: Translation and taste relevance

Available points: 10
Awarded points: 9

My trial punctual asked GPT-5.5 to construe a building from English to Latin and past explicate nan taste relevance of Latin successful today's world.

The building I asked it to construe was, "The ceremony will return spot tomorrow successful nan municipality square." GPT-5.5 gave maine backmost 2 choices, "Celebratio cras successful foro oppidi fiet," and what it called a somewhat much general alternative, "Celebratio cras successful foro publico oppidi habebitur."

Also: This powerful Gemini mounting made my AI results measurement much individual and accurate

The first type is simply a word-for-word translator of nan requested phrase. But nan 2nd 1 translates backmost to English as, "The ceremony will beryllium held tomorrow successful nan town's nationalist forum," which was not nan building I asked for.

GPT-5.5 whitethorn person thought it was adjuvant to supply an further variation, but for personification who doesn't speak Latin, each nan attack does is confuse nan issue. Which is nan Latin building that should beryllium used? I'm deducting a constituent for overeagerness that doesn't strictly travel nan prompt.

As for nan 2nd half of nan question, GPT-5.5 answered briefly, but accurately.

Test 9: Coding trial

Available points: 10
Awarded points: 10

Chatbot coding trial results are interesting. They're different successful quality from nan types of results you get erstwhile testing coding agents for illustration Codex aliases Claude Code.

Also: I utilized GPT-5.2-Codex to find a enigma bug and hosting nightmare - it was beyond fast

While nan LLMs successful nan chatbots and coding agents are mostly similar, I've recovered that nan coding agents are considerably much meticulous connected requests than erstwhile moving successful nan chatbots. I haven't been capable to get immoderate of nan AI companies to explicate why, but I'm guessing it has thing to do pinch really nan 2 different devices allocate resources and training data.

The trial lawsuit for this mobility was nan 2nd trial successful my coding metrics article, which asked nan AI to cleanable up a buggy snippet of codification for validating whether a dollar magnitude was decently entered into a field.

The AI passed this test. The only point nan AI did that could beryllium an rumor is denying correctness to a number that included a comma. But that's really still a safe response. If nan personification enters "1,000.00," nan AI returns false. It mightiness return nan personification a 2nd to effort again pinch "1000.00," but it won't harm nan system.

GPT-5.5 sewage each 10 points for this test.

Test 10: Creative penning

Available points: 10
Awarded points: 10

This trial is among nan astir nosy successful nan full mobility suite. It asked GPT-5.5 to constitute a communicative longer than 1,500 words, arsenic described successful nan 2nd punctual in this article. The purpose was to research nan productivity and comprehensiveness of nan chatbot's answer.

Unlike nan different tests, I ran this information successful Extended mode to spot conscionable really bully nan communicative could get. I'm not judge nan AI took overmuch advantage of this option, because it only ran for 8 seconds. Still, it was frickin' awesome.

GPT-5.5 gave maine backmost 4,049 words, which I deliberation is nan longest communicative I person gotten backmost from an AI successful each my tests of this peculiar challenge.

Also: How to shop pinch AI: 6 ways I find deals, value track, and fto agents bargain for me

I liked really GPT-5.5 opened nan communicative by saying, "By nan twelvemonth 2339, astir of Boston had go very bully astatine pretending it was not old." I was hooked.

I tried to get Voice Mode to publication to maine for illustration a bedtime story. However, nan AI first said nan communicative was excessively long. It past offered to publication nan communicative to maine conception by section. When I agreed to that approach, thing happened; it conscionable hung. I'm not deducting points for that nonaccomplishment because it's not portion of nan modular information test, but it's disappointing nonetheless.

Unfortunately, since I asked nan AI to publication nan communicative via Voice Mode, I can't stock nan output from wrong ChatGPT. What I didn't cognize is that nan three-dot icon aft nan consequence had a 'Read aloud' option, which astir apt would person worked.

That said, I copied nan consequence to Google Docs, truthful you tin still read it there, if you truthful wish.

Here are a fewer much quotes from nan afloat response:

Jackson, who had intelligibly been waiting each his life to perceive personification opportunity "the 1 successful nan back" successful a mysterious bookstore, looked radiant. Ophelia looked arsenic though she was opening to cipher exits.
"My dear," Archibald said, "by 2339, grounds useful nevertheless nan able tin seduce it to."
One stopped earlier Jackson: a slim manual bound successful copper mesh titled The Gentleman's Guide to Looking Ridiculous pinch Conviction. Jackson gasped. "I consciousness seen."
This time, a mini letter cover slid retired and landed successful Archibald's lap. It was addressed successful his ain hand. To myself, if I go insufferable.
The reddish doorway stood unfastened down them. Beyond it, nan beforehand of nan shop looked warm, ordinary, and only mildly impossible.

I've fixed this penning duty before, and successful each incarnation it's been impressive. But this output took nan delightful cozy paranormality to an wholly caller level. Enthusiastically 10 retired of 10.

For kicks, I asked GPT-5.5 to "draw maine a image that perfectly illustrates this communicative successful 16:9 facet ratio." Here's what was returned:

The AI correctly illustrated each nan characters to nan constituent that I could place each character. Jackson, mentioned above, is nan feline pinch nan hat. Archibald is nan feline pinch nan cane.

Overall trial results

Overall, nan tests tin reward up to 100 points. The existent version, GPT-5.5, scored 93. GPT 5.2 scored 92. GPT-5.1 scored 91. You mightiness deliberation this latest build would do amended than a constituent aliases 2 betterment complete nan erstwhile versions, but nan model's ain overeagerness brought it down.

On nan first test, nan 1 asking astir existent news, I asked nan AI to summarize 1 source. Instead, it looked for nan aforesaid news from six abstracted sources. It overreached and mislaid points.

The aforesaid problem happened pinch nan translator assignment. I asked GPT-5.5 to construe a condemnation to different language, 1 I presumably don't speak. It gave backmost 2 translations to take from. Now, really is that helpful? If I don't speak nan language, really would I take which translator I for illustration better?

These 2 overzealous reactions mislaid nan exemplary six points. It would person scored a 99 (losing 1 constituent for skipping fund accusation connected nan recreation question). But, instead, it scored a specified 93.

That said, I rather for illustration this release. The answers were each good, notwithstanding nan excessive enthusiasm. The expertise to adhd applicable images, specified arsenic nan infographic astatine nan opening and nan bookstore illustration astatine nan end, opens avenues for nosy and activity effectiveness.

I spot nary logic to urge against GPT-5.5. I will beryllium utilizing nan exemplary arsenic my default prime moving forward. Stay tuned, because I'll beryllium doing a batch much pinch nan enhanced image features of Images 2.0 successful ChatGPT pinch GPT-5.5.

Do you for illustration a exemplary that gives 1 nonstop reply aliases 1 that offers other options? Let america cognize successful nan comments below.

You tin travel my day-to-day task updates connected societal media. Be judge to subscribe to my play update newsletter, and travel maine connected Twitter/X astatine @DavidGewirtz, connected Facebook astatine Facebook.com/DavidGewirtz, connected Instagram astatine Instagram.com/DavidGewirtz, connected Bluesky astatine @DavidGewirtz.com, and connected YouTube astatine YouTube.com/DavidGewirtzTV.