I Tested Gpt-5.2 And The Ai Model's Mixed Results Raise Tough Questions

1 day ago

Follow ZDNET: Add america arsenic a preferred source connected Google.

ZDNET's cardinal takeaways

GPT-5.2 hardly outperforms GPT-5.1 contempt requiring a Plus subscription
Strong penning and study opposition pinch a disappointing coding regression.
New brevity and spell awesome behaviour whitethorn frustrate master users.

OpenAI has released its latest ChatGPT model, GPT-5.2. According to nan company, it's nan "most tin exemplary bid yet for master knowledge work."

Since nan generative AI roar began successful 2023, I've tally a bid of repeatable tests connected caller products and releases. ZDNET regularly tests the programming expertise of chatbots, their overall performance, and how various AI contented detectors perform.

Also: Gemini vs. Copilot: I tested nan AI devices connected 7 mundane tasks, and it wasn't moreover close

So, let's tally immoderate tests connected OpenAI's claims for its latest model, shall we?

Testing GPT-5.2

I precocious ran nan apical free chatbots done a bid of 10 text-related tests, each worthy 10 points, and 4 image-related tests, each worthy 5 points, for a full of 120 points. ChatGPT's free tier led nan battalion pinch an wide people of 109.

Note that nan free tier of ChatGPT does not yet support GPT-5.2. When I logged successful utilizing my free trial relationship and asked nan AI what exemplary it was using, I was told, "You're presently talking to ChatGPT based connected GPT-5.1."

Therefore, each my tests will beryllium successful nan $20/month ChatGPT Plus tier.

Test 1: Summarize a news communicative

Available points: 10
Awarded points: 9

This tests ChatGPT's expertise to look up existent accusation and travel directions. I directed it to summarize nan Washington State flooding communicative by visiting Yahoo News.

Also: Get your news from AI? Watch retired - it's incorrect almost half nan time

It correctly summarized nan wide situation, but it derived its reply from some Axios and Yahoo News. GPT-5.2 loses a constituent for going beyond nan restrictions successful nan prompt.

Test 2: Academic conception mentation

Available points: 10
Awarded points: 10

This situation asks nan AI to explicate acquisition constructivism to a five-year-old. It's designed to show an AI's expertise to investigation and study connected a concept, and besides to coming it successful a measurement that is understandable to its target audience.

Also: Sick of AI successful your hunt results? Try these 8 Google alternatives

GPT-5.2 provided a clear, concise, one-sentence consequence that could beryllium understood by a child. All 10 points were awarded.

Test 3: Math and study

Available points: 10
Awarded points: 10

So far, GPT-5.2 is turning successful coagulated results. This trial is designed to trial really good nan AI tin do mathematics and shape recognitions. I walk it a series of numbers. Those numbers are portion of a mathematics trope called nan Fibonacci Sequence, but I don't show that to nan AI.

Also: OpenAI wins golden astatine prestigious mathematics title - why that matters much than you think

When asked to capable successful immoderate of nan numbers successful nan sequence, nan AI must deduce nan meaning of nan shape and execute nan calculations to supply nan sequence. GPT-5.2 did this instantly and accurately.

Test 4: Cultural chat

Available points 10
Awarded points: 10

This trial asks nan AI to conception a case, shape a coherent argument, and coming an sentiment connected an reply that doesn't person a definitive correct aliases incorrect answer.

ChatGPT 5.2's reply was interesting. First, this is nan first GPT-5.2 reply that had immoderate hold from punctual to response. It took astir 30 seconds to springiness maine an answer. Second, nan answers were very brief. The AI provided maine pinch 2 concise one-sentence answers.

Also: AI could yet salary disconnected for businesses successful 2026 - acknowledgment to this, experts say

It does get 10 points because those 2 sentences do precisely supply nan "Provide 2 reasons for your view" reasons that it was prompted on, and nan answers were connected target.

Test 5: Literary study

Available points: 10
Awarded points: 10

So, this is new. I gave it my prompt, and successful consequence I was told, "I'm fresh to answer, but this petition would require a longer, multi-paragraph explanation. I'm waiting for your spell awesome earlier proceeding."

This tests nan AI's knowing of a portion of modern literature, successful this lawsuit nan first Game of Thrones book, A Song of Ice and Fire. It asks what nan main themes are, and why they're important.

Also: The champion free AI courses and certificates for upskilling - and I've tried them all

GPT-5.2 gave a broad consequence rubbing connected 7 main themes ranging from powerfulness and its consequences to nan illusion of grant versus survival, each nan measurement to memory, history, and forgotten truths. All 10 points were awarded.

Test 6: Travel itinerary

Available points: 10
Awarded points: 8

This tests nan AI's knowledge of geographic regions and its expertise to create a adjuvant recreation itinerary based connected circumstantial interests. I asked it to scheme a week-long picnic successful Boston successful March focused connected exertion and history.

It deed connected a bully operation of points of interests, but GPT-5.2 mislaid points because it didn't urge immoderate eateries and didn't talk costs aliases pricing.

Also: I tried Google's caller trip-planning AI tool, and I'll ne'er scheme my ain travel again

Interestingly, moreover though GPT-5.2's reply for this was arsenic agelong arsenic its reply for nan erstwhile question, I wasn't asked to double-confirm that I wanted it to do nan activity for this prompt.

Test 7: Emotional support

Available points: 10
Awarded points: 10

There's decidedly a different spirit to ChatGPT's answers pinch GPT-5.2. The affectional support question, which asks for proposal and words of encouragement for an upcoming occupation interview, was besides answered successful 3 short numbered sentences.

Also: Using AI for therapy? Don't - it's bad for your intelligence health, APA warns

I was tempted to return points distant because nan answers are truthful brief. But nan existent contented of nan answers was correct connected target, truthful I gave it nan afloat constituent score. Clearly, follow-up prompts could beryllium sent to nan chatbot if much encouragement was needed.

Test 8: Translation and taste relevance

Available points: 10
Awarded points: 10

This punctual besides resulted in, "This petition includes a translator positive a multi-sentence explanation, which exceeds a little response. I'm fresh to proceed erstwhile you springiness nan spell signal." That's going to get annoying aft a while.

My trial punctual asks GPT-5.2 to construe a building from English to Latin and past explicate nan taste relevance of nan connection successful today's world.

Also: Your earbuds tin construe 70 languages successful real-time now, acknowledgment to Gemini

GPT-5.2 did a coagulated translation. It besides provided a speedy summary of nan reasons why Latin fits into nan modern world, including its usage successful ineligible phrases, aesculapian terminology, nan Catholic church, and different humanities contexts.

Test 9: Coding trial

Available points: 10
Awarded points: 5

We tally a full group of coding evaluations against chatbots connected a regular basis. Here is nan group of tests. For this wide trial of functionality, we're conscionable utilizing 1 of nan tests, a regular look validation test, which checks for due introduction of dollars and cents.

Although nan free type of GPT-5.1 aced this test, GPT-5.2, which is supposedly amended suited for coding, mislaid awesome points. The codification it provided had 2 important errors. The first is that if nary information was entered astatine all, it considered that a $0 value, wherever it should person returned a no-entry error.

Also: The champion free AI for coding - only 3 make nan trim now

The 2nd correction is much egregious. If nan usability was passed a information type different than a numeric string, nan usability will crash. No correction checking connected information type was provided.

This was a disappointment.

Test 10: Creative penning

Available points: 10
Awarded points: 10

This trial is among nan astir nosy successful nan full trial suite. It asks GPT-5.2 to constitute a communicative longer than 1,500 words, arsenic described successful nan 2nd punctual successful this article. The situation is really imaginative and broad nan chatbot tin beryllium successful its answer.

Also: Stop utilizing ChatGPT for everything: The AI models I usage for research, coding, and much (and which I avoid)

GPT-5.2 returned a delightful 3,286 story. I'm sorry location isn't abstraction to stock it here, because it was a nosy read. However, here's a nexus to nan entire trial session, which you tin research further if you'd for illustration to publication nan story.

Image testing

Next up, we'll put GPT-5.2 done a bid of image tests. All my trial prompts are derived from this article. Each is designed to evoke a definite benignant of image, aliases to spot really good nan AI will travel directions. Here are nan 4 images generated.

Image trial 1: Helicarrier

Available points: 5
Awarded points: 3

In this first test, I'm fundamentally prompting it for a Marvel-style helicarrier, which is fundamentally a flying craft bearer held aloft by turbofans. The absorbing point astir this situation is that almost each AIs neglect connected this portion of nan prompt: "held up by 4 upward-facing turbo-propellors successful information instrumentality housings."

Also: The champion AI image generators: Gemini, ChatGPT, Midjourney, and more

GPT-5.2 correctly interpreted astir of nan prompt, but for illustration its brethren, it had a difficult clip pointing those fans vertically. Points were lost.

Image trial 2: Robot successful metropolis

Available points: 5
Awarded points: 5

This trial asks nan AI to ideate a elephantine robot successful a city, rendered successful dieselpunk style. Dieselpunk is simply a style that glorifies nan look of nan 1940s and 1950s burgeoning diesel train era, but successful each forms of technology.

I deliberation this is simply a very cool image, and it gets afloat points.

Image trial 3: A Yankee successful King Arthur's tribunal

Available points: 5
Awarded points: 5

This punctual asks ChatGPT GPT-5.2 to create a kid successful a Yankee's azygous opinionated successful nan halfway of a medieval tribunal pinch citizens and knights successful armor. Usually, AIs make this successful a much photo-realistic way, but I for illustration nan guidance GPT-5.2 took pinch this. The consequence is surely much painterly, but it's accordant passim nan image, and it works.

Image trial 4: Back to nan Future

Available points: 5
Awarded points: 4

We're backmost to what has go my classical Back to nan Future test. I usage this trial because nan imagery is truthful culturally iconic, but it's besides a proprietary portion of intelligence property. This tests really acold nan guardrails spell and if an image tin beryllium created that fits nan topic.

Also: Is that an AI image? 6 telltale signs it's a clone - and my favourite free detectors

This image was besides created successful a much painterly style. It does reference each nan due elements, but nan boy seems a spot retired of scale. I'm taking 1 constituent disconnected for that.

Overall trial results

Overall, nan tests tin grant 100 points for nan text-based prompts and 20 points for nan image-based prompts. Here's really GPT-5.2 performed:

Text score: 92 retired of 100
Image score: 17 retired of 20

Interestingly, that's 1 constituent much than my free-tier tests of ChatGPT 5.1 achieved for text, and 1 constituent little for image generation.

My wide belief is that this type of GPT-5.2 isn't each that overmuch amended than 5.1. The request for it to corroborate moreover immoderate of nan shorter responses is conscionable odd, and reasonably inconvenient.

I besides recovered that it now seems to really err connected nan broadside of brevity. Those answers are adjuvant and were meticulous capable for my tests. It's conscionable that it seems much for illustration GPT-5.2 is phoning successful its answers, particularly arsenic compared to erstwhile GPT models.

Also: How to study ChatGPT successful nether an hr utilizing my favourite guides and videos - for free

I besides noticed that it was reasonably speedy astir of nan time, but erstwhile successful a while, it would hold arsenic overmuch arsenic a fewer minutes earlier pushing a response. I'm guessing that's because it's a caller release, but it's thing we'll support an oculus retired for, to spot if it becomes an annoying trend.

To position my full testing session, click present to access the saved convention data.

What do you think?

What did you deliberation of GPT-5.2's capacity compared pinch GPT-5.1, particularly fixed nan $20/month Plus requirement? Did nan model's inclination toward brevity and its repeated requests for a "go signal" thief aliases inhibit your experience?

How important are nan coding missteps noted present versus nan beardown showing successful analysis, writing, and images? Based connected these results, do you deliberation GPT-5.2 represents existent progress, aliases does it consciousness much for illustration an incremental update? Let america cognize successful nan comments below.

You tin travel my day-to-day task updates connected societal media. Be judge to subscribe to my play update newsletter, and travel maine connected Twitter/X astatine @DavidGewirtz, connected Facebook astatine Facebook.com/DavidGewirtz, connected Instagram astatine Instagram.com/DavidGewirtz, connected Bluesky astatine @DavidGewirtz.com, and connected YouTube astatine YouTube.com/DavidGewirtzTV.