Gpt-5 Bombed My Coding Tests, But Redeemed Itself With Code Analysis

1 month ago

ZDNET's cardinal takeaways

GPT-5 Pro delivers nan sharpest, astir actionable codification analysis.
A detail-focused punctual tin push guidelines GPT-5 toward Pro results.
o3 remains a beardown contender contempt being a GPT-4 variant.

With nan large news that OpenAI has released GPT-5, nan squad present astatine ZDNET is moving to study astir and pass its strengths and weaknesses. In different article, I put its programming prowess to nan test and came up pinch a less-than-impressive result.

Also: I tested GPT-5's coding skills, and it was truthful bad that I'm sticking pinch GPT-4o

When Deep Research first appeared pinch nan OpenAI o3 LLM, I was quite impressed pinch what it could understand from examining a codification repository. I wanted to cognize really good it understood nan task conscionable from nan disposable code.

In this article, I'm examining really good nan 3 GPT-5 variants do successful examining that aforesaid codification repository. We'll excavation successful and comparison them. The results are rather interesting. Here are nan 4 models.

o3: a GPT-4 version optimized for reasoning.
GPT-5: OpenAI's caller main ChatGPT model, disposable to each tiers, including free.
GPT-5 Thinking: A version of GPT-5 that OpenAI says is optimized for "architectural reflection." It is disposable to $20/mo Plus and $200/mo Pro tiers.
GPT-5 Pro: OpenAI's existent $200/mo top-tier model, pinch nan highest reasoning and discourse capabilities.

I gave each 4 models nan aforesaid assignment. I connected them to my backstage GitHub repository for my open-source free WordPress information plugin and its freemium add-on modules, selected Deep Research, and gave them this prompt.

Examine nan repository and study its building and architecture. Then study backmost what you've learned.

For those models that asked to take areas of item astir what I wanted, I gave them this prompt.

Everything you tin show me, beryllium arsenic broad arsenic possible.

As you tin see, I didn't supply immoderate discourse different than nan root codification repo itself. That codification has a README file, arsenic good arsenic comments passim nan code, truthful location was immoderate English-language context. But astir of nan discourse has to beryllium derived from nan files structure, record names, and codification itself.

Also: The champion AI for coding successful 2025 (and what not to use)

From that, I hoped that nan AIs would measure its structure, quality, information posture, extensibility, and perchance propose improvements. This should beryllium applicable to ZDNET readers because it's nan benignant of high-judgement, detail-oriented activity that AIs are being utilized for. It surely tin make coming up to velocity connected an existing coding task easier, aliases astatine slightest supply a instauration for first understanding.

TL;DR summary

Other than nan 2 prompts above, I didn't springiness nan LLMs immoderate guidance astir what to show me. I wanted to spot really they evaluated nan repository and what benignant of study they could provide.

As you tin spot from this table, wide sum was rather varied successful scope. More checks mean much extent of coverage.

To create this aggregate, topics for illustration "Project Purpose & Architecture," "System Architecture," and "Plugin Design & Integration" were each normalized nether Purpose/Architecture. Directory/File Structure contained immoderate conception mapping folders and files. Execution travel combines thing astir really nan package codification runs. Recommendations/Issues combines each discussions of modernization suggestions, unfastened issues, and insignificant reddish flags.

In position of wide value, I'd rank nan 4 LLMs arsenic follows (from champion to slightest best).

GPT-5 Pro: Most precise, engineering-ready, and actionable.
GPT-5: Widest scope, fantabulous mapping, and defensive-coding insight.
o3: Concise, modernization-focused, but lighter connected underlying architecture.
GPT-5 Thinking: Best onboarding narrative, slightest evaluative depth.

Pro, of course, is only disposable successful nan $200/mo ChatGPT Pro tier. Later successful this article, I'll show 1 measurement to modify nan supra prompts to get GPT-5 (non-Pro) to supply a reasonably adjacent approximation of nan wide extent of nan Pro response.

GPT-5 Thinking, which is simply a exemplary disposable successful nan $20/mo Plus plan, was nan slightest adjuvant of nan group. The GPT-4 procreation o3 Deep Thinking exemplary still holds up, but you tin spot really its self-directed attraction is simply a spot different from nan different two.

Also: Google's Jules AI coding supplier built a caller characteristic I could really vessel - while I made coffee

My main conclusion is that I was a spot amazed astir really adjacent nan models were to each other. GPT-5, arsenic OpenAI promised, did look to supply a jump successful wide cognition and usefulness, but thing I would see game-changing.

With that, let's dive into immoderate circumstantial examples that thief exemplify my conclusions. Each of these sections is pulled from nan various reports generated and shows you really each exemplary provided akin information.

Security posture, according to nan models

Below, you tin spot really GPT-5 Pro names nonstop mechanisms (like file-top guard, nonces, manage_options). I've provided nonstop snippets present from nan reports generated by each 4 models. GPT-5 affirms champion practices but keeps it conceptual. o3 describes what happens (redirects/login flow) much than really it's hardened. GPT-5 Thinking gives nan clearest "what runs when" communicative for caller developers coming up to velocity connected nan repo.

GPT-5 Pro: Most concrete, code-level

"It guards against nonstop record entree (if (!defined('ABSPATH')) exit; astatine nan apical of PHP files). It sanitizes input wherever due … utilizing WordPress nonces successful AJAX handlers. It uses capacity checks (e.g., adding paper pages only for users pinch manage_options capacity … only admins spot those tools). The codification tries not to load unnecessary things … for illustration only loading definite admin files connected definite contexts."

GPT-5: Correct, but higher-level

"There are checks for WordPress functions earlier usage … truthful nan plugin behaves gracefully moreover connected very aged WordPress setups. The plugins often defender against nonstop record entree by checking … to forestall information issues from accessing .php files directly. Add-ons verify nan beingness of halfway earlier proceeding … and show an admin correction if CMB2 isn't loaded."

o3: Runtime behavior, little connected hardening specifics

"Purpose: My Private Site locks down an full tract truthful only logged-in users tin position contented … while protecting nan rest. Overall architecture: [it] integrates profoundly pinch WordPress's hook strategy and login/logout events to negociate redirects and way login state."

GPT-5 Thinking: Clear execution flow, onboarding reside

"Admin vs Front-end: It checks is_admin() to find context. If connected nan front-end (not admin), it retrieves nan saved privateness mounting and, erstwhile enabled, hooks astatine a constituent for illustration template_redirect to redirect unauthorized visitors. Throughout this initialization, nan plugin uses WordPress hooks (actions and filters) to merge functionality."

Licensing and update mechanism, according to nan models

GPT-5 Pro didn't conscionable picture nan system; it walked done nan process successful sequential operational steps, almost for illustration a short runbook you could manus to a developer aliases QA tester. GPT-5 confirms nan architecture but abstracts nan plumbing. GPT-5 Thinking adds a adjuvant "how add-ons plug into nan Licenses tab" detail. o3 mostly leaves licensing internals connected nan cutting room level successful favour of a reasonably unhelpful modernization critique.

GPT-5 Pro: Explains it step-by-step

"The halfway plugin provides inferior functions to get and shop licence keys successful a centralized action (jr_ps_licenses) and to interaction nan EDD licence server for validation. Each hold plugin defines its ain updater utilizing EDD_SL_Plugin_Updater, passing nan existent version, nan licence cardinal from nan centralized store, and nan EDD shop URL. The halfway plugin's UI has a 'Licenses' tab, and extensions inject their ain licence fields via filters."

GPT-5: Conceptual, but meticulous

"License integration: The halfway plugin centralizes licence guidance … and nan add-ons piggyback connected nan core's licensing mechanism, integrating their licence fields into nan halfway plugin's interface."

o3: Barely mentions this taxable astatine each

The o3 study spends astir of its clip connected modernization and architecture. It discusses configuration and update behaviour but does not locomotion done action keys, updater classes, aliases nan Licenses UI wiring pinch nan aforesaid procedural item arsenic GPT-5 and GPT-5 Pro. So there's thing present to quote arsenic a demonstration.

GPT-5 Thinking: Good UI and extensibility study

"The add-ons heavy trust connected hooks provided by halfway aliases WordPress: They usage add_filter/add_action calls to insert their logic … and usage WordPress action hooks to merge their licence fields into nan Licenses tab that nan halfway plugin triggers erstwhile building nan Licenses tab."

State management, according to nan models

Both GPT-5 Pro and GPT-5 explicitly pointed retired really my codification uses "one action array + prune + no-op writes," which is simply a WordPress champion believe for codification maintainability. Both o3 and GPT-5 Thinking picture nan lifecycle and effects (what's initialized, what loads when) alternatively than nan nonstop action structure.

GPT-5 Pro: Looks astatine circumstantial retention shape

"Settings are stored successful a azygous serialized action … initialization routines adhd default keys, prune deprecated ones, and only update nan action successful nan database if location is an existent change, avoiding unnecessary writes."

GPT-5: Also looks astatine retention pattern, but much mostly

"State Management: Plugin settings are stored successful WordPress options arsenic a cardinal settings array and nan codification ensures defaults are applied while removing deprecated ones connected each load, but only writes to nan database erstwhile changes occur."

o3: Identifies intent and behavior, but doesn't talk internals

"The main plugin initializes defaults (installed version, first-run timestamp, etc.). On each tally it ensures these options beryllium and, if nan privateness characteristic is disabled, nan enforcement hook is not added."

GPT-5 Thinking: Discusses basal travel and modules

"Module includes: includes admin and communal modules successful nan back-end; connected nan front-end it retrieves nan saved privateness mounting and, erstwhile enabled, loads enforcement logic (e.g., successful template_redirect). It registers a deactivation hook to cleanable up connected deactivation (e.g., deleting a emblem option)."

What does this mean for GPT-5?

I was unimpressed pinch GPT-5 erstwhile it came to my coding tests. It grounded half of my tests, an unprecedentedly bad consequence for what has antecedently been nan golden modular successful passing coding tests.

But GPT-5 was rather awesome successful its study of nan GitHub repository. It could beryllium a powerful instrumentality for onboarding caller programmers, for personification adopting code, aliases simply for coming backmost up to velocity connected a task that's been untouched for a while.

Also: How I trial an AI chatbot's coding expertise - and you can, too

The GPT-4 procreation o3 exemplary is known to beryllium a beardown reasoning model, which is why it has been nan ground for ChatGPT Deep Research. But GPT-5 was capable to harvester some breadth and detail, which is wherever o3 and GPT-4o were anemic successful erstwhile tests.

The older models did springiness meticulous summaries and useful suggestions, but they missed interconnections. For example, nan older models were ne'er capable to show really UI flows, licensing, and update mechanisms activity together.

Even nan guidelines type of GPT-5 was capable to place cross-cutting concerns without further prompting. Repository structure, backward compatibility, capacity characteristics, and authorities guidance patterns each appeared successful nan first draft. Trying to get GPT-4 to span subjects is often an workout successful heavy frustration.

I recovered GPT-5's expertise to understand and explicate a analyzable interconnected strategy for illustration my information product, each successful 1 pass, to beryllium a important betterment complete nan GPT-4 generation.

Is GPT-5 Pro worthy $200/mo?

Maybe. If you're successful a existent unreserved to get to cognize a task and want arsenic overmuch of a information dump arsenic imaginable arsenic quickly arsenic possible, yes. If you're operating connected a large programming fund and $200/mo doesn't matter to you, yes.

But I find that costs difficult to bear, particularly erstwhile I person to subscribe to a wide scope of AI services to measure them. So, now that I'm nearing nan extremity of my one-month trial of Pro-level activities, I'm readying connected downgrading backmost to nan $20/mo Plus plan.

Also: How to usage GPT-5 successful VS Code pinch GitHub Copilot

Pro's separator complete GPT-5 wasn't astir knowing much facts; it was astir delivering those facts successful a shape you tin enactment connected immediately. The Pro study didn't conscionable explicate that information looked good; it cited nan nonstop guards and checks successful nan code. It didn't conscionable opportunity licensing was centralized; it mapped nan nonstop functions and database options involved.

Again, if you're connected a clip crunch, you mightiness see Pro. But I besides deliberation you tin modify nan guidelines GPT-5's responses, pinch item for illustration nan Pro study produced, simply by utilizing amended prompting.

That's next…

How to get Pro-level results from guidelines GPT-5

I fed some nan GPT-5 and GPT-5 Pro reports into GPT-5 and asked it for a punctual that would push nan base-level GPT-5 to springiness GPT-5 Pro comprehensiveness arsenic a result. This is that prompt, which you should adhd to immoderate query wherever you want much complete coding information:

*High-Specificity Technical Mode: ***In your answer, harvester complete high-level sum pinch exhaustive implementation-level detail.

Always sanction nonstop constants, functions, classes, hooks, action names, database tables, record paths, and build devices wherever possible, quoting them precisely from nan codification aliases worldly provided.
For each claim, explicate why it's existent and really you tin show (include reasoning tied to nan evidence).
For each betterment you suggest, make it actionable and reference wherever successful nan codebase it applies.
Do not generalize erstwhile specifics are available.
Structure nan output truthful a developer could usage it straight to verify findings aliases instrumentality recommendations.

This worked fantastically well. It took ChatGPT GPT-5 12 minutes to nutrient a 15,477-word document, complete pinch study and coding blocks. For example, it describes really worth initialization is done, and past shows nan codification that accomplishes it.

I deliberation you could fine-tune this punctual and get Pro-level results without having to salary nan $200/mo fee. I'm surely going to tinker pinch this idea, perchance utilizing GPT-5 to refine nan specifications successful nan punctual for different areas I want to delve profoundly into. I'll fto you cognize really it goes.

See for yourself

I had immoderate trouble mounting up sharing for each of these agelong reports, truthful I conscionable copied nan results into Google Docs and shared them. Here are nan links if you want to look astatine immoderate of these reports.

o3 Deep Research
GPT-5 Deep Research
GPT-5 Thinking Deep Research
GPT-5 Pro Deep Research
GPT-5 Deep Research pinch Detail Prompt

You are invited to excavation into these documents and study really my task is structured. While you whitethorn aliases whitethorn not attraction astir my project, it's instructive to spot really nan various models perform. While you tin publication nan reports, my existent repo is restricted since it's my backstage improvement repository.

What astir you? Have you tried utilizing GPT-5 aliases GPT-5 Pro to analyse your ain code? How did its insights comparison to earlier models for illustration GPT-4 aliases o3? Do you deliberation nan $200/month Pro tier is worthy it for nan other precision, aliases could you get by pinch amended prompts successful nan guidelines version? Have you recovered AI codification study useful for onboarding, refactoring, aliases improving security? Let america cognize successful nan comments below.

You tin travel my day-to-day task updates connected societal media. Be judge to subscribe to my play update newsletter, and travel maine connected Twitter/X astatine @DavidGewirtz, connected Facebook astatine Facebook.com/DavidGewirtz, connected Instagram astatine Instagram.com/DavidGewirtz, connected Bluesky astatine @DavidGewirtz.com, and connected YouTube astatine YouTube.com/DavidGewirtzTV.