Inside Openai’s Quest To Make Ai Do Anything For You

6 hours ago

Shortly aft Hunter Lightman joined OpenAI arsenic a interrogator successful 2022, he watched his colleagues motorboat ChatGPT, 1 of nan fastest-growing products ever. Meanwhile, Lightman softly worked connected a squad school OpenAI’s models to lick precocious schoolhouse mathematics competitions.

Today that team, known arsenic MathGen, is considered instrumental to OpenAI’s industry-leading effort to create AI reasoning models: nan halfway exertion down AI agents that tin do tasks connected a machine for illustration a quality would.

“We were trying to make nan models amended astatine mathematical reasoning, which astatine nan clip they weren’t very bully at,” Lightman told TechCrunch, describing MathGen’s early work.

OpenAI’s models are acold from cleanable coming — nan company’s latest AI systems still hallucinate and its agents struggle pinch analyzable tasks.

But its state-of-the-art models person improved importantly connected mathematical reasoning. One of OpenAI’s models precocious won a gold badge at nan International Math Olympiad, a mathematics title for nan world’s brightest precocious schoolhouse students. OpenAI believes these reasoning capabilities will construe to different subjects, and yet powerfulness general-purpose agents that nan institution has ever dreamed of building.

ChatGPT was a happy mishap — a lowkey investigation preview turned viral user business — but OpenAI’s agents are nan merchandise of a years-long, deliberate effort wrong nan company.

“Eventually, you’ll conscionable inquire nan machine for what you request and it’ll do each of these tasks for you,” said OpenAI CEO Sam Altman astatine nan company’s first developer conference successful 2023. “These capabilities are often talked astir successful nan AI section arsenic agents. The upsides of this are going to beryllium tremendous.”

Techcrunch event

San Francisco | October 27-29, 2025

OpenAI CEO Sam Altman speaks during nan OpenAI DevDay arena connected November 06, 2023 successful San Francisco, California.

Whether agents will meet Altman’s imagination remains to beryllium seen, but OpenAI shocked nan world pinch nan merchandise of its first AI reasoning model, o1, successful nan autumn of 2024. Less than a twelvemonth later, nan 21 foundational researchers down that breakthrough are nan astir highly sought-after talent successful Silicon Valley.

Mark Zuckerberg recruited 5 of nan o1 researchers to activity connected Meta’s caller superintelligence-focused unit, offering immoderate compensation packages northbound of $100 million. One of them, Shengjia Zhao, was precocious named main intelligence of Meta Superintelligence Labs.

The reinforcement learning renaissance

The emergence of OpenAI’s reasoning models and agents are tied to a instrumentality learning training method known arsenic reinforcement learning (RL). RL provides feedback to an AI exemplary connected whether its choices were correct aliases not successful simulated environments.

RL has been utilized for decades. For instance, successful 2016, astir a twelvemonth aft OpenAI was founded successful 2015, an AI strategy created by Google DeepMind utilizing RL, AlphaGo, gained world attraction aft beating a world champion successful nan committee game, Go.

Around that time, 1 of OpenAI’s first employees, Andrej Karpathy, began pondering really to leverage RL to create an AI supplier that could usage a computer. But it would return years for OpenAI to create nan basal models and training techniques.

By 2018, OpenAI pioneered its first ample connection exemplary successful nan GPT series, pretrained connected monolithic amounts of net information and a ample clusters of GPUs. GPT models excelled astatine matter processing, yet starring to ChatGPT, but struggled pinch basal math.

It took until 2023 for OpenAI to execute a breakthrough, initially dubbed “Q*” and past “Strawberry,” by combining LLMs, RL, and a method called test-time computation. The second gave nan models other clip and computing powerfulness to scheme and activity done problems, verifying its steps, earlier providing an answer.

This allowed OpenAI to present a caller attack called “chain-of-thought” (CoT), which improved AI’s capacity connected mathematics questions nan models hadn’t seen before.

“I could spot nan exemplary starting to reason,” said El Kishky. “It would announcement mistakes and backtrack, it would get frustrated. It really felt for illustration reference nan thoughts of a person.”

Though individually these techniques weren’t novel, OpenAI uniquely mixed them to create Strawberry, which straight led to nan improvement of o1. OpenAI quickly identified that nan readying and truth checking abilities of AI reasoning models could beryllium useful to powerfulness AI agents.

“We had solved a problem that I had been banging my caput against for a mates of years,” said Lightman. “It was 1 of nan astir breathtaking moments of my investigation career.”

Scaling reasoning

With AI reasoning models, OpenAI wished it had 2 caller axes that would let it to amended AI models: utilizing much computational powerfulness during nan post-training of AI models, and giving AI models much clip and processing powerfulness while answering a question.

“OpenAI, arsenic a company, thinks a batch astir not conscionable nan measurement things are, but nan measurement things are going to scale,” said Lightman.

Shortly aft nan 2023 Strawberry breakthrough, OpenAI spun up an “Agents” squad led by OpenAI interrogator Daniel Selsam to make further advancement connected this caller paradigm, 2 sources told TechCrunch. Although nan squad was called “Agents,” OpenAI didn’t initially differentiate betwixt reasoning models and agents arsenic we deliberation of them today. The institution conscionable wanted to make AI systems tin of completing analyzable tasks.

Eventually, nan activity of Selsam’s Agents squad became portion of a larger task to create nan o1 reasoning model, pinch leaders including OpenAI co-founder Ilya Sutskever, main investigation serviceman Mark Chen, and main intelligence Jakub Pachocki.

Ilya Sutskever, Russian Israeli-Canadian machine intelligence and co-founder and Chief Scientist of OpenAI.

OpenAI would person to divert precious resources — chiefly talent and GPUs — to create o1. Throughout OpenAI’s history, researchers person had to discuss pinch institution leaders to get resources; demonstrating breakthroughs was a surefire measurement to unafraid them.

“One of nan halfway components of OpenAI is that everything successful investigation is bottommost up,” said Lightman. “When we showed nan grounds [for o1], nan institution was like, ‘This makes sense, let’s push connected it.’”

Some erstwhile labor opportunity that nan startup’s ngo to create AGI was nan cardinal facet successful achieving breakthroughs astir AI reasoning models. By focusing connected processing nan smartest-possible AI models, alternatively than products, OpenAI was capable to prioritize o1 supra different efforts. That type of ample finance successful ideas wasn’t ever imaginable astatine competing AI labs.

The determination to effort caller training methods proved prescient. By precocious 2024, respective starring AI labs started seeing diminishing returns connected models created done accepted pretraining scaling. Today, overmuch of nan AI field’s momentum comes from advances successful reasoning models.

What does it mean for an AI to “reason?”

In galore ways, nan extremity of AI investigation is to recreate quality intelligence pinch computers. Since nan motorboat of o1, ChatGPT’s UX has been filled pinch much human-sounding features specified arsenic “thinking” and “reasoning.”

When asked whether OpenAI’s models were genuinely reasoning, El Kishky hedged, saying he thinks astir nan conception successful position of machine science.

“We’re school nan exemplary really to efficiently expend compute to get an answer. So if you specify it that way, yes, it is reasoning,” said El Kishky.

Lightman takes nan attack of focusing connected nan model’s results and not arsenic overmuch connected nan intends aliases their narration to quality brains.

The OpenAI logo connected surface astatine their developer time stage. (Credit: Devin Coldeway)Image Credits:Devin Coldewey

“If nan exemplary is doing difficult things, past it is doing immoderate basal approximation of reasoning it needs successful bid to do that,” said Lightman. “We tin telephone it reasoning, because it looks for illustration these reasoning traces, but it’s each conscionable a proxy for trying to make AI devices that are really powerful and useful to a batch of people.”

OpenAI’s researchers statement group whitethorn disagree pinch their nomenclature aliases definitions of reasoning — and surely, critics person emerged — but they reason it’s little important than nan capabilities of their models. Other AI researchers thin to agree.

Nathan Lambert, an AI interrogator pinch nan non-profit AI2, compares AI reasoning modes to airplanes successful a blog post. Both, he says, are manmade systems inspired by quality — quality reasoning and vertebrate flight, respectively — but they run done wholly different mechanisms. That doesn’t make them immoderate little useful, aliases immoderate little tin of achieving akin outcomes.

A group of AI researchers from OpenAI, Anthropic, and Google DeepMind agreed successful a caller position paper that AI reasoning models are not good understood today, and much investigation is needed. It whitethorn beryllium excessively early to confidently declare what precisely is going connected wrong them.

The adjacent frontier: AI agents for subjective tasks

The AI agents connected nan marketplace coming activity champion for well-defined, verifiable domains specified arsenic coding. OpenAI’s Codex agent intends to thief package engineers offload elemental coding tasks. Meanwhile, Anthropic’s models person go peculiarly popular successful AI coding devices for illustration Cursor and Claude Code — these are immoderate of nan first AI agents that group are consenting to pay up for.

However, wide intent AI agents for illustration OpenAI’s ChatGPT Agent and Perplexity’s Comet struggle pinch galore of nan complex, subjective tasks group want to automate. When trying to usage these devices for online shopping aliases uncovering a semipermanent parking spot, I’ve recovered nan agents return longer than I’d for illustration and make silly mistakes.

Agents are, of course, early systems that will undoubtedly improve. But researchers must first fig retired really to amended train nan underlying models to complete tasks that are much subjective.

“Like galore problems successful instrumentality learning, it’s a information problem,” said Lightman, erstwhile asked astir nan limitations of agents connected subjective tasks. “Some of nan investigation I’m really excited astir correct now is figuring retired really to train connected little verifiable tasks. We person immoderate leads connected really to do these things.”

Noam Brown, an OpenAI interrogator who helped create nan IMO exemplary and o1, told TechCrunch that OpenAI has caller general-purpose RL techniques which let them to thatch AI models skills that aren’t easy verified. This was really nan institution built nan exemplary which achieved a golden badge astatine IMO, he said.

OpenAI’s IMO exemplary was a newer AI strategy that spawns aggregate agents, which past simultaneously research respective ideas, and past take nan champion imaginable answer. These types of AI models are becoming much popular; Google and xAI person precocious released state-of-the-art models utilizing this technique.

“I deliberation these models will go much tin astatine math, and I deliberation they’ll get much tin successful different reasoning areas arsenic well,” said Brown. “The advancement has been incredibly fast. I don’t spot immoderate logic to deliberation it will slow down.”

These techniques whitethorn thief OpenAI’s models go much performant, gains that could show up successful nan company’s upcoming GPT-5 model. OpenAI hopes to asseverate its power complete competitors pinch nan motorboat of GPT-5, ideally offering nan best AI model to powerfulness agents for developers and consumers.

But nan institution besides wants to make its products simpler to use. El Kishky says OpenAI wants to create AI agents that intuitively understand what users want, without requiring them to prime circumstantial settings. He says OpenAI intends to build AI systems that understand erstwhile to telephone up definite tools, and really agelong to logic for.

These ideas overgarment a image of an eventual type of ChatGPT: an supplier that tin do thing connected nan net for you, and understand really you want it to beryllium done. That’s a overmuch different merchandise than what ChatGPT is today, but nan company’s investigation is squarely headed successful this direction.

While OpenAI undoubtedly led nan AI manufacture a fewer years ago, nan institution now faces a tranche of worthy opponents. The mobility is nary longer conscionable whether OpenAI tin present its agentic future, but tin nan institution do truthful earlier Google, Anthropic, xAI, aliases Meta hit them to it?