OpenAI built an AI coding agent and uses it to improve the agent itself

As an Amazon Associate I earn from qualifying purchases.

Table of Contents

OpenAI constructed an AI coding representative and utilizes it to enhance the representative itself

“The large bulk of Codex is developed by Codex,” OpenAI informed us about its brand-new AI coding representative.

With the appeal of AI coding tools increasing amongst some software application designers, their adoption has actually started to touch every element of the procedure, consisting of the enhancement of AI coding tools themselves.

In interviews with Ars Technica today, OpenAI workers exposed the degree to which the business now depends on its own AI coding representative, Codex, to develop and enhance the advancement tool. “I believe the huge bulk of Codex is developed by Codex, so it’s nearly totally simply being utilized to enhance itself,” stated Alexander Embiricos, item lead for Codex at OpenAI, in a discussion on Tuesday.

Codex, which OpenAI released in its contemporary version as a research study sneak peek in May 2025, runs as a cloud-based software application engineering representative that can manage jobs like composing functions, repairing bugs, and proposing pull demands. The tool runs in sandboxed environments connected to a user’s code repository and can perform several jobs in parallel. OpenAI provides Codex through ChatGPT’s web user interface, a command-line user interface (CLI), and IDE extensions for VS Code, Cursor, and Windsurf.

The “Codex” name itself goes back to a 2021 OpenAI design based upon GPT-3 that powered GitHub Copilot’s tab conclusion function. Embiricos stated the name is reported amongst personnel to be brief for “code execution.” OpenAI wished to link the brand-new representative to that earlier minute, which was crafted in part by some who have actually left the business.

“For many individuals, that design powering GitHub Copilot was the very first ‘wow’ minute for AI,” Embiricos stated. “It revealed individuals the capacity of what it can suggest when AI has the ability to comprehend your context and what you’re attempting to do and accelerate you in doing that.”

The user interface for OpenAI’s Codex in ChatGPT.

Credit: OpenAI

It’s clear that the present command-line variation of Codex bears some similarity to Claude Code, Anthropic’s agentic coding tool that released in February 2025. When asked whether Claude Code affected Codex’s style, Embiricos parried the concern however acknowledged the competitive dynamic. “It’s an enjoyable market to operate in since there’s great deals of terrific concepts being tossed around,”he stated. He kept in mind that OpenAI had actually been constructing web-based Codex functions internally before delivering the CLI variation, which got here after Anthropic’s tool.

OpenAI’s clients obviously like the command line variation. Embiricos stated Codex use amongst external designers leapt 20 times after OpenAI delivered the interactive CLI extension together with GPT-5 in August 2025. On September 15, OpenAI launched GPT-5 Codex, a specialized variation of GPT-5 enhanced for agentic coding, which even more sped up adoption.

It hasn’t simply been the outdoors world that has actually accepted the tool. Embiricos stated the large bulk of OpenAI’s engineers now utilize Codex routinely. The business utilizes the exact same open-source variation of the CLI that external designers can easily download, recommend additions to, and customize themselves. “I actually like this about our group,” Embiricos stated. “The variation of Codex that we utilize is actually the open source repo. We do not have a various repo that includes enter.”

The recursive nature of Codex advancement extends beyond basic code generation. Embiricos explained circumstances where Codex monitors its own training runs and procedures user feedback to “choose” what to develop next. “We have locations where we’ll ask Codex to take a look at the feedback and after that choose what to do,” he stated. “Codex is composing a great deal of the research study harness for its own training runs, and we’re try out having Codex monitoring its own training runs.” OpenAI staff members can likewise send a ticket to Codex through job management tools like Linear, designating it jobs the exact same method they would designate work to a human associate.

This sort of recursive loop, of utilizing tools to develop much better tools, has deep roots in calculating history. Engineers created the very first incorporated circuits by hand on skin and paper in the 1960s, then produced physical chips from those illustrations. Those chips powered the computer systems that ran the very first electronic style automation (EDA) software application, which in turn made it possible for engineers to create circuits far too intricate for any human to prepare by hand. Modern processors consist of billions of transistors set up in patterns that exist just due to the fact that software application made them possible. OpenAI’s usage of Codex to construct Codex appears to follow the exact same pattern: each generation of the tool develops abilities that feed into the next.

Explaining what Codex in fact does presents something of a linguistic obstacle. At Ars Technica, we attempt to decrease anthropomorphism when going over AI designs as much as possible while likewise explaining what these systems do utilizing examples that make good sense to basic readers. Individuals can speak to Codex like a human, so it feels natural to utilize human terms to explain communicating with it, although it is not an individual and imitates human character through analytical modeling.

The system runs numerous procedures autonomously, addresses feedback, spins off and handles kid procedures, and produces code that ships in genuine items. OpenAI staff members call it a “colleague” and appoint it jobs through the very same tools they utilize for human coworkers. Whether the jobs Codex deals with make up “choices” or advanced conditional reasoning smuggled through a neural network depends upon meanings that computer system researchers and theorists continue to dispute. What we can state is that a semi-autonomous feedback loop exists: Codex produces code under human instructions, that code enters into Codex, and the next variation of Codex produces various code as an outcome.

Structure much faster with “AI colleagues”

According to our interviews, the most significant example of Codex’s internal effect originated from OpenAI’s advancement of the Sora Android app. According to Embiricos, the advancement tool enabled the business to produce the app in record time.

“The Sora Android app was delivered by 4 engineers from scratch,” Embiricos informed Ars. “It took 18 days to develop, and after that we delivered it to the app shop in 28 days overall,” he stated. The engineers currently had the iOS app and server-side elements to work from, so they concentrated on developing the Android customer. They utilized Codex to assist prepare the architecture, create sub-plans for various elements, and carry out those parts.

Regardless of OpenAI’s claims of success with Codex in home, it’s worth keeping in mind that independent research study has actually revealed combined outcomes for AI coding efficiency. A METR research study released in July discovered that knowledgeable open source designers were in fact 19 percent slower when utilizing AI tools on complex, fully grown codebases– though the scientists kept in mind AI might carry out much better on easier jobs.

Ed Bayes, a designer on the Codex group, explained how the tool has actually altered his own workflow. Bayes stated Codex now incorporates with task management tools like Linear and interaction platforms like Slack, permitting employee to appoint coding jobs straight to the AI representative. “You can include Codex, and you can essentially appoint concerns to Codex now,” Bayes informed Ars. “Codex is actually a colleague in your office.”

This combination indicates that when somebody posts feedback in a Slack channel, they can tag Codex and ask it to repair the problem. The representative will produce a pull demand, and employee can examine and repeat on the modifications through the very same thread. “It’s generally estimating this sort of colleague and appearing any place you work,” Bayes stated.

For Bayes, who deals with the visual style and interaction patterns for Codex’s user interfaces, the tool has actually allowed him to contribute code straight instead of handing off requirements to engineers. “It type of offers you more utilize. It allows you to work throughout the stack and generally have the ability to do more things,” he stated. He kept in mind that designers at OpenAI now model functions by constructing them straight, utilizing Codex to manage the execution information.

The command line variation of OpenAI codex running in a macOS terminal window.

Credit: Benj Edwards

OpenAI’s technique deals with Codex as what Bayes called”a junior designer “that the business hopes will finish into a senior designer in time.”If you were onboarding a junior designer, how would you onboard them? You provide a Slack account, you provide a Linear account, “Bayes stated. “It’s not simply this tool that you go to in the terminal, however it’s something that pertains to you also and sits within your group. “

Provided this colleague technique, will there be anything left for human beings to do? When asked, Embiricos drew a difference in between “ambiance coding,” where designers accept AI-generated code without close evaluation, and what AI scientist Simon Willison calls “ambiance engineering,” where human beings remain in the loop. “We see a lot more ambiance engineering in our code base,” he stated. “You ask Codex to work on that, possibly you even ask for a strategy. Go back and forth, repeat on the strategy, and after that you’re in the loop with the design and thoroughly evaluating its code.”

He included that ambiance coding still fits for models and throwaway tools. “I believe ambiance coding is excellent,” he stated. “Now you have discretion as a human about just how much attention you wan na pay to the code.”

Looking ahead

Over the previous year, “monolithic” big language designs (LLMs) like GPT-4.5 have actually obviously ended up being something of a dead end in regards to frontier benchmarking development as AI business pivot to simulated thinking designs and likewise agentic systems constructed from numerous AI designs running in parallel. We asked Embiricos whether representatives like Codex represent the very best course forward for squeezing energy out of existing LLM innovation.

He dismissed issues that AI abilities have actually plateaued. “I believe we’re really far from plateauing,” he stated. “If you take a look at the speed on the research study group here, we’ve been delivering designs nearly weekly or every other week.” He indicated current enhancements where GPT-5-Codex supposedly finishes jobs 30 percent quicker than its predecessor at the very same intelligence level. Throughout screening, the business has actually seen the design work separately for 24 hours on complicated jobs.

OpenAI deals with competitors from several instructions in the AI coding market. Anthropic’s Claude Code and Google’s Gemini CLI deal comparable terminal-based agentic coding experiences. Today, Mistral AI launched Devstral 2 together with a CLI tool called Mistral Vibe. Start-ups like Cursor have actually developed devoted IDEs around AI coding, supposedly reaching $300 million in annualized earnings.

Provided the popular problems with confabulation in AI designs when individuals try to utilize them as accurate resources, could it be that coding has ended up being the killer app for LLMs? We questioned if OpenAI has actually discovered that coding appears to be a clear service usage case for today’s AI designs with less risk than, state, utilizing AI language designs for composing or as psychological buddies.

“We have actually definitely seen that coding is both a location where representatives are gon na get great truly quick and there’s a great deal of financial worth,” Embiricos stated. “We seem like it’s extremely mission-aligned to concentrate on Codex. We get to offer a great deal of worth to designers. Designers develop things for other individuals, so we’re kind of fundamentally scaling through them.”

Will tools like Codex threaten software application designer tasks? Bayes acknowledged issues however stated Codex has actually not minimized headcount at OpenAI, and “there’s constantly a human in the loop due to the fact that the human can really check out the code.” The 2 males do not forecast a future where Codex runs by itself without some type of human oversight. They feel the tool is an amplifier of human prospective instead of a replacement for it.

The useful ramifications of representatives like Codex extend beyond OpenAI’s walls. Embiricos stated the business’s long-lasting vision includes making coding representatives helpful to individuals who have no programs experience. “All humankind is not gon na open an IDE or perhaps understand what a terminal is,” he stated. “We’re developing a coding representative today that’s simply for software application engineers, however we consider the shape of what we’re constructing as truly something that will work to be a more basic representative.”

This short article was upgraded on December 12, 2025 at 6:50 PM to discuss the METR research study.

Benj Edwards is Ars Technica’s Senior AI Reporter and creator of the website’s devoted AI beat in 2022. He’s likewise a tech historian with nearly twenty years of experience. In his spare time, he composes and tape-records music, gathers classic computer systems, and takes pleasure in nature. He resides in Raleigh, NC.

82 Comments