GPT-5.1 Brings Better Steerability and Codex Models for Agentic Coding

Q: What are the GPT-5.1 Codex models?

Three dedicated coding models: `gpt-5.1-codex` (standard), `gpt-5.1-codex-mini` (faster/cheaper), and `gpt-5.1-codex-max` (maximum compute). They're optimized for OpenAI's Codex agentic coding product.

OpenAI's GPT-5.1 improves instruction following and code generation. Dedicated Codex models ship alongside enhanced RBAC and extended prompt caching up to 24 hours.

Maya Johnson

Thursday, November 13, 2025·3 min read

OpenAI released GPT-5.1 on November 13, 2025, focusing on steerability — the model's ability to follow complex, multi-part instructions precisely. Alongside the main model, OpenAI shipped gpt-5.1-codex and gpt-5.1-codex-mini, purpose-built for agentic coding tasks, according to OpenAI's changelog.

Steerability Over Raw Power

Rather than chasing benchmark records, GPT-5.1 prioritized instruction following. The model defaults to none reasoning — no chain-of-thought by default — which makes responses faster and cheaper for tasks that don't require deep thinking. Developers enable reasoning explicitly when needed.

This is a pragmatic choice. Most production API calls don't need frontier-level reasoning; they need the model to do exactly what it's told, quickly and reliably.

Codex Models

gpt-5.1-codex and gpt-5.1-codex-mini are dedicated coding variants. They're optimized for OpenAI's Codex product — the agentic coding tool that competes with Anthropic's Claude Code and Cursor.

A gpt-5.1-codex-max followed on December 4, offering maximum compute for the hardest coding problems. Three tiers of coding model — matching Anthropic's approach of optimizing for different complexity levels.

Infrastructure Updates

Two enterprise features shipped alongside the model. Enhanced RBAC (Role-Based Access Control) capabilities let organizations manage API access at a granular level. Extended prompt cache retention — up to 24 hours with GPU-local storage offloading — dramatically reduces costs for applications that reuse similar prompts.

The 24-hour cache is significant. It means a customer service application that handles similar queries throughout the day pays the full input cost only once, then benefits from cached pricing for subsequent requests.

The Competitive Context

GPT-5.1 launched five days before Claude Opus 4.5, which took the coding benchmark crown. OpenAI was losing the coding race specifically — Claude had led SWE-bench since Sonnet 4.5 in September, and the dedicated Codex models were a direct response to that competitive pressure.

Our Take

GPT-5.1 is OpenAI doing the boring, important work. Steerability improvements and RBAC don't generate headlines, but they're exactly what enterprise customers ask for. The dedicated Codex line signals that OpenAI sees agentic coding as a distinct product category, not just a model feature. Whether the Codex models can catch Claude on coding benchmarks is the question that matters.

FAQ

What's new in GPT-5.1? GPT-5.1 focuses on improved steerability (instruction following), agentic coding via dedicated Codex models, enhanced RBAC for enterprises, and extended prompt caching up to 24 hours.

What are the GPT-5.1 Codex models? Three dedicated coding models: gpt-5.1-codex (standard), gpt-5.1-codex-mini (faster/cheaper), and gpt-5.1-codex-max (maximum compute). They're optimized for OpenAI's Codex agentic coding product.

How does GPT-5.1 compare to Claude? GPT-5.1 improves on instruction following and general tasks. Claude Sonnet 4.5 and Opus 4.5 lead on coding benchmarks. The models are competitive on different dimensions.

Meta Launches Muse Spark — Its First Closed-Source Model Targets 'Personal Superintelligence'

Meta Superintelligence Labs unveils Muse Spark with dual modes, 58% on Humanity's Last Exam, and multimodal reasoning. Breaking with tradition, the model is not open-source.

Alex Chen·Apr 8, 2026

AI LLMs

OpenAI, Anthropic, and Google Unite to Combat AI Model Copying From China

The three biggest Western AI labs are sharing information through the Frontier Model Forum to prevent Chinese competitors from extracting their models' capabilities.

Sarah Mueller·Apr 7, 2026

Steerability Over Raw Power

This is a pragmatic choice. Most production API calls don't need frontier-level reasoning; they need the model to do exactly what it's told, quickly and reliably.

Codex Models

gpt-5.1-codex and gpt-5.1-codex-mini are dedicated coding variants. They're optimized for OpenAI's Codex product — the agentic coding tool that competes with Anthropic's Claude Code and Cursor.

Infrastructure Updates

Our Take

FAQ

Meta Launches Muse Spark — Its First Closed-Source Model Targets 'Personal Superintelligence'

Meta Superintelligence Labs unveils Muse Spark with dual modes, 58% on Humanity's Last Exam, and multimodal reasoning. Breaking with tradition, the model is not open-source.

Alex Chen·Apr 8, 2026

AI LLMs

OpenAI, Anthropic, and Google Unite to Combat AI Model Copying From China

The three biggest Western AI labs are sharing information through the Frontier Model Forum to prevent Chinese competitors from extracting their models' capabilities.

Sarah Mueller·Apr 7, 2026

GPT-5.1 Brings Better Steerability and Codex Models for Agentic Coding

Steerability Over Raw Power

Codex Models

Infrastructure Updates

The Competitive Context

Our Take

FAQ

More in AI LLMs

Meta Launches Muse Spark — Its First Closed-Source Model Targets 'Personal Superintelligence'

OpenAI, Anthropic, and Google Unite to Combat AI Model Copying From China

GPT-5.1 Brings Better Steerability and Codex Models for Agentic Coding

Steerability Over Raw Power

Codex Models

Infrastructure Updates

The Competitive Context

Our Take

FAQ

More in AI LLMs

Meta Launches Muse Spark — Its First Closed-Source Model Targets 'Personal Superintelligence'

OpenAI, Anthropic, and Google Unite to Combat AI Model Copying From China