Claude Opus 4.5 Scores Highest on Engineering Exam, Leads Agentic Benchmarks
Anthropic's Opus 4.5 exceeded all human candidates on the company's internal engineering exam, leads SWE-bench Verified, and introduces an effort parameter for speed optimization.
Maya Johnson
Anthropic released Claude Opus 4.5 on November 24, 2025, calling it the "best model in the world for coding, agents, and computer use." The claim is backed by numbers: it leads SWE-bench Verified, shows a 10.6% improvement over Sonnet 4.5 on the Aider Polyglot coding benchmark, and scored 29% higher on Vending-Bench for long-horizon tasks, according to Anthropic.
The Engineering Exam Result
The standout detail: Opus 4.5 exceeded all human candidates on Anthropic's internal engineering exam. This isn't a public benchmark designed for AI — it's the actual test Anthropic gives to engineering job applicants. The model outperformed every human who took it.
That's a different kind of milestone than benchmark leaderboards. It suggests the model has crossed a threshold where it can reliably perform professional-level software engineering work, not just solve isolated coding puzzles.
The Effort Parameter
Opus 4.5 introduced a new "effort parameter" that lets developers control the speed-capability tradeoff. Lower effort settings produce faster, cheaper responses for simple tasks. Higher settings enable deeper reasoning for complex problems. This makes Opus 4.5 more practical for production use where not every query needs maximum compute.
Desktop App With Parallel Agent Sessions
The release included desktop app support with parallel agent sessions — multiple Claude agents running simultaneously on different tasks. This is a preview of the multi-agent architecture that would later become agent teams in Opus 4.6.
Pricing and Context
Opus 4.5 costs $5/$25 per million tokens — a significant drop from Opus 4's $15/$75. The 200K context window and 64K max output match Sonnet 4.5. Model ID: claude-opus-4-5-20251101.
For comparison: GPT-5 launched in August at competitive pricing, and Gemini 3 Pro was about to ship. The LLM market was entering its most competitive period, with three strong contenders releasing frontier models within weeks of each other.
Our Take
The pricing restructure is the real story here. Opus dropped from $15/$75 to $5/$25 — a 67% price cut — while getting significantly better. That's Anthropic acknowledging that the Opus tier needs to be accessible enough for production use, not just occasional hard problems. The effort parameter makes this practical: you can run Opus at low effort for routine work and high effort for the hard stuff, keeping costs manageable.
FAQ
How much does Claude Opus 4.5 cost?
Opus 4.5 costs $5 per million input tokens and $25 per million output tokens. This is a 67% reduction from Opus 4's pricing of $15/$75. The model ID is claude-opus-4-5-20251101.
What is the effort parameter? The effort parameter lets developers control how much reasoning Opus 4.5 applies to each request. Lower settings produce faster, cheaper responses for simple tasks, while higher settings enable deeper reasoning for complex problems.
How does Opus 4.5 compare to Sonnet 4.5? Opus 4.5 scores 10.6% higher on the Aider Polyglot benchmark and 29% higher on Vending-Bench for long-horizon tasks. However, Sonnet 4.5 at $3/$15 offers excellent value for tasks that don't require maximum capability.
Did Opus 4.5 really beat all human engineers? Yes, according to Anthropic, Opus 4.5 exceeded all human candidates on the company's internal engineering exam — the same test used for hiring decisions. This is the actual Anthropic engineering interview, not a standardized benchmark.