Meta Ships Llama 4: Scout Fits on One GPU, Maverick Beats GPT-4o
Llama 4 introduces MoE architecture with three models. Scout has a 10M token context window. Maverick's 128 experts beat GPT-4o on LMArena. Behemoth is still training.
Maya Johnson
Meta released the Llama 4 family on April 5, 2025, marking a fundamental architecture shift: Llama 4 uses Mixture of Experts (MoE) for the first time. Three models shipped or were announced — Scout, Maverick, and Behemoth — each targeting a different scale point, according to Meta AI.
Llama 4 Scout: 10M Context on One GPU
Scout is the practical breakthrough. At 17B active parameters with 16 experts, it fits on a single H100 GPU while offering a 10M token context window — 50x larger than most competitors. That's enough to process entire codebases, book-length documents, or months of conversation history in a single prompt.
The 10M context window is particularly significant for enterprise applications that need to reason over massive document collections without retrieval-augmented generation (RAG) pipelines.
Llama 4 Maverick: Competing With Closed Models
Maverick scales up to 17B active parameters with 128 experts (400B total parameters). It beat GPT-4o and Gemini 2.0 Flash on LMArena with an Elo score of 1,417, making it the first open-source model to consistently outperform leading closed-source models on competitive benchmarks.
Maverick is natively multimodal — handling text, image, and video — pre-trained on 30T+ tokens across 200 languages. This is 2x the training data of Llama 3.
Behemoth: Still Training
Llama 4 Behemoth was announced but not yet released. At 288B active parameters and approximately 2T total parameters, it already outperforms GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on STEM benchmarks despite being mid-training.
The Open Source Statement
All released Llama 4 models are open-source, continuing Meta's strategy of undermining the commercial moat of closed-source providers. By March 2025, Llama had passed 1 billion cumulative downloads — making it the most widely deployed AI model family in history.
Llama 4 is used in government (GSA partnership for federal agencies), military (expanded to NATO allies and Five Eyes+ nations), and space (deployed on the International Space Station via a partnership with Booz Allen and HPE).
LlamaCon: The Ecosystem Event
Alongside the model launch, Meta held LlamaCon (April 29) where it announced the Llama API (limited preview), performance partnerships with Cerebras and Groq for faster inference, security tools (Llama Guard 4, LlamaFirewall), and the Meta AI app.
Our Take
Llama 4 is Meta's strongest argument that open-source AI can compete with and beat closed-source models. Maverick beating GPT-4o is a milestone — it means the best freely available model now outperforms what was the best model in the world just a year ago. Scout's 10M context window on a single GPU is the kind of practical innovation that enterprises actually need. The question is whether Behemoth, when it ships, can compete with Claude Opus and GPT-5 at the frontier.
FAQ
What is Llama 4 Scout? Llama 4 Scout is Meta's efficient open-source model with 17B active parameters, 16 experts, and a 10M token context window. It fits on a single H100 GPU.
How does Llama 4 Maverick compare to GPT-4o? Maverick beat GPT-4o and Gemini 2.0 Flash on LMArena with an Elo score of 1,417. It has 400B total parameters with 128 experts.
Is Llama 4 open source? Yes, all released Llama 4 models are open-source. Llama has surpassed 1 billion cumulative downloads as of March 2025.
What is Llama 4 Behemoth? Behemoth is the largest Llama 4 model with 288B active parameters and ~2T total parameters. It was announced but not yet released as of April 2025, already outperforming GPT-4.5 on STEM benchmarks during training.