back

Grok 3 Unveiled: xAI’s leap in AI innovation

Grok 3 Shatters AI Benchmarks: xAI's Latest Model Sets New Industry Standards with Unprecedented 1400+ Arena Score

Get SIGNAL/NOISE in your inbox daily

xAI, led by Elon Musk, has launched Grok 3, claiming it to be the world’s most advanced AI model. Following its live demo, the model has set new AI performance benchmarks—most notably becoming the first to exceed a score of 1400 on Chatbot Arena. Grok 3 outperforms established competitors like OpenAI’s GPT-4o and Google’s Gemini 2 Pro in reasoning, coding, and problem-solving capabilities. The tech community has praised this achievement, particularly noting xAI’s swift progress as a newcomer in the AI field. Within xAI, the model’s development has generated significant excitement, with both team members and industry observers anticipating its future applications and potential to advance AI technology further.

Key Insights from the Grok 3 Video

According to Elon, Grok 3 is an order of magnitude more capable than Grok 2.

The capacity was doubled in 92 days!

Total GPUs: 200K

All of this compute was used to improve Grok — which has lead to Grok 3.

Grok 3’s training was ten times more extensive than Grok 2’s. While its initial pretraining phase concluded in early January, the model continues to undergo training.

Here are the benchmark numbers:

Grok 3 significantly outperforms other models in its category such as Gemini 2 Pro and GPT-4o. Even Grok-3 mini shows to be competitive.

Results of early Grok 3 in the Chatbot Arena (LMSYS)

It reached an Elo score of 1400 which no other model has achieved.

The model score keeps improving.

Grok 3 also has reasoning capabilities too!

The Grok team has been testing these capabilities which they have unlocked using RL.

The model is good, especially in coding.

Grok 3 Reasoning performance:

The results correspond to the beta version of Grok-3 Reasoning.

It outperforms o1 and DeepSeek-R1 when given more test-time compute (allowing it to think longer).

The Grok 3 mini reasoning model is also very capable.

More on DeepSearch:

  • the model can think deeply about user intent
  • what facts to consider
  • how many websites to browse
  • it can cross-validate different sources

DeepSearch also exposes the steps that it takes to conduct the search itself.

What others are saying on X:

An early version of Grok-3 (codename “chocolate”) has claimed the #1 spot in Arena!

Grok-3 has achieved two major milestones:

  • First model ever to break the 1400 score barrier
  • #1 ranking across all categories—an increasingly challenging feat

Recent Blog Posts

Feb 9, 2026

Six ideas from the Musk-Dwarkesh podcast I can’t stop thinking about

I spent three days with this podcast. Listened on a walk, in the car, at my desk with a notepad. Three hours is a lot to ask of anyone, especially when half of it is Musk riffing on turbine blade casting and lunar mass drivers. But there are five or six ideas buried in here that I keep turning over. The conversation features Dwarkesh Patel and Stripe co-founder John Collison pressing Musk on orbital data centers, humanoid robots, China, AI alignment, and DOGE. It came days after SpaceX and xAI officially merged, a $1.25 trillion combination that sounds insane until you hear...

Feb 8, 2026

The machines bought Super Bowl airtime and we rank them

Twenty-three percent of Super Bowl LX commercials featured artificial intelligence. Fifteen spots out of sixty-six. By the end of the first quarter, fans on X were already exhausted. The crypto-bro era of 2022 has found its successor. This one has better PR. But unlike the parade of indistinguishable blockchain pitches from years past, the AI ads told us something. They revealed, in thirty-second bursts, which companies understand what they're building and which are still figuring out how to explain it to 120 million people eating guacamole. The results split cleanly. One company made art. One made a promise it probably can't...

Feb 3, 2026

The Developer Productivity Paradox

Here's what nobody's telling you about AI coding assistants: they work. And that's exactly what should worry you. Two studies published this month punch a hole in the "AI makes developers 10x faster" story. The data pointssomewhere darker: AI coding tools deliver speed while eroding the skills developers need to use that speed well. The Numbers Don't Lie (But They Do Surprise) Anthropic ran a randomized controlled trial, published January 29, 2026. They put 52 professional developers througha new programming library. Half used AI assistants. Half coded by hand. The results weren't close. Developers using AI scored 17% lower on...