Elon Musk’s xAI has unveiled Grok 3, a new large language model positioned as a competitor to OpenAI’s GPT-4o, Google’s Gemini, and China’s DeepSeek. The company claims the model achieves “superhuman reasoning” through a combination of architectural upgrades and synthetic training data, though independent researchers urge caution until third-party evaluations validate its performance TechCrunch.

At its core, Grok 3 introduces a “Big Brain” mode designed for multi-step problem-solving in fields like physics and mathematics. The system leverages self-correction mechanisms and a new “DeepSearch” tool that autonomously scans web sources—including X (formerly Twitter)—to generate summaries with citations. This feature mirrors functionality seen in OpenAI’s Deep Research and Hugging Face’s Open Deep Research agent frameworks, though xAI claims its implementation better handles real-time data from social platforms.

Performance claims meet skepticism

xAI asserts Grok 3 outperforms rivals on benchmarks like Chatbot Arena (1,400+ score) and PhD-level assessments such as AIME 2025 and GPQA. The company attributes these gains to its “Colossus” supercomputer cluster, which uses 200,000 Nvidia H100 GPUs—double the compute power used for Grok 2. Early adopters report the model solves complex thermodynamics problems and generates functional Python code more reliably than previous iterations CNBC.

But the benchmarks come with asterisks. Researchers note xAI’s comparisons use older versions of competitor models, and creative tasks still produce inconsistent results. Until we see third-party evaluations under controlled conditions, these claims remain interesting but unproven.

The model’s “maximally truth-seeking” design also raises ethical questions. Early testers report Grok 3 generates politically charged responses and plausible-sounding but inaccurate historical summaries, with limited safeguards against deepfake creation. xAI engineers acknowledge these issues in a system card-style document, stating they’re exploring mitigation strategies without compromising reasoning capabilities.

Availability and infrastructure

Grok 3 is rolling out first to X’s $50/month Premium+ subscribers, with a $30/month “SuperGrok” tier offering unlimited image generation. Enterprise API access remains limited to U.S.-based developers, though Musk promises EU availability pending regulatory approvals. The company plans to open-source Grok 2 once Grok 3 stabilizes—a move skeptics argue delays transparency for the newer model Business Insider.

Behind the scenes, xAI is expanding its GPU infrastructure to 1.2 gigawatts, reportedly to support a $10 billion funding round at a $75 billion valuation. Whether Grok 3 represents a genuine leap in AI reasoning or another salvo in the generative arms race may depend on how quickly external researchers can stress-test Musk’s latest “scary smart” creation.


For technical details on Grok 3’s architecture, see xAI’s Colossus.

The link has been copied!