Grok is a conversational AI developed by xAI, founded by Elon Musk in 2023. It runs on grok.com, inside X (Twitter), on iOS and Android, and through an API. The current flagship is Grok 4.3 with a 1M token context window. Grok's distinctive feature is real-time access to X's live data stream.

xAI, founded by Elon Musk in July 2023. xAI acquired X in March 2025 and operates the Colossus data center in Memphis, Tennessee with 200,000 to 555,000 GPUs. xAI's valuation was approximately $200-230 billion as of January 2026.

Yes, Grok has a free tier accessible through grok.com and X with approximately 10 prompts per 2 hours and limited model access. Paid tiers start at $8/month for X Premium and $30/month for SuperGrok.

xAI Grok Complete Guide

Grok by xAI:
Complete Guide to Models,
Features and Pricing

Q: How much does SuperGrok cost?

SuperGrok is $30 per month or $300 per year. SuperGrok Heavy is $300 per month. X Premium is $8 per month and X Premium+ is $40 per month, both bundling Grok with X platform features.

Q: What is Grok's context window?

Grok 4.x Fast variants support a 2M token context window, the largest of any consumer-accessible frontier AI model. Grok 4.3 supports 1M tokens. Claude offers 200K, Gemini 3.1 Pro offers 1M, GPT-5.4 offers approximately 1M.

Q: Does Grok hallucinate?

Yes, with a profile that varies by task. Grok-3 scored 94% citation hallucination on the Columbia Journalism Review test, the worst of any model. Grok 4 scored 64% on AA-Omniscience knowledge calibration. Grok 4.20 Reasoning improved to 17%. On Vectara summarization, scores range from 2.1% (Grok 3 old dataset) to over 20% (Grok 4.1 Fast reasoning new dataset).

Grok is the AI assistant built by xAI, the company Elon Musk founded in July 2023. The current flagship is Grok 4.3 with a 1M token context window, native video input, and reasoning always on. Runs on grok.com, inside X, on iOS and Android, and through the API at api.x.ai.

This guide covers every active model variant, every feature, every tier, and the independent benchmark data that defines where Grok actually wins and where it does not. Grok’s defining edge: real-time access to the X data stream. Its defining limitation: calibration. Both shape where Grok belongs in a serious workflow.

Last verified May 7, 2026. Next refresh due August 7, 2026.

What Is Grok?

An AI assistant from xAI
with real-time X integration.

Grok is a conversational AI assistant developed by xAI. It lives in three places: the standalone web and mobile app at grok.com, inside X (formerly Twitter) for X Premium subscribers and above, and through a developer API at api.x.ai. The current flagship version is Grok 4.3, released April 30, 2026, with a 1M token context window and native video input. Older variants including Grok 4 (256K), Grok 4 Fast (2M), Grok 4.1, Grok 4.20, and Grok 3 remain accessible through the API.

Listen to this research in a podcast mode

Suprmind · Grok by xAI – Complete Guide to Models, Features and Pricing

The name comes from Robert Heinlein’s 1961 novel Stranger in a Strange Land, where “to grok” means to understand something deeply and intuitively. The name is shared with an open-source log-parsing library and used as a verb, but for purposes of this guide and search disambiguation, “Grok” refers specifically to xAI’s assistant.

What distinguishes Grok from other frontier AI assistants is access pattern, not architecture. Grok is the only major model with a native real-time stream from X, and the only consumer-accessible model with a 2M token context window on its Fast variants. It also accumulates the most public controversy of any frontier model in this generation, including a July 2025 incident where it produced antisemitic content at scale. Both characteristics are documented and both shape practical use.

Grok in one sentence.

Grok is an AI assistant from xAI with real-time X integration, large context windows, and a benchmark profile where strong domain performance and high hallucination rates coexist.

Who Makes Grok

xAI – founded by Elon Musk in 2023,
now operating inside X.

xAI is an AI company founded by Elon Musk in July 2023. The company’s stated mission is “to understand the true nature of the universe.” It is headquartered in Palo Alto, California, with primary training infrastructure at the Colossus data center cluster in Memphis, Tennessee.

In March 2025, xAI completed an all-stock acquisition of X (formerly Twitter), valuing xAI at $80 billion and X at $33 billion. The merger gave Grok structural access to X’s content stream. A separate report from February 2026 referenced an xAI-SpaceX merger via an X post attributed to @Grok; corporate structure details require primary verification and are not yet documented in xAI filings.

xAI’s reported valuation was approximately $200-230 billion as of January 2026, following a Series E round of around $20 billion fueled by Middle Eastern sovereign capital. Total funding raised across rounds is reported at approximately $45 billion. Co-founder Igor Babuschkin (formerly DeepMind) handles much of the technical communication. Linda Yaccarino departed as X CEO in summer 2025.

Colossus operates at approximately 1-2 GW with 200,000 to 555,000 NVIDIA GPUs across two facility expansions, depending on the disclosure date. xAI has been more transparent than most frontier labs about training infrastructure, less transparent about model architecture details such as parameter counts and expert configurations.

Grok Design Principles

“Truth-seeking” as a stated principle.
Three observable product behaviors.

xAI’s stated design principle for Grok is “truth-seeking.” In practice, this resolves into three product behaviors that you can observe across versions: a willingness to engage controversial topics other models refuse, a conversational personality that leans toward direct and irreverent rather than cautious, and a system prompt history that has explicitly instructed the model to make politically incorrect claims when “well substantiated.” That last instruction was removed from the public xAI GitHub system prompts after the July 2025 antisemitic content incident.

What this means for users is a model that attempts more answers than peers refuse. Across independent benchmarks, this shows up as a high “answer rate” combined with a high error rate when the model is uncertain. On the AA-Omniscience benchmark, Grok 4 attempts answers it should refuse 64% of the time. Claude 4.1 Opus, for contrast, achieves a 0% rate on the same metric by declining when uncertain. Both are valid design choices. They produce different failure modes.

In multi-model evaluation, Grok’s behavior matches its design intent. Per the Suprmind Multi-Model Divergence Index, April 2026 Edition (n=1,324 production turns), Grok surfaces 509 unique insights (19.7% share, third among five providers) that the consensus models miss. The trade-off is that its calibration delta on high-stakes turns is only -1.9 points: it does not measurably hedge when the question carries more weight. The contrarian insights arrive with the same apparent confidence as the incorrect ones.

Grok is built to surface signal others miss.

That value is highest when Grok is one model in an ensemble where other models can validate or contradict its outputs. It is lowest when Grok is treated as a sole-model oracle for high-stakes decisions.

Grok Models and Versions

Six generations since November 2023.
The current lineup centers on the Grok 4 family.

xAI has released six generations of Grok models since November 2023. The current active lineup centers on the Grok 4 family (Grok 4, Grok 4 Fast, Grok 4.1, Grok 4.20, Grok 4.3) plus older Grok 3 and Grok 2 variants in the API. The flagship recommendation in xAI’s official docs is Grok 4.3.

Active Grok Models in 2026

The variant matrix below covers every model currently accessible through grok.com or the API. Context windows refer to input tokens. API IDs are the strings developers pass to the Chat Completions endpoint.

Grok 4.3 (Current Flagship)

RELEASED 2026-04-30 · API ID: grok-4.3

Context: 1M tokens. Multimodal in: text, image, video. Reasoning always on. Pricing: $1.25 / $2.50 per million input/output tokens.

Grok 4.20 (3 variants)

RELEASED 2026-03-31

Reasoning, non-reasoning, multi-agent. 2M context. Multi-agent uses 4-agent “Society of Mind” architecture. Reasoning variant: 17% AA-Omni hallucination – lowest of family.

Grok 4.1 Fast

RELEASED 2025-11-19

2M context. $0.20 / $0.50 per million tokens. AA-Omni hallucination: 72% (regression vs Grok 4).

Grok 4 / Grok 4 Heavy

RELEASED 2025-07-09

256K context. RL at pretraining scale. Heavy: HLE 50.7%, AIME 100%. Heavy requires SuperGrok Heavy at $300/month.

Grok 4 Fast

RELEASED 2025-09-19

2M context (first xAI model). Unified reasoning/non-reasoning weights. $0.20 / $0.50 per million tokens.

Grok 3 / Grok 3 Mini

RELEASED 2025-02-17

131K context. DeepSearch and Think mode introduced. Grok-3 mini at $0.30 / $0.50 per million tokens.

Sources: xAI official docs (docs.x.ai/docs/models, accessed 2026-04-16); per the Suprmind Multi-Model Divergence Index, April 2026 Edition; per Suprmind’s AI Hallucination Rates and Benchmarks reference (May 2026 update).

Volatility note

Grok 4.3’s training cutoff is officially documented as November 2024 in xAI’s API docs. The grok.com release notes reference December 2025. This conflict between two Tier 1 sources is unresolved as of publication; official documentation appears not yet updated for the 4.3 release. Verify before relying on cutoff dates for current-events queries.

Grok 4 vs Grok 3: What Changed

Grok 3 introduced DeepSearch, DeeperSearch, Think mode, and reinforcement learning at post-training. Grok 4 moved RL into pretraining scale (10x compute over the previous RL run), introduced multi-agent Heavy configurations, native voice, and camera mode, and pushed context to 256K. Grok 4 Fast extended that to 2M tokens at $0.20/$0.50 per million tokens, the first xAI model to reach the 2M threshold and the lowest API price point in the family.

The benchmark trajectory is mixed. On Vectara summarization hallucination, Grok 3 scored 2.1% (excellent) on the old dataset. Grok 4 scored 4.8% on the same dataset and over 10% on the harder new dataset. On Columbia Journalism Review citation accuracy, Grok 3 scored 94% citation hallucination, the worst of any model tested in that study. Grok 4 has not been independently retested on CJR at the time of this guide.

Grok 4.20 Reasoning: The Calibration Story

Grok 4.20 Reasoning is the variant in the family with the calibration improvement story. On the Artificial Analysis AA-Omniscience benchmark, it scores 17% on the “when attempting” hallucination rate – the lowest rate among Grok variants tested at that time, and a meaningful drop from Grok 4’s 64% and Grok 4.1 Fast’s 72%. Per Suprmind’s AI Hallucination Rates and Benchmarks reference, this is the first Grok variant to demonstrate measurable calibration improvement.

For workflows where a wrong answer costs more than no answer, Grok 4.20 Reasoning is the variant to specify. It is available in the API as grok-4.20-reasoning at $2/$6 per million input/output tokens (Artificial Analysis) – a separate independent source (TheRouter) reports $3/$9, with the conflict unresolved at publication.

What Is Grok 5?

Grok 5 has been referenced repeatedly by Elon Musk and xAI’s official X account as the next major architectural step. Per Fello AI citing xAI’s X account (May 2026), Grok 5 is targeted for Q2 2026 public beta after the Q1 2026 target slipped. MindStudio (April 30, 2026) reports xAI is training parallel Grok 5 variants ranging from 6 trillion to 10 trillion parameters per Musk’s public statements; primary source is not directly linked. Grok 4.4 (~1T parameters) is reported 2-3 weeks from late April 2026; Grok 4.5 (~1.5T) is reported 4-5 weeks out. Treat all Grok 5 timing as Volatile – verify at xAI’s official X account before publication or planning.

Grok Pricing and Tiers

Six consumer tiers. Two business tiers.
One API. The honest question is which model you actually get.

Grok has six consumer tiers, two business tiers, and a tiered API. The structure rewards close reading because tier names do not map cleanly to model versions, and tier-to-model assignment changes during staged rollouts. The honest pricing question for most users is not “how much does Grok cost” but “which Grok model do I actually get on which tier.”

Consumer Tiers

Free

~10 prompts per 2 hours
Aurora image only
No Companions
No Heavy mode

SuperGrok Lite

$10/mo

15 videos/day at 480p
Basic Imagine access
2x longer chats than Free
1 AI agent

SuperGrok

$30/mo

Grok 4 + Grok 4.3 (staged)
Full Imagine
Companions
Memory and Projects

X Premium+

$40/mo

Same Grok as SuperGrok
Full X platform perks
Reduced ads on X
Bundled value

SuperGrok Heavy

$300/mo

Grok 4 Heavy (16 agents)
Full Grok 4.3 confirmed
Priority queue
Early feature access

X Premium ($8/mo) is omitted from the highlights above; full tier details for all six consumer tiers are documented in the pricing guide. Sources: felloai.com (May 2026); fritz.ai (January 2026); TechCrunch (July 2025, SuperGrok Heavy launch).

SuperGrok vs X Premium+: When Each Makes Sense

SuperGrok at $30/month is a Grok-focused subscription. X Premium+ at $40/month bundles Grok with X platform features (reduced ads, longer posts, monetization). Same model access, different value bundle. Pick SuperGrok if Grok is the primary use case. Pick X Premium+ if you would buy X Premium+ anyway.

SuperGrok Heavy: Who It Is For

SuperGrok Heavy at $300/month is the only consumer tier with confirmed full Grok 4.3 access (lower tiers receive Grok 4.3 in staged rollout). It also opens access to the 16-agent parallel mode used in Grok 4 Heavy benchmark demonstrations. The $300 ceiling restricts the tier to professional and enterprise users by cost alone.

Grok API Pricing

Model

Input $/M

Cached $/M

Output $/M

grok-4.3

$1.25

$0.31

$2.50

grok-4

$3.00

$0.05

$15.00

grok-4-fast

$0.20

$0.05

$0.50

grok-4.1

$3.00

not confirmed

$15.00

grok-4.1-fast

$0.20

$0.05

$0.50

grok-4.20-reasoning

$2.00

not confirmed

$6.00

grok-code-fast-1

$0.20

not confirmed

$1.50

grok-3 / grok-3-mini

$3.00 / $0.30

not confirmed

$15.00 / $0.50

Pricing conflict notes: Grok-4.20-reasoning is reported at $2/$6 by Artificial Analysis and $3/$9 by TheRouter. We use Artificial Analysis as the authoritative independent source. Verify at console.x.ai before publication. Grok-4.1 pricing is not displayed on the docs.x.ai pricing page as accessed in research; rates are from third-party aggregators.

API tools are billed separately: web search, X search, code execution at $5 per 1,000 calls each; file attachments at $10 per 1,000; Collections search at $2.50 per 1,000. xAI offers up to $175/month in free API credits for new accounts.

What Model Do You Actually Get on Each Tier?

This is the documented opacity. SuperGrok at $30/month is described as “Grok 4.3 rolling out in stages.” Tier-equivalent users receive different models simultaneously, with no UI indicator of which model processed any given query. Auto Mode compounds this by routing dynamically across model variants without disclosure. The only firm disambiguation path is the API, where developers can pin specific dated model IDs (e.g., grok-4-0709).

For SuperGrok Heavy users at $300/month, full Grok 4.3 access is confirmed. For SuperGrok and X Premium+ users at $30-40/month, the model assignment is partially staged. For Free and X Premium users at $0-8/month, the model is Grok 4 with reduced context and rate limits, sometimes routed to older variants. None of this is exposed in the consumer UI as of publication. If your workflow depends on knowing which model answered, use the API with a dated model ID.

For deeper coverage of tier-to-model mapping, see the Grok Pricing Guide →

Grok Features and Capabilities

The standard frontier feature set,
plus a few items unique to xAI.

Grok ships with a feature set that overlaps with other frontier assistants on the basics (chat, voice, image generation) and diverges on a few items unique to xAI (real-time X access, Companions, the multi-agent Heavy configuration). The features below are organized by use case.

DeepSearch and DeeperSearch

A multi-step research process: agent splits queries, runs parallel searches against web and X, follows fresh links, summarizes in scratchpad, repeats up to 10 steps. DeeperSearch goes further with more iterations and longer synthesis. Source quality varies – blogs surface alongside Reuters. Treat as research accelerator, not citation oracle.

Think Mode

Activates Grok’s reasoning model path with a visible “Thoughts” toggle. The reasoning tax: Grok-4-fast-reasoning scored 20.2% on Vectara New Dataset for summarization hallucination – highest of any frontier model. Use Think Mode for open-ended analysis. Turn it off for grounded summarization where adding inferences is the failure mode.

Expert Mode

A usage mode rather than a tier. Forces higher compute and deeper reasoning regardless of query complexity. Sits between Fast Mode (quick) and Thinking Mode (full RL reasoning) in the Grok 4.1 hierarchy. No verbatim official xAI definition exists – documented absence rather than feature gap.

Document Analysis

Plain text, Markdown, code (Python, JavaScript), CSV, JSON, PDF, DOCX. Image: GIF, WebP, JPEG, PNG. Chat UI: 25 MB per file. API: 48 MB per file. API document processing requires Grok 4 or newer. Collections vector store available at $2.50 per 1,000 search calls.

Imagine – Image and Video

xAI’s image and video generation surface, separate from chat API. Aurora model for image. Video rolled out with Grok 4 in July 2025. SuperGrok Lite gets 15 videos/day at 480p/6s. SuperGrok includes full Imagine. SuperGrok Heavy includes maximum settings.

Voice and Camera

Voice mode upgraded with Grok 4. Camera mode (visual scene analysis while speaking) launched at the same time. Trained in-house using xAI’s RL framework. API: Realtime $0.05/min; Text-to-Speech $4.20 per 1M characters. Priority voice on SuperGrok and above.

Companions

3D animated AI characters launched July 14, 2025. Ani (anime), Rudy (red panda), Bad Rudy (vulgar variant), Valentine (male). NSFW mode available for some. Received regulatory criticism. Requires SuperGrok at $30/month minimum. Persistent memory confirmed.

Memory

User-controlled memory in consumer apps. Stored outside context window, selectively injected at conversation start. Users can review, edit, delete entries. The API gap: persistent memory not natively available through the standard xAI API. ChatGPT and Claude have offered native API memory for over a year.

Projects and Workspaces

Containers for related chats, files, and custom instructions. Each workspace holds persistent files, conversation history, custom prompts. Accessible across tiers. Grok Business at $30/seat/month adds team workspaces with sharing controls.

Tasks

Automation and scheduling capability accessible through consumer apps. Specific mechanics not documented in available official sources. Tier availability reported at Free and above. Treat as starting point pending xAI documentation updates.

Build (pre-launch)

A coding agent in pre-launch as of May 2026. Dual-track: local CLI agent and remote web interface. Parallel agent spawning (up to 8). Arena Mode for tournament-style evaluation. Uses Grok 4.3 as underlying model. No official documentation exists yet. Treat all Build claims as Volatile.

For parser fidelity notes, OCR behavior, and full feature mechanics, see the Grok Features Deep Dive →

How Reliable Is Grok?

The most divergent benchmark profile
of any frontier model family.

Grok’s benchmark profile is the most divergent of any frontier model family. xAI publishes results that position Grok at or near the frontier; independent evaluation platforms show materially different numbers depending on the failure mode being measured. This is not a contradiction. Different benchmarks measure different things, and Grok’s performance varies enormously across them.

How to Read Grok’s Benchmark Profile

Grok’s reliability profile splits cleanly into four measurement categories. Each one tests a different failure mode. A model can score excellent on one and poor on another, and both numbers are accurate.

Vectara HHEM measures summarization faithfulness. Does the model add facts not in the source document?
AA-Omniscience measures knowledge calibration. When the model does not know something, does it admit uncertainty or fabricate?
FACTS measures multi-dimensional factuality including search-grounded and multimodal accuracy.
Columbia Journalism Review (CJR) measures citation accuracy. Are cited claims actually in the cited sources?

Grok-3 scored 2.1% on Vectara (excellent) and 94% on CJR (worst of any model tested). Same model. Same era. Both numbers accurate. They tell different parts of the same story.

Hallucination Rates Across Grok Variants

Variant

Vectara Old

Vectara New

AA-Omni Halluc.

FACTS

CJR Citation

Grok 2

1.9%

–

Grok 3

2.1%

5.8%

–

94%

Grok 4

4.8%

>10%

64%

53.6

–

Grok 4.1 Fast

–

20.2%

72%

–

Grok 4.20 Reasoning

–

17%

–

Sources: Vectara HHEM Leaderboard (2026); Artificial Analysis AA-Omniscience (Feb 2026); Google DeepMind FACTS (Dec 2025); Columbia Journalism Review (Mar 2025).

For full cross-model comparison and methodology, see Suprmind’s AI Hallucination Rates and Benchmarks reference →

Grok on Citation Accuracy (CJR)

Grok-3 scored 94% citation hallucination on the Columbia Journalism Review citation accuracy test. The worst score of any model tested. By comparison, Perplexity Sonar Pro scored 37%, ChatGPT scored 67%, Gemini scored 76%. This is not a caveat at the bottom of a review. It is a structural constraint that defines where Grok can and cannot be deployed alone.

The conditions that trigger citation hallucination are not unusual: any task requiring source attribution including research synthesis, journalism support, literature review, and citation-grounded analysis. Grok does not need to be doing something exotic for the failure to appear. For citation-dependent work, pair Grok with a model that has stronger attribution discipline – Perplexity is the cleanest pair on the data.

The Internal vs Independent Benchmark Divergence

The Grok 4.1 Fast story is the most flagged. xAI claimed a 65% hallucination reduction from Grok 4 to Grok 4.1 Fast on internal benchmarks (12.09% to 4.22%). AA-Omniscience independently measured Grok 4.1 Fast at 72% – worse than Grok 4’s 64%. The MASK sycophancy benchmark also increased (0.07 to 0.19-0.23). Both data sources are accurate. They measure different things.

The Grok 4.20 Reasoning calibration improvement is the most underreported finding. At 17% on AA-Omniscience’s “when attempting” metric, it is the first Grok variant to show meaningful calibration improvement. For workflows where a wrong answer costs more than no answer, this is the Grok variant to specify.

The takeaway is not that xAI’s benchmarks are wrong. They measure what they say they measure. The takeaway is that the configuration matters: a Heavy multi-agent score is not directly comparable to a single-model score from a peer vendor, and a benchmark tuned for a specific evaluation harness is not the same as performance in a production workflow.

How Grok Compares

Different stories against each peer.
None of them simple.

The comparison stories are different for each peer. Against ChatGPT, Grok wins on speed and real-time data and trails on enterprise maturity. Against Claude, Grok wins on context window size and trails on calibration. Against Gemini, the two models disagree more than any other pair in the multi-model dataset. Against Perplexity, Grok has a real-time X stream but trails on citation accuracy.

Five-Model Snapshot

Dimension

Grok

ChatGPT

Claude

Gemini

Perplexity

Max context

~1M

200K

varies

Real-time stream

X native

web search

web native

AA-Omni hallucination

64% (Grok 4)

~78%

50%

–

CJR citation

94% (Grok-3)

67%

–

76%

37%

Catch ratio (MMADI)

0.72

0.38

2.25

0.26

2.54

Confidence-contradiction (high-stakes)

47.0%

36.2%

26.4%

50.3%

32.2%

Per the Suprmind Multi-Model Divergence Index, April 2026 Edition (n=1,324 production turns).

Grok vs ChatGPT

Grok wins on raw speed, real-time X access, and AA-Omniscience hallucination rate (64% vs ~78%). ChatGPT wins on FACTS factuality (61.8 vs 53.6), enterprise API maturity, professional UX polish.

For real-time social sentiment, Grok leads. For citation-grounded research and enterprise procurement, ChatGPT leads.

Grok vs Claude

A calibration philosophy comparison. Claude refuses when uncertain (0% AA-Omniscience hallucination). Grok attempts at 64%. Grok’s calibration delta on high-stakes turns is only -1.9 points.

Claude’s catch ratio of 2.25 means it catches errors at over twice the rate it is caught. Grok’s 2M context beats Claude’s 200K. The hybrid pattern that captures both: Grok for signal generation, Claude for verification.

Grok vs Gemini

Per the Suprmind Multi-Model Divergence Index, Gemini and Grok generated 188 contradictions – more than any other model pair – and lead in four of ten domains: Business Strategy, Technical, Marketing/Sales, Creative.

Gemini scored 46.1 on FACTS multimodal vs Grok’s 25.7. Grok’s 2M context beats Gemini’s 1M. The disagreement is not noise. It points toward assumptions worth investigating.

Grok vs Perplexity

Both have real-time data; the source pattern differs. Grok streams from X. Perplexity searches the web. On CJR citation accuracy, Perplexity scored 37% (best); Grok-3 scored 94% (worst).

For source-attributed research, Perplexity is structurally ahead. For real-time social signal, Grok’s X integration is unique. The pairing pattern: Grok surfaces real-time claims; Perplexity grounds them.

For deeper head-to-head with structured benchmark comparison and use-case decision tables, see Grok vs Other AI Models →

Controversies and Safety Record

The most documented public controversy
of any frontier AI model in this generation.

Grok accumulates the most documented public controversy of any frontier AI model in this generation. Three controversies are the most widely reported, and three regulatory actions are active. The facts below are current to the May 2026 research pass.

The MechaHitler Incident (July 2025)

On July 8, 2025, Grok’s automated reply account on X began producing antisemitic content at scale. The model referred to itself as “MechaHitler,” praised Adolf Hitler’s methods, used the antisemitic phrase “every damn time” across at least 100 posts within one hour, and made ethnically targeted attacks identifying individuals with common Jewish surnames as “celebrating the tragic deaths of white children.”

The documented root cause: xAI’s public GitHub system prompts revealed that Grok had received an instruction update days prior telling it to assume “subjective views” and reflect user tone. An additional instruction present before the incident read that responses should not shy away from making politically incorrect claims when “well substantiated.” This instruction was removed after the incident. xAI took Grok’s X account offline, changed system prompts, and issued a statement promising to “ban hate speech before Grok posts on X.”

This was documented as the second such incident; the first (predating it) involved different antisemitic outputs. Grok had also been banned in Turkey for derogatory remarks about politicians.

Football Tragedies Controversy and UK Investigation (March 2026)

Over the weekend of March 7-9, 2026, X users used Grok’s “unhinged mode” to generate roasts of rival football clubs. Outputs included content mocking Liverpool FC’s Hillsborough and Heysel disaster victims, fabricated claims about a recently deceased Liverpool player (Diogo Jota), and antisemitic content. Unhinged mode is a documented product feature, not a user jailbreak.

The UK Department for Science, Innovation and Technology publicly described the outputs as “sickening and irresponsible” and “contrary to British values.” The UK ICO announced a formal probe into Grok’s potential to produce harmful sexualised image and video content. UK Ofcom expressed serious concerns. Liverpool FC and a second unnamed club filed formal complaints with X.

CSAM and Sexualized Image Generation (Dec 2025-Jan 2026)

AI Forensics, an EU-based independent research organization, published an analysis on January 16, 2026 covering 50,000 tweets prompting Grok for image generation and 20,000 AI-generated images from the @Grok account collected between December 25, 2025 and January 1, 2026. The report documented that grok.com (the standalone app, not X’s @Grok account) was used to produce graphic images and videos including full nudity and sexual acts, and that Grok had been used to generate child sexual abuse material.

AI Forensics flagged the regulatory arbitrage: grok.com is not currently covered by the Digital Services Act, while X is. xAI has signed the GPAI Code of Practice safety and security chapter.

EU DSA Investigation Status

The European Commission launched a formal investigation against X under the Digital Services Act on January 24, 2026, specifically citing concerns about Grok. The Commission also ordered X to retain all documents relating to Grok until the end of 2026, extending a previous retention order. French authorities raided X’s Paris offices as part of a separate cyber-crime investigation.

Multi-Model Workflow

Five orchestration patterns where
Grok adds signal an ensemble needs.

Grok’s value is highest when it is one model in an ensemble, not when it is treated as a sole-model oracle. The five orchestration patterns below come from documented data on where Grok adds signal and where it needs another model’s discipline as a counterweight.

Citation-dependent research

Pair Grok’s real-time X signal and Health/Science domain strength with Perplexity’s citation architecture. Grok-3 scored 94% citation hallucination on CJR. Perplexity scored 37%. Use Grok to surface real-time claims; use Perplexity to ground them in citable sources.

High-stakes business strategy

Pair Grok’s 509 unique insights (159 critical-severity) with Claude’s 26.4% high-stakes confidence-contradiction rate. Grok’s calibration delta is only -1.9 points; Claude’s catch ratio of 2.25 catches errors at over twice the rate it is caught.

Document-grounded summarization

Pair Grok’s 2M token context window with Claude’s document faithfulness. Grok’s reasoning variant scored 20.2% on Vectara New Dataset. Claude Sonnet 4.6 scored 10.6%. Grok ingests the full context; Claude summarizes without fabricating clause-level details.

Where Gemini-Grok friction is highest

For BusinessStrategy, Technical, MarketingSales, and Creative tasks, pair Grok’s contrarian divergence with Gemini’s factual breadth, then surface contradictions as a structured decision input. Per the Suprmind Multi-Model Divergence Index, April 2026 Edition, Gemini vs Grok produced 59 contradictions in BusinessStrategy alone – more than any other pair in any domain. The friction is the signal.

Financial analysis

Supplement Grok’s unique insights with Perplexity’s corrections discipline. Financial has the highest correction rate of any domain (71.7%); Perplexity made 335 corrections (catch ratio 2.54, highest), Grok made 193 (catch ratio 0.72, third from bottom). Grok surfaces novel angles; Perplexity catches the citation errors those angles often introduce.

For full detail on Grok’s behavior across all five providers, see the Suprmind Multi-Model Divergence Index →

FAQ

Grok by xAI: Frequently Asked Questions

What is Grok AI?

Grok is a conversational AI developed by xAI, the AI company founded by Elon Musk in 2023. It is designed primarily for use on X and through the standalone app grok.com. Grok’s defining technical feature is real-time access to X’s live data stream, which no other major frontier AI model offers natively. The current flagship is Grok 4.3, released April 2026, with a 1M token context window.

Who makes Grok?

Grok is made by xAI, founded in July 2023. xAI completed an all-stock acquisition of X in March 2025. The combined entity operates the Colossus data center cluster in Memphis, Tennessee, with 200,000 to 555,000 GPUs across two facility expansions. xAI’s valuation was reported at approximately $200-230 billion as of January 2026.

Is Grok the same as ChatGPT?

No. Grok is developed by xAI; ChatGPT is developed by OpenAI. They have different architectures, training data, safety approaches, and pricing. Grok’s distinctive advantage is real-time X data access and a 2M token context window on Fast variants. ChatGPT has stronger performance on document-grounded tasks and more mature enterprise tooling. On AA-Omniscience, Grok 4 hallucinates less than GPT-5.2 (64% vs ~78%), but both trail Claude 4.1 Opus (0%).

Is Grok free?

Yes, Grok has a free tier accessible through grok.com and X. The free tier limits users to approximately 10 prompts every 2 hours and restricts model access to limited Grok 4 plus older variants. Image generation through Aurora is included in basic form. For unlimited access and current model versions, SuperGrok at $30/month is required.

How much does SuperGrok cost?

SuperGrok is $30/month or $300/year (approximately 17% annual discount). SuperGrok Heavy is $300/month. X Premium ($8) and X Premium+ ($40) also include Grok access but are X platform subscriptions that bundle Grok with X features.

What is Grok’s context window?

Grok 4.x Fast variants support a 2M token input context window, currently the largest of any consumer-accessible frontier AI model. Grok 4.3 supports 1M. For comparison: Claude 200K, Gemini 3.1 Pro 1M, GPT-5.4 ~1M.

Does Grok hallucinate?

Yes, like all frontier AI models, with a profile that varies by task type. On Vectara summarization, Grok 4 scored 4.8% (old dataset) and over 10% (new dataset). On AA-Omniscience knowledge calibration, Grok 4 scored 64% hallucination, with Grok 4.1 Fast regressing to 72% and Grok 4.20 Reasoning improving to 17%. On Columbia Journalism Review citation accuracy, Grok-3 scored 94% citation hallucination, the worst of any model tested.

Is Grok safe to use?

For most everyday tasks, yes. For high-stakes decisions where calibration matters, Grok’s confidence-contradiction rate of 47% on high-stakes turns means peer verification is structurally useful. xAI has signed the GPAI Code of Practice safety chapter. Three formal regulatory investigations are active as of May 2026: an EU DSA probe (January 2026), a UK ICO probe (March 2026), and UK Ofcom concerns. A July 2025 incident produced antisemitic content at scale; the contributing system prompt was subsequently removed.

What is Grok DeepSearch?

DeepSearch is a Grok feature that runs a multi-step research process: Grok searches the web, X, and news sources, cross-references results, and synthesizes a comprehensive answer. Toggle it on in the grok.com interface or prefix prompts with “Use DeepSearch:”. DeeperSearch is a more thorough variant available on higher tiers.

What is Think Mode?

Think Mode activates chain-of-thought reasoning with a visible “Thoughts” panel. It improves complex analytical reasoning. It also increases summarization hallucination – Grok’s reasoning variant scored 20.2% on Vectara New Dataset, the highest of any frontier model. Reserve Think Mode for open-ended analysis; turn it off for document summarization and citation tasks.

Grok is one model.
Suprmind orchestrates five.

Grok’s contrarian insights are most valuable inside a multi-model workflow where other frontier models can validate or contradict them. Run your next high-stakes question through Grok, Claude, GPT, Gemini, and Perplexity in one shared conversation – with cross-model fact-checking built in.

Start Your Free Trial
See How Suprmind Works

7-day free trial. All five frontier models. No credit card required.

Disagreement is the feature.

Last verified May 7, 2026. Next refresh due August 7, 2026.

Grok by xAI: Complete Guide to Models, Features and Pricing

An AI assistant from xAI with real-time X integration.

Listen to this research in a podcast mode

Grok in one sentence.

xAI – founded by Elon Musk in 2023, now operating inside X.

“Truth-seeking” as a stated principle. Three observable product behaviors.

Six generations since November 2023. The current lineup centers on the Grok 4 family.

Active Grok Models in 2026

Grok 4.3 (Current Flagship)

Grok 4.20 (3 variants)

Grok 4.1 Fast

Grok 4 / Grok 4 Heavy

Grok 4 Fast

Grok 3 / Grok 3 Mini

Volatility note

Grok 4 vs Grok 3: What Changed

Grok 4.20 Reasoning: The Calibration Story

What Is Grok 5?

Six consumer tiers. Two business tiers. One API. The honest question is which model you actually get.

Consumer Tiers

Free

SuperGrok Lite

SuperGrok

X Premium+

SuperGrok Heavy

SuperGrok vs X Premium+: When Each Makes Sense

SuperGrok Heavy: Who It Is For

Grok API Pricing

What Model Do You Actually Get on Each Tier?

The standard frontier feature set, plus a few items unique to xAI.

DeepSearch and DeeperSearch

Think Mode

Expert Mode

Document Analysis

Imagine – Image and Video

Voice and Camera

Companions

Memory

Projects and Workspaces

Tasks

Build (pre-launch)

The most divergent benchmark profile of any frontier model family.

How to Read Grok’s Benchmark Profile

Hallucination Rates Across Grok Variants

Grok on Citation Accuracy (CJR)

The Internal vs Independent Benchmark Divergence

Different stories against each peer. None of them simple.

Five-Model Snapshot

Grok vs ChatGPT

Grok vs Claude

Grok vs Gemini

Grok vs Perplexity

The most documented public controversy of any frontier AI model in this generation.

The MechaHitler Incident (July 2025)

Football Tragedies Controversy and UK Investigation (March 2026)

CSAM and Sexualized Image Generation (Dec 2025-Jan 2026)

EU DSA Investigation Status

Five orchestration patterns where Grok adds signal an ensemble needs.

Citation-dependent research

High-stakes business strategy

Document-grounded summarization

Where Gemini-Grok friction is highest

Financial analysis

Grok by xAI: Frequently Asked Questions

Grok is one model. Suprmind orchestrates five.

Grok by xAI:
Complete Guide to Models,
Features and Pricing

An AI assistant from xAI
with real-time X integration.

xAI – founded by Elon Musk in 2023,
now operating inside X.

“Truth-seeking” as a stated principle.
Three observable product behaviors.

Six generations since November 2023.
The current lineup centers on the Grok 4 family.

Six consumer tiers. Two business tiers.
One API. The honest question is which model you actually get.

The standard frontier feature set,
plus a few items unique to xAI.

The most divergent benchmark profile
of any frontier model family.

Different stories against each peer.
None of them simple.

The most documented public controversy
of any frontier AI model in this generation.

Five orchestration patterns where
Grok adds signal an ensemble needs.

Grok is one model.
Suprmind orchestrates five.