Skip to main content
Back to Blog
Sarah Young

Which AI Should You Trust for the World Cup?

The favorite was unanimous. The fine print was another story.

A soccer ball hitting the back of the goal net and scoring a goal
Photo by Chaos Soccer Gear on Unsplash / Styling by Gemini

Whether you’re a lifelong lover of the game or one of the millions this tournament has turned into a brand-new soccer fan, your first stop for a quick answer about the 2026 FIFA World Cup might be a chatbot. Why is everyone wearing pink cleats? What’s offside again? Is Pulisic playing today? For someone just dipping their toes into the beautiful game, whatever AI says back might feel like the whole story.

So we put their knowledge to the test. We asked the four chatbots a casual user is most likely to open — the default tiers of ChatGPT (free tier), Gemini (3.5 Flash), Claude (Sonnet 4.6), and Grok (4.3) — over 300 carefully crafted World Cup questions, and ran every answer through our factual accuracy evaluator.

In the end, we evaluated over 26,000 checkable facts generated by four models across 1,240 responses, all from a single early-morning snapshot last Thursday after all 48 teams had played exactly once.

The question we’re all asking

We led with the one top of mind for every fan: Based on where things stand right now, which team do you think will win?

All four said France. That’s the bookmakers’ favorite, so it’s hard to call any of the answers baseless. But the broader responses are where it gets interesting.

ChatGPT was tidy and well-sourced, hedging that “the difference between France, Spain, England, Argentina, and Brazil is small enough that I wouldn’t be shocked if any of them lifted the trophy.” Grok was concise, linked every source, and added the most concrete caveat in the group: “it’s early — group stage results, injuries, and knockouts can shift things quickly.” Gemini gave the most structured answer, a ranked tier-list of contenders, and singled out a challenger to its ultimate pick, saying England’s “ceiling might be the highest in the tournament.”Finally Claude, the most eager, opened with a flag-strewn odds board and the cheerful announcement that the tournament “kicked off on June 11, 2025” – a full year off, for an event it was in the middle of describing as underway.

But this whole line of questioning was speculative; what happens when actual answers do exist?

Can AI be trusted for World Cup team info?

This is the layer of questions beneath idle speculation: Who is the keeper, who are their best players? It’s also the information that changes fastest. Squads are finalized days before kickoff, injuries can take any player out on a minute’s notice, and stats move every night. So it’s where it would make sense for the models to struggle, and they do. But not how we expected.

We assumed the LLMs would be sharp on the giants and shaky on the little guys simply because so much more readily-scrapable information exists about the former, but the error rate was within about a point for Brazil and for Haiti. Fame wasn’t the fault line; the kind of fact was.

We noticed two types of errors. The first was when the model would confidently misstate certain (and sometimes critical) details within otherwise factual descriptions.

Asked for notable players on each team, Claude slotted Portugal’s Bernardo Silva under the Danish flag: “🇩🇰 Denmark | Bernardo Silva (Man City)… six Premier League winners medals.” This was a detailed, confident, and accurate player bio pinned to the wrong country; it’s worth noting that Denmark didn’t even qualify for the 2026 World Cup. It correctly referenced Erling Haaland’s opening goals against Iraq, but labeled him “the first Norwegian to score at a World Cup” when 5 had done so previously.

Along similar lines, Gemini had Lamine Yamal “donning Spain’s iconic No. 10 jersey” – that’s actually Dani Olmo. Lamine wears 19 for the Spanish National team, although he was upgraded to 10 for FC Barcelona last season. ChatGPT listed Eberechi Eze and Noni Madueke, who’d just won the Premier League with North London’s Arsenal FC, among the players “left out” of England’s squad. Both made it.

The second big issue: Many numbers ran a half-step stale from Thursday’s true stats. Claude credited Harry Kane with 79 goals for England (he’d reached 81 the day before); it also labeled Messi “three shy” of the World Cup scoring record he’d tied with a hat-trick hours earlier.

Does it know how the modern game works?

Ask how the tournament is structured, and the models are mostly excellent, covering the process of qualifying, the way groups are set, and how the bracket flows. Tournament structure was ChatGPT’s and Grok’s strongest category by far; for each, more than three-quarters of answers were rated completely true.

Where the models tend to err up on mechanics was wherever the facts are fresh. This year’s World Cups the first with 48 teams, and the first that will go to round 32. Claude described the knockouts as “single-elimination from the Round of 16 onward,” a round behind when it will actually begin this year. The same lag showed up on rules that changed off the field: Claude told users each team gets “3 substitutions” (it’s been 5 in most leagues since 2022) and that a goalkeeper “must release the ball within 6 seconds” (raised to 8 last year), signing off that this is what makes “soccer the strategic and exciting sport it is!”

The verdict

No model wins outright; they win and lose in different places.

A chart showing factual accuracy scores for Grok, GPT, Gemini, and Claude. Clade is the least accurate, Grok and GPT are the most.

There are two ways to crown the winner, and they disagree. If you look at all claims a model made across all answers, Grok wins. Only 3.2% of its claims were false. If you look at how many answers are error free, ChatGPT wins, at 62%. But a clean average is no guard against a clean miss — the same Grok we’re “crowning” flatly stated, with a citation, that the 2026 World Cup “has no third-place match” (it does, scheduled for July 18).

The deeper pattern is volume: Claude made the most claims per answer, finished last on every measure, and by itself accounts for 44% of every false claim we found. ChatGPT, the model with the most clean answers, also made the fewest claims. Say less, be wrong less, it seems.

What this is actually about

The World Cup is fun, and mostly harmless to get wrong. Nobody really suffers because a chatbot moved Bernardo Silva to Denmark (except for maybe Portugal’s midfield). But the shape of these failures is not harmless, because it repeats everywhere.

Across all four models, only about 1 in 20 claims were false, yet nearly half of all responses contained at least one error. For Claude, it was two out of three. And the errors clustered around the things that are specific and fast-moving: the roster updated last week, the stat that changed last night, the rule that changed last season.

The models are fluent narrators of a world that’s already a little out of date, but this disclaimer is rarely provided. And that’s not just a soccer problem; it’s a news in general problem. Elections, current events, developing stories are exactly where people are starting to turn to AI.

So which AI should you trust for the World Cup? For the gist — who’s favored, how the tournament works — any of them. For the specifics, none. The wrong answers arrive in the same confident voice as the right ones, and no headline accuracy number will pick them out for you. When the subject is soccer, that’s a relatively harmless lesson to learn.

When the stakes are real, mapping exactly where these systems break — with people who actually understand the nuances — is the work Forum AI exists to do, long after the final whistle.