The tournament kicks off on June 11, featuring 48 teams and 104 matches, along with the usual flood of bold opinions. My goal was to create a prediction I could truly stand behind—not just an impressive machine learning model with good-looking outcomes, but one where every figure stems from a clear assumption I could explain and debate.
This article walks through that prediction from the ground up. The approach is intentionally straightforward: assign a rating to each team, translate every matchup into a goal-scoring distribution, and then run the entire tournament simulation tens of thousands of times.
While this might sound specific to football, nearly everything covered here—from the method to how we interpret the findings—applies broadly to data science. Replace “teams” with sales representatives, shipment timelines, server demand, or customer segments, and these same three stages give you a defendable forecast instead of a single optimistic number.
The real skill worth taking away is designing a workflow where every output can be traced back to a deliberate assumption you’re willing to challenge, rather than one concealed inside a black-box machine learning system.
For our football project, this means no GPS tracking data, no neural networks, nothing you couldn’t rebuild in a single afternoon. But stick with me! The value isn’t in complexity. It’s about having an open, transparent workflow that makes you face the exact modeling decisions that black-box systems quietly hide away. We’ll construct our model step by step and question the assumptions at every stage.
Step 1: Rate every team with Elo
To forecast a match, you first need a number representing how strong each team is. The most accessible, ready-made option for national teams is the World Football Elo rating, a spin on Arpad Elo’s original chess ranking system.
Elo relies on a single self-adjusting formula. Every team holds a rating R. Before a match, the expected result for team A against team B (measured on a 0–1 scale, where 1 means a win) follows a logistic curve based on the gap in their ratings:
E_A = 1 / (1 + 10^(-(R_A - R_B) / 400))Once the match is played, you shift the rating slightly toward what actually occurred:
R_A' = R_A + K * (S_A - E_A),where S_A is the actual outcome (1 for a win, 0.5 for a draw, 0 for a loss) and K determines how quickly ratings respond to new results. The football-specific twist introduces two important details: K adjusts based on the victory margin (a 4–0 win shifts ratings more than a narrow 1–0), and competitive fixtures carry more weight than friendlies. The number 400 is simply a scaling constant—what makes a 400-point gap equate to roughly a 10-to-1 favorite (E ≈ 0.91).
For this model, all we need are the current ratings, stored as a simple dictionary. I’m pulling them from a pre-tournament snapshot in early June 2026, sourced from a freely available Kaggle dataset that compiles these ratings:
# World Football Elo Ratings, pre-tournament snapshot (early June 2026).
# Source: "2026 FIFA World Cup — Historical Elo Ratings" (Kaggle, CC BY-SA 4.0),
# compiling data from World Football Elo Ratings (eloratings.net).
ELO = {
"Spain": 2155, "Argentina": 2113, "France": 2062,
"England": 2020, "Brazil": 1988, "Portugal": 1984,
"Colombia": 1977, "Netherlands": 1944, "Germany": 1925,
# ... all 48 qualified teams
}Assumption check: Elo squeezes everything—recent form, roster quality, fatigue levels—into a single number and treats a team’s strength as roughly stable over a short window. It’s a significant simplification, but it’s a transparent and auditable one, and as a single-input feature, Elo is tough to outperform.
Step 2: Turn a rating gap into a goal distribution
A rating difference gives us a win likelihood, but to run a full tournament simulation we need actual scorelines—these determine goal difference, group tiebreakers, and the overall feel of the results. The go-to approach in soccer analytics is to model each team’s goals using a Poisson process.
The Poisson distribution describes the chance of seeing k events when events happen independently at a steady average rate λ:
P(k goals) = λ^k * e^(-λ) / k!Goals fit this framework well in practice: they’re discrete, relatively infrequent, and roughly independent of one another within a single match. If we treat each team’s goal count as a separate Poisson draw with averages λ_home and λ_away, the full scoreline distribution is simply the outer product of their two probability functions, and we can extract win/draw/loss probabilities by summing the relevant cells:
from scipy.stats import poisson
import numpy as np
def match_probs(lam_home, lam_away, max_goals=10):
h = poisson.pmf(np.arange(max_goals + 1), lam_home)
a = poisson.pmf(np.arange(max_goals + 1), lam_away)
grid = np.outer(h, a) # grid[i, j] = P(home i, away j)
p_home = np.tril(grid, -1).sum() # home goals > away goals
p_draw = np.trace(grid)
p_away = np.triu(grid, 1).sum()
return p_home, p_draw, p_awayAssumption check: the independence assumption is handy but imperfect—real scorelines exhibit some correlation and tend to produce more low-scoring draws (0–0, 1–1) than pure independence would suggest. The standard remedy is the Dixon–Coles adjustment, which introduces a correction for low-scoring outcomes and applies time-decay weighting to older matches. We’re omitting it here to keep things simple; it’s an easy upgrade and precisely the sort of refinement my upcoming book’s Poisson chapter walks through.
Step 3: Connect ratings to goals
We need λ_home and λ_away as functions of the Elo gap. A well-established rule of thumb in soccer modeling is that roughly a 400-point Elo edge translates to about one extra goal of expected margin. So we take a baseline of approximately 2.7 total goals (a typical figure for international matches) and divide it between the two teams based on their rating difference:
GOALS_BASE = 2.7
GOALS_PER_400_ELO = 1.0
def lambdas(elo_a, elo_b):
diff = (elo_a - elo_b) / 400.0 * GOALS_PER_400_ELO
la = max(0.15, GOALS_BASE / 2 + diff / 2)
lb = max(0.15, GOALS_BASE / 2 - diff / 2)
return la, lbThe 0.15 floor prevents even the weakest team from receiving an unrealistic negative scoring rate. A more rigorous approach would fit log(λ) = β₀ + β₁·Δrating as a Poisson generalized linear model using historical match data; the linear-supremacy heuristic above is the quick mental-math equivalent and lands in the same neighborhood for most favorites.
Step 4: Simulate the tournament 10,000 times
One simulation run is not a forecast—it’s just one possible version of 2026. The real forecast is the distribution that emerges across thousands of simulations. So we play through the entire bracket over and over, tracking how often each team wins.
The 2026 format is new and worth spelling out clearly: 48 teams split into 12 groups of four, where the top two from each group plus the eight best third-placed teams advance to a 32-team single-elimination knockout stage.
That third-place advancement rule is quite a combinatorial challenge to handle.
There’s always uncertainty about which team advances because you can’t determine the progression until all the group stages are complete. This is why the simulation keeps track of points and goal differences for all four teams in each group, compares the third-placed teams across different groups, and selects the eight best ones. When it comes to knockout rounds where matches are tied and move on to penalties, these are treated as something close to a coin flip — though with just a slight tilt towards the stronger team.
N = 10_000
title = {t: 0 for t in ELO}
for _ in range(N):
champion = simulate_one_tournament() # groups -> R32 -> ... -> final
title[champion] += 1
probs = {t: title[t] / N for t in ELO}Run it 10,000 times? Why? Because every simulated probability is just an estimate, and it comes with a margin of error. If you calculate a title probability p by running N separate tournaments, the standard error works out to sqrt(p(1-p)/N). Take a team with a 15% chance — that’s roughly 0.36 percentage points at N = 10,000. The result is stable enough that the rankings won’t jump around if you run it again. But cut it down to N = 500, and that standard error jumps up to about 1.6 points — plenty of room to jumble up the mid-table finishes. By vectorizing the simulation (running all N tournaments at once as array-based calculations instead of a Python for loop), you can push past 20,000 runs without any noticeable slowdown.
What the Model Predicts
| Team | Win probability |
|---|---|
| Spain | 16.0% |
| Argentina | 11.9% |
| France | 7.9% |
| England | 7.0% |
| Brazil | 5.4% |
| Netherlands | 4.7% |
| Portugal | 4.3% |
| Germany | 3.7% |
Table 1: Projected World Cup Winners According to the Model. Source: author.
A couple of things jump off the page. First, the top favorite only hovers around 15%, not 50%. Even the strongest squad on the planet is still far more likely not to lift the trophy in a 48-team knockout format. That’s what happens when you stack up the randomness of a low-scoring game — modeled through Poisson variance — across seven do-or-die matches.
Second, these figures line up pretty closely with predictions from much more sophisticated models — the type that lean on years of historical match data and dozens of variables. The fact that they agree is reassuring: a straightforward Elo-and-Poisson framework manages to capture most of what a heavyweight forecasting setup produces. At the end of the day, both approaches are doing fundamentally the same thing — converting team strength ratings into win probabilities.
Where It Works Well, and Where It Falls Short
This model doesn’t pretend to be anything more than it is. Every simplification is an explicit knob you can adjust:
- Neutral venues assumed. Every match is treated as being played at a neutral site — no bonus is given to the hosts (the USA, Mexico, or Canada). Slapping on a home-field advantage adjustment (roughly +50 to +100 Elo points, historically translating to about a third of a goal) would take just a single line of code.
- Fixed ratings throughout. Elo scores are locked in before kickoff and don’t shift as the tournament progresses. Updating them after every round would tighten up predictions for the later stages.
- Independent Poisson scoring. No Dixon–Coles adjustment for low-scoring draws, no built-in mechanism to boost draw probabilities.
- Simplified seeded bracket. A generic seeded knockout structure is used instead of FIFA’s specific Round-of-32 pairing map. For the frontrunners’ overall title chances, this barely registers, but when you care about a team’s specific route through the bracket, it does matter.
Each of these topics is covered in depth in a chapter of the book I co-wrote, Soccer Analytics with Machine Learning (O’Reilly, 2026): the Poisson goal model and its refinements in Chapter 6, team rating systems in Chapter 8, and converting probabilities into smarter betting decisions in Chapter 9. Think of this article as a streamlined version of that whole pipeline — one you can get up and running in an afternoon.
Give It a Go Yourself
The book’s GitHub repository is packed with extra examples. Fork it, plug in the latest Elo numbers, and you’ll have your own World Cup projections cooked up before you could even type out a prompt for Claude.
In a follow-up piece, I’ll walk through rebuilding this entire pipeline with eleven different models trained on real match outcomes — and we’ll see FIFA hand the trophy to four different winners.
For now, the model’s pick is Spain. The tournament kicks off June 11. Let’s see how it plays out.
Ari Joury is a co-author of Soccer Analytics with Machine Learning (O’Reilly, 2026).



