When I entered the tech world, there was one phrase everyone seemed to know, regardless of age or background:
“Correlation does not equal causation.”
It’s a catchy line you’ve probably quoted before, and you’ve likely given it an approving nod when someone else brought it up. It tends to come up most often with datasets that have nothing to do with each other, yet people love implying a cause-and-effect link as a joke or curiosity!
Here are two strikingly fun facts:
- Countries where people consume more pizza tend to have higher math test scores.
- As the number of sunglasses sold goes up, the number of shark attacks also tends to rise.
Now, if those were the only pieces of data you had… what would you jump to conclude?
Does eating pizza improve your math skills? Will purchasing sunglasses somehow trigger a shark attack?
As amusing as it is to imagine, the honest answer is “almost certainly not.”
Still, these are genuine examples of something real: Correlation.
The real question we should ask is: if correlation doesn’t equal causation, then what actually is it?
That’s where confusion often sets in.
Because correlation often gets treated as a vague notion, many people think of it as a loose idea like “These two things are somehow related” or “They sort of follow each other.” But correlation is not just a hunch; it’s a precise, numeric measure showing how two variables vary together.
Instead of just repeating the classic warning, we’re going to actually explore the meaning behind it. Once you grasp it, those oddball examples stop feeling surprising and start making logical sense.
So let’s dive in!
What Does Correlation Mean?
When people say two variables are “correlated,” they usually mean one of three things:
- “Those two things appear connected.”
- “Those two things tend to change together.”
- “There seems to be some sort of relationship.”
At a basic level, all three of these impressions are understandable, but they’re missing important detail.
Correlation is not a feeling. It’s a measurement. And like any measurement, it answers a specific question clearly.
To see this clearly, consider gathering data on the number of hours students studied and the scores they received on a test.
After plotting it, you might see something like this:
Each dot represents a student. The horizontal axis shows hours studied, the vertical axis shows their score.
Looking at the chart, you can see that the dots generally rise from left to right. You might then say: “As study time increases, scores tend to go up as well”—which we refer to as a positive correlation.
But is that just a visual impression, or is the data revealing something more solid?
In this case, the pattern you’re seeing means: whenever one variable is above its average, the other tends to be above its average as well.
That’s the key idea most people overlook: correlation isn’t about raw numbers—it’s about how each variable moves in relation to its own average.
So, the core question correlation answers is:
Do two variables consistently move together in a predictable way?
That question leads to one of three possible outcomes:
- Both rise together → positive correlation
- One rises while the other falls → negative correlation
- No consistent movement → no correlation
Understanding the Math Behind It
To make the idea of correlation easier to grasp, let’s look at the Pearson correlation coefficient, defined as:
I realize that formula doesn’t scream “intuitive” at first glance… But stay with me—we’ll break it down without making it feel like a textbook.
Step 1: Covariance — Do They Move Together?
Covariance checks how two variables deviate from their respective averages. If both are above average at the same time, covariance is positive; if one is high while the other is low, it’s negative.
In simple terms, covariance tells us: “Are these variables aligned in how they differ from their own averages?”
Step 2: Normalize It
On its own, covariance is hard to compare because it depends on the units of measurement. To fix that, we divide by the standard deviations: and . This rescales everything into a clean, universal range: from –1 to 1. That gives us a standardized basis for comparison.
With these two steps in place, we can compute the Pearson coefficient! The result tells us:
- +1 → a perfect positive relationship.
- 0 → no linear relationship.
- –1 → a perfect negative relationship.
This number simply captures how consistently two variables move together—not how large they are, but how aligned their patterns are.
What Different Correlations Look Like

- Left: strong positive correlation → clear upward trend
- Center: no correlation → points scattered randomly
- Right: strong negative correlation → clear downward trend
Correlation measures how consistently two variables move together—not merely whether they’re related or not.
What Correlation Actually Reveals
Correlation tells you: these variables shift together in a predictable, organized way. It signals that there’s a meaningful pattern worth paying attention to.
However, it does NOT tell you why or how they move together—or whether one actually causes the other.
A classic illustration of this is the link between ice cream sales and drowning incidents.
In fact, if you plot the number of ice cream sales against drowning incidents, you’ll see:

There’s an obvious upward trend connecting these two… does more ice cream actually lead to more drownings?
Of course not—that would be misleading. The real explanation lies in temperature: hot weather drives up ice cream sales, increases beach visits, and results in more people swimming.
So even though the correlation itself is undeniable, the underlying reason remains hidden.
Correlation and Curved Relationships
Consider this mathematical relationship:
y = x²

This is clearly a strong and consistent relationship—as x increases or decreases, y rises! But if you calculate the correlation:
np.corrcoef(x, y)[0,1]
The result is nearly zero.
That’s because correlation only evaluates how well a straight line fits the data. This is a critical limitation. If the true relationship follows a curve, correlation can completely miss it—even when the connection is strong and real.
So rather than thinking “Correlation equals relationship,” it’s more accurate to say: “Correlation measures how well a straight line can describe the relationship.”
Common Misunderstandings
The abstract nature of correlation—and how it’s typically taught—leads to frequent misconceptions. Three widespread errors include:
- Mistaking correlation for causation: Just because two things move together doesn’t mean one causes the other.
- Overlooking hidden factors: A third unseen variable might be influencing both.
- Missing curved or complex patterns: Correlation only detects straight-line trends.
You might now be wondering: if correlation seems so limited, why does it still matter?
Because it’s incredibly valuable as a starting clue. It signals:
“There may be something interesting going on here.”
From that point, you dig deeper. Correlation identifies alignment?further research uncovers the real cause.
Key Takeaway
“Correlation doesn’t imply causation.”?True—but here’s the problem: people hear that and conclude, “Correlation is useless.”?That’s a mistake!
Correlation measures how variables move together, ranging from -1 to 1, capturing linear associations—but it never claims to prove cause and effect.
Correlation isn’t the culprit. We’re the problem—for expecting it to do something it was never designed to do. It’s simply an alert that says:
“Pay attention… this looks worth exploring.”
Now the real work begins—as we investigate what’s truly behind this intriguing pattern.



