It might seem like your social media feed has an uncanny understanding of your interests.
As you scroll through platforms like Instagram or TikTok, a fascinating pattern emerges: watch one video, and your entire stream quickly fills with nearly identical content. Half a decade ago, this felt almost magical. Today, we casually talk about the algorithm as though it’s some enigmatic force working behind the scenes in a Silicon Valley server room. The reality is far less cinematic, yet far more intriguing.
The algorithm isn’t inherently malicious; it doesn’t deliberately try to radicalize anyone. It’s simply a program computing cosine similarities and weighted averages, attempting to guess what you’ll interact with next. The problem is that whatever content we engage with keeps generating more engagement. And the most reliable way to hold human attention turns out to be the worst way to actually inform people (sensational clickbait, misleading headlines, or far worse).
This article explores how recommendation engines function, why they push us into filter bubbles, and because merely reading about something never quite compares to witnessing it firsthand, we’ll build one from scratch, feed it real news data, and observe how the bubble develops in real time.
The Engagement Machine: How Recommenders Function
Deep down, a social media algorithm acts like a curator sifting through an endless stream of content to deliver the posts you’re most likely to interact with: a tap, a watch, a like, a share, or an angry comment. Its entire operation hinges on one thing: data.
Every small interaction leaves a digital footprint:
- Which posts you pause on, even without clicking
- Which videos you watch, and for how long
- Which profiles you follow, mute, or block
- Which topics you search for late at night
Using machine learning, the algorithm uncovers patterns within this constant flood of behavioral data. Over and over, it asks one question: what keeps this person scrolling? Keep in mind that this is the number one priority for every social media platform: keeping you glued to your screen for as long as possible.
Two foundational techniques power most recommendation engines:
- Collaborative filtering identifies users whose behavior mirrors yours and surfaces content they enjoyed. For instance, if Alice and Bob both loved The Matrix and Inception, and Alice also appreciated Interstellar, the system steers Bob toward Interstellar. Straightforward enough.
- Content-based filtering analyzes the features of content you’ve engaged with and finds material that’s alike. If you regularly watch cooking videos, it recommends more content tagged “cooking”, “recipe”, or “knife skills”, pieces that resemble what you’ve already enjoyed.
Major platforms merge these approaches alongside hundreds of additional signals. At their core, the idea remains unchanged: study your behavior, then predict what else might catch your attention.
The algorithm doesn’t deliberately push harmful or false material. It simply optimizes for engagement surest ways to keep humans locked in is to tap into our emotions, particularly the intense, negative ones. Or sometimes, adorable cat videos.
Building a News Recommender on Real Data
Let’s move past theory and actually construct one. We’ll use real anonymized click records from Microsoft News. The dataset is known as MIND (Microsoft News Dataset), released for academic research by Microsoft Research. This sample includes 50,000 users, over 51,000 English news articles spanning 17 categories (news, sports, finance, lifestyle, health, travel, and more), and 156,000+ real impression sessions, each documenting what a user saw and what they clicked. The entire prototype fits in roughly 30 lines of Python, although you don’t need to know the technical details:
import numpy as np
import pandas as pd
from scipy.sparse import csr_matrix
from sklearn.metrics.pairwise import cosine_similarity
# Build a sparse user × article matrix (1 = clicked, 0 = didn't)
matrix = csr_matrix((np.ones(len(clicks)), (user_rows, article_cols)),
shape=(n_users, n_articles))
def recommend(user_id, matrix, top_n=15, n_neighbors=50):
"""Find 50 most similar users and rank the articles
they clicked that our user hasn't seen yet."""
u = user_idx[user_id]
# Cosine similarity between this user and everyone else
sims = cosine_similarity(matrix[u], matrix).flatten()
sims[u] = 0 # don't recommend to yourself
# Take the top 50 most similar users
top_neighbors = np.argsort(sims)[-n_neighbors:][::-1]
weights = sims[top_neighbors]
# Score articles by weighted sum of neighbour clicks
scores = np.asarray(matrix[top_neighbors].T.dot(weights)).flatten()
# Zero out articles the user already clicked
scores[matrix[u].toarray().flatten() > 0] = 0
# Return the top-scoring articles
top_articles = np.argsort(scores)[-top_n:][::-1]
return top_articlesCosine similarity identifies your fifty nearest neighbors, individuals who click on similar articles as you do. It gathers the articles they clicked, weights each by how closely that neighbor aligns with you, and serves up the fifteen highest-ranking items. This basic mechanism is what fuels a multibillion-dollar industry.
Cosine? What Similarity?
Cosine similarity might sound like a concept from a math textbook, but stick with me, it’s simpler than it seems. To illustrate how it works, let’s take a brief detour.
Picture a few data points spread across two axes: one ranging from mechanical to biological, and another measuring cuteness:
Cosine similarity calculates the angle between two arrows, each originating from the center (0,0) and pointing toward one of our data points. The narrower the angle between them, the more alike the two items are.
Here’s an intuitive way to think about it: when two arrows are nearly pointing the same way, the things they represent have a lot in common. Take cats and dogs as an example. Both rank high on “biological” and high on “cuteness”, so their arrows point in roughly the same direction, and cosine similarity gives a value close to 1 (its maximum).
But when we compare cats to teddy bears, the picture shifts. They might overlap on the cuteness axis but differ sharply on the biological one:

If
We often liken cats to teddy bears. While both score high on the “cuteness” scale, they are worlds apart on the biological spectrum: a cat is a living creature, whereas a teddy bear registers as completely non-biological.
This divergence pushes their respective vectors further apart.As the angle between them grows, the cosine similarity score drops, illustrating that despite sharing one common attribute, these two items inhabit entirely different areas of our conceptual space.
Naturally, when you compare a cat to a car, the similarity is virtually nonexistent, as their vectors point in entirely different directions:

AI systems leverage this mathematical logic to suggest content likely to evoke a similar reaction from you. Picture a two-dimensional plane where one axis represents the emotional impact of a video (calm, amused, outraged) and the other represents its subject matter. Every video is mapped to a specific point within that plane.
Suppose you click on a political video that sparks your anger and watch it to the end. The platform logs both data points: the subject matter and your emotional reaction. By applying cosine similarity, it identifies other videos whose “vector” aligns with that direction (outrage-inducing political content) and queues them up for you. The more you interact, the more precisely the algorithm identifies which segment of that plane keeps you engaged.
Introducing User U92876 (let’s call him Joe): The Die-Hard Sports Fan
I selected a profile from the MIND dataset whose entire reading history revolves around sports: NFL power rankings, NBA trade speculation, MLB suspensions. This person consumed twenty-five articles, every single one about sports.
Let’s see what the recommender system suggests for them:

Here’s how the categories break down:
- 40% sports
- 13% news
- 13% autos
- 34% a mix of various other topics.
The system correctly identifies this user’s passion for sports and reinforces it, yet it also delivers a reasonably balanced mix. Politics, entertainment, lifestyle, and finance all make an appearance. Seems fair, doesn’t it?
But watch what happens next.
A Fleeting Moment of Curiosity
I modeled something far more typical than a deep-dive binge: a brief spark of idle curiosity.
Our sports fanatic didn’t dedicate hours to political reading. Glancing at their initial feed, they simply clicked on three items that piqued their interest:
- The article covering Joe Biden.
- The piece on Mitch McConnell.
- The clip about Trump’s attacks.
Just three clicks within a span of under ten minutes. Three tiny digital crumbs dropped for the algorithm, and then Joe carried on with the rest of his day.
Now, if we fed those clicks through the basic 30-line Python script we built earlier, the impact would be negligible. Mathematically, 25 past sports clicks would still drown out 3 new political ones. The system would still classify this person as 89% sports-oriented, and the feed would remain largely unchanged.
But here lies the critical ingredient powering today’s social media: Recency Weighting (also known as Time Decay).
Real-world algorithms don’t value every click equally. A click from three years ago is practically ancient history, while a click from three minutes ago is pure gold. To keep you hooked during your current session, platforms apply a significant multiplier to your most recent activity.
A single line of code can implement this in the algorithm we discussed. If we decide that the freshest clicks should carry up to 100 times the weight of older ones, we could write something like this:
time_decay_weights = np.array([0.1 if historical_click else 10.0 for click in user_history])With that adjustment, let’s regenerate the recommendations:

Here’s the impact that just 3 clicks had on our time-weighted recommendation engine:


Political content surged from 13% to 40% of the feed. A threefold increase, triggered by a single evening of clicking on just three news stories. Sports — the topic this person had consumed for years — was dethroned from the top spot and pushed to second place. The algorithm didn’t pause to reason: “Wait, this person has 25 sports articles in their history; one evening of politics doesn’t redefine them.”
It doesn’t reason at all. It simply recalculates the time-weighted similarity matrices, identifies a new cluster of neighbors, and serves what other users with similar clicks tended to enjoy.
Two observations stand out:
- The speed. A single evening can completely reshape a user’s feed composition. Actual platforms recalculate even faster than this demo, updating in real time. You’ve likely noticed this with ads for products you searched for just moments ago.
- What gets lost. This isn’t merely about what the algorithm adds — it’s equally about what it takes away. The user’s informational diet didn’t just become more political; it became narrower. And that narrowing is the true danger.
Note: real platforms keep their decay constants under wraps, so this example is illustrative rather than empirical. The mechanism itself is real, and the direction of the effect is what counts. My 100x multiplier may overstate the actual recency bias.
What the evidence shows
You now understand how your clicks shape what the algorithm serves you next.
But here’s the troubling part — content designed to provoke anger, fear, or shock keeps us glued to our screens far more effectively than uplifting or informative material. Social media companies didn’t deliberately set out to exploit this; their algorithms simply figured it out on their own through trial and error.
A major 2025 study that tracked the digital behavior of 25,000 SmartNews users confirmed that humans have an inbuilt “negativity bias” when choosing what news to consume. From an evolutionary standpoint, this makes sense — our ancestors survived by paying close attention to threats and dangers. So what happens when this primal survival mechanism collides with today’s machine learning systems? The study found that personalized news feeds actively magnify our natural negativity bias, turning up the volume on what already captures our attention.
On top of that, researchers who examined hundreds of millions of social media posts on Facebook and X (formerly Twitter) discovered that users are approximately 1.91 times more inclined to share negative headlines than positive ones. It turns out, negativity fuels virality, and this is how the outrage cycle takes hold.
The cognitive toll: It’s not just what you absorb, it’s how your brain changes
The effect of these algorithmic loops goes beyond the kind of content we end up consuming — it reaches into the very way our minds operate. A 2025 systematic review that synthesized findings from 71 individual studies involving 98,299 participants revealed significant cognitive consequences tied to short-form video feeds like TikTok, Instagram Reels, and YouTube Shorts. Spending more time on these endless-scroll platforms correlates with weaker cognitive performance, particularly when it comes to sustained focus and the ability to resist distractions.
Psychologists explain this through two simultaneous processes: habituation and sensitization. The fast-paced, high-stimulation format of short videos gradually dulls our tolerance for slower, more demanding tasks like reading or complex problem-solving. Meanwhile, the algorithm’s constant delivery of finely tuned content heightens our brain’s reward circuitry, reinforcing impulsive habits and training us to chase quick hits of satisfaction.
Frequent heavy users of these platforms show measurably reduced brain activity during tasks that require concentration. Some researchers have even identified physical differences in regions associated with cognitive control — notably the prefrontal cortex and reward circuits in the striatum — that appear linked to this relentless stream of algorithmically curated stimuli.
The cost to society
Thanks to complex mathematical systems, each of us now occupies our own personalized information bubble.
In the short run, it might feel like a minor inconvenience. But step back and the consequences grow darker. When algorithms continuously serve us content that aligns with our existing beliefs, we experience confirmation bias on an industrial scale.
These filter bubbles widen the fractures in our society. The divisions between us will only deepen, and there’s no end to this growing chasm in sight.
Misinformation flourishes inside these closed echo chambers because false claims rarely face scrutiny outside them. By the time a correction circulates, the original falsehood has already made its rounds and converted a loyal following of believers.
And democracy — which relies on a shared foundation of facts and a willingness among citizens to engage in open debate — suffers when people inhabit entirely separate realities.
Taking back control of your feed
You’re not powerless in this situation. The algorithm responds to your behavior, and while the steps below take some effort, they can help you break free from your bubble.
The very mechanism that walled you into an echo chamber can also be used to break you out. Here are some practical strategies:
- Broaden your sources. Make a point of following a handful of voices that challenge your assumptions. If your views lean one direction politically, seek out thoughtful perspectives from the other side.
- Reset from time to time. Wipe your watch history. Tap “Not Interested” on persistent suggestions. Browse the platform while logged out or in incognito mode — notice how dramatically the content shifts without your personal data.
- Switch to chronological feeds. Most platforms still offer the option to view posts in simple time order rather than algorithmic ranking.
- Think before you share. Every like, comment, and share signals to the algorithm “give me more of this.” If a post makes you furious, pause — that’s precisely when the algorithm has the most leverage over you.
- Cut down your usage. Set screen-time boundaries and carve out offline time. The less your information diet depends on an algorithmic feed, the less sway it has over your beliefs.
Breaking beyond the bubble
My goal with this post was to shed light on exactly how these recommendation systems and their bubbles operate. We built a working recommender model using real news data, and it took just three clicks to transform a sports fan’s feed — shifting from 40% sports content to 53% political content.
The first step to liberating yourself is simply awareness. The next time you feel caught up in an online outrage spiral, stop and ask: Why is seeing this? Who stands to gain from my reaction? The answer almost always leads back to an algorithm doing exactly what it was designed to do — and informing you well is rarely part of that design.
Stay informed, stay open-minded,
— Ivo



