Why bother tackling a tricky probability puzzle in my spare time when I could have been mindlessly scrolling through social media? The reason is simple: I want to keep my mind sharp during this era when we can easily offload most of our critical thinking to AI. If you’re reading articles on TDS, chances are we share that same goal.
In this article, we’ll work through an entertaining probability challenge recently shared by one of my favorite YouTubers, 3Blue1Brown. If you haven’t come across his channel yet, I highly recommend checking it out. He specializes in visual, intuitive explanations that will make you question why math is taught any other way.
Setting up the problem
The short video linked below provides the best introduction, but I’ll summarize the setup here as a quick reference.
Picture a box containing several strings. We randomly pick one end of a string, then randomly pick another end. We tie those two ends together. One of two outcomes occurs: (1) the two ends belong to different strings, creating one longer string, or (2) both ends belong to the same string, forming a loop.
When two separate strings are joined, the resulting longer string goes back into the box. When a loop is formed, it’s taken out of the box. This random selection process continues until the box is empty.
The central question is: how many loops should we expect this process to produce? Or, phrased more practically — if we ran this experiment many times, what would the average number of loops be?
Important observations about the problem
Understanding a problem thoroughly is always the first step toward solving it. Beyond grasping the basic mechanics described above, there are several key insights we need to keep in mind.
Observation #1
Each round of random selections involves two random draws — the first and the second. The first draw doesn’t carry much significance. The second draw is what matters, since it determines whether we create a loop or a longer string.
Observation #2
Every round reduces the number of strings in the box by exactly one, regardless of the outcome. If a loop forms, the string that created it is removed. If a longer string is produced, two strings merge into one, cutting the total count by 1.
Observation #3
The total number of draws is not random. Each round removes one string from the box no matter what (observation #2). Since each round involves two draws, the number of rounds equals the number of strings. For instance, with 10 strings, there are 20 random draws across 10 rounds. Keep in mind that the final ’round’ involves just one remaining string and always produces a loop.
Observation #4
This observation builds on the previous three and is the most crucial. When it comes to counting loops, each round of random draws is independent of the rounds before it. This means we can decompose the problem into individual rounds rather than having to analyze the entire sequence as a whole.
It’s worth noting that if we cared about something like the expected circumference of a loop, this observation wouldn’t hold. The lengths of strings (and therefore the circumferences of loops) do depend on what happened in earlier rounds.
With these observations in hand, let’s dive into solving the problem!
The “brute force” approach
Nearly all problems like this (and many real-world ones) have a brute force solution — an approach comparable to digging a swimming pool by hand.
For this puzzle, we could construct a probability tree and manually compute the expected number of loops. Let’s walk through that idea here.

This method is cumbersome but works fine for a small number of strings. In the video, Grant specifically challenges viewers to solve it for 50 strings — which would require a tree with 250 leaves! He did this to nudge his audience away from brute force and toward more elegant solutions.
Let’s see if we can find a smarter approach.
Divide and conquer approach
By carefully analyzing the properties of this random process, we discovered that each round of draws is independent of the others (observation #4). Thanks to this property, we can calculate the expected value for individual draws and then figure out how to combine those individual results to solve the overall problem.
Expected loops from a single draw
We’ve already established that the first random draw isn’t particularly important (observation #1) — everything hinges on the second draw.
Let’s work through a simple example with 4 strings. We perform our first random draw to grab one end (it doesn’t matter which one). For the second draw, we can pick any end in the box except the one we already selected.
With 4 strings in the box, there are 8 ends total. After picking the first end, we can’t choose it again, leaving 7 possible ends. Only one of those 7 will create a loop; the other 6 won’t. The image below illustrates this setup more clearly than a verbal description alone.

So, the probability of forming a loop is 1/7, and the probability of not forming one is 6/7 — giving us an expected value of 1/7 loops (1 × 1/7 + 0 × 6/7).
Now let’s generalize this into a formula using the number of strings as the input. If S represents the number of strings, the total number of ends is 2S (two ends per string). After the first selection, there are 2S − 1 ends to choose from, and only one of those produces a loop. So, the formula for the expected number of loops is 1/(2S − 1).

Merging separate rounds to address the overall problem
With our formula for the expected number of loops in a single round now established, let’s explore how to handle multiple rounds together. Thanks to observation #4 (rounds are independent when it comes to counting loops) and observation #2 (the number of rounds is known in advance), we can simply sum up the expected loops from each individual round. Naturally, we need to adjust the string count as we progress through each round, which we can handle using the summation function.

With the formula ready, wrapping up the challenge is as straightforward as substituting 50 for N, which yields approximately 2.94 loops — and that solves it!
Using a Monte Carlo approach
Since this problem has an exact analytical solution, we technically could have wrapped things up in the previous section. Still, it’s worth discussing how a Monte Carlo simulation could also tackle this problem. While unnecessary for straightforward cases, Monte Carlo methods become invaluable when we introduce additional complexities.
Monte Carlo simulations approximate outcomes by repeatedly running random experiments. In this case, we’d replicate the random drawing process many times and then average the loop counts across all simulations.
The law of large numbers ensures that as we increase the number of simulations, the Monte Carlo estimate converges toward the true expected value. I’ve linked the complete code below — the loop handling the actual simulation is shown here:
from monte_carlo_funcs import create_strings, select_ends, tie_ends
# Execute the Monte Carlo simulation
list_of_circles = []
num_strings = 50
num_simulations = 10000
if __name__ == "__main__":
for _ in range(0, num_simulations):
# set up the simulated starting collection of strings
strings = create_strings(num_strings)
# initialize the circle counter for this run
circle_counter = 0
# keep drawing until all strings are used up
while len(strings) > 0:
end_1, end_2, strings = select_ends(strings)
strings, circle_bool = tie_ends(strings, end_1, end_2)
circle_counter += circle_bool
# record the total circles found in this simulation run
list_of_circles.append(circle_counter)
print(np.mean(list_of_circles))My run produced 2.95 — quite close to the correct answer of 2.94 (results may vary slightly each time). This illustrates a key point: Monte Carlo techniques provide solid approximations, but the trade-off for their flexibility is a loss of exact precision.
Adding complexity to the problem
Let’s take a moment to showcase where Monte Carlo truly excels by introducing a harder variant. Instead of seeking the expected count of loops, what if we wanted the expected average circumference of those loops? This version is significantly more involved because it introduces dependencies across rounds of random draws.
I wasn’t able to derive a closed-form solution for this one (though one might exist). In cases like this — which make up the bulk of real-world problems — Monte Carlo comes to the rescue! We could easily extend the simulation code to track each string’s length and then use those lengths to compute circumferences whenever a loop forms. With Monte Carlo, what would otherwise be a daunting mathematical challenge becomes a fairly manageable programming task.
The key insight is that when a closed-form solution is tough or impossible to obtain, Monte Carlo offers a practical and accessible alternative.
In summary
The ability to deeply analyze a problem and methodically craft a solution has always set strong data scientists apart — and in an era increasingly dominated by generative AI, that skill is becoming even rarer. I found this puzzle to be an enjoyable way to sharpen those abilities.
You’re unlikely to ever need to calculate expected loop counts from a box of strings in professional practice (at least, not likely). But you will regularly face scenarios where the solution isn’t immediately apparent. The habits of deeply understanding the problem, decomposing it into manageable pieces, and deliberately constructing a focused solution are capabilities that carry over directly into real-world data science and analytics work.



