In my first TDS article, I explained how to transform a real-world challenge into an integer linear program. A second piece addressed how to make that model resilient to uncertainty. Both built on the same principle: take an ambiguous practical problem, express it as a linear program, and let a solver handle the rest.
Yet at some point, every modeler hits a wall where the LP feels artificially tidy. You feed it a demand figure, a travel time, a wind speed. The model takes those numbers, outputs a solution, and moves on. But the real world they represent—unpredictable, noisy, occasionally astonishing—never really enters the picture.
Stochastic programming takes that gap seriously. Instead of assuming the data is precise, it incorporates uncertainty directly into the model. The cost is slightly more notation; the reward is decisions that survive when reality refuses to play along.
This post offers an introductory overview. We will see why the most obvious approach fails, explore the four standard methods for incorporating uncertainty into a linear program, and end with a quick evaluation of whether the added complexity is justified. There is some mathematics, but it builds on the LP concepts you are already familiar with, just with one additional symbol.
Foundation: a fashion retailer with unpredictable forecasts
To ground the discussion, we’ll use the recurring example from Dr. Ruben van Beesten’s course material (acknowledged in the credits below). Here is the setup:
You operate a fashion company marketing winter apparel in Germany. Manufacturing occurs in Bangladesh, where costs are low but transit is slow, requiring weeks for delivery. Each autumn, you must determine production quantities for the approaching winter season.
Two potential problems emerge: underproducing leads to lost sales; overproducing leaves you holding worthless inventory. The central question is how much to manufacture today, and the answer depends on a factor still unknown: winter demand.
If we briefly set aside uncertainty and treat demand as a known constant, we could formulate a basic LP:
Here x represents your production quantity, c is the per-unit production cost, h is demand, and T is simply the identity matrix (each unit produced fulfills one unit of demand). The constraint ensures production meets or exceeds demand.
This works perfectly if h is truly known. The issue is that demand is not a fixed number—it is a random variable. We will denote it by ξ. An accurate version of the model would then appear as:

At this point we encounter a fundamental problem. What does it mean for x to satisfy a constraint grounded in a random variable? Is x = 100 acceptable if demand could be 80, could be 120, or anything in between? The difficulty is not that the problem is computationally hard—it is poorly defined. The solver has no clear understanding of the actual problem you want solved.
Stochastic programming is, essentially, a set of structured approaches to answering just that question. We will examine the four most widely used methods.
Four approaches to managing uncertainty
Each method converts the ambiguous LP above into a clearly defined optimization problem. They differ in the assumptions they make about your knowledge of uncertainty and in how protective they are against unfavorable outcomes.
1. Robust optimization: brace for the worst
The most protective strategy. You don’t need the complete probability distribution of ξ, only its support, meaning the range of values it might assume. We refer to this range as the uncertainty set, denoted U. Then you ask: what is the best decision that remains feasible regardless of which ξ ∈ U actually materializes?

The constraint must now be satisfied for every possible ξ in the uncertainty set. In our fashion company scenario with U = [0, 10], you would always be preparing for a demand of 10, representing the worst case.
That single observation captures both the power and the limitation of robust optimization. The solution is completely safeguarded, but it is also overly cautious: you will often end up with surplus inventory because you behaved as if the improbable worst case were certain. If you have read my previous article on hardening linear programs, this is the very logic underlying those four steps.
2. chance constraints: softening the worst case
Robust optimization plans for every conceivable outcome. Chance constraints ease that by requiring preparation for most of them. You select a confidence threshold α, such as 95%, and mandate the constraint to be met with at least that probability:

This is termed a joint chance constraint: every element in the constraint vector must be met at the same time, with a combined probability no less than α. A less strict alternative addresses each row individually:

These are called individual chance constraints: each constraint i must hold with probability no less than αᵢ, with no consideration for their combined behavior. Consider this quick question: if you assign every αᵢ the same value as the joint α, which version is more protective?
Answer: the joint formulation. Satisfying all constraints at once is more demanding than satisfying each independently, so the joint version allows a smaller feasible set and results in a higher optimal cost. Regardless of the approach, chance constraints give you a parameter, α, to set your desired level of caution. Set it to 1, and you approach (but do not quite reach) robust optimization. Lower it to 0.5, and you are essentially gambling on feasibility. Most practical implementations settle somewhere in the 0.9 to 0.99 range.
One important caveat deserves mention: chance constraints are generally computationally difficult. The probability expression within the constraint is a nonlinear, frequently non-convex function of x, meaning you typically cannot submit this formulation directly to a standard LP solver. There are manageable special cases (Gaussian noise, certain distribution mixtures, sampling-based approximations), but the general problem remains complex.
That’s far more complex than it initially seems.
3. Two-stage recourse models: act, observe, adapt
The first two methods deal with constraint violations as things to sidestep entirely (robust LP) or risk only to a limited degree (chance-constrained LP). This framing misses the mark at times. In our fashion case, failing to satisfy demand isn’t fatal. It just means a bit of trouble: you can print a rush run locally at added expense, air freight to customers, or simply live with unfulfilled orders and move forward.
This concept—that breaking a constraint isn’t a catastrophe, that you can take corrective measuress down the line—drives the whole recourse model approach. In a two-stage recourse framework, here’s how things unfold:
- Stage 1 (you are here): you select decision x while ξ remains unpredictable.
- Then: ξ becomes clear—the random variable reveals its actual value.
- Stage 2 (down the line): you choose decision y after seeing ξ.
From a mathematical standpoint, the first stage resembles a standard linear program, apart from the fact that the objective includes an expected corrective cost down the line:

The function v(ξ, x) stands for the optimal outcome of the second-stage problem, given your first-stage pick x and the true realization of ξ:

Pause and think through this. The expression h(ξ) − T(ξ)x on the right-hand side captures the gap—the extent to which your first-stage plan fell short once ξ is known. The recourse action y then bridges that gap, costing q(ξ)ᵀy. So the setup is straightforward: you pay the initial cost cᵀx, then add on the expected cost of dealing with whatever surprise ξ throws your way.
That’s the essence of it. Two-stage recourse models show up constantly in real-world use, both because they mirror the actual order of decisions in practical settings (production, inventory, logistics, energy dispatch) as well as because their math stays fairly manageable.
Here are two terms worth knowing for further reading:
- A model has fixed recourse if the recourse matrix W remains constant regardless of ξ. Much of the existing theory only holds under this assumption.
- A model has (relatively) complete recourse if, regardless of what ξ reveals and what x you picked, a feasible recourse decision y always exists. When this breaks down, the second stage can have no feasible solution, which acts as a hidden restriction on the first-stage choice. (This gap is precisely where Benders’ feasibility cuts come in, but that’s another topic.)
4. Multi-stage recourse models: repeating the cycle
Sometimes decisions don’t follow a simple two-step rhythm. You don’t just choose, watch, adjust once—you keep cycling through choose-watch-adjust-choose. Multi-stage recourse models build on the same foundation naturally.
Extending our fashion story, imagine that instead of one big autumn choice, you select three times: cheap production in Bangladesh in the fall, mid-priced production in Romania in early winter, and the priciest production in Germany in late winter. Demand slowly reveals itself across the season, and each stage draws on everything learned so far.
The math gets heavier—you write out recursive value functions Qt, with histories ξ[t] = (ξ₁, …, ξₜ) trailing behind them—but the core idea is unchanged. Each stage nests a recourse problem within the prior one. The best visual is a scenario tree: each node refers to a possible world-state, each edge is a possible outcome for the next random variable, and a scenario traces one root-to-leaf route.

A subtle point worth noting: a scenario covers the full path of ξ through time, not just one snapshot. Realizing that ξ₂ = 10 doesn’t pin down your scenario, since ξ₃ still lies ahead. This becomes essential when formulating the deterministic equivalent (covered next), because every decision can only rely on information available up to that stage. This requirement—called non-anticipativity—means the model must forbid peeking into the future. Without explicit enforcement, it would exploit foreknowledge freely.
How can we actually solve a recourse model?
Up to now the focus has been on writing down models. To solve them, we typically convert them into something an off-the-shelf LP solver can process. The key technique here is the deterministic equivalent formulation.
Imagine ξ follows a discrete distribution with a finite set of outcomes ξ¹, ξ², …, ξˢ (referred to as scenarios), each occurring with probability pₛ. In that case, the expected second-stage cost collapses into a finite sum, and the whole two-stage problem transforms into one large LP by creating a separate copy of y for each scenario:

The result is a standard LP. Large, certainly—S scenarios mean S copies of the second stage—but still an LP. Feed it to HiGHS, Gurobi, CPLEX, or whichever solver you prefer, and it handles the rest.
Two practical questions arise naturally.
First: what happens when ξ’s distribution isn’t discrete? The deterministic equivalent then involves infinitely many scenarios and loses finite dimensions. The go-to fix is sample average approximation: draw S samples from the actual distribution, solve the resulting deterministic equivalent, and increase S until results stabilize. An entire body of research studies how large S must be and what quality guarantees apply.
Second: what if the deterministic equivalent grows too enormous to tackle head-on? That’s where decomposition methods enter. Benders’ decomposition splits things into a master problem over the first-stage variables with a subproblem for each scenario, passing information back and forth iteratively. For multi-stage setups spanning many periods, the corresponding approach is stochastic dual dynamic programming (SDDP), which relies on sampling and approximate value functions to keep the full scenario tree manageable. Both methods are advanced enough to warrant separate treatment.I’ll return to these posts later.
Is all this effort really justified?
That’s a fair concern. Stochastic programs are more complex to set up, more challenging to solve, and take longer to compute than their deterministic counterparts. If your real-world situation isn’t particularly sensitive to uncertainty, a simpler approach — plugging average values into a standard linear program — might be perfectly adequate.
The encouraging part is that you can precisely measure what the stochastic approach is worth. There are two well-established metrics for this, and understanding both is valuable.
Start by defining four key values:

Here’s what each one means: SP is the best possible outcome from the true stochastic program. EV is the result you’d get by substituting the average (expected) value of the uncertain parameter ξ and solving the simpler deterministic version — let’s call its solution x̄. EEV is what it would actually cost to implement that deterministic solution x̄ when you face the real uncertain world. And WS (“wait-and-see”) represents the cost if you could magically see the actual realization of ξ before making your decision — the ideal, cheat-code scenario.
From these four values, you can derive two particularly insightful measures:

VSS stands for the Value of the Stochastic Solution: it tells you how much you’d lose by simply solving the deterministic problem with average values and going with that solution. When VSS is small, the extra effort of stochastic modeling isn’t paying off much — the simpler deterministic approach is fine.
EVPI is the Expected Value of Perfect Information: it measures how much you’d benefit if a crystal ball revealed the true value of ξ before you committed to a decision. A small EVPI means your current forecasts already capture most of what matters — pouring resources into better predictions likely won’t help much. A large EVPI, on the other hand, signals that investing in improved data could be genuinely worthwhile.

These two metrics fit neatly into a chain of inequalities (assuming the uncertainty only affects the right-hand side of your constraints):

Reading left to right: the “cheat-using-the-mean” solution (EV) is at best equal to the “cheat-using-the-actual-value” solution (WS), which is at best equal to the honest stochastic answer (SP), which is at best equal to the “lock-in-the-deterministic-solution-and-deal-with-it” cost (EEV). This chain gives you a handy upper bound on VSS that you can calculate even before solving the full stochastic program: VSS ≤ EEV − EV. If that gap is small, the deterministic shortcut is good enough, and you can skip the extra complexity.
What comes next
This article covered the fundamentals — how to formulate a stochastic program. The natural next question is how to solve large-scale ones efficiently. The two main approaches are:
- Benders’ decomposition — well-suited for two-stage problems, this method breaks the deterministic equivalent into a master problem (involving x) plus a separate subproblem for each scenario, tying them together through cutting planes. It works especially well when you have many scenarios but a relatively small first stage.
- Stochastic Dual Dynamic Programming (SDDP) — designed for multi-stage problems, this approach uses sampling and piecewise-linear approximations of future cost-to-go functions. It’s famously applied in hydropower scheduling, where the scenario tree grows so huge that direct enumeration becomes impractical.
Each of these methods deserves a deep dive of its own. If there’s demand, I’ll cover them in future articles.
Key takeaways
Whenever you’re applying linear programming to a setting where input data carries genuine uncertainty — whether it’s forecasted demand, weather patterns, market prices, travel times, or any other variable — your model is implicitly making a decision about how to handle that uncertainty. “Just use the mean value” is one choice. “Plan for the worst outcome” is another. Stochastic programming gives you the language to make that decision deliberate, and the analytical tools to judge whether your approach was the right one (enter VSS).
Here’s a quick recap of the four main ways to incorporate uncertainty into a linear program:
- Robust optimization — prepare for the worst case within a defined uncertainty set.
- Chance constraints — ensure feasibility holds with at least a specified probability α.
- Two-stage recourse — make a decision, observe what happens, then adjust; the adjustment carries an expected cost.
- Multi-stage recourse — the same decide-obtain-correct logic, applied repeatedly across time via a scenario tree.
And tuck these two metrics away for later reference: VSS (is the stochastic model actually adding value?) and EVPI (would better forecasts make a meaningful difference?).
Most real-world problems don’t come with certainty baked in. The good news is your modeling toolkit doesn’t have to pretend they do either.
Acknowledgements and references
This article draws on lectures delivered by Dr. Ruben van Beesten at the Norwegian University of Science and Technology, from his Stochastic Programming course held in October 2023, which I had the privilege of attending in Trondheim, Norway. The fashion-company example, the four-part taxonomy of formulations, and the entire VSS/EVPI framework are all directly sourced from his slides; any inaccuracies in my interpretation are entirely my own.
The foundational modeling exercise that underpins much of the recourse-model intuition comes from
- Higle, J. L. (2005). Stochastic Programming: Optimization When Uncertainty Matters. In INFORMS TutORials in Operations Research, pp. 30–53.
A few additional resources worth knowing about:
- Kleywegt, A. J., Shapiro, A., and Homem-de-Mello, T. (2002). The sample average approximation method for stochastic discrete optimization. SIAM Journal on Optimization, 12(2), 479–502. The go-to reference for SAA.
- Higle, J. L., and Sen, S. (1991). Stochastic decomposition: an algorithm for two-stage linear programs with recourse. Mathematics of Operations Research, 16(3), 650–669. One of the few methods capable of handling continuous (non-discrete) distributions directly.
And naturally, the two earlier articles in this series: Five questions that will help you model integer linear programs better and Four steps to robustify your linear program.



