In my earlier post, I outlined four key innovations that make ORPilot a production-ready, open-source tool for using large language models in operations research (OR): the interview agent, data collection agent, parameter computation agent, and intermediate representation (IR). Of these, the IR stands out as the most critical—it’s what elevates ORPilot beyond a mere academic prototype and gives it real potential as a production-grade solution. Why? Because it directly addresses two top priorities in any production setting: reproducibility and portability. In this post, I’ll take a detailed look at how ORPilot’s IR is structured.
What Exactly Is an IR?
When people discuss AI-generated optimization models, there’s a glaring gap almost no one addresses: what do you do after solving the model the first time?
Say you’ve built your model, run it, and gotten an optimal result. Three weeks later, you need to rerun it with updated demand figures. Or maybe a colleague on another machine wants to replicate your results. Perhaps your company switches from Gurobi to an open-source solver to cut licensing fees. Or you’d like to explore a “what-if” scenario—like increasing a facility’s capacity by 20%. With most current LLM-based OR tools, the answer to all these situations is the same: start from scratch. You’d have to call the LLM again, pay the API cost again, regenerate the solver-specific code, and hope the model structure stays consistent.
ORPilot offers a better way: its Intermediate Representation (IR). The IR is a solver-independent, strongly typed JSON schema that captures the full mathematical essence of an optimization model—not the code to solve it, but the model itself, expressed in a universal format that doesn’t depend on any specific solver.
ORPilot’s IR is organized into five main sections:
(1) Sets: Named groups of entities—like Workers, Tasks, Plants, or Periods. Each set specifies how its members are defined: whether they come from a CSV file, a fixed count, or a hardcoded list.
(2) Parameters: Numerical data indexed over sets, pulled from CSV files. Each parameter links to its indexing domain and the exact column names needed to load it.
(3) Variables: Decision variables with their type (continuous, binary, integer), domain, bounds, and other structural properties.
(4) Objective: A symbolic expression tree built from variables and parameters—using operations like sums, differences, products, and indexed sums—all in a solver-neutral format.
(5) Constraints: Named constraints, each with a domain, a symbolic expression tree, and a relational sense (<=, =, or >=). Every constraint is a self-contained, fully described object.
To make this clearer, let’s walk through a concrete example: a worker-task assignment problem.
Example: Worker-Task Assignment Problem
Imagine you need to assign four workers to four tasks—one task per worker, one worker per task. Each (worker, task) pairing has an associated cost stored in a CSV file. Your goal is to minimize the total assignment cost. This is a classic integer programming assignment problem.
The input data is stored in two files:
(1) sets.csv (contains all set members):
set_name element
workers w1
workers w2
workers w3
workers w4
tasks t1
tasks t2
tasks t3
tasks t4
(2) assignment_costs.csv (the cost matrix):
worker_id task_id cost
w1 t1 2.0
w1 t2 4.0
… … …
Below is the complete IR for this problem:
{
"problem_class": "AssignmentProblem",
"model_type": "Mixed Integer Program",
"sense": "minimize",
"sets": {
"Workers": {
"size": null,
"index_symbol": "w",
"source": "sets.csv",
"column": "element",
"filter_column": "set_name",
"filter_value": "workers",
"ordered": false
},
"Tasks": {
"size": null,
"index_symbol": "t",
"source": "sets.csv",
"column": "element",
"filter_column": "set_name",
"filter_value": "tasks",
"ordered": false
}
},
"parameters": {
"assignment_cost": {
"domain": ["Workers", "Tasks"],
"type": "float",
"source": "assignment_costs.csv",
"column": "cost",
"index_columns": ["worker_id", "task_id"],
"missing_default": "inf"
}
},
"variables": {
"assign": {
"description": "1 if worker w is assigned to task t, 0 otherwise",
"label": "assignments",
"domain": ["Workers", "Tasks"],
"type": "binary",
"lower_bound": 0,
"upper_bound": 1,
"upper_bound_set": null,
"exclude_diagonal": false,
"domain_filter": null
}
},
"constraints": {
"one_task_per_worker": {
"domain": ["Workers"],
"expression": {
"operation": "indexed_sum",
"over": ["Tasks:t"],
"body": {"type": "variable", "name": "assign", "indices": ["w", "t"]}
},
"sense": "=",
"rhs": {"type": "constant", "value": 1}
},
"one_worker_per_task": {
"domain": ["Tasks"],
"expression": {
"operation": "indexed_sum",
"over": ["Workers:w"],
"body": {"type": "variable", "name": "assign", "indices": ["w", "t"]}
},
"sense": "=",
"rhs": {"type": "constant", "value": 1}
}
},
"objective": {
"sense": "minimize",
"expression": {
"operation": "indexed_sum",
"over": ["Workers:w", "Tasks:t"],
"body": {
"operation": "multiply",
"left": {"type": "parameter", "name": "assignment_cost", "indices": ["w", "t"]},
"right": {"type": "variable", "name": "assign", "indices": ["w", "t"]}
}
}
}
}Now, let’s break down each section and explain the reasoning behind the design choices.
Sets
The sets section defines where the members of each set originate. A key design decision here is the data source convention: ORPilot requires all set members to be stored in a single file named sets.csv, using a two-column format: set_name and element. Every set—whether it represents entities (like workers or tasks) or time periods—is derived by filtering rows from this file. For instance, the Workers set instructs the system to load data from sets.csv, read the element column, and keep only rows where the set_name column equals "workers". At compile time, this yields Workers = ["w1", "w2", "w3", "w4"].
This approach offers two major advantages. First, all master data is centralized—adding a new worker just means appending a row to sets.csv, not editing multiple files. Second, the filter_value is validated against the actual distinct values in sets.csv during IR generation, catching typos early before they lead to empty sets in the solver. The index_symbol (e.g., "w" for Workers, "t" for Tasks) becomes the loop variable in the generated solver code (e.g., for w in Workers, for t in Tasks). It must be chosen carefully to avoid naming conflicts in nested loops (more on the shadow rule later). The ordered flag is false here, but it plays a vital role in time-indexed models, where it enables temporal lag references.
For instance, referencing inventory[t-1] from within a period-t constraint.
Parameters
The parameters section connects external data to the model. The assignment_cost parameter includes six structural components.
(1) domain: ["Workers", "Tasks"] — this parameter is indexed by both sets, creating a two-dimensional table.
(2) type: "float" — the parameter stores floating-point numbers.
(3) source: "assignment_costs.csv" — the specific file (including extension) that contains the data.
(4) column: "cost" — the column in the CSV that holds the numeric values to be loaded.
(5) index_columns: ["worker_id", "task_id"] — the CSV columns used as lookup keys, listed in the same order as the domain sets. This field is one of the most critical parts of the IR. Without it, the compiler cannot determine which CSV columns map to which domain sets. A frequent source of errors in earlier versions was the compiler silently guessing the wrong key column name and loading incorrect data. The IR now requires that the correct column names always be provided explicitly.
(6) missing_default: "inf" — instructs the compiler to treat any (worker, task) pair absent from the CSV as having infinite cost, effectively marking that assignment as unavailable. This is the appropriate semantic for cost and penalty parameters.
Variables
The variables section defines the decision variables for the optimization model. The assign variable is binary and indexed over domain: ["Workers", "Tasks"]. At compile time, the compiler generates (assuming the PuLP solver):
assign = {(w, t): pulp.LpVariable(f"assign_{w}_{t}", cat="Binary") for w in Workers for t in Tasks}Some important structural flags not used here but worth knowing are exclude_diagonal, domain_filter, and upper_bound_set.
For variables indexed over the same set twice — such as arc[Location, Location] in a routing model — setting exclude_diagonal=true tells the compiler to skip the (i, i) diagonal entries, since no location routes to itself. The compiler inserts an
if l1 == l2:
continueguard and uses .get(key, 0) for all lookups so that missing keys never trigger a KeyError.
When a cost table has fewer rows than the full Cartesian product of its domain sets (for example, only valid routes appear in the CSV), setting domain_filter to that parameter’s name restricts the variable to only those existing combinations. The compiler generates the comprehension with if (i, j) in transport_cost, ensuring that non-existent routes are never created as variables.
For integer variables whose natural upper bound equals the size of a set (such as MTZ position variables in subtour elimination), setting upper_bound_set="Customers" causes the compiler to emit len(Customers) as the upper bound, keeping the model data-agnostic even when the set size changes between runs.
Constraints
The constraints section contains expression trees that describe the constraints defined for this model. This is where the IR differs most significantly from a code file. Constraints are not stored as strings or code — they are expression trees. Each constraint has: (1) domain: the sets the compiler will iterate over to generate one constraint instance per combination. For example, domain: ["Workers"] means one constraint per worker. (2) expression: the left-hand side, represented as a recursive tree of nodes. (3) sense: the inequality or equality sign for this constraint — "=", "<=", or ">=". (4) rhs: the right-hand side, also an expression tree (but containing only constants and parameters — never variables, which must always appear on the LHS). Let’s examine the one_task_per_worker constraint in detail.
"one_task_per_worker": {
"domain": ["Workers"],
"expression": {
"operation": "indexed_sum",
"over": ["Tasks:t"],
"body": {"type": "variable", "name": "assign", "indices": ["w", "t"]}
},
"sense": "=",
"rhs": {"type": "constant", "value": 1}
},In the expression node above, the over field uses the alias "Tasks:t" to explicitly name the loop variable t for this inner sum. This is necessary because t is already the index_symbol of the Tasks set, and when the outer constraint domain does not include Tasks, the compiler would not have a t in scope — the alias forces it to exist inside the sum. Whenever a set in over already appears in the constraint’s domain (with the same index_symbol), use an alias to avoid shadowing the outer loop variable. Otherwise the inner t would shadow the outer t, and the sum would always compute assign[t, t] (a self-loop diagonal) rather than the intended sum.
Objective
In the IR, the objective is written as follows.
"objective": {
"sense": "minimize",
"expression": {
"operation": "indexed_sum",
"over": ["Workers:w", "Tasks:t"],
"body": {
"operation": "multiply",
"left": {"type": "parameter", "name": "assignment_cost", "indices": ["w", "t"]},
"right": {"type": "variable", "name": "assign", "indices": ["w", "t"]}
}
}
}The outer indexed_sum iterates over both Workers and Tasks simultaneously, using aliases "Workers:w" and "Tasks:t" to name both loop variables explicitly. The body is a multiply node — parameter × variable — which is the only form of multiplication the IR permits in a linear model. The result is one term per (worker, task) pair, summed into the total cost.
This is the simplest objective shape: a single indexed sum. More complex objectives combine multiple indexed sums using subtract. Suppose the model had both an assignment cost and a bonus for certain assignments: maximize sum(bonus[w,t] × assign[w,t]) – sum(cost[w,t] × assign[w,t]). That would be encoded as:
subtract(indexed_sum(over Workers,Tasks: bonus[w,t] × assign[w,t]),indexed_sum(over Workers,Tasks: cost[w,t] × assign[w,t]))
One critical rule about subtract: never nest a subtract on the right side of another subtract. Because subtract is a binary operation — left minus right — placing another subtract on the right flips the inner term’s sign:
subtract(A, subtract(B, C))= A – (B – C)= A – B + C ← C was supposed to be subtracted but ends up added
Suppose the objective is revenue – shipping_cost – holding_cost. A common failure mode for LLMs is grouping the two costs together on the right:
subtract(revenue, subtract(shipping_cost, holding_cost))= revenue – (shipping_cost – holding_cost)= revenue – shipping_cost + holding_cost
This is incorrect — the holding cost effectively becomes revenue. The model still compiles and the solver still returns “optimal,” but the objective value is wrong, inflated by 2 × holding_cost. The correct form is a flat left-to-right
chain:
subtract(subtract(revenue, shipping_cost), holding_cost)
= (revenue – shipping_cost) – holding_cost
= revenue – shipping_cost – holding_cost
ORPilot includes an IR semantic validator that identifies this right-side nesting pattern before compilation and pinpoints exactly which term had its sign reversed, allowing the LLM to correct the chain ordering.
From IR to Solver Code
The IR compiler is entirely deterministic — no LLM is involved at any point. Feed it the same ir.json and the same CSV data files, and it will always generate identical solver code. Every single time. The compiler currently supports five backends: PuLP, Pyomo, OR-Tools, Gurobi, and CPLEX. Switching between backends requires no changes to the model itself. The IR stays the same; only the compilation target changes. This means you can store ir.json alongside your data and exactly reproduce any past result without making a single API call. You can swap from Gurobi to PuLP by running: orpilot compile-ir output/ir.json --solver pulp --run. One command, zero LLM calls, identical model structure. You can integrate CI/CD validation of solver outputs by committing ir.json and running the compiler inside your pipeline. You can share ir.json with a teammate on a different machine, and they can solve the same model without needing your LLM API key or even understanding the problem from the ground up.
The IR Compilation Pipeline
Once you have a validated ir.json, ORPilot provides a streamlined compilation pipeline: ir.json + CSV Data → IR Compiler → Solver Code → Code Execution. This entire pipeline involves zero LLM calls from start to finish. It is fast, inexpensive, and fully deterministic. The only LLM call in the entire workflow was the one that generated the ir.json in the first place. The CLI command is: orpilot compile-ir output/ir.json --run. That compiles the IR, runs the model, and produces a solution report. To switch solvers: orpilot compile-ir output/ir.json --solver pyomo --run.
The IR Semantic Validator
Before an IR is saved and compiled, ORPilot runs a semantic validator that catches modeling mistakes that are structurally valid JSON but mathematically incorrect. The validator currently detects three major categories, all of which are common LLM failure modes observed during testing.
1. Inventory balance sign errors. It detects when all flow variables in a balance constraint end up on the same side (e.g., inv = inflow + outflow instead of inv = inflow – outflow). The correct identity is: ending_inv = beginning_inv + inflow – outflow. Violations of this produce models that are either infeasible (the over-constrained case) or unbounded (the under-constrained case), and the sign error is nearly impossible to catch in compiled code.
2. Missing init constraint. If a temporal-lag balance constraint exists, the validator requires a corresponding “_init” variant representing the constraint in the initial time period. A missing init constraint could leave the first period unconstrained, producing an unbounded model even when the subsequent-period constraint is correct.
3. Nested subtract in objective. Sometimes the IR-building LLM writes subtract(A, subtract(B, C)) when it actually intends to subtract both cost B and cost C sequentially from revenue A. Mathematically, however, this expression evaluates to A – (B – C) = A – B + C, flipping C’s sign from cost to revenue. The model still solves to “optimal,” but the objective value is inflated by 2 × C. The validator detects right-side nesting and identifies the affected term so the LLM can rewrite the objective as a flat left-to-right chain.
When validation fails, the specific error message is sent back to the LLM as a targeted retry prompt. The LLM does not see a generic “invalid IR” message — instead it receives something like “inventory_balance sign error: variable discharge appears to be negative (coefficient -1) but should be subtracted from inflow, not added to it.”
Why IR Matters for What-If Analysis
The IR’s reproducibility and portability properties extend naturally into systematic what-if analysis. Once a model is solved and its IR is saved, a business user typically wants to explore how the optimal solution shifts under different assumptions. What if demand increases by 20% in Q3? What if the cost of raw material rises to $15 per unit? What if we add a constraint that no single supplier accounts for more than 40% of total procurement? The IR structure makes two categories of what-if queries trivially inexpensive. The first category is data changes. If the question only modifies parameter values (leaving the model structure intact), you only need to update the CSV files. The IR JSON remains unchanged. Run the compiler against the new data and re-solve. This is a zero-LLM-call operation. You can run hundreds of scenarios this way with no API cost.
The second category is structural changes. If the question modifies a constraint, adds a new one, or changes the objective, you edit the IR JSON directly. Because the IR is a typed, schema-validated document with a well-defined expression tree, such edits are localized. Adding a constraint is simply a matter of appending a new constraint object — not searching through hundreds of lines of solver-specific code trying to figure out where to make the change. This represents a fundamentally different relationship with your optimization model than anything else available today. Instead of a one-shot artifact, you have a living, editable model structure that you can interrogate and modify independently of the LLM.
The Bigger Picture
The IR addresses something fundamental about the relationship between AI and production software: AI outputs need to be verifiable, portable, and durable. A solver code file generated by an LLM is an opaque blob. If something is wrong, you need the LLM to fix it. If you want to change something, you either need to understand solver API syntax well enough to edit it yourself, or you call the LLM again. The model exists only as code. The IR decouples the modeling intelligence (which requires an LLM) from the computational step (which does not). The LLM’s job is to produce a clean, structured JSON artifact. Once that artifact exists and is validated, it belongs to you — not to the LLM. This design choice, more than anything else in ORPilot, is what makes it suitable for production deployment rather than academic demonstration.



