A Light Introduction To Nonlinear Constrained Optimization With Piecewise Linear Approximations

drawback, the aim is to seek out the most effective (most or minimal) worth of an goal perform by deciding on actual variables that fulfill a set of equality and inequality constraints.

A common constrained optimization drawback is to pick out $n$ actual determination variables $x_0,x_1,dots,x_{n-1}$ from a given possible area in such a method as to optimize (decrease or maximize) a given goal perform

[f(x_0,x_1,dots,x_{n-1}).]

We normally let $x$ denote the vector of $n$ actual determination variables $x_0,x_1,dots,x_{n-1}.$ That’s, $x=(x_0,x_1,dots,x_{n-1})$ and write the overall nonlinear program as:

[begin{aligned}text{ Maximize }&f(x)text{ Subject to }&g_i(x)leq b_i&&qquad(i=0,1,dots,m-1)&xinmathbb{R}^nend{aligned}]

the place every of the constraint capabilities $g_0$ by means of $g_{m-1}$ is given, and every $b_i$ is a continuing (Bradley et al., 1977).

This is just one attainable solution to write the issue. Minimizing $f(x)$ is equal to maximizing $-f(x).$ Likewise, an equality constraint $h(x)=b$ could be expressed because the pair of inequalities $h(x)leq b$ and $-h(x)leq-b.$ Furthermore, by including a slack variable, any inequality could be transformed into an equality constraint (Bradley et al., 1977).

Most of these issues seem throughout many utility areas, for instance, in enterprise settings the place an organization goals to maximise revenue or decrease prices whereas working on sources or funding constraints (Cherry, 2016).

If $f(x)$ is a linear perform and the constraints are linear, then the issue is known as a linear programming drawback (LP) (Cherry, 2016).

“The problem is called a nonlinear programming problem (NLP) if the objective function is nonlinear and/or the feasible region is determined by nonlinear constraints.” (Bradley et al., 1977, p. 410)

The assumptions and approximations utilized in linear programming can generally produce an acceptable mannequin for the choice variable ranges of curiosity. In different circumstances, nonetheless, nonlinear conduct within the goal perform and/or the constraints is important to formulate the applying as a mathematical program precisely (Bradley et al., 1977).

Nonlinear programming refers to strategies for fixing an NLP. Though many mature solvers exist for LPs, NLPs, particularly these involving higher-order nonlinearities, are sometimes tougher to unravel (Cherry, 2016).

Difficult nonlinear programming issues come up in areas reminiscent of digital circuit design, monetary portfolio optimization, gasoline community optimization, and chemical course of design (Cherry, 2016).

One solution to deal with nonlinear programming issues is to linearize them and use an LP solver to acquire a superb approximation. One want to do this for a number of causes. As an illustration, linear fashions are normally quicker to unravel and could be extra numerically secure.

To maneuver from idea to a working Python mannequin, we are going to stroll by means of the next sections:

This textual content assumes some familiarity with mathematical programming. After defining Separable Features, we current a separable nonlinear maximization Instance and description an strategy primarily based on piecewise linear (PWL) approximations of the nonlinear goal perform.

Subsequent, we outline Particular Ordered Units of Sort 2 and clarify how they help the numerical formulation.

We then introduce the theoretical background in On Convex and Concave Features, offering instruments used all through the rest of this work on nonlinear programming (NLP).

Lastly, we current a Python Implementation process that makes use of Gurobipy and PWL approximations of the nonlinear goal perform, enabling LP/MIP solvers to acquire helpful approximations for pretty massive NLPs.

Separable Features

This text’s answer process is for separable applications. Separable applications are optimization issues of the shape:

[begin{aligned}text{ Maximize }&sumlimits_{j=0}^{n-1}f_j(x_j)text{ Subject to }&sumlimits_{j=0}^{n-1} g_{ij}(x_j)leq 0qquad(i=0,1,dots,m-1)&xinmathbb{R}^nend{aligned}]

the place every of the capabilities $f_j$ and $g_{ij}$ are recognized. These issues are known as separable as a result of the choice variables seem individually: one in every constraint perform $g_{ij}$ and one in every goal perform $f_j$ (Bradley et al., 1977).

Instance

Contemplate the next drawback from Bradley et al. (1977) that arises in portfolio choice:

[begin{aligned}text{ Maximize }&f(x)=20x_0+16x_1-2x_0^2-x_1^2-(x_0+x_1)^2text{ Subject to }&x_0+x_1leq5&x_0geq0,x_1geq0.end{aligned}]

Goal perform (Animation by the writer).

Goal perform over the possible area (Animation by the writer).

As said, the issue isn’t separable due to the time period $(x_0+x_1)^2$ within the goal perform. Nevertheless, nonseparable issues can steadily be decreased to a separable type utilizing varied formulation tips. Specifically, by letting $x_2=x_0+x_1$ we will re-express the issue above in separable type as:

[begin{aligned}text{ Maximize }&f(x)=20x_0+16x_1-2x_0^2-x_1^2-x_2^2text{ Subject to }&x_0+x_1leq5&x_0+x_1-x_2=0&x_0geq0,x_1geq0,x_2geq 0.end{aligned}]

Clearly, the linear constraints are separable, and the target perform can now be written as $f(x)=f_0(x_0)+f_1(x_1)+f_2(x_2),$ the place

[begin{aligned}f_0(x_0)&=20x_0-2x_0^2f_1(x_1)&=16x_1-x_1^2end{aligned}]

and

[f_2(x_2)=-x_2^2.]

Nonlinear capabilities (Picture by the writer).

To type the approximation drawback, we approximate every nonlinear time period $f_j(x_j)$ by a piecewise linear perform $hat{f_j}(x_j)$ by utilizing a specified variety of breakpoints.

Every PWL perform $hat{f_j}(x_j)$ is outlined over an interval $[l,u]$ and consists of line segments whose endpoints lie on the nonlinear perform $f_j(x_j),$ beginning at $(l,f_j(l))$ and ending at $(u,f_j(u)).$ We characterize the PWL capabilities utilizing the next parametrization. One can write any $lleq x_jleq u$ as:

[x_j=sumlimits_{i=0}^{r-1}theta_j^i a_i,quad hat f_j(x_j)=sumlimits_{i=0}^{r-1}theta_j^if_j(a_i),quadsumlimits_{i=0}^{r-1}theta_j^i=1,quadtheta_j=(theta_j^0,theta_j^1,dotstheta_j^{r-1})inmathbb{R}_{geq0}^r]

for every $j=0,1,2,$ the place the PWL capabilities are specified by the factors ${(a_i,f_j(a_i)}$ for $i=0,1,dots,r-1$ (Cherry, 2016). Right here, we use 4 uniformly distributed breakpoints to approximate every $f_j(x_j)$ Additionally, since $x_0+x_1leq5,$ $x_0+x_1=x_2,$ and $x_0,x_1,x_2geq0$ we’ve got that

[0leq x_0leq5,qquad0leq x_1leq 5,quadtext{ and }quad0leq x_2leq5.]

And so, our approximations needn’t lengthen past these variable bounds. Due to this fact the PWL perform $hat{f_j}(x_j)$ can be outlined by the factors $(0,f_j(0)),$ $(5/3,f_j(5/3)),$ $(10/3,f_j(10/3))$ and $(5,f_j(5))$ for $j=0,1,2.$

Any $x_0in[0,5]capmathbb{R}$ could be written as

[begin{aligned}x_0&=theta_0^0cdot0+theta_0^1cdotfrac{5}{3}+theta_0^2cdotfrac{10}{3}+theta_0^3cdot5&=theta_0^1cdotfrac{5}{3}+theta_0^2cdotfrac{10}{3}+theta_0^3cdot 5text{ s.t. }sumlimits_{i=0}^3theta_0^i=1end{aligned}]

and $f_0(x_0)$ could be approximated as

[begin{aligned}hat f_0(x_0)&=theta_0^0f_0(0)+theta_0^1f_0left(frac{5}{3}right)+theta_0^2f_0left(frac{10}{3}right)+theta_0^3f_0(5)&=theta_0^1cdotfrac{250}{9}+theta_0^2cdotfrac{400}{9}+theta_0^3cdot50.end{aligned}]

The identical reasoning follows for $x_1$ and $x_2$ For instance, evaluating the approximation at $x_0=1.5$ offers:

[begin{aligned}hat f_0(1.5)&=frac{1}{10}f_0(0)+frac{9}{10}f_0left(frac{5}{3}right)&=frac{1}{10}cdot 0+frac{9}{10}cdotfrac{250}{9}&=25end{aligned}]

since

[1.5=frac{1}{10}cdot0+frac{9}{10}cdotfrac{5}{3}.]

PWL approximation curves (Picture by the writer).

Such weighted sums are known as convex mixtures, i.e., linear mixtures of factors with non-negative coefficients that sum to 1. Now, since $f_0(x_0),$ $f_1(x_1),$ and $f_2(x_2)$ are concave capabilities, one might ignore extra restrictions, and the issue could be solved as an LP (Bradley et al., 1977). Nevertheless, typically, such a formulation doesn’t suffice. Specifically, we nonetheless have to implement the adjacency situation: At most two $theta_j^i$ weights are constructive, and if two weights are constructive, then they’re adjoining, i.e., of the shape $theta_j^i$ and $theta_j^{i+1}.$ An analogous restriction applies to every approximation. As an illustration, if the weights $theta_0^0=2/5$ and $theta_0^2=3/5$ then the approximation offers

[begin{aligned}hat f_0(x_0)&=frac{2}{5}cdot 0+frac{3}{5}cdotfrac{400}{9}=frac{80}{3}x_0&=frac{2}{5}cdot0+frac{3}{5}cdotfrac{10}{3}=2.end{aligned}]

In distinction, at $x_0=2,$ the approximation curve offers

[begin{aligned}hat f_0(x_0)&=frac{4}{5}cdotfrac{250}{9}+frac{1}{5}cdotfrac{400}{9}=frac{280}{9}x_0&=frac{4}{5}cdotfrac{5}{3}+frac{1}{5}cdotfrac{10}{3}=2.end{aligned}]

Want for adjacency situation (Picture by the writer).

One can implement the adjacency situation by introducing extra binary variables into the mannequin, as formulated in Cherry (2016). Nevertheless, we leverage Gurobipy‘s SOS (Particular Ordered Set) constraint object.

The entire program is as follows

[begin{alignat}{2}text{ Maximize }&hat f_0(x_0)+hat f_1(x_1)+hat f_2(x_2)text{ Subject to }&x_0=sumlimits_{i=0}^{r-1}theta_0^ia_i&x_1=sumlimits_{i=0}^{r-1}theta_1^ia_i&x_2=sumlimits_{i=0}^{r-1}theta_2^ia_i&hat f_0=sumlimits_{i=0}^{r-1}theta_0^if_0(a_i)&hat f_1=sumlimits_{i=0}^{r-1}theta_1^if_1(a_i)&hat f_2=sumlimits_{i=0}^{r-1}theta_2^if_2(a_i)&sumlimits_{i=0}^{r-1}theta_j^i=1&&qquad j=0,1,2&{theta_j^0,theta_j^1,dotstheta_j^{r-1}}text{ SOS type 2 }&&qquad j=0,1,2&x_0,x_1,x_2in[0,5]capmathbb{R}&hat f_0in[0,50]capmathbb{R}&hat f_1in[0,55]capmathbb{R}&hat f_2in [-25,0]capmathbb{R}&theta_j^iin[0,1]capmathbb{R}&&qquad j=0,1,2,;i=0,1,dots,{r-1}.finish{alignat}]

Particular Ordered Units of Sort 2

“A set of variables ${x_0,x_1,dots,x_{n-1}}$ is a special ordered set of type 2 (SOS type 2) if $x_ix_j=0$ whenever $|i-j|geq2,$ i.e., at most two variables in the set can be nonzero, and if two variables are nonzero they must be adjacent in the set.” (de Farias et al., 2000, p. 1)

In our formulation, the adjacency situation on the $theta_j^i$ variables matches precisely a SOS kind 2 situation: at most two $theta_j^i$ could be constructive, and so they should correspond to adjoining breakpoints. Utilizing SOS kind 2 constraints lets us guarantee adjacency straight in Gurobi, which applies specialised branching methods that robotically seize the SOS construction, as a substitute of introducing additional binary variables by hand.

On Convex and Concave Features

For the next dialogue, we take a common separable constrained maximization drawback occasion, as beforehand outlined within the Introduction part:

[begin{aligned}text{ Maximize }&sumlimits_{j=0}^{n-1}f_j(x_j)text{ Subject to }&sumlimits_{j=0}^{n-1} g_{ij}(x_j)leq 0qquad(i=0,1,dots,m-1)&xinmathbb{R}^n.end{aligned}]

the place every of the capabilities $f_i$ and $g_{ij}$ are recognized.

All through, the time period piecewise-linear (PWL) approximation refers back to the secant interpolant obtained by connecting ordered breakpoints pairs $(a_k,f(a_k))$ with line segments.

Convex and concave are among the many most important purposeful types in mathematical optimization. Formally, a perform $f(x)$ is known as convex if, for each $y$ and $z$ and each $0leqlambdaleq1,$

[f[lambda y+(1-lambda)z]leqlambda f(y)+(1-lambda)f(z).]

This definition implies that the sum of convex capabilities is convex, and nonnegative multiples of convex capabilities are additionally convex. Concave capabilities are merely the damaging of convex capabilities, for which the definition above follows aside from the reversed course of the inequality. In fact, some capabilities are neither convex nor concave.

Convex and concave capabilities (Eric W. Weisstein on MathWorld)

It’s straightforward to see that linear capabilities are each convex and concave. This property is important because it permits us to optimize (maximize or decrease) linear goal capabilities utilizing computationally environment friendly strategies, such because the simplex technique in linear programming (Bradley et al., 1977).

Moreover, a single-variable differentiable perform $f(x)$ is convex on an interval precisely when its by-product $f^prime(x)$ is nondecreasing — that means the slope doesn’t go down. Likewise, differentiable $f(x)$ is concave on an interval precisely when $f^prime(x)$ is nonincreasing — that means the slope doesn’t go up.

From the definition above, one can trivially conclude that PWL approximations overestimate convex capabilities since each line section becoming a member of two factors on its graph doesn’t lie beneath the graph at any level. Equally, PWL approximations underestimate concave capabilities.

Nonlinear Goal Perform

This notion helps us clarify why we will omit the SOS kind 2 constraints from an issue such because the one within the Instance part. By concavity, the PWL approximation curve lies above the section becoming a member of any two non-adjacent factors. Due to this fact, the maximization will choose the approximation curve with solely adjoining weights. An analogous argument applies if three or extra weights are constructive (Bradley et al., 1977). Formally, suppose that every $f_j(x_j)$ is a concave perform and every recognized $g_{ij}$ is a linear perform in $x.$ Then we will apply the identical process as within the Instance part — We let breakpoints fulfill

[l= a_0lt a_1lt dotslt a_{r-1}= u,qquad rgeq 2.]

For actual numbers $lleq x_jleq u.$ Additionally, let $theta_jinmathbb{R}_+^r$ such that,

[sumlimits_{i=0}^{r-1}theta_j^i=1]

and assign

[x_j=sumlimits_{i=0}^{r-1}theta_j^icdot a_i,qquadhat f_j=sumlimits_{i=0}^{r-1}theta_j^if_j(a_i)]

for every $j=0,1,dots,n-1.$ And so, the reworked LP is as follows

[begin{alignat}{2}text{ Maximize }&sumlimits_{j=0}^{n-1}hat f_jtext{ Subject to }&sumlimits_{j=0}^{n-1} g_{ij}(x_j)leq 0&&qquad(i=0,1,dots,m-1)&x_j=sumlimits_{i=0}^{r-1}theta_j^icdot a_i&&qquad(j=0,1,dots,n-1)&hat f_j=sumlimits_{i=0}^{r-1}theta_j^if_j(a_i)&&qquad(j=0,1,dots,n-1)&sum_{i=0}^{r-1}theta_j^i=1&&qquad(j=0,1,dots,n-1)&theta_jinmathbb{R}_{geq 0}^r&&qquad(j=0,1,dots,n-1)&xinmathbb{R}^n.end{alignat}]

Now, let $p_j(x_j)$ be the PWL curve that goes by means of the factors $(a_i,f_j(a_i))$ i.e., for every $x_jin[a_k,a_{k+1}],$ $j=0,1,dots,n-1:$

[p_j(x_j)=frac{a_{k+1}-x_j}{a_{k+1}-a_k}f_j(a_k)+frac{x_j-a_k}{a_{k+1}-a_k}f_j(a_{k+1})]

$f_j(x_j)$ concave implies that its secant slopes are non-increasing, and since $p_j(x_j)$ is constructed by becoming a member of ordered breakpoints on the graph of $f_j(x_j),$ we’ve got that $p_j(x_j)$ can be concave. And so, by Jensen’s inequality,

[p_j(x_j)=p_jleft(sumlimits_{i=0}^{r-1}theta_j^icdot a_iright)geqsumlimits_{i=0}^{r-1}theta_j^icdot p_j(a_i)=sumlimits_{i=0}^{r-1}theta_j^if_j(a_i)=hat f_j.]

Furthermore, for all $x_j$ in any possible answer $((x_0,hat{f_0},theta_0),(x_1,hat{f_1},theta_1),dots,(x_{n-1},hat{f}_{n-1},theta_{n-1})),$ one can select $ok$ such that $x_jin[a_k,a_{k+1}]$ and outline the vector $hat{theta_j}$ by

[hattheta_j^k=frac{a_{k+1}-x_j}{a_{k+1}-a_k},qquad hattheta_j^{k+1}=frac{x_j-a_k}{a_{k+1}-a_k},qquadhattheta_j^i=0,;(inotin {k,k+1})qquad]

with $hattheta_j^ok,hattheta_j^{ok+1}geq 0$ and $hattheta_j^ok+hattheta_j^{ok+1}=1.$ Then $hattheta_j^kcdot a_k+hattheta_j^{ok+1}cdot a_{ok+1}=x_j$ and offers

[hat f_j^text{new}=hattheta_j^k f_j(a_k)+hattheta_j^{k+1}f_j(a_{k+1})=p_j(x_j)geqhat f_j.]

As a result of the constraints rely upon $theta_j$ solely by means of the induced worth $x_j,$ changing every $theta_j$ by $hattheta_j$ preserves feasibility (as a result of it leaves every $x_j$ unchanged) and it weakly improves the target perform (as a result of it will increase every $hat f_j$ to $p_j(x_j)$ ). Due to this fact, the SOS kind 2 constraints that implement the adjacency situation don’t change the optimum worth in separable maximization issues with concave goal capabilities. An analogous argument reveals that the SOS kind 2 constraints don’t change the optimum worth in separable minimization issues with convex goal capabilities.

Word, nonetheless, that this reveals that there exists an optimum answer through which solely weights comparable to adjoining breakpoints are constructive. It doesn’t assure that each one optimum options of the LP fulfill adjacency. Due to this fact, one should embody SOS kind 2 constraints to implement a constant illustration.

In follow, fixing fashions with concave goal capabilities by way of PWL approximations can yield an goal worth that underestimates the true optimum worth. Conversely, fixing fashions with convex goal capabilities by way of PWL approximations yield an goal worth that overestimates the true optimum worth.

Nevertheless, though these approximations can distort the target worth, in fashions the place the unique linear constraints stay unchanged, feasibility isn’t affected by the PWL approximations. Any variable assignments discovered by fixing the mannequin by means of PWL approximations fulfill the identical linear constraints and subsequently lie within the authentic possible area. Consequently, one can consider the unique nonlinear goal $f(x)$ on the returned answer to acquire the target worth, though this worth needn’t equal the true nonlinear optimum.

Nonlinear Constraints

What requires completely different approaches are PWL approximations of nonlinear constraints, since these approximations can change the unique possible area. We outline the possible area/set of the issue, just like the one above, by:

[F=bigcaplimits_{i=0}^{m-1}{xinmathbb{R}^n:G_i(x)leq0},]

the place

[G_i(x):=sumlimits_{j=0}^{n-1}g_{ij}(x_j).]

Now, suppose (for notational simplicity) that each one $G_i(x)$ are nonlinear capabilities. Approximate every univariate $g_{ij}(x_j)$ by a PWL perform $hat g_{ij}(x_j)$ and outline:

[hat G_i(x):=sumlimits_{j=0}^{n-1}hat g_{ij}(x_j).]

Then

[hat F=bigcaplimits_{i=0}^{m-1}{xinmathbb{R}^n:hat G_i(x)leq0},]

is the possible set of the reworked LP that makes use of PWL approximations. To keep away from actually infeasible options, we would like

[hat Fsubseteq F.]

If, as a substitute, $Fsubseteqhat F$ then there might exist $xin hat Fsetminus F$ that means the mannequin that makes use of PWL approximations may settle for factors that violate the unique nonlinear constraints.

Given the best way we assemble the PWL approximations, a ample situation for $hat Fsubseteq F$ is that for constraints of the shape $G_i(x)leq0$ every $g_{ij}(x_j)$ is convex and for constraints of the shape $G_i(x)geq0$ every $g_{ij}(x_j)$ is concave.

To see why this holds, take any $xinhat F.$ First, think about constraints of the shape $G_i(x)leq0$ the place every $g_{ij}(x_j)$ is convex. As a result of PWL approximations overestimate convex capabilities, for any mounted $i=0,1,dots,m-1,$ we’ve got:

[(forall j,;hat g_{ij}(x_j)geq g_{ij}(x_j))rightarrowsumlimits_{j=0}^{n-1}hat g_{ij}(x_j)geqsumlimits_{j=0}^{n-1}g_{ij}(x_j)rightarrowhat G_i(x)geq G_i(x)]

Since $xinhat F,$ it satisfies $hat G_i(x)leq0.$ Along with $hat G_i(x)geq G_i(x),$ this suggests $G_i(x)leq0.$

Subsequent, think about constraints of the shape $G_i(x)geq0$ the place every $g_{ij}(x_j)$ is concave. As a result of PWL approximations underestimate concave capabilities, for any mounted $i=0,1,dots,m-1,$ we’ve got:

[(forall j,;hat g_{ij}(x_j)leq g_{ij}(x_j))rightarrowsumlimits_{j=0}^{n-1}hat g_{ij}(x_j)leqsumlimits_{j=0}^{n-1}g_{ij}(x_j)rightarrowhat G_i(x)leq G_i(x).]

Once more, since $xinhat F,$ it satisfies $hat G_i(x)geq0.$ Mixed with $hat G_i(x)leq G_i(x),$ this suggests $G_i(x)geq0.$

This motivates the notion of a convex set: A set $C$ is convex if, for any two factors $x,yin C$ the complete line section connecting them lies inside $C.$ Formally, $C$ is convex if for all $x,yin C$ and any $lambdain[0,1]$ the purpose $lambda x+(1-lambda)y$ can be in $C.$

Specifically, if $G_i:mathbb{R}^ntomathbb{R}$ is convex, then

[S_i:={xinmathbb{R}^n:G_i(x)leq b}]

is convex for any $binmathbb{R}.$ Proof: Take any $x_1,x_2in S,$ then $G_i(x_1)leq b$ and $G_i(x_2)leq b.$ Since $G_i(x)$ is convex, for any $lambdain[0,1]$ we’ve got that

[G_i(lambda x_1+(1-lambda)x_2)leqlambda G_i(x_1)+(1-lambda)G_i(x_2)leqlambda b+(1-lambda)b=b.]

Therefore, $lambda x_1+(1-lambda) x_2in S,$ and so, $S$ is convex. An analogous argument applies to indicate that

[T_i:={xinmathbb{R}^n:G_i(x)geq b}]

is convex for any $binmathbb{R}$ if $G_i:mathbb{R}^ntomathbb{R}$ is concave.

And since one may simply present that the intersection of any assortment of convex units can be convex,

“for a convex feasible set we want convex functions for less-than-or-equal-to constraints and concave functions for greater-than-or-equal to constraints.” (Bradley et al., 1977, p.418)

Nonlinear optimization issues through which the aim is to reduce a convex goal (or maximize a concave one) over a convex set of constraints are known as convex nonlinear issues. These issues are usually simpler to deal with than nonconvex ones.

Certainly, if $Fsubseteqhat F$ then variable assignments might result in constraint violations. In these circumstances, one would want to depend on different strategies, reminiscent of the straightforward modification MAP to the Frank-Wolfe algorithm offered in Bradley et al. (1977), and/or introduce tolerances that, to some extent, permit constraint violations.

Python Implementation

Earlier than presenting the code, it’s helpful to think about how our modeling strategy will scale as the issue measurement will increase. For a separable program with $n$ determination variables and $r$ breakpoints per variable, the mannequin introduces $ncdot r$ steady variables (for the convex mixtures) and $r$ SOS kind 2 constraints. Which means that each the variety of variables and constraints develop linearly with $n$ and $r,$ however their product can shortly result in massive fashions when both parameter is elevated.

As beforehand talked about, we are going to use Gurobipy to numerically resolve this system above. Typically, the process is as follows:

Select bounds $[l,u]$
Select breakpoints $a_i$
Add $theta,$ convex-combination constraints
Add SOS kind 2
Remedy
Consider the nonlinear goal on the returned $x$
Refine breakpoints if wanted

After importing the required libraries and dependencies, we declare and instantiate our nonlinear capabilities $f_0,$ $f_1,$ and $f_2:$

# Imports
import gurobipy as grb
import numpy as np

# Nonlinear capabilities
f0=lambda x0: 20.0*x0-2.0*x0**2
f1=lambda x1: 16.0*x1-x1**2
f2=lambda x2: -x2**2

Subsequent, we select the variety of breakpoints. To assemble the PWL approximations, we have to distribute them throughout the interval $[0,5]capmathbb{R}.$ Many methods exist to do that. As an illustration, one may cluster extra factors close to the ends and even depend on adaptive procedures as in Cherry (2016). For simplicity, we stick with uniform sampling:

lb,ub=0.0,5.0
r=4  # Variety of breakpoints
a_i=np.linspace(begin=lb,cease=ub,num=r)  # Distribute breakpoints uniformly

Then, we will compute the worth of the capabilities $f_j(x_j)$ on the chosen breakpoints to approximate the choice variables:

# Compute perform values at breakpoints
f0_hat_r=f0(a_i)
f1_hat_r=f1(a_i)
f2_hat_r=f2(a_i)

The subsequent step is to declare and instantiate a Gurobipy optimization mannequin:

mannequin=gp.Mannequin()

After creating of our mannequin, we have to add the choice variables. Gurobipy offers strategies so as to add a number of determination variables to a mannequin directly, and we leverage this function to declare and instantiate a few of our determination variables, specifically the coefficients of the PWL approximations. Word that each one determination variables in our mannequin are steady and could be assigned to any actual worth throughout the offered bounds. Such fashions could be solved effectively utilizing trendy software program packages reminiscent of Gurobi.

# Resolution variables
x0=mannequin.addVar(lb=0.0,ub=5.0,identify='x0',vtype=GRB.CONTINUOUS)
x1=mannequin.addVar(lb=0.0,ub=5.0,identify='x1',vtype=GRB.CONTINUOUS)
x2=mannequin.addVar(lb=0.0,ub=5.0,identify='x2',vtype=GRB.CONTINUOUS)

f0_hat=mannequin.addVar(lb=0.0,ub=50.0,identify='f0_hat',vtype=GRB.CONTINUOUS)
f1_hat=mannequin.addVar(lb=0.0,ub=55.0,identify='f1_hat',vtype=GRB.CONTINUOUS)
f2_hat=mannequin.addVar(lb=-25.0,ub=0.0,identify='f2_hat',vtype=GRB.CONTINUOUS)

theta=mannequin.addVars(vary(0,3),vary(0,r),lb=0.0,ub=1.0,identify='theta',vtype=GRB.CONTINUOUS)

Moreover, we have to add constraints that restrict the values our determination variables might take. Firstly, we add the constraints that guarantee the right determination of the variables that yield the approximations of the nonlinear capabilities:

# Constraints
mannequin.addConstr(x0==gp.quicksum(theta[0,i]*a_i[i] for i in vary(0,r)))
mannequin.addConstr(x1==gp.quicksum(theta[1,i]*a_i[i] for i in vary(0,r)))
mannequin.addConstr(x2==gp.quicksum(theta[2,i]*a_i[i] for i in vary(0,r)))

mannequin.addConstr(f0_hat==gp.quicksum(theta[0,i]*f0_hat_r[i] for i in vary(0,r)))
mannequin.addConstr(f1_hat==gp.quicksum(theta[1,i]*f1_hat_r[i] for i in vary(0,r)))
mannequin.addConstr(f2_hat==gp.quicksum(theta[2,i]*f2_hat_r[i] for i in vary(0,r)))

mannequin.addConstr(gp.quicksum(theta[0,i] for i in vary(0,r))==1)
mannequin.addConstr(gp.quicksum(theta[1,i] for i in vary(0,r))==1)
mannequin.addConstr(gp.quicksum(theta[2,i] for i in vary(0,r))==1)

Secondly, we add the constraints that outline the unique drawback, specifically these restrictions within the nonlinear formulation:

mannequin.addConstr(x0+x1<=5.0)
mannequin.addConstr(x0+x1-x2==0.0)

And eventually, we add the SOS Sort 2 constraints to make sure the adjacency situation:

# Particular ordered units
mannequin.addSOS(GRB.SOS_TYPE2,[theta[0,i] for i in vary(0,r)])
mannequin.addSOS(GRB.SOS_TYPE2,[theta[1,i] for i in vary(0,r)])
mannequin.addSOS(GRB.SOS_TYPE2,[theta[2,i] for i in vary(0,r)])

All that is still is to outline the target perform, which we want to maximize:

mannequin.setObjective(f0_hat+f1_hat+f2_hat,GRB.MAXIMIZE)  # Goal perform

The mannequin is now full and we will use the Gurobi solver to retrieve the optimum worth and optimum variable assignments. The code to do that is proven beneath:

mannequin.optimize()  # Optimize!

obj_val=mannequin.ObjVal

res={}

all_vars=mannequin.getVars()
values=mannequin.getAttr('X',all_vars)
names=mannequin.getAttr('VarName',all_vars)

# Retrieve values of variables after optimizing
for identify,val in zip(names,values):
    res[name]=val

After executing the code above and printing attributes to the display screen, the output is as follows:

- Standing: 2
- Variety of breakpoints: 4
- Optimum goal worth: 45.00
- Optimum x0, f0(x0) worth (PWL approximation): 1.67, 27.78
- Optimum x1, f1(x1) worth (PWL approximation): 3.33, 42.22
- Optimum x2, f2(x2) worth (PWL approximation): 5.00, -25.00

PWL approximation curves (3 segments) with optimum values (Picture by the writer).

The distinction from our computed answer to the optimum worth of 139/3 is 4/3. Now, to extend accuracy and procure a greater answer, one may add extra breakpoints, consequently rising the variety of segments used to approximate the nonlinear capabilities. As an illustration, the output beneath reveals how rising the variety of breakpoints to 16 results in an approximation with virtually no error to the true optimum worth:

- Standing: 2
- Variety of breakpoints: 16
- Optimum goal worth: 46.33
- Optimum x0, f0(x0) worth (PWL approximation): 2.33, 35.78
- Optimum x1, f1(x1) worth (PWL approximation): 2.67, 35.56
- Optimum x2, f2(x2) worth (PWL approximation): 5.00, -25.00

PWL approximation curves (15 segments) with optimum values (Picture by the writer).

Goal perform over the possible area with optimum worth utilizing PWL approximation curves consisting of 15 segments (Animation by the writer).

Yow will discover the whole Python code on this GitHub repository: hyperlink.

Additional Readings

For a extra superior, sturdy algorithm that makes use of PWL capabilities to approximate the nonlinear goal perform, Cherry (2016) is a helpful reference. It presents a process that begins with coarse approximations, that are refined iteratively utilizing the optimum answer to the LP leisure from the earlier step.

For a deeper understanding of nonlinear programming, together with strategies, utilized examples, and an evidence of why nonlinear issues are intrinsically tougher to unravel, the reader might discuss with Nonlinear Programming: Utilized Mathematical Programming (Bradley et al., 1977).

Lastly, de Farias et al. (2000) current computational experiments that use SOS constraints inside a branch-and-cut scheme to unravel a generalized project drawback arising in fiber-optic cable manufacturing scheduling.

Conclusion

Nonlinear programming is a related space in numerical optimization — with functions starting from monetary portfolio optimization to chemical course of management. This text offered a process for tackling separable nonlinear applications by changing nonlinear phrases with PWL approximations and fixing the ensuing mannequin utilizing LP/MIP solvers. We additionally offered theoretical background on convex and concave capabilities and defined how these notions relate to nonlinear programming. With these parts, a reader ought to be capable of mannequin and resolve LP/MIP-compatible approximations of many nonlinear programming situations with out counting on devoted nonlinear solvers.

Thanks for studying!

References

Cherry, A., 2016. Piecewise Linear Approximation for Nonlinear Programming Issues. A dissertation. Texas Tech College. Obtainable at: (Accessed: 27 January 2026).

Bradley, S. P., Hax, A. C. & Magnanti, T. L., 1977. Nonlinear Programming. In: Utilized Mathematical Programming. Studying, MA: Addison-Wesley Publishing Firm, pp. 410–464. Obtainable at: (Accessed: 27 January 2026).

de Farias, I. R., Johnson, E. L. & Nemhauser, G. L., 2000. A generalized project drawback with particular ordered units: a polyhedral strategy. Mathematical Programming, 89(1), pp. 187–203. Obtainable at: (Accessed: 27 January 2026).

Top Posts

Critical WordPress Zero-Day: Unauthenticated Code Execution Exposed in WP2Shell Flaw

Bolivia’s Bold Crypto Play: USDT Adoption Sparks AI Mining Debate

General Dynamics Fires Back: DISA’s Enclave Cloud Expansion Sparks Contract Clash

A Light Introduction to Nonlinear Constrained Optimization with Piecewise Linear Approximations

Unlock Peak Performance: Your Command Protocol for GPT-5.6 Synergy

Beyond the Main Branch: Streamlining AI Workflows with Git Worktrees

The AI Safety Capital Rising: Beyond Silicon Valley’s Shadow

The Agent Security Chasm: 54% of Enterprises Battling AI Breaches While Credentials Freely Roam

Unleashing Kimi K3: The 2.8 Trillion-Parameter Open MoE Powerhouse with Delta Attention and 1M Context Horizon

Unlock Peak AI Performance: 5 Essential Assets Before Scaling Your Team

Critical WordPress Zero-Day: Unauthenticated Code Execution Exposed in WP2Shell Flaw

Bolivia’s Bold Crypto Play: USDT Adoption Sparks AI Mining Debate

General Dynamics Fires Back: DISA’s Enclave Cloud Expansion Sparks Contract Clash

Wireless Logic Bolsters US IoT Reach with Strategic SIMETRY Acquisition

The Blackout Test: Crucial Mistakes I Made With Backup Power (And How You Can Avoid Them)

Unlock Peak Performance: Your Command Protocol for GPT-5.6 Synergy

Iran Hunts US Military Phones: CrashStealer macOS Malware & the CVD Blueprint Unmasked

Benjamin Cowen’s Bold Q4 Forecast: Bitcoin’s $44K Bottom is Imminent!

Trending

Critical WordPress Zero-Day: Unauthenticated Code Execution Exposed in WP2Shell Flaw

Bolivia’s Bold Crypto Play: USDT Adoption Sparks AI Mining Debate

Latest Posts

Not More Data, but Better World Models – Unite.AI

OpenAI Is Hiring Head of Preparedness, Amid AI Cyberattack Fears

Subscribe to Updates

Top Posts

A Light Introduction to Nonlinear Constrained Optimization with Piecewise Linear Approximations

Separable Features

Instance

Particular Ordered Units of Sort 2

On Convex and Concave Features

Nonlinear Goal Perform

Nonlinear Constraints

Python Implementation

Additional Readings

Conclusion

References

Join With Me

Related Posts