Nonlinear Programming:Convex Programming

Convex Programming

We already have discussed some special cases of convex programming in Secs. 13.4 and 13.5 (unconstrained problems), 13.7 (quadratic objective function with linear constraints), and 13.8 (separable functions). You also have seen some theory for the general case (necessary and sufficient conditions for optimality) in Sec. 13.6. In this section, we briefly discuss some types of approaches used to solve the general convex programming problem [where the objective function f(x) to be maximized is concave and the gi(x) constraint functions are convex], and then we present one example of an algorithm for convex programming.

There is no single standard algorithm that always is used to solve convex programming problems. Many different algorithms have been developed, each with its own ad- vantages and disadvantages, and research continues to be active in this area. Roughly speaking, most of these algorithms fall into one of the following three categories.

The first category is gradient algorithms, where the gradient search procedure of Sec. 13.5 is modified in some way to keep the search path from penetrating any constraint boundary. For example, one popular gradient method is the generalized reduced gradient (GRG) method. Solver uses the GRG method for solving convex programming problems. (As discussed in the next section, both Solver and ASPE now include an Evolutionary Solver option that is well suited for dealing with nonconvex programming problems.)

The second category—sequential unconstrained algorithms—includes penalty function and barrier function methods. These algorithms convert the original constrained optimization problem to a sequence of unconstrained optimization problems whose opti- mal solutions converge to the optimal solution for the original problem. Each of these un- constrained optimization problems can be solved by the kinds of procedures described in Sec. 13.5. This conversion is accomplished by incorporating the constraints into a penalty function (or barrier function) that is subtracted from the objective function in order to im- pose large penalties for violating constraints (or even being near constraint boundaries). In the latter part of this section, we will describe an algorithm from the 1960s, called the sequential unconstrained minimization technique (or SUMT for short), that pioneered this category of algorithms. (SUMT also helped to motivate some of the interior-point methods for linear programming.)

The third category—sequential-approximation algorithms—includes linear approx- imation and quadratic approximation methods. These algorithms replace the nonlinear ob- jective function by a succession of linear or quadratic approximations. For linearly constrained optimization problems, these approximations allow repeated application of linear or quadratic programming algorithms. This work is accompanied by other analysis that yields a se- quence of solutions that converges to an optimal solution for the original problem. Although 19R. R. Meyer, “Two-Segment Separable Programming,” Management Science, 25: 385–395, 1979.

20For example, see J. P. Vielma, S. Ahmed, and G. Nemhauser: “Mixed-Integer Models for Nonseparable Piecewise- Linear Optimization: Unifying Framework and Extensions, Operations Research, 58(2): 303–315, March–April 2010.

these algorithms are particularly suitable for linearly constrained optimization problems, some also can be extended to problems with nonlinear constraint functions by the use of appropriate linear approximations.

As one example of a sequential-approximation algorithm, we present here the Frank-Wolfe algorithm21 for the case of linearly constrained convex programming (so the constraints are Ax < b and x > 0 in matrix form). This procedure is particularly straightforward; it combines linear approximations of the objective function (enabling us to use the simplex method) with a procedure for one-variable unconstrained opti- mization (such as described in Sec. 13.4).

A Sequential Linear Approximation Algorithm (Frank-Wolfe) Given a feasible trial solution x', the linear approximation used for the objective function

The simplex method (or the graphical procedure if n = 2) then is applied to the resulting linear programming problem [maximize g(x) subject to the original constraints, Ax < b and x > 0] to find its optimal solution xLP. Note that the linear objective function neces- sarily increases steadily as one moves along the line segment from x' to xLP (which is on the boundary of the feasible region). However, the linear approximation may not be a par- ticularly close one for x far from x', so the nonlinear objective function may not continue to increase all the way from x' to xLP. Therefore, rather than just accepting xLP as the next trial solution, we choose the point that maximizes the nonlinear objective function along this line segment. This point may be found by conducting a procedure for one-variable un- constrained optimization of the kind presented in Sec. 13.4, where the one variable for pur- poses of this search is the fraction t of the total distance from x' to xLP. This point then becomes the new trial solution for initiating the next iteration of the algorithm, as just de- scribed. The sequence of trial solutions generated by repeated iterations converges to an optimal solution for the original problem, so the algorithm stops as soon as the successive trial solutions are close enough together to have essentially reached this optimal solution.

Summary of the Frank-Wolfe Algorithm

Initialization: Find a feasible initial trial solution x(0), for example, by applying linear programming procedures to find an initial BF solution. Set k = 1.

21M. Frank and P. Wolfe, “An Algorithm for Quadratic Programming,” Naval Research Logistics Quarterly, 3: 95–110, 1956. Although originally designed for quadratic programming, this algorithm is easily adapted to the case of a general concave objective function considered here.

Iteration 1: Because x = (0, 0) is clearly feasible (and corresponds to the initial BF solution for the linear programming constraints), let us choose it as the initial trial solu- tion x(0) for the Frank-Wolfe algorithm. Plugging x1 = 0 and x2 = 0 into the expressions for the partial derivatives gives c1 = 5 and c2 = 8, so that g(x) = 5x1 + 8x2 is the initial linear approximation of the objective function. Graphically, solving this linear programming prob- lem (see Fig. 13.17a) yields x(1) = (0, 3). For step 3 of the first iteration, the points on the line segment between (0, 0) and (0, 3) shown in Fig. 13.17a are expressed by

Figure 13.17b shows the trial solutions that are obtained from iterations 3, 4, and 5 as well. You can see how these trial solutions keep alternating between two trajectories that appear to intersect at approximately the point x = (1, 3). This point is, in fact, the optimal solution, as can be verified by applying the KKT conditions from Sec. 13.6.

This example illustrates a common feature of the Frank-Wolfe algorithm, namely, that the trial solutions alternate between two (or more) trajectories. When they alternate in this way, we can extrapolate the trajectories to their approximate point of intersection to esti- mate an optimal solution. This estimate tends to be better than using the last trial solu- tion generated. The reason is that the trial solutions tend to converge rather slowly toward an optimal solution, so the last trial solution may still be quite far from optimal.

If you would like to see another example of the application of the Frank-Wolfe algorithm, one is included in the Solved Examples section of the book’s website. Your OR Tutor provides an additional example as well. IOR Tutorial also includes an interactive procedure for this algorithm.

Some Other Algorithms

We should emphasize that the Frank-Wolfe algorithm is just one example of sequential- approximation algorithms. Many of these algorithms use quadratic instead of linear approximations at each iteration because quadratic approximations provide a considerably closer fit to the original problem and thus enable the sequence of solutions to converge con- siderably more rapidly toward an optimal solution than was the case in Fig. 13.17b. For this reason, even though sequential linear approximation methods such as the Frank-Wolfe algorithm are relatively straightforward to use, sequential quadratic approximation methods now are generally preferred in actual applications. Popular among these are the quasi- Newton (or variable metric) methods. As already mentioned in Sec. 13.5, these methods use a fast approximation of Newton’s method and then further adapt this method to take the con- straints of the problem into account. To speed up the algorithm, quasi-Newton methods com- pute a quadratic approximation to the curvature of a nonlinear function without explicitly calculating second (partial) derivatives. (For linearly constrained optimization problems, this nonlinear function is just the objective function; whereas with nonlinear constraints, it is the Lagrangian function described in Appendix 3.) Some quasi-Newton algorithms do not even explicitly form and solve an approximating quadratic programming problem at each iteration, but instead incorporate some of the basic ingredients of gradient algorithms. (See Selected Reference 5 for further details about sequential-approximation algorithms.)

We turn now from sequential-approximation algorithms to sequential unconstrained algorithms. As mentioned at the beginning of the section, algorithms of the latter type solve the original constrained optimization problem by instead solving a sequence of un- constrained optimization problems.

A particularly prominent sequential unconstrained algorithm that has been widely used since its development in the 1960s is the sequential unconstrained minimization technique (or SUMT for short).22 There actually are two main versions of SUMT, one of which is an exterior-point algorithm that deals with infeasible solutions while using a penalty function to force convergence to the feasible region. We shall describe the other version, which is an interior-point algorithm that deals directly with feasible solutions while using a barrier func- tion to force staying inside the feasible region. Although SUMT was originally presented as a minimization technique, we shall convert it to a maximization technique in order to be con- sistent with the rest of the chapter. Therefore, we continue to assume that the problem is in the form given at the beginning of the chapter and that all the functions are differentiable.

Sequential Unconstrained Minimization Technique (SUMT)

As the name implies, SUMT replaces the original problem by a sequence of unconstrained optimization problems whose solutions converge to a solution (local maximum) of the original problem. This approach is very attractive because unconstrained optimization problems are much easier to solve (see Sec. 13.5) than those with constraints. Each of the unconstrained problems in this sequence involves choosing a (successively smaller) strictly positive value of a scalar r and then solving for x so as to

Maximize P(x; r) = f (x) - rB(x).

Here B(x) is a barrier function that has the following properties (for x that are feasible for the original problem):

1. B(x) is small when x is far from the boundary of the feasible region.

2. B(x) is large when x is close to the boundary of the feasible region.

3. B(x) --- oc as the distance from the (nearest) boundary of the feasible region --- 0.

Thus, by starting the search procedure with a feasible initial trial solution and then at- tempting to increase P(x; r), B(x) provides a barrier that prevents the search from ever crossing (or even reaching) the boundary of the feasible region for the original problem.

For feasible values of x, note that the denominator of each term is proportional to the distance of x from the constraint boundary for the corresponding functional or nonnegativity constraint. Consequently, each term is a boundary repulsion term that has all the preceding three properties with respect to this particular constraint boundary. Another attractive feature of this B(x) is that when all the assumptions of convex programming are satisfied, P(x; r) is a concave function.

Because B(x) keeps the search away from the boundary of the feasible region, you probably are asking the very legitimate question: What happens if the desired solution lies there? This concern is the reason that SUMT involves solving a sequence of these un- constrained optimization problems for successively smaller values of r approaching zero (where the final trial solution from each one becomes the initial trial solution for the next). For example, each new r might be obtained from the preceding one by multiplying by a 22See Selected Reference 4.

constant () (0 < () < 1), where a typical value is () = 0.01. As r approaches 0, P(x; r) approaches f (x), so the corresponding local maximum of P(x; r) converges to a local max- imum of the original problem. Therefore, it is necessary to solve only enough unconstrained optimization problems to permit extrapolating their solutions to this limiting solution.

How many are enough to permit this extrapolation? When the original problem sat- isfies the assumptions of convex programming, useful information is available to guide us in this decision. In particular, if x is a global maximizer of P(x; r), then

f (x ) < f (x*) < f (x) + rB(x),

where x* is the (unknown) optimal solution for the original problem. Thus, rB( x) is the max- imum error (in the value of the objective function) that can result by using x to approximate x*, and extrapolating beyond x to increase f(x) further decreases this error. If an error toler- ance is established in advance, then you can stop as soon as rB( x) is less than this quantity.

Summary of SUMT

Initialization: Identify a feasible initial trial solution x(0) that is not on the boundary of the feasible region. Set k = 1 and choose appropriate strictly positive val- ues for the initial r and for () < 1 (say, r = 1 and () = 0.01).23

Iteration k: Starting from x(k-1), apply a multivariable unconstrained optimization pro- cedure (e.g., the gradient search procedure) such as described in Sec. 13.5 to find a local maximum x(k) of

in the expression for P(x; r) given under “Summary of SUMT,” and then the same procedure is used. The numerator -[bi - gi(x)]2 imposes a large penalty for deviating substantially from satisfying the equality constraint, and then the denominator tremendously increases this penalty as r is decreased to a tiny amount, thereby forcing the sequence of trial solutions to converge toward a point that satisfies the constraint.

SUMT has been widely used because of its simplicity and versatility. However, numer- ical analysts have found that it is relatively prone to numerical instability, so considerable caution is advised. For further information on this issue as well as similar analyses for alter- native algorithms, see Selected Reference 6.

23A reasonable criterion for choosing the initial r is one that makes rB(x) about the same order of magnitude as f (x) for feasible solutions x that are not particularly close to the boundary.

The Solved Examples section of the book’s website provides another example that illustrates the application of SUMT to a convex programming problem in minimization form. You also can go to your OR Tutor to see an additional example. An automatic pro- cedure for executing SUMT is included in IOR Tutorial.

Some Software Options for Convex Programming

As mentioned in Sec. 13.7, the standard Excel Solver includes a solving method called GRG Nonlinear for solving convex programming problems. The ASPE Solver also in- cludes this solving method. The Excel file for this chapter shows the application of this 24The technical reason is that f (x) is a (strictly) quasiconcave function that shares the property of concave func- tions that a local maximum always is a global maximum. For further information, see M. Avriel, W. E. Diewert, S. Schaible, and I. Zang, Generalized Concavity, Plenum, New York, 1985, and republished by SIAM Book- mart,Philadelphia, PA, 2010.

solving method to the first example in this section.

LINGO can solve convex programming problems, but the student version of LINDO cannot except for the special case of quadratic programming (which includes the first ex- ample in this section). Details for this example are given in the LINGO/LINDO file for this chapter in your OR Courseware.

The professional version of MPL supports a large number of solvers, including some that can handle convex programming. One of these, called CONOPT, is included with the student version of MPL that is on the book’s website. CONOPT (a product of AKRI Con- sulting) is designed specifically to solve convex programming problems very efficiently. It can be used by adding the following statement at the beginning of the MPL model file.

OPTIONS

ModelType = Nonlinear

The convex programming examples that are formulated in this chapter’s MPL/Solvers file have been solved with this solver.

Search This Blog

Operation Research course