Nonlinear Programming:Multivariable Unconstrained Optimization

Multivariable Unconstrained Optimization

Now consider the problem of maximizing a concave function f (x) of multiple variables x = (x1, x2, . . . , xn ) when there are no constraints on the feasible values. Suppose again that the necessary and sufficient condition for optimality, given by the system of equations obtained by setting the respective partial derivatives equal to zero (see Sec. 13.3), cannot be solved analytically, so that a numerical search procedure must be used.

As for the one-variable case, a number of search procedures are available for solving such a problem numerically. One of these (the gradient search procedure) is an especially important one because it identifies and uses the direction of movement from the current trial solution that maximizes the rate at which f (x) is increased. This is one of the key ideas of nonlinear programming. Adaptations of this same idea to take constraints into ac- count are a central feature of many algorithms for constrained optimization as well.

After discussing this procedure in some detail, we will briefly describe how Newton’s method is extended to the multivariable case.

The Gradient Search Procedure

In Sec. 13.4, the value of the ordinary derivative was used by the bisection method to select one of just two possible directions (increase x or decrease x) in which to move from the current trial solution to the next one. The goal was to reach a point eventually where this derivative is (essentially) 0. Now, there are innumerable possible directions in which to move; they correspond to the possible proportional rates at which the respective variables can be changed. The goal is to reach a point eventually where all the partial derivatives are (essentially) 0. Therefore, a natural approach is to use the values of the partial derivatives to select the specific direction in which to move. This selection involves using the gradient of the objective function, as described next.

Because the objective function f (x) is assumed to be differentiable, it possesses a gra- dient, denoted by Vf (x), at each point x. In particular, the gradient at a specific point x = x' is the vector whose elements are the respective partial derivatives evaluated at x = x', so that

The significance of the gradient is that the (infinitesimal) change in x that maximizes the rate at which f (x) increases is the change that is proportional to Vf (x). To express this idea geometrically, the “direction” of the gradient Vf (x') is interpreted as the direction of the di- rected line segment (arrow) from the origin (0, 0, . . . , 0) to the point (af/ax1, af/ax2, . . . , af/axn), where af/axj is evaluated at xj = xj'. Therefore, it may be said that the rate at which f (x) increases is maximized if (infinitesimal) changes in x are in the direction of the gradi- ent Vf (x). Because the objective is to find the feasible solution maximizing f (x), it would seem expedient to attempt to move in the direction of the gradient as much as possible.

Because the current problem has no constraints, this interpretation of the gradient suggests that an efficient search procedure should keep moving in the direction of the gra- dient until it (essentially) reaches an optimal solution x*, where Vf (x*) = 0. However, normally it would not be practical to change x continuously in the direction of Vf (x), because this series of changes would require continuously reevaluating the af/axj and changing the direction of the path. Therefore, a better approach is to keep moving in a fixed direction from the current trial solution, not stopping until f (x) stops increasing. This stopping point would be the next trial solution, so the gradient then would be recalculated to determine the new direction in which to move. With this approach, each iteration in- volves changing the current trial solution x' as follows:

10This stopping rule generally will provide a solution x that is close to an optimal solution x*, with a value of f (x) that is very close to f (x*). However, this cannot be guaranteed, since it is possible that the function maintains a very small positive slope (< E) over a great distance from x to x*.

An analogy may help to clarify this procedure. Suppose that you need to climb to the top of a hill. You are nearsighted, so you cannot see the top of the hill in order to walk directly in that direction. However, when you stand still, you can see the ground around your feet well enough to determine the direction in which the hill is sloping upward most sharply. You are able to walk in a straight line. While walking, you also are able to tell when you stop climbing (zero slope in your direction). Assuming that the hill is concave, you now can use the gradient search procedure for climbing to the top efficiently. This problem is a two-variable problem, where (x1, x2) represents the coordinates (ignoring height) of your current location. The function f (x1, x2) gives the height of the hill at (x1, x2). You start each iteration at your current location (current trial solution) by deter- mining the direction [in the (x1, x2) coordinate system] in which the hill is sloping up- ward most sharply (the direction of the gradient) at this point. You then begin walking in this fixed direction and continue as long as you still are climbing. You eventually stop at a new trial location (solution) when the hill becomes level in your direction, at which point you prepare to do another iteration in another direction. You continue these iterations, following a zigzag path up the hill, until you reach a trial location where the slope is essentially zero in all directions. Under the assumption that the hill [ f (x1, x2)] is con- cave, you must then be essentially at the top of the hill.

The most difficult part of the gradient search procedure usually is to find t*, the value of t that maximizes f in the direction of the gradient, at each iteration. Because x and Vf (x) have fixed values for the maximization, and because f (x) is concave, this prob- lem should be viewed as maximizing a concave function of a single variable t. There- fore, it can be solved by the kind of search procedures for one-variable unconstrained optimization that are described in Sec. 13.4 (while considering only nonnegative values of t because of the t > 0 constraint). Alternatively, if f is a simple function, it may be possible to obtain an analytical solution by setting the derivative with respect to t equal to zero and solving.

Summary of the Gradient Search Procedure

Initialization: Select E and any initial trial solution x'. Go first to the stopping rule.

This completes the second iteration. With a typically small value of E, the procedure now would continue on to several more iterations in a similar fashion. (We will forgo the details.)

A nice way of organizing this work is to write out a table such as Table 13.3 which summarizes the preceding two iterations. At each iteration, the second column shows the current trial solution, and the rightmost column shows the eventual new trial solution, which then is carried down into the second column for the next iteration. The fourth column gives the expressions for the xj in terms of t that need to be substituted into f(x) to give the fifth column.

However, because this converging sequence of trial solutions never reaches its limit, the procedure actually will stop somewhere (depending on E) slightly below (1, 1) as its fi- nal approximation of x*.

As Fig. 13.14 suggests, the gradient search procedure zigzags to the optimal solu- tion rather than moving in a straight line. Some modifications of the procedure have been

Additional examples of the application of the gradient search procedure are included in both the Solved Examples section of the book’s website and your OR Tutor. The IOR Tutorial includes both an interactive procedure and an automatic procedure for applying this algorithm.

Newton’s Method

Section 13.4 describes how Newton’s method would be used to solve one-variable unconstrained optimization problems. The general version of Newton’s method actually is designed to solve multivariable unconstrained optimization problems. The basic idea is the same as described in Sec. 13.4, namely, work with a quadratic approximation of the objective function f(x) being maximized, where x = (x1, x2, . . . , xn) in this case. This approximating quadratic function is obtained by truncating the Taylor series around the current trial solution after the second derivative term. This approx- imate function then is maximized exactly to obtain the new trial solution to start the next iteration.

When the objective function is concave and both the current trial solution x and its gradient Vf(x) are written as column vectors, the solution x' that maximizes the approxi- mating quadratic function has the form, where V2f(x) is the n X n matrix (called the Hessian matrix) of the second partial deriva- tives of f(x) evaluated at the current trial solution x and [V2f(x)]-1 is the inverse of this Hessian matrix.

Nonlinear programming algorithms that employ Newton’s method (including those that adapt it to help deal with constrained optimization problems) commonly approximate the inverse of the Hessian matrix in various ways. These approximations of Newton’s method are referred to as quasi-Newton methods (or variable metric methods). We will comment further on the important role of these methods in nonlinear programming in Sec. 13.9.

Further description of these methods is beyond the scope of this book, but further de- tails can be found in books devoted to nonlinear programming.

Search This Blog

Operation Research course