DYNAMIC PROGRAMMING:CHARACTERISTICS OF DYNAMIC PROGRAMMING PROBLEMS

CHARACTERISTICS OF DYNAMIC PROGRAMMING PROBLEMS

The stagecoach problem is a literal prototype of dynamic programming problems. In fact, this example was purposely designed to provide a literal physical interpretation of the rather abstract structure of such problems. Therefore, one way to recognize a situation that can be formulated as a dynamic programming problem is to notice that its basic struc- ture is analogous to the stagecoach problem.

These basic features that characterize dynamic programming problems are presented and discussed here.

1. The problem can be divided into stages, with a policy decision required at each stage.

The stagecoach problem was literally divided into its four stages (stagecoaches) that correspond to the four legs of the journey. The policy decision at each stage was which life insurance policy to choose (i.e., which destination to select for the next stage- coach ride). Similarly, other dynamic programming problems require making a sequence of interrelated decisions, where each decision corresponds to one stage of the problem.

2. Each stage has a number of states associated with the beginning of that stage.

The states associated with each stage in the stagecoach problem were the states (or territories) in which the fortune seeker could be located when embarking on that particular leg of the journey. In general, the states are the various possible conditions in which the system might be at that stage of the problem. The number of states may be either finite (as in the stagecoach problem) or infinite (as in some subsequent examples).

3. The effect of the policy decision at each stage is to transform the current state to a state associated with the beginning of the next stage (possibly according to a probability distribution).

The fortune seeker’s decision as to his next destination led him from his current state to the next state on his journey. This procedure suggests that dynamic programming

problems can be interpreted in terms of the networks described in Chap. 10. Each node would correspond to a state. The network would consist of columns of nodes, with each column corresponding to a stage, so that the flow from a node can go only to a node in the next column to the right. The links from a node to nodes in the next col- umn correspond to the possible policy decisions on which state to go to next. The value assigned to each link usually can be interpreted as the immediate contribution to the objective function from making that policy decision. In most cases, the objective cor- responds to finding either the shortest or the longest path through the network.

4. The solution procedure is designed to find an optimal policy for the overall problem, i.e., a prescription of the optimal policy decision at each stage for each of the possible states.

For the stagecoach problem, the solution procedure constructed a table for each stage (n) that prescribed the optimal decision (xn*) for each possible state (s). Thus, in addition to identifying three optimal solutions (optimal routes) for the overall problem, the results show the fortune seeker how he should proceed if he gets detoured to a state that is not on an optimal route. For any problem, dynamic programming provides this kind of policy prescription of what to do under every possible circumstance (which is why the actual decision made upon reaching a particular state at a given stage is referred to as a policy decision). Providing this additional information beyond simply specifying an optimal solution (optimal sequence of decisions) can be helpful in a variety of ways, including sensitivity analysis.

5. Given the current state, an optimal policy for the remaining stages is independent of the policy decisions adopted in previous stages. Therefore, the optimal immediate decision depends on only the current state and not on how you got there. This is the principle of optimality for dynamic programming.

Given the state in which the fortune seeker is currently located, the optimal life insurance policy (and its associated route) from this point onward is independent of how he got there. For dynamic programming problems in general, knowledge of the current state of the system conveys all the information about its previous behavior nec- essary for determining the optimal policy henceforth. (This property is the Markovian property, discussed in Sec. 29.2.) Any problem lacking this property cannot be for- mulated as a dynamic programming problem.

6. The solution procedure begins by finding the optimal policy for the last stage.

The optimal policy for the last stage prescribes the optimal policy decision for each of the possible states at that stage. The solution of this one-stage problem is usu- ally trivial, as it was for the stagecoach problem.

7. A recursive relationship that identifies the optimal policy for stage n, given the opti- mal policy for stage n + 1, is available.

For the stagecoach problem, this recursive relationship was

INTRODUCTION TO OPERATIONS RESEARCH-0061

INTRODUCTION TO OPERATIONS RESEARCH-0062

where fn(sn, xn) would be written in terms of sn, xn, f *n+1(sn+1), and probably some measure of the immediate contribution of xn to the objective function. It is the inclu- sion of f *n+1(sn+1) on the right-hand side, so that f *n (sn) is defined in terms of f *n+1(sn+1), that makes the expression for f *n (sn) a recursive relationship.

The recursive relationship keeps recurring as we move backward stage by stage. When the current stage number n is decreased by 1, the new fn*(sn) function is derived by using the f *n+1(sn+1) function that was just derived during the preceding iteration, and then this process keeps repeating. This property is emphasized in the next (and fi- nal) characteristic of dynamic programming.

8. When we use this recursive relationship, the solution procedure starts at the end and moves backward stage by stage—each time finding the optimal policy for that stage— until it finds the optimal policy starting at the initial stage. This optimal policy immedi- ately yields an optimal solution for the entire problem, namely, x1* for the initial state s1, then x2* for the resulting state s2, then x3* for the resulting state s3, and so forth to x*N for the resulting stage sN.

This backward movement was demonstrated by the stagecoach problem, where the optimal policy was found successively beginning in each state at stages 4, 3, 2, and 1, respectively.4 For all dynamic programming problems, a table such as the following would be obtained for each stage (n = N, N - 1, . . . , 1).

When this table is finally obtained for the initial stage (n = 1), the problem of interest is solved. Because the initial state is known, the initial decision is specified by x1* in this table. The optimal value of the other decision variables is then specified by the other tables in turn according to the state of the system that results from the preceding decisions.

Comments

Popular posts from this blog

DUALITY THEORY:THE ESSENCE OF DUALITY THEORY

NETWORK OPTIMIZATION MODELS:THE MINIMUM SPANNING TREE PROBLEM

NETWORK OPTIMIZATION MODELS:THE SHORTEST-PATH PROBLEM