DYNAMIC PROGRAMMING:DETERMINISTIC DYNAMIC PROGRAMMING
DETERMINISTIC DYNAMIC PROGRAMMING
This section further elaborates upon the dynamic programming approach to deterministic problems, where the state at the next stage is completely determined by the state and pol- icy decision at the current stage. The probabilistic case, where there is a probability dis- tribution for what the next state will be, is discussed in the next section.
4Actually, for this problem the solution procedure can move either backward or forward. However, for many problems (especially when the stages correspond to time periods), the solution procedure must move backward.
Deterministic dynamic programming can be described diagrammatically as shown in Fig. 11.3. Thus, at stage n the process will be in some state sn. Making policy decision xn then moves the process to some state sn+1 at stage n + 1. The contribution thereafter to the objective function under an optimal policy has been previously calculated to be f *n+1(sn+1). The policy decision xn also makes some contribution to the objective func- tion. Combining these two quantities in an appropriate way provides fn(sn, xn), the con- tribution of stages n onward to the objective function. Optimizing with respect to xn then gives f n*(sn) = fn(sn, xn*). After xn* and f n*(sn) are found for each possible value of sn, the solution procedure is ready to move back one stage.
One way of categorizing deterministic dynamic programming problems is by the form of the objective function. For example, the objective might be to minimize the sum of the contributions from the individual stages (as for the stagecoach problem), or to maximize such a sum, or to minimize a product of such terms, and so on. Another cat- egorization is in terms of the nature of the set of states for the respective stages. In par- ticular, states sn might be representable by a discrete state variable (as for the stagecoach problem) or by a continuous state variable, or perhaps a state vector (more than one vari- able) is required. Similarly, the decision variables (x1, x2, . . . , xN) also can be either dis- crete or continuous.
Several examples are presented to illustrate some of these possibilities. More impor- tantly, they illustrate that these apparently major differences are actually quite inconsequential (except in terms of computational difficulty) because the underlying basic structure shown in Fig. 11.3 always remains the same.
The first new example arises in a much different context from the stagecoach prolem, but it has the same mathematical formulation except that the objective is to maxi- mize rather than minimize a sum.
EXAMPLE 2 Distributing Medical Teams to Countries
The WORLD HEALTH COUNCIL is devoted to improving health care in the underdeveloped countries of the world. It now has five medical teams available to allocate among three such countries to improve their medical care, health education, and training pro- grams. Therefore, the council needs to determine how many teams (if any) to allocate to each of these countries to maximize the total effectiveness of the five teams. The teams must be kept intact, so the number allocated to each country must be an integer.
The measure of performance being used is additional person-years of life. (For a particular country, this measure equals the increased life expectancy in years times the country’s population.) Table 11.1 gives the estimated additional person-years of life (in multiples of 1,000) for each country for each possible allocation of medical teams.
Which allocation maximizes the measure of performance?
Formulation. This problem requires making three interrelated decisions, namely, how many medical teams to allocate to each of the three countries. Therefore, even though there is no fixed sequence, these three countries can be considered as the three stages in
Six days after Saddam Hussein ordered his Iraqi military forces to invade Kuwait on August 2, 1990, the United States began the long process of deploying many of its own military units and cargo to the region. After develop- ing a coalition force from 35 nations led by the United States, the military operation called Operation Desert Storm was launched on January 17, 1991, to expel the Iraqi troops from Kuwait. This led to a decisive victory for the coalition forces, which liberated Kuwait and pene- trated Iraq.
The logistical challenge involved in quickly trans- porting the needed troops and cargo to the war zone was a daunting one. A typical airlift mission carrying troops and cargo from the United States to the Persian Gulf required a three-day round-trip, visited seven or more different air- fields, burned almost one million pounds of fuel, and cost
$280,000. During Operation Desert Storm, the Military Airlift Command (MAC) averaged more than 100 such missions daily as it managed the largest airlift in history.
To meet this challenge, operations research was applied to develop the decision support systems needed to schedule and route each airlift mission. The OR tech- nique used to drive this process was dynamic program- ming. The stages in the dynamic programming formulation correspond to the airfields in the network of flight legs
relevant to the mission. For a given airfield, the states are characterized by the departure time from the airfield and the remaining available duty for the current crew. The objective function to be minimized is a weighted sum of several measures of performance: the lateness of deliver- ies, the flying time of the mission, the ground time, and the number of crew changes. The constraints include a lower bound on the load carried by the mission and upper bounds on the availability of crew and ground-support resources at airfields.
This application of dynamic programming had a dramatic impact on the ability to deliver the necessary cargo and personnel to the Persian gulf quickly to sup- port Operation Desert Storm. For example, when speaking to the developers of this approach, MAC’s deputy chief of staff for operations and transportation is quoted as saying, “I guarantee you that we could not have done that (the deployment to the Persian Gulf) without your help and the contributions you made to (the decision support systems)—we absolutely could not have done that.”
Source: M. C. Hilliard, R. S. Solanki, C. Liu, I. K. Busch,
G. Harrison, and R. D. Kraemer: “Scheduling the Operation Desert Storm Airlift: An Advanced Automated Scheduling Support System,” Interfaces, 22(1): 131–146, Jan.–Feb. 1992.
a dynamic programming formulation. The decision variables xn (n = 1, 2, 3) are the num- ber of teams to allocate to stage (country) n.
The identification of the states may not be readily apparent. To determine the states, we ask questions such as the following. What is it that changes from one stage to the next? Given that the decisions have been made at the previous stages, how can the status of the situation at the current stage be described? What information about the current state of affairs is necessary to determine the optimal policy hereafter? On these bases, an appro- priate choice for the “state of the system” is sn = number of medical teams still available for allocation to remaining countries (n, . . . , 3).
Thus, at stage 1 (country 1), where all three countries remain under consideration for al- locations, s1 = 5. However, at stage 2 or 3 (country 2 or 3), sn is just 5 minus the num- ber of teams allocated at preceding stages, so that the sequence of states is
s1 = 5, s2 = 5 - x1, s3 = s2 - x2.
With the dynamic programming procedure of solving backward stage by stage, when we are solving at stage 2 or 3, we shall not yet have solved for the allocations at the preceding stages. Therefore, we shall consider every possible state we could be in at stage 2 or 3, namely, sn = 0, 1, 2, 3, 4, or 5.
Figure 11.4 shows the states to be considered at each stage. The links (line segments)
show the possible transitions in states from one stage to the next from making a feasible allocation of medical teams to the country involved. The numbers shown next to the links are the corresponding contributions to the measure of performance, where these numbers
come from Table 11.1. From the perspective of this figure, the overall problem is to find the path from the initial state 5 (beginning stage 1) to the final state 0 (after stage 3) that maximizes the sum of the numbers along the path.
To state the overall problem mathematically, let pi(xi) be the measure of performance from allocating xi medical teams to country i, as given in Table 11.1. Thus, the objective is to choose x1, x2, x3 so as to
Solution Procedure. Beginning with the last stage (n = 3), we note that the values of p3(x3) are given in the last column of Table 11.1 and these values keep increasing as we move down the column. Therefore, with s3 medical teams still available for allocation to country 3, the maximum of p3(x3) is automatically achieved by allocating all s3 teams; so x3* = s3 and f 3*(s3) = p3(s3), as shown in the following table.
Since allocating x1 medical teams to country 1 leads to a state of 5 - x1 at stage 2, a choice of x1 = 0 leads to the bottom node on the right, x1 = 1 leads to the next node up, and so forth up to the top node with x1 = 5. The corresponding p1(x1) values from Table 11.1 are shown next to the links. The numbers next to the nodes are obtained from the f 2*(s2) column of the n = 2 table. As with n = 2, the calculation needed for each alternative value of the decision variable involves adding the corresponding link value and node value, as summarized below:
A Prevalent Problem Type—The Distribution of Effort Problem
The preceding example illustrates a particularly common type of dynamic programming problem called the distribution of effort problem. For this type of problem, there is just one kind of resource that is to be allocated to a number of activities. The objective is to determine how to distribute the effort (the resource) among the activities most effectively. For the World Health Council example, the resource involved is the medical teams, and the three activities are the health care work in the three countries.
Assumptions. This interpretation of allocating resources to activities should ring a bell for you, because it is the typical interpretation for linear programming problems given at the beginning of Chap. 3. However, there also are some key differences between the distribution of effort problem and linear programming that help illuminate the general distinctions between dynamic programming and other areas of mathematical programming.
One key difference is that the distribution of effort problem involves only one re- source (one functional constraint), whereas linear programming can deal with thousands of resources. (In principle, dynamic programming can handle slightly more than one re- source, but it quickly becomes very inefficient when the number of resources is increased because a separate state variable is required for each of the resources. This is referred to as the “curse of dimensionality.”)
On the other hand, the distribution of effort problem is far more general than linear programming in other ways. Consider the four assumptions of linear programming pre- sented in Sec. 3.3: proportionality, additivity, divisibility, and certainty. Proportionality is routinely violated by nearly all dynamic programming problems, including distribution of effort problems (e.g., Table 11.1 violates proportionality). Divisibility also is often violated, as in Example 2, where the decision variables must be integers. In fact, dynamic programming calculations become more complex when divisibility does hold (as in Example 4). Although we shall consider the distribution of effort problem only under the assumption of certainty, this is not necessary, and many other dynamic programming problems violate this assumption as well (as described in Sec. 11.4).
Of the four assumptions of linear programming, the only one needed by the distribution of effort problem (or other dynamic programming problems) is additivity (or its analog for functions involving a product of terms). This assumption is needed to satisfy the principle of optimality for dynamic programming (characteristic 5 in Sec. 11.2).
Formulation. Because they always involve allocating one kind of resource to a num- ber of activities, distribution of effort problems always have the following dynamic pro- gramming formulation (where the ordering of the activities is arbitrary):
Note how the structure of this diagram corresponds to the one shown in Fig. 11.5 for the World Health Council example of a distribution of effort problem. What will differ from one such example to the next is the rest of what is shown in Fig. 11.5, namely, the rela- tionship between fn(sn, xn) and f *n+1(sn - xn), and then the resulting recursive relationship between the f n* and f *n+1 functions. These relationships depend on the particular objective function for the overall problem.
5This statement assumes that xn and sn are expressed in the same units. If it is more convenient to define xn as some other quantity such that the amount of the resource allocated to activity n is anxn, then sn+1 = sn - anxn.
The structure of the next example is similar to the one for the World Health Council because it, too, is a distribution of effort problem. However, its recursive relationship differs in that its objective is to minimize a product of terms for the respective stages.
At first glance, this example may appear not to be a deterministic dynamic programming problem because probabilities are involved. However, it does indeed fit our definition because the state at the next stage is completely determined by the state and policy decision at the current stage.
EXAMPLE 3 Distributing Scientists to Research Teams
A government space project is conducting research on a certain engineering problem that must be solved before people can fly safely to Mars. Three research teams are currently trying three different approaches for solving this problem. The estimate has been made that, under present circumstances, the probability that the respective teams—call them 1, 2, and 3—will nt succeed is 0.40, 0.60, and 0.80, respectively. Thus, the current prob- ability that all three teams will fail is (0.40)(0.60)(0.80) = 0.192. Because the objective is to minimize the probability of failure, two more top scientists have been assigned to the project.
Table 11.2 gives the estimated probability that the respective teams will fail when 0, 1, or 2 additional scientists are added to that team. Only integer numbers of scientists are considered because each new scientist will need to devote full attention to one team. The problem is to determine how to allocate the two additional scientists to minimize the prob- ability that all three teams will fail.
Formulation. Because both Examples 2 and 3 are distribution of effort problems, their underlying structure is actually very similar. In this case, scientists replace medical teams as the kind of resource involved, and research teams replace countries as the activities. Therefore, instead of medical teams being allocated to countries, scientists are being al- located to research teams. The only basic difference between the two problems is in their objective functions.
With so few scientists and teams involved, this problem could be solved very easily by a process of exhaustive enumeration. However, the dynamic programming solution is presented for illustrative purposes.
In this case, stage n (n = 1, 2, 3) corresponds to research team n, and the state sn is the number of new scientists still available for allocation to the remaining teams. The decision
variables xn (n = 1, 2, 3) are the number of additional scientists allocated to team n.
Let pi(xi) denote the probability of failure for team i if it is assigned xi additional scientists, as given by Table 11.2. If we let II denote multiplication, the government’s objective is to choose x1, x2, x3 so as to
Therefore, the optimal solution must have x1* = 1, which makes s2 = 2 - 1 = 1, so that x2* = 0, which makes s3 = 1 - 0 = 1, so that x3* = 1. Thus, teams 1 and 3 should each receive one additional scientist. The new probability that all three teams will fail would then be 0.060.
All the examples thus far have had a discrete state variable sn at each stage. Further- more, they all have been reversible in the sense that the solution procedure actually could have moved either backward or forward stage by stage. (The latter alternative amounts to renumbering the stages in reverse order and then applying the procedure in the standard way.) This reversibility is a general characteristic of distribution of effort problems such as Examples 2 and 3, since the activities (stages) can be ordered in any desired manner.
The next example is different in both respects. Rather than being restricted to integer values, its state variable sn at stage n is a continuous variable that can take on any value over certain intervals. Since sn now has an infinite number of values, it is no longer possible to consider each of its feasible values individually. Rather, the solution for f n*(sn) and xn* must be expressed as functions of sn. Furthermore, this example is not reversible because its stages correspond to time periods, so the solution procedure must proceed backward.
Before proceeding directly to the rather involved example presented next, you might find it helpful at this point to look at the two additional examples of deterministic dynamic programming presented in the Solved Examples section of the book’s website. The first one involves production and inventory planning over a number of time periods. Like the exam- ples thus far, both the state variable and the decision variable at each stage are discrete. How- ever, this example is not reversible since the stages correspond to time periods. It also is not a distribution of effort problem. The second example is a nonlinear programming problem with two variables and a single constraint. Therefore, even though it is reversible, its state and decision variables are continuous. However, in contrast to the following example (which has four continuous variables and thus four stages), it has only two stages, so it can be solved relatively quickly with dynamic programming and a bit of calculus.
EXAMPLE 4 Scheduling Employment Levels
The workload for the LOCAL JOB SHOP is subject to considerable seasonal fluctuation. However, machine operators are difficult to hire and costly to train, so the manager is re- luctant to lay off workers during the slack seasons. He is likewise reluctant to maintain his peak season payroll when it is not required. Furthermore, he is definitely opposed to overtime work on a regular basis. Since all work is done to custom orders, it is not pos- sible to build up inventories during slack seasons. Therefore, the manager is in a dilemma as to what his policy should be regarding employment levels.
The following estimates are given for the minimum employment requirements during the four seasons of the year for the foreseeable future:
Employment will not be permitted to fall below these levels. Any employment above these levels is wasted at an approximate cost of $2,000 per person per season. It is estimated that the hiring and firing costs are such that the total cost of changing the level of employment from one season to the next is $200 times the square of the difference in em- ployment levels. Fractional levels of employment are possible because of a few part-time employees, and the cost data also apply on a fractional basis.
Formulation. On the basis of the data available, it is not worthwhile to have the em- ployment level go above the peak season requirements of 255. Therefore, spring em- ployment should be at 255, and the problem is reduced to finding the employment level for the other three seasons.
For a dynamic programming formulation, the seasons should be the stages. There are actually an indefinite number of stages because the problem extends into the indefinite future. However, each year begins an identical cycle, and because spring employment is known, it is possible to consider only one cycle of four seasons ending with the spring season, as summarized below:
Note a key difference between the nature of this solution and those obtained for the preceding examples where there were only a few possible states to consider. We now have an infinite number of possible states (240 < s3 < 255), so it is no longer feasible to solve separately for x3* for each possible value of s3. Therefore, we instead have solved for x3* as a function of the unknown s3.
this value of x2 is the desired minimizing value if it is feasible (240 < x2 < 255). Over the possible s2 values (220 < s2 < 255), this solution actually is feasible only if 240 < s2 < 255.
Therefore, we still need to solve for the feasible value of x2 that minimizes f2(s2, x2) when 220 < s2 < 240. The key to analyzing the behavior of f2(s2, x2) over the feasible region for x2 again is the partial derivative of f2(s2, x2). When s2 < 240,
Note that this region (240 < x1 < 255) includes x1 = 240, so that f1(s1, 240) > f1(s1, 247.5). In the next-to-last paragraph, we found that x1 = 240 minimizes f1(s1, x1) over the region 220 < x1 < 240. Consequently, we now can conclude that x1 = 247.5 also minimizes f1(s1, x1) over the entire feasible region 220 < x1 < 255.
Therefore, by tracing back through the tables for n = 2, n = 3, and n = 4, respec- tively, and setting sn = x*n-1 each time, the resulting optimal solution is x1* = 247.5, x2* = 245, x3* = 247.5, x4* = 255, with a total estimated cost per cycle of $185,000.
You now have seen a variety of applications of dynamic programming, with more to come in the next section. However, these examples only scratch the surface. For exam- ple, Chapter 2 of Selected Reference 2 describes 47 types of problems to which dynamic programming can be applied. (This reference also presents a software tool that can be used to solve all these problem types.) The one common theme that runs through all these applications of dynamic programming is the need to make a series of interrelated deci- sions and the efficient way dynamic programming provides for finding an optimal com- bination of decisions.
Comments
Post a Comment