EXAMPLES OF PERFORMING SIMULATIONS ON SPREADSHEETS WITH ANALYTIC SOLVER PLATFORM:CHOOSING THE RIGHT DISTRIBUTION

CHOOSING THE RIGHT DISTRIBUTION

As mentioned in Sec. 20.6, ASPE’s Distributions menu provides a wealth of choices. Any of 46 probability distributions can be selected as the one to be entered into any uncertain variable cell. In the preceding sections, we have illustrated the use of five of these distributions (the integer uniform, uniform, triangular, normal, and binomial distributions). How- ever, not much was said about why any particular distribution was chosen.

In this section, we focus on the issue of how to choose the right distribution. We begin by surveying the characteristics of many of the 46 distributions and how these characteristics help to identify the best choice. We next describe a special feature of ASPE for creating one of the 7 available custom distributions when none of the other 39 choices in the Distributions menu will do. We then return to the example analyzed in Sec. 20.6 to illustrate another special feature of ASPE. When historical data are available, this feature will identify which of the available distributions provides the best fit to these data while also estimating the parameters of this distribution. If you do not like this choice, it will even identify which of the distributions provides the second best fit, the third best fit, and so on.

Characteristics of the Available Distributions

The probability distribution of any random variable describes the relative likelihood of the possible values of the random variable. A continuous distribution is used if any values are possible, including both integer and fractional numbers, over the entire range of possible values. A discrete distribution is used if only certain specific values (e.g., only the integer numbers over some range) are possible. However, if the only possible values are integer numbers over a relatively broad range, a continuous distribution may be used as an approximation by rounding any fractional value to the nearest integer. (This approximation was used in cells C10:C11 of the spreadsheet model in Fig. 28.16.) ASPE’s Distributions menu includes both continuous and discrete distributions. We will begin by looking at the continuous distributions.

The right-hand side of Fig. 28.19 shows the dialog box for three popular continuous distributions from the Common submenu of the Distributions menu. The dark figure in each dialog box displays a typical probability density function for that distribution. The height of the probability density function at the various points shows the relative likelihood of the corresponding values along the horizontal axis. Each of these distributions has a most likely value where the probability density function reaches a peak. Furthermore, all the other relatively high points are near the peak. This indicates that there is a tendency for one of the central values located near the most likely value to be the one that occurs. There- fore, these distributions are referred as central-tendency distributions. The characteristics of each of these distributions are listed on the left-hand side of Fig. 28.19.

The Normal Distribution

The normal distribution is widely used by both OR professionals and others because it describes so many natural phenomena. (Because of its importance, Appendix 5 provides a

table for this distribution.) One reason that it arises so frequently is that the sum of many random variables tends to have a normal distribution (approximately) even when the individual random variables do not. Using this distribution requires estimating the mean and the standard deviation. The mean coincides with the most likely value because this is a symmetric distribution. Thus, the mean is a very intuitive quantity that can be readily esti- mated, but the standard deviation is not. About two-thirds of the distribution lies within one standard deviation of the mean. Therefore, if historical data are not available for calculating an estimate of the standard deviation, a rough estimate can be elicited from a knowledgeable individual by asking for an amount such that the random value will be within that amount of the mean about two-thirds of the time.

One danger with using the normal distribution for some applications is that it can give negative values even when such values actually are impossible. Fortunately, it can give negative values with significant frequency only if the mean is less than three standard deviations. For example, consider the situation where a normal distribution was entered into an uncertain variable cell in Fig. 28.16 to represent the number of customers requesting a reservation. A negative number would make no sense in this case, but this was no problem since the mean (195) was much larger than three standard deviations (3 3 30 5 90) so a negative value essentially could never occur. (When normal distributions were entered into uncertain variable cells in Fig. 28.13 to represent cash flows, the means were small or even negative, but this also was no problem since cash flows can be either negative or positive.)

The Triangular Distribution

A comparison of the shapes of the triangular and normal distributions in Fig. 28.19 reveals some key differences. One is that the triangular distribution has a fixed minimum value and a fixed maximum value, whereas the normal distribution allows rare extreme values far into the tails. Another is that the triangular distribution can be asymmetric (as shown in the figure), because the most likely value does not need to be midway between the bounds, whereas the normal distribution always is symmetric. This asymmetry provides additional flexibility to the triangular distribution. Another key difference is that all its parameters— min (the minimum value), likely (the most likely value), and max (the maximum value)— are intuitive ones, so they are relatively easy to estimate.

These advantages have made the triangular distribution a popular choice for simulations. They are the reason why this distribution was used in previous examples to represent competitors’ bids for a construction contract (in Fig. 28.2), activity times (in Fig. 28.6), and cash flows (in Fig. 28.11).

However, the triangular distribution also has certain disadvantages. One is that, in many situations, rare extreme values far into the tails are possible, so it is quite artificial to have fixed minimum and maximum values. This also makes it difficult to develop meaningful estimates of the bounds. Still another disadvantage is that a curve with a gradually changing slope, such as the bell-shaped curve for the normal distribution, usually describes the true distribution more accurately than the straight line segments in the triangular distribution.

The Lognormal Distribution

The lognormal distribution shown at the bottom of Fig. 28.19 combines some of the advantages of the normal and triangular distributions. It has a curve with a gradually changing slope. It also allows rare extreme values on the high side. At the same time, it does not allow negative values, so it automatically fits situations where this is needed. This is par- ticularly advantageous when the mean is less than three standard deviations and the normal distribution should not be used.

This distribution always is “positively skewed,” meaning that the long tail always is to the right. This forces the most likely value to be toward the left side (so the mean is on its right), so this distribution is less flexible than the triangular distribution. Another disadvantage is that it has the same parameters as the normal distribution (the mean and the standard deviation), so the less intuitive one (the standard deviation) is difficult to estimate unless historical data are available.

When a positively skewed distribution that does not allow negative values is needed, the lognormal distribution provides an attractive option. That is why this distribution frequently is used to represent stock prices or real estate prices.

The Uniform and Integer Uniform Distributions

Although the preceding three distributions are all central-tendency distributions, the uniform distributions shown in Fig. 28.20 definitely are not. They have a fixed minimum value and a fixed maximum value. Otherwise, they say that no value between these bounds is any more likely than any other possible value. Therefore, these distributions have more variability than the central-tendency distributions with the same range of possible values (excluding rare extreme values).

The choice between these two distributions depends on which values between the minimum and maximum values are possible. If any values in this range are possible, including even fractional values, then the uniform distribution would be preferred over the integer uniform distribution. If only integer values are possible, then the integer uniform distribution would be the preferable one.

Either of these distributions is a particularly convenient one because it has only two parameters (lower and upper limit) and both are very intuitive. These distributions receive considerable use for this reason. In our examples of performing simulations on a spread- sheet, the integer uniform distribution was used to represent the demand for a newspaper (in Fig. 20.7 in Sec. 20.6), whereas the uniform distribution was used to generate the bid for a construction project by one competitor (in Fig. 28.2) and the future sale price for real estate property (in Fig. 28.13).

The disadvantage of this distribution is that it usually is only a rough approximation of the true distribution. It is uncommon for either the minimum value or the maximum value to be just as likely as any other value between these bounds while any value barely outside these bounds is impossible.

The Exponential Distribution

If you have studied Chap. 17 on queueing theory, you hopefully will recall that the most commonly used queueing models assume that the time between consecutive arrivals of customers to receive a particular service has an exponential distribution. The reason for this assumption is that, in most such situations, the arrivals of customers are random events and the exponential distribution is the probability distribution of the time between random events. Section 17.4 describes this property of the exponential distribution in some detail.

As first depicted in Fig. 17.3, this distribution has the unusual shape shown in Fig.

In particular, the peak is at 0 but there is a long tail to the right. This indicates that the most likely times are short ones well below the mean but that very long times also are possible. This is the nature of the time between random events.

Since the only parameter is the mean time until the next random event occurs, this distribution is a relatively easy one to use.

The Poisson Distribution

Although the exponential distribution (like most of the preceding ones) is a continuous distribution, the Poisson distribution is a discrete distribution, as shown in the bottom half of Fig. 28.21. The only possible values are nonnegative integers: 0, 1, 2. . . . However, it is natural to pair this distribution with the exponential distribution for the following reason. If the time between consecutive events has an exponential distribution (i.e., the events are occurring at random), then the number of events that occur within a certain period of time has a Poisson distribution. (This is property 4 of the exponential distribution described in Sec. 17.4.) The Poisson distribution has some other applications as well.

When considering the number of events that occur within a certain period of time, the mean to be entered into the one parameter field in the dialog box should be the average number of events that occur within that period of time.

The Bernoulli and Binomial Distribution

The Bernoulli distribution is a very simple discrete distribution with only two possible values (1 or 0) as shown in the top half of Fig. 28.22. It is used to simulate whether a particular event occurs or not. The only parameter of the distribution is the probability that the event occurs. The Bernoulli distribution gives a value of 1 (representing yes) with this probability; otherwise, it gives a value of 0 (representing no).

As shown in the bottom half of Fig. 28.22, the binomial distribution is an extension of the Bernoulli distribution for when an event might occur a number of times. The binomial distribution gives the probability distribution of the number of times a particular event occurs, given the number of independent opportunities (called trials) for the event to occur, where the probability of the event occurring remains the same from trial to trial. For exam- ple, if the event of interest is getting heads on the flip of a coin, the binomial distribution (with Prob. = 0.5) gives the distribution of the number of heads in a given number of flips

of the coin. Each flip constitutes a trial where there is an opportunity for the event (heads) to occur with a fixed probability (0.5). The binomial distribution is equivalent to the Bernoulli distribution when the number of trials is equal to 1.

You have seen another example in the preceding section when the binomial distribution was entered into the uncertain variable cell Number That Show (C17) in Fig. 28.16. In this airline overbooking example, the events are customers showing up for the flight and the trials are customers making reservations, where there is a fixed probability that a customer making a reservation actually will arrive to take the flight.

The only parameters for this distribution are the number of trials and the probability of the event occurring on a trial.

The Geometric and Negative Binomial Distributions

These two distributions displayed in Fig. 28.23 are related to the binomial distribution because they again involve trials where there is a fixed probability on each trial that the event will occur. The geometric distribution gives the distribution of the number of trials until the event occurs for the first time. After entering a positive integer into the suc field in its dialog box, the negative binomial distribution gives the distribution of the number of tri- als until the event occurs the number of times specified in the suc field (suc is the number of successful events that must occur). Thus, suc is a parameter for this distribution and the fixed probability of the event occurring on a trial is a parameter for both distributions.

To illustrate these distributions, suppose you are again interested in the event of get- ting heads on a flip of a coin (a trial). The geometric distribution (with Prob. = 0.5) gives the distribution of the number of flips until the first head occurs. If you want five heads, the negative binomial distribution (with Prob. = 0.5 and suc = 5) gives the distribution of the number of flips until heads have occurred five times.

Similarly, consider a production process with a 50 percent yield, so each unit pro- duced has an 0.5 probability of being acceptable. The geometric distribution (with Prob. = 0.5) gives the distribution of the number of units that need to be produced to obtain one acceptable unit. If a customer has ordered five units, the negative binomial distribution (with Prob. = 0.5 and suc = 5) gives the distribution of the production run size that is needed to fulfill this order.

Other Distributions

The Distributions menu includes many other distributions as well, such as beta, gamma, Weibull, Pert, Pareto, Erlang, and many more. These distributions are not as widely used in simulations, so they will not be discussed further.

There is also a Custom submenu that enables you to design your own distribution when none of the other distributions will do. The next subsection will focus on how this is done.

The Custom Distribution

Of the 46 probability distributions included in the Distributions menu, 39 of them are standard types that might be discussed in a course on probability and statistics. In most cases, one of these standard distributions will be just what is needed for an uncertain variable cell. However, unique circumstances occasionally arise where none of the standard distributions fit the situation. This is where the distributions in the Custom submenu of the Distributions menu enter the picture.

The custom distributions actually are not probability distributions until you make them one. Rather, choosing a member of the Custom submenu triggers a process that enables you to custom-design your own probability distribution to fit almost any unique situation you might encounter.

There are seven choices in the Custom submenu: Cumul (short for cumulative), Discrete, DisUniform (short for discrete uniform), General, Histogram, Sip, and Slurp. The

custom cumulative, custom general, and custom histogram distributions are all similar in that they are all used to create a continuous distribution with a fixed minimum and maximum value. With the custom cumulative distribution, you enter several values in between the minimum and maximum, along with the corresponding cumulative probability at those values. With the custom general distribution, you also enter several values in between the minimum and maximum, but instead of cumulative probabilities, you enter relative weights that represent how likely it should be for outcomes near the listed values to occur (relative to outcomes near the other values in the list). Finally, with the custom histogram distribution, the range between the minimum and maximum is divided into a number of equal-sized segments and weights are provided for each segment to indicate how likely it should be (relative to the other segments) for a random outcome to fall within that segment.

The custom discrete and the custom discrete uniform distributions are also similar. With both, you enter a set of discrete values and these values are assumed to be the only possible outcomes. With the custom discrete distribution, each value (or outcome) is assigned its own probability, whereas the custom discrete uniform distribution assumes that all the discrete values have the same probability.

Finally, the custom sip and custom slurp distribution are used when you have a set of historical data and you want the uncertain variable to sample directly from the historical data. This might be appropriate if you expect the future to behave similarly to the past.

We will show two examples that use distributions from the Custom submenu. The first utilizes the custom discrete distribution whereas the second utilizes the custom general distribution.

In the first example, a company is developing a new product but it is unclear which of three production processes will be needed to produce the product. The unit production cost will be $10, $12, or $14, depending on which process is needed. The probabilities for these individual discrete values of the cost are the following:

20 percent chance of $10 50 percent chance of $12 30 percent chance of $14

To enter this distribution, first choose Discrete from the Custom submenu under the Distributions menu on the ASPE ribbon. Each discrete value and weight (expressed as a decimal number representing the probability) is then entered in the values and weights boxes as a list within curly brackets, as shown in Fig. 28.24.

The second example also involves a company that is developing a new product. How- ever, the complication in this case is that our company’s management has learned that another firm is developing a competitive product. It is unclear which company will be able to bring its product to market first and thereby capture most of the sales. In this light, here are the predicted thousands of sales for our company’s new product:

0–20 (with 10 the most likely) if the competitive product comes to market first. 20–30 (all equally likely) if both products reach the market at the same time. 30–50 (with 40 the most likely) if our company’s product comes to market first.

The two products are believed to have an equal chance of reaching the market first. Each of the other two cases is considered about three times as likely as both products reaching the market at the same time.

To enter this distribution, first choose General from the Custom submenu of the Distributions menu to bring up the dialog box shown in Fig. 28.25. The first two parameters, min =0 and max =50, are used to specify the smallest and largest possible value for sales (in thousands). In the values box, any number of values between the minimum and

maximum can be entered as a list inside of curly brackets. For each value in this list, a cor- responding weight needs to be entered into the weights box. The weight is a relative value used to specify the likelihood of each value relative to the other values in the list. Since sales around 10 thousand or 40 thousand are each about three times as likely as sales between 20 thousand and 30 thousand, the weights for the values \{10, 20, 30, 40\} are entered as \{3, 1, 1, 3\}. The net result is what is known as a bimodal distribution, with two distinct peaks, as shown in the chart on the left side of the dialog box.

Identifying the Continuous Distribution That Best Fits Historical Data We now have at least mentioned most of the probability distributions in the Distributions menu and have described the characteristics of many of them. This brings us to the question of how to identify which distribution is best for a particular uncertain variable cell. When historical data are available, ASPE provides a powerful feature for doing this by using the Fit button on the ASPE ribbon. We will illustrate this feature next by returning to the example involving Freddie the newsboy that was presented in Sec. 20.6.

Recall that one of the most popular newspapers that Freddie the newsboy sells from his newsstand is the daily Financial Journal. Freddie purchases copies from his distributor early each morning. Since excess copies left over at the end of the day represent a loss for Freddie, he is trying to decide what his order quantity should be in the future. This led to the spreadsheet model in Fig. 20.7 that was presented in Sec. 20.6. This model includes the uncertain variable cell Demand (C12). To get started, a discrete uniform distribution between 40 and 70 has been entered into this uncertain variable cell.

To better guide his decision on what the order quantity should be, Freddie has been keeping a record of the demand (the number of customers requesting a copy) for this news- paper each day. Figure 28.26 shows a portion of the data he has gathered over the last 60 days in cells F4:F63, along with part of the original spreadsheet model from Fig. 20.7. These data indicate a lot of variation in sales from day to day—ranging from about 40 copies to 70 copies. However, it is difficult to tell from these numbers which distribution in the Distributions menu best fits these data.

ASPE provides the following procedure for fitting the best distribution to data:

1. Gather the data needed to identify the best distribution to enter into an uncertain vari- able cell.

2. Enter the data into the spreadsheet containing your simulation model.

3. Select the cells containing the data.

4. Click the Fit button on the ASPE ribbon, which brings up the Fit Options dialog box.

5. Make sure the Range box in this dialog box is correct for the range of the historical data in your worksheet.

6. Specify which type of distributions are being considered for fitting (continuous or dis- crete).

7. Indicate whether to allow shifted distributions and whether to run a sample indepen- dence test.

8. Also use this dialog box to select which ranking method should be used to evaluate how well a distribution fits the data.

9. Click Fit, which brings up the Fit Results chart that identifies the distribution that best fits the data.

10. If desired, check the box to select distributions to view that are lower on the list on the left side of the dialog box. This identifies the other types of distributions (including their parameter values) that are next in line for fitting the data well.

11. After choosing the distribution (from steps 9 and 10) that you want to use, close the dialog box by using the close box in the upper-right-hand corner and then click Yes to accept the distribution.

12. Click the cell where you want the uncertain variable cell to be. This then enters the chosen distribution into the uncertain variable cell.

Since Fig. 28.26 already includes the needed data in cells F4:F63, applying this proce- dure to Freddie’s problem begins by selecting the data. Then clicking the Fit button brings up the Fit Options dialog box displayed in Fig. 28.27. The range F4:F63 of the data in Fig. 28.26 is already entered into the Range box of this dialog box. When deciding which type of distributions should be considered for fitting, the default option of continuous

distributions has been selected here. Sales will always be integer, so discrete might seem the more logical choice. However, when all the integer values over a wide range are possi- ble (all 31 integer values between 40 and 70 in this case), the form of the distribution begins to resemble a continuous distribution. Furthermore, there are many more continu- ous distributions (31) available in ASPE than discrete distributions (8). Thus, there may a better chance of finding a continuous distribution that is a good fit. This continuous distri- bution can then be made to give only integer values by rounding each number in the uncer- tain variable cell to the nearest integer (as was done in the airline overbooking example of Sec. 28.5 with the ticket demand in cell C11 of Fig. 28.16). The chi-square test also has been selected for the ranking method. Clicking Fit then brings up the Fit Results chart dis- played in Fig. 28.28.

The left side of the Fit Results chart in Fig. 28.28 identifies the best-fitting distribu- tions, ranked according to the Chi-Square test. This is a widely used test in statistics where smaller values indicate a better fit. It appears that the uniform distribution would be a good fit. In combination with the fact that demand actually must be integer, this confirms that the choice made in Freddie’s original spreadsheet model in Fig. 20.7 to enter the integer

uniform distribution into the uncertain variable cell Demand (C12) was reasonable. In fact, if we had chosen Discrete instead of Continuous as the type of distribution to fit in Fig. 28.27, ASPE would have found the integer uniform distribution to be the best fit. Choosing either Continuous or Discrete (or both) would have been reasonable in this case and would have led to the same type of distribution (uniform).

Search This Blog

Operation Research course