Doing Some Simple Statistics on the Simulation Output

Up: Approximating the Solution of Previous: Using Simulations to Answer

Doing Some Simple Statistics on the Simulation Output

The output from the simulations in the previous section can provide us with a lot of information. We can apply simple statistics to the output to determine answers to a variety of questions. We might ask questions like:

What is the probability of winning or losing?
What is the average number of plays before the gambler loses?
What is the average number of plays before the gambler wins?
What is the median of the success and failure data?
What does the distribution of probabilities look like?

Even though we already know the answer to the first question, we can compute another approximation using the two output lists. We will estimate the answers to these questions using the output for the case when 5000 gamblers were simulated. The following are typical results that we would see. We should keep in mind that in each time the simulator is executed, the output will be different due to the way Mathematica generates random numbers. In the following calculations we will assume that we have two lists, win5000 and lose5000, that are the output from the simulation above.

Estimating the Probablility from the Output

To estimate the probability we will add up the number of values in both the win and loss lists and then divide by the total number of gamblers in the output lists. We can use the Sum and Part commands to do the work. First the sums can be computed using

  swin = Sum[ Part[win5000,i], {i,1,40}]

which returns a number like 1040 and

  slose = Sum[ Part[lose5000,i], {i,1,40}]

which returns 3319. Adding the two using

  totgam = swin + slose

gives 4359. Note that this number is not 5000. The reason why this is less than 5000 is that we neglected any gambling sequences that involved playing the game more than 40 times. In this case we are ignoring 641 sequences or 12.82% of the gamblers in the simulation. This number may be too large; see Problem 9.

We can still get an estimate of the probability of success by using

  probability = swin / totgam

An example output value for this case is

. This is similar to the values we have seen previously in the chapter. To get the other probability we can subtract the number above from 1.0.

Estimating the Average Playing Time

We can also compute the average number of times a successful or ruined gambler plays using the concept of an expected value. The expected value is computed using the formula

$\begin{displaymath} E[y_i] = {{\sum_{i=1}^n i y_i}\over{\sum_{i=1}^n y_i}} \end{displaymath}$

where the variable,

, is used to represent the data we are working on. We can compute the expected value of the number of times the successful gambler will play using the following command

   winmean = N[Sum[ i * Part[win5000,i], {i,1,40}] / swin]

which implements the formula for the expected value. Note that swin is the sum of the ways of winning and has already been computed as part of the effort to compute the probability of being successful. This computes the numerical value of

given the data above. So we can say that a successful gambler should expect to play about 17 or 18 times. The same calculation can be done for the ruined gamblers. The command is

   losemean = N[Sum[ i * Part[lose5000,i], {i,1,40}] / slose]

In this case, the value is

. So the ruined gambler will play about the same number of times as the successful gambler before being ruined.

Estimating the Median Playing Time

The median divides the distribution into two equal parts. The commands for the data on the successful gamblers are

  icnt = 0
  value = 0
  While[ value < swin/2,
       icnt = icnt + 1;
       value = value + Part[win5000,icnt]
       ]
  icnt

The variable icnt is 15 at the end which means about half of the time the gambler will play 15 or fewer times and half the time the gambler will play 15 or more times. The same calculation can be done for the ruined gamblers in the simulation. The value from the data in this example is also 15.

You should spend a little time looking at Figure (1.4) to make sure that these numbers make sense. There are a great number of additional statistics functions in Mathematica that we could use to analyze the output of the simulations. We will not spend a lot of time here on further analysis. Many of the things we have done in the analysis in this section could have been built into a more thorough or complicated module. For example, we could have written a module that automatically generated the histograms and the approximate probabilities and expected values. The way to do this should be clear, so we won't pursue this any further.

Graphing an Approximate Probability Distribution

**Figure 1.5:** The distribution of probabilities of winning and losing as a function of the number of times the gambler plays the game. The two distributions look are roughly the same. Note that this is not does not give the probability of winning or losing. These are conditional probability; e.g. if the gambler wins what is the probability of having played 12 times?
$\begin{figure}\centerline{\hbox{ \psfig{figure=chapters/gamblers_ruin/figs/gr_f... ...fig{figure=chapters/gamblers_ruin/figs/gr_fig5c.ps,height=2.375in}}}\end{figure}$

As a final tool in analyzing some of the questions raised in this section, we can go through the process of graphing to probability distributions. We can compute and graph the distribution of probabilities of winning as a function of the number of times a gambler will play the game. To do this for both winning and losing, we can use the following commands.

  Clear[data1,g1]
  data1 = Table[{i,N[win5000[[i]]/swin]}, {i,1,40,2}]
  g1 = ListPlot[data1, PlotJoined -> True,
                       PlotLabel -> " Winning PDF",
                       PlotStyle -> RGBColor[1,0,0]]

and

  Clear[data2,g2]
  data2 = Table[{i,N[win5000[[i]]/swin]}, {i,1,40,2}]
  g2 = ListPlot[data2, PlotJoined -> True,
                       PlotLabel -> " Winning PDF",
                       PlotStyle -> RGBColor[0,0,1]]

To show both curves we can use the commands

  Clear[g3]
  g3 = Show[g1,g2,
            PlotLabel -> "Both PDFs"]

In the code generating the graphics, a skip of 2 was placed in the list defining which points to graph. This is done to avoid the zeros from sequences that cannot exist in the problem. If these are included, a jagged curve will be seen. The results of these commands are shown in Figure (1.5).

Homework

Problem 1. Solve the following ordinary differential equations using DSolve. Discuss the behavior of the general solution of the differential equation. Use the Plot function to graph the pieces of the general solution (the independent functions) without the coefficients in your explanations.

d. $y''(x)+2y'(x)+ y(x)=e^{-x}$

e. $y'''(x)-2y''(x)+2y'(x)+ y(x)=e^{-x}$

Problem 2. Solve the following ordinary differential equations using DSolve. Discuss the behavior of the general solution of the differential equation. In this case you should graph the solutions using Mathematica to help explain the behavior.

a. with the initial condition .

b. with the initial condition

c. with the initial condition

d. with and $y(\pi)=1.3$ .

e. with and .

f. $y''(x)+2y'(x)+ y(x)=e^{-x}$ with the same boundary conditions as in d.

g. $y''(x)+2y'(x)+ y(x)=e^{-x}$ with the same initial conditions as in e.

Problem 3. Solve the linear second order ordinary differential equation

$\begin{displaymath} y''(t) - 3 y'(t) + 2 y(t) = 0 \end{displaymath}$

with

and

. Then solve the second order difference equation

$\begin{displaymath} y_{i+1} - 3 y_i + 2 y_{i-1} = 0 \end{displaymath}$

with

and

. Use the command DSolve to solve the ordinary differential equation and RSolve to solve the difference equation. To load in the RSolve pakcage use the Help facility and the Function Browser in the menus to load the package. Graph the solutions of the differential equation and recursion relationship.

Problem 4. Finish the solution process in the two point boundary value problem in the second example in the review section of solutions of ordinary differential equations. Compare the process to that for finding the solution of the finite difference equation in the following section.

Problem 5. Solve the following recursion relationships using RSolve (see Appendix ) and check the work showing the solution by hand. Use the process of assuming the solution is of the form

$\begin{displaymath} a_n = z^n \end{displaymath}$

and perform the algebraic steps after substituting the form in the finite difference equation in each case. In your notebook type in the steps you used to solve the problems and give the intermediate results. Note that you will need to load in the RSolve command. Use the function/package browser in the Help menu in the notebook.

a. $a_n = 0.4 a_{n-1}$ with the initial condition .

b. $a_{n+1} = a_{n} - 2 a_{n-1}$ with the initial conditions and .

c. $a_{n+1} = - 3 a_{n-1}$ with the initial conditions and .

Problem 6. Suppose a gambler starts with $10 and wants to leave the casino with at least $10. That is, the gambler wants to come out ahead or at least break even. Using the notation in the notes we have and is any value greater than or equal to $10. If the gambler chooses roulette and always plays to win on black the odds of winning are 18 out of 38. This means that $p={{18}\over{38}}$ and $q={{20}\over{38}}$ . Define a function that represents the probability of being an arbitrary amount ahead while playing the game. Graph the function using the Plot command and discuss the behavior of the function. In addition, do the same with and and then again for and . Show the graphs of the three cases on the same plot. You can use the Show command to do this. Answer the following questions.

a. In the second situation above, the starting amount of money is increased and in the third case, the probability of winning is increased. Which of the second and third cases shows the biggest difference relative to the first? Why?

b. How do the answers returned from the SGR code compare with the probabilities from the exact solution? To answer this question, simulate each of the three cases.

Problem 7. For this problem you will need to copy of a cell from the chapter notebook into your homework notebook. Search for the Module containing SGR and copy this into your notebook using the Edit menu on the notebook interface. Note that you must also retrieve the GamblersRuin Module. Let be the amount of money that you will start with, the amount of money that you want to end up with, and be the probability of winning at each play of the game. Go through each of the following cases.

First, compute the exact probability in each case using the exact solution developed in the chapter. Then use the module to get an estimate of the probability in each case. Describe the results that you see and compare the three cases. Do the probabilities make sense.

Problem 8. Use the cases defined in Problem 7 to do a more detailed analysis of the effect of the number of gamblers simulated on the accuracy of the estimation of the probability. In each case, vary the number of gamblers input to the SGR Module. Plot the results using ListPlot and discuss the accuracy results defined in the graph. For example, simulate 25, 50, 75, 100, ... gamblers and graph the results of these simulations and compare the output to the value returned by the simulator. Why is this an important analysis to perform?

Problem 9. In this problem you will need to do some full simulations and plot some graphs. To get the simulation module, copy the cell containing the module SGR to your homework notebook. In the simulations it is necessary to limit the length of the gambling sequences for practical reasons. Simulate the same cases in parts a, b, and c in Problem 7 using keeping all sequences of no more than 50, 100, and 200. Output barcharts of the results showing the number of winners and losers for different length sequences and discuss the results. Compare the probabilities of winning with those predicted by the exact solution and compare the approximations as the number of gamblers in the simulations is increased. Why is this analysis important to perform?

Problem 10. In the SGR code we ignore sequences that are longer than a specified value, maxnum. In this problem we will investigate how large this number should be. For each of the cases in Problem 7, determine a cutoff value for which less than 5% of the sequences are ignored in the simulation. It may pay to modify the SGR Module to do more work for this problem. If you do this describe the modifications to the code in detail. Why is this an important analysis to perform on the simulations?

Problem 11. A gambler wants to double the money in his pocket. Entering the casino the gambler asks you to advise him about the probability of doubling an initial money by playing roulette as in Problem 7. Advise the gambler of the probability of achieving the goal using graphs of various quantities. You should use a few examples to illustrate the advice you give.

Problem 12. A good deal more information is contained in the simulation results each time a test is run. Suppose that , , and . Use the lists returned from the SGR simulator to answer the following questions.

a. Compute an approximation of the probability that a gambler in the given situation will lose is exactly 10 plays of the game.

b. Compute an approximation of the probability that a gambler in the given situation will win and will play the game at least 10 times.

c. Compute an approximation of the probability that a gambler in the given situation will win and will play the game at less than 10 times.

d. Compute an approximation of the probability that a gambler in the given situation will lose and will play the game at less than 10 times.

Problem 14. Consider a more complicated gambling situation. A gambler decides to play two different games. The gambler is equally likely to play either game while at the casino. That is, the gambler is equally likely to play either of the games during the sequence before the goal is achieved or the gambler is ruined. Suppose the probability of winning in the first game is and the probability of winning in the second game is . We do not have the tools to define an exact solution in this case. Write a code that will simulate this process. Test your simulator on the following three problems. Hint: You can modify GamblersRuin and SGR to do this.

Problem 15. (Continuation) Consider an even more complicated gambling situation. A gambler again decides to play two different games. The gambler plays the first game about 15 out of 100 times and the second game for the balance. That is, the gambler is not equally likely to play either of the games during the sequence before the goal is achieved or the gambler is ruined. As in Problem 14 suppose the probability of winning in the first game is and the probability of winning in the second game is . Write a code that will simulate this process. Test your simulator on the problems defined in Problem 14. Hint: You can modify GamblersRuin and SGR versions from Problem 14 to do this.

Problem 16. Suppose that a gambler has a strategy of playing a higher risk game when relatively wealthy and a lower risk game when relatively poor. This means that at some amount of money the gambler will change to the second lower risk game. Write a simulator that can be used to determine the chance that the gambler will achieve the goal under these circumstances. Use the problems defined in Problem 14 and set the change over point to be half the desired goal, .

Problem 17. (Continuation) Suppose that the break over point is changed to one fourth of the goal amount, , in Problem 16. How does this affect the chances of the gambler. What is the optimal change over point between the games. Hint: If you think a little the answer to the last question is obvious.

Problem 18. Write a module that will return a graph of the amount of money that gambler has at any time in the process. For example, if a gambler starts with $25 and bets $1 on each play, keep track of in the definition of the Gambler's Ruin problem. Use the module to graph the function for values of , , and with and . Run the Module several times on this problem and discuss the results produced by the Module.

Problem 19. In the investment example at the beginning of the chapter, a simple investment game was described. Suppose that you make an initial investment of $1000 in the bond described. In addition, suppose that the interest is paid once per year on the bond (no quarterly compounding).

a. Set up a simulation of the process and use the simulator to estimate the probability that you will double your money given that the chance of default on the bond is 0.25 in any one year. Do the same simulation with the chance of default 0.10 and 0.50.

b. Consider the same investment as above, but include a quarterly compounding of the interest. That is, the interest should be compounded every 3 months (4 times per year) and the chance of failure in a given year is 0.25, 0.10, and 0.50, respectively. Compare your results to those in part a.

Problem 20. Let's consider a more complicated investment problem. In this case suppose that you put money into two types of investments. The following table describes the two investments. Set up the simulations and answer the questions below.

Investment Annual Return Chance of Default

#1 8.13% 5%

#2 10.59% 35%

a. Given that we invest $500 in each type of investment, give the probability of doubling the investment.

b. Do the same investing $250 in #1 and $750 in #2.

c. Do the same investing $750 in #1 and $250 in #2.

d. Compare the answers in the parts a, b, and c.

Problem 21. Suppose that we try to attack the problem of the dryland farmer. Write a simulator that incorporates the following statements into the process:

To have a profitable crop the fields must receive a minimum of 10 inches of moisture in a given year. From the weather records, this happens about 15 out 18 years.
In addition, in a 100 year period, about 6 crops will be hailed out. This means no profits for that year.
Pest such as locusts cause the farmer to lose a crop once in every 20 years.

If we consider the profit and loss in each year to be $10,000 and the farmer has a $100,000 bankroll to begin, what is the chance that the bankroll will be doubled? On average, how long will it take to double the initial amount of money?

Problem 22. (Continuation) Suppose that instead of losing a crop in a bad locust year, the farmer can pay $10,000 to spray pesticides on the fields to eliminate the pests. Incoporate this effect into the simulator developed in Problem 21 and answer the same questions as in Problem 21. What happens if the price of the application of the pesticides is cut in half or doubled?

Problem 23. The Module SGR in the text neglects any sequences that are longer than the input variable, maxnum. Inside the Module, the GamblersRuin is called without modification which computes the entire sequence regardless whether it is too long or not. If we only keep track of sequences that are less than or equal to maxnum in length, then the GamblersRuin Module can stop if the number of plays is too large. Modify the GamblersRuin Module to stop if the number of plays becomes too large. Discuss the difference in the performance of the modified code relative to the original code. Why is this analysis of the simulator important?

Up: Approximating the Solution of Previous: Using Simulations to Answer

Joe Koebbe 2003-10-01

Investment	Annual Return	Chance of Default
#1	8.13%	5%
#2	10.59%	35%