Categories

## Probability in Python

Recently a friend sent me this probability problem.

“Which is more likely? A) You roll two dice 5 times and, every time, one of the two comes up as 1 and the other as 6. B) You roll 10 dice all at once. 5 come up as 1s and the other 5 come up as 6s. A, B or equal probability?”

The probability of A is easy to calculate. Rolling a die is an independent event. There are six possible outcomes for each die roll, so rolling two dice (lets call them red and green) gives 36 possible outcomes. The red die can be 1 or 6 and the green die has to be 1 if red is 6 and 6 if green is 1.

\frac{2}{6} \times \frac{1}{6} = \frac{2}{36}\\

We can visualize this in the table below.

We are rolling the red and green dice five times. So we calculate the probability as.

\bigg(\frac{2}{36}\bigg)^5 = \frac{32}{60466176}= \frac{1}{1889566} = 5.2922 \times \def\bar#1{#1^{-7}}\bar{10} .

That’s not very likely. Subtracting the above from 1 (which is certainty) we can see the chance of NOT getting five 1s and five 6s when rolling a pair of die five times is 0.9999994708.

Situation B is harder to calculate. When I first looked at this problem I made the mistake of thinking, “Well, each die has to have a mate. Every 1 has to have a 6 paired with it and every 6 has to have a one paired with it. Following the logic we used in A the first five die can be 1 or 6, and the next 5 have to be the mate of whatever the first five were. Mathematically this looks like;

\Big (\frac{2}{6}\Big)^5 \times \Big(\frac{1}{6}\Big)^5 = \frac{32}{60466176}= \frac{1}{1889566}.

This is the same probability as A. However this only covers two possible combinations of five 1s and five 6s. That is {1,1,1,1,1,6,6,6,6,6} and {6,6,6,6,6,1,1,1,1,1}. This worked as a solution for two dice because there are only two possible combinations; {1,6} and {6,1}. To work out every possible valid combination we could build another table.

However this is very time consuming and it’s easy to accidentally miss out a combination. Instead, we can imagine that 1 and 6 are the only possible outcomes and use the binomial theorem. This allows us to calculate the number of different combinations of number of k elements that can be chosen from an n-element set.

C(n,k) = \binom n k = \frac{n!}{k!(n-k)!}
C(10,5) = \binom {10} 5 = \frac{10!}{5!(10-5)!} = 252

So this means there are 252 out of 60466176 possible ways to get five 1s and five 6s from throwing ten dice all at once. Notice this also works with A.

This translates to a probability of NOT getting five 1s and five 6s of 0.9999958324.

\frac{252}{6^{10}} = \frac{252}{60466176} = 4.16761 \times \def\bar#1{#1^{-6}}\bar{10}

Another way of getting an answer to this problem ( remember we don’t need an exact probability, just what situation is more likely) is to roll each combination multiple times and see how often the desired outcome occurs.

Because the probability of A and B are so low we need A LOT of dice rolls. There are ways of working out how many rolls are needed to be x percent confident in the outcome, but for problems with a multinomial distribution it’s pretty complicated so we wont go into that here. If you’re interested check out this link.

In the interests of simplicity let’s just say we need one billion rolls to be reasonably confident. If I rolled one handful of dice every ten seconds and didn’t sleep or eat it would still take over 317 years to do this by hand. So not practical for humans, but luckily computers thrive on this kind of mind numbing computation.

To simulate this problem let’s break it down into steps and create some code to accomplish each one.

Simulate a dice. This is pretty easy in Python. We can just use the random int module to return random number between 1 and 6. I’m going to create a class that creates a dice object with a color and specified number of sides because I’ll probably simulate dice in the future and this way I’ve got a debugged dice creator ready to roll. When rolled the dice object returns an int between 1 and 6.

class Die:
"""
Class representing a die

"""

def __init__(self, colour, sides_count = 6):
self.__sides_count = sides_count
self.colour = colour

def description(self):
return "A {} die with {} sides".format(self.colour, self.__sides_count)

def get_sides_count(self):
"""
Returns the number of sides the die has
"""
return self.__sides_count

def roll(self):
"""
Rolls a die and returns a number from 1
to the sides count
"""
import random as _random
return _random.randint(1, self.__sides_count)

Simulate a handful of dice. We can create another class that ‘makes’ a handful of dice using the class we just made in step one. Each handful will be made up of a certain number of different colored dice and each handful can be rolled a specified number of times. It is possible to simulate this problem with just one dice object, but doing it this way is easier to understand and debug as it’s a virtual recreation of the real problem. The handful of dice can be rolled and the results are returned in an array.

class Handful:
"""
Class for a handful of dice.
"""
def __init__(self,num_dice=1,times_thrown=1):
self.num_dice = num_dice
self.times_thrown = times_thrown

def description(self):
return "A handful of {} die thrown {} times".format(self.num_dice, self.times_thrown)

def throw(self):

colour_list = ['Red', 'Green', 'White', 'Yellow', 'Blue','Black', 'Brown', 'Azure','Ivory', 'Teal']
die_list = []

for i in colour_list[0:self.num_dice]:
x = Die(i)
die_list.append(x)

throw_list = []

for i in range(0,self.times_thrown):
for x in die_list:
throw_list.append(x.roll())

throw_array = np.array(throw_list)
throw_array = np.reshape(throw_array, (self.times_thrown,self.num_dice))

return throw_array

Now we have a way to make handfuls of dice, let’s try to simulate A. We need to create a handful of two dice and roll them five times. That’s easy to do using the Die and Handful classes we just defined. What we get back from this is an array of 5 pairs of random numbers between 1 and 6. Something like this.

[[1,6],[6,1],[6,1],[1,6],[1,6]]

What we need to do now is check if this is a ‘win’.

The easiest way to do this is define another array called ‘A_win_array’ that looks like this.

[[1,6],[1,6],[1,6],[1,6],[1,6]]

Now we sort our roll array from lowest to highest and check if it matches the ‘win_array’. To do this we write a little function called ‘check_equal’ that will check if the arrays match.

def check_equal(a,b):
"Function to check if two arrays are equal"
if np.array_equal(a,b):
return 1
else:
return 0

If the roll is a ‘win’ then one is added to the win counter for A, if not it adds zero.

A_win_counter += check_equal(A, A_win_array)

Then we just repeat this process using a loop for however many rolls we want to test. I used one billion.

Then we can calculate the probability by dividing the index of each ‘win’ by the roll number.

If we plot the probability of every win we get this. You can see that due to the law of large numbers the probabilities start off pretty wild and then get closer and closer to the calculated value as the number of rolls increases.

Simulated probability of A is 5.4313e-07.
Calculated probability of A is 5.2922e-7.

Simulated probability of B is 4.29207e-06.
Calculated probability of B is 4.16761e-6.

Pretty close huh?

So now we can be pretty confident in answering the problem that sparked all this calculation. Although both probabilities are exceedingly small you’re almost 10 times more likely to get five 1s and five 6s by throwing a handful of ten dice once than two dice, five times.

If you’re interested in the complete code I used to simulate this problem you can find it here.