What are the odds?

Susan and I were talking about the Texas lottery the other day. The odds are so bad on most of the lottery games that I usually refer to them as a tax on stupidity. Well, I guess we’ve joined the ranks of the stupid because we finally decided to pay a dollar for a lottery ticket. I naturally got interested in how to calculate the odds of winning and how to pick a reasonably good number (or rather avoid picking a bad number).

Like most forms of gambling, they’ve made it as complex as possible to try to hide the terrible odds. The advertising is a bit deceptive too. I’m looking at a flyer from the Texas lottery commision that says my “overall” odds of winning are 1 in 57. However, it turns out the actual odds of winning the real jackpot are 1 in 47,784,352. That’s for the Lotto game. The Mega-millions game has odds so astronomically high they make Lotto look like a sure thing.

Trying to sort out how the odds are calculated was interesting. Six balls with numbers ranging from 1 to 44 on them are selected at each drawing. So my first thought was something like 44 * 43 * 42 * 41 * 40 * 39 but this turns out to be wrong for several reasons. The first five balls are drawn from a container that holds 44 sequentially number balls. The last ball, the “bonus ball”, is drawn from a second container that holds 44 sequentially numbered balls. The order of the first five balls isn’t important so a combination rather than a permutation should be used. As I understand it, this means the number of combinations of the first five balls can be calculated like so:

C(44,5) = 44!/(5!*(44-5)!) = 1,086,008

Multiply that by the 44 possible ball 6 values and you get a total of 47,784,352 combinations, one of which is the winning number.

I looked at a few websites that give alleged advice for winning the lottery but most ranged from wrong to absurd. They suggested things like avoiding the selection of numbers that form geometric patterns on the lottery forms and avoiding numbers based on birthdays. Some suggested avoiding sequential numbers or numbers that had been picked in previous drawings. None of these make any mathematical sense that I can see. I’m don’t know much about statistics but I decided I could probably come up with something better on my own.

Fortunately, the Texas lottery website offers CSV data of prior winning numbers going back to 2003. I grabbed the file and wrote a few lines of Perl to draw histograms of the differences between the winning ball values of each drawing (excluding the bonus ball). It turns out that small differences are as much as 20 times more likely than large differences. Clumps of two or three sequential numbers are very, very probable whereas evenly distributed numbers are highly unlikely. For example, 1 2 3 36 37 seems much more likely to occur than 1 11 21 31 41.

Aside from picking a good number, there seem to be two other ways of improving the odds. The first is simply to play the same number repeatedly. Play it 10 times and the odds improve to 1 in 4,778,435.2. Play it 100 times and the odds improve to 1 in 477,843.52. The second is to play multiple good numbers with the same effect.

One crazy idea that just occured to me is to check all possible three and four ball combinations against the database of winning numbers to see if there are any that occur more frequently than others. I wrote another little Perl program to do that. It’s running right now. It will probably take several hours to complete. I don’t hold out much hope of finding anything as I’m sure others must have tried this before as well.