There was a period in my life when I used to play poker quite a lot, Texas Hold’em specifically. I loved everything about it — trying to anticipate your opponents’ next steps, calculating probabilities based on what little information I had available, placing bets small and large, and managing the psyche, including riding the highs and lows of the game.

In poker, there’s a popular saying, “play the (wo)man, not the cards.” In other words, you won’t have all the information you need about the other cards in the game, so you have to do the best with what you know about your opponents’ playing styles.

I was recently reminded of this when I read a book by award-winning psychologist Maria Konnikova’s The Biggest Bluff, a memoir about her experience becoming a poker champion and what she learnt from the game about life. In her book, Maria discusses the critical thinking that goes into playing poker, how the game teaches you to make better choices with limited information, and strategies for handling the outcomes of those decisions.

As Konnikova notes, we’re often forced to operate with limited information in real life, too. You can predict with only so much certainty how the chips will fall — at the end of the day, it’s just a very educated guess. One of the primary ways we make these educated guesses is through statistics, and one of the most popular statistical methods is Monte Carlo simulations, fittingly named after Monte Carlo, Monaco, a poker capital of the world.

Monte Carlo simulations and unreliable data

As any statistician will tell you, Monte Carlo simulations have many compelling applications, from corporate finance and risk analysis to — you guessed it — poker.

In short, Monte Carlo simulations are based on the idea of using a large number of randomized simulations to predict an outcome in a complex situation, often when you don’t have a closed-form analytical model you can use instead.

Even with techniques like Monte Carlo, however, data is rarely complete or accurate, and even the most holistic dataset and experiment conditions should be met with scrutiny.

We refer to this all-too-common reality of unreliable data as data downtime. Strictly speaking, data downtime refers to periods of time when data is missing, inaccurate, or otherwise erroneous — and everyone from poker players to data teams are on the receiving end.

What this means for the U.S. elections

The upcoming 2020 U.S. election is an interesting event in and of itself, but also because the outcomes of such events are incredibly hard to predict (take our 2016 presidential election, for example).

In this diagram from FiveThirtyEight, Nate Silver and team visualize which presidential candidate, Democratic nominee Joe Biden or Republican nominee and incumbent Donald Trump, is most likely to win each state based on near real-time polling data. Image courtesy of FiveThirtyEight.

Every election cycle, FiveThirtyEight uses Monte Carlo to estimate election results. They run 40,000 Monte Carlo simulations across states to generate a range of possible outcomes, ranking them on their likelihood of occuring. FiveThirtyEight’s team of data scientists and journalists gets data for their model from state polls, which it combines with demographic, economic and other data to forecast who will win. As the election draws closer, and more and more polling information becomes available, their forecast becomes less uncertain.

I have yet to meet a data scientist who is willing to admit that their data is “perfect” or that their forecasts are 100 percent reliable. The elections, too, are not safe from data issues.

What this means for you

Election forecasts are not the only place where data downtime hits home and can affect us personally. An even starker case of this was the 2020 U.S. Census, an annual count of the U.S. population, determining the number of seats each state has in the U.S. House of Representatives, as well as distribute billions in federal funds to local communities. Like FiveThirtyEight, the U.S. Census also uses Monte Carlo simulations, in their case to evaluate the quality of new statistical methodology and analyze measurement errors in demographic sample surveys.

In 2020, the U.S. Census data collection process was plagued with data downtime issues, such as outdated technology, duplicate addresses, and a shortened deadline as a result of COVID-19. Taken together, these factors affect the integrity and accuracy of the census, standing in the way of democracy and proving that data downtime isn’t just a business problem.

The bottomline: data can be personal and events like these make it even more evident that we need to treat data with the diligence it deserves. As the best statisticians will tell you, not even a great Monte Carlo model can save your data.

Love poker and Monte Carlo simulations? Reach out to Barr Moses.