Machine Learning for March Madness Is a Competition In Itself

Machine Learning for March Madness Is a Competition In Itself

This year, 47million Americans will spend an estimated $8.5 billion betting on the outcome of the NCAA basketball championships, a cultural ritual appropriately known as March Madness. Before the tournament starts, anyone who wants to place a bet must fill out a bracket, which holds their predictions for each of the 63 championship games. The winner of a betting pool is the one whose bracket most closely mirrors the results of the championship.

For most people, making a bracket is a way to flex their knowledge of collegiate basketball and maybe make a few bucks by outguessing their colleagues in the office betting pool. But for the mathematically inclined, accurately predicting March Madness brackets is a technical problem in search of a solution.

In the past few years, the proliferation of open source machine learning tools and robust, publicly available datasets have added a technological twist to March Madness: Data scientists and statisticians now compete to develop the most accurate machine learning models for bracket predictions. In these competitions, knowing how to wield random forests and logistic regression counts for more than court smarts. In fact, knowing too much about basketball might hurt your odds. Welcome to the world of Machine Learning Madness.

Betting and sports have always been closely linked, but as the size of professional and collegiate leagues ballooned during the later half of the 20th century, predicting the outcomes of sporting competitions became exponentially more difficult. In 1939, just eight teams competed in the inaugural NCAA basketball tournament, which would make the odds of filling out a perfect bracket around one in 128. When the tournament expanded to 16 teams in 1951, those odds were lowered to one in 32,768, but this is still pretty good compared to your chances of filling out a perfect 64-team bracket today, which is around one in 9.2 quintillion.

There’s an important caveat here, however. These odds are calculated as if each team had a 50-50 chance of winning each game in the tournament, but in reality, some teams have a clear advantage over their opponents. For example, in the first round of March Madness the highest ranked teams (the first seeds) are pitted against the lowest ranked teams (the sixteenth seeds) in each division. Given that a sixteenth seed has beat a first seed only a single time in the history of March Madness, the outcomes of these games can be considered a given. As calculated by Duke University math professor Jonathan Mattingly, treating the outcomes of these games as guaranteed wins for the one seeds increases the odds of selecting a perfect bracket by six orders of magnitude to a measly one in 2.4 trillion.

In short, you have a far better chance of winning the Powerball jackpot—one in 300 billion—than you do of filling out a perfect March Madness bracket. The challenge for statisticians, then, is developing mathematical models that improve these dismal odds as much as possible. Tournament modeling or “bracketology” is a nearly alchemical process that involves identifying the most important factors in a team’s success and combining these elements in such a way that they produce the most accurate possible prediction about a team’s future performance.

These models will never be perfect, of course. There’s simply too much randomness in the system being modeled—players get injured, rosters change, coaches quit, and so on. This “noise” is something that no model will ever be able to fully anticipate. “The point is to try to find the trend and be more accurate than if you’re just going with your gut,” says Tim Chartier, an associate professor of mathematics at Davidson College, where he teaches a class on bracketology. “There’s only so much you can expect out of the model and then you just have to watch it play out with the randomness taking effect.”

The whole point of machine learning is to find meaningful trends among the noise. So using these techniques to predict NCAA champions makes perfect sense. Over the last few years, a steadily growing number of data scientists have competed in Machine Learning Madness, which invites participants to leverage machine learning techniques to create their NCAA tournament brackets. The contest is hosted on Kaggle, a Google-owned platform that is a cross between Stack Exchange and Github specifically designed for data scientists.

Machine Learning Madness was launched in 2014 by Jeff Sonas, the owner of a database consulting firm who also designed a chess ranking method, and Mark Glickman, a statistician at Harvard. Sonas and Glickman had previously organized Kaggle competitions around chess tournaments, but “it was a relatively obscure area so we [realized] we would have greater outreach if we did a more popular topic like March Madness,” Sonas says.

In the five years since Machine Learning Madness started, Sonas says the number of entrants to the competition has nearly tripled. This year, 955 competitors are vying for a total of $25,000 in prize money that will be distributed to the creators of the five most accurate brackets. But to take home the grand prize it’s not merely enough to have the most accurate bracket. Participants must also have predicted the outcome of their bracket with a high degree of certainty.

Before the NCAA tournament begins, Machine Learning Madness participants are given access to a massive trove of data that includes basic information like the scores for every Division I basketball game dating back to 1984, team box scores dating back to 2002, and all the team rankings from dozens of different rating systems collected by Massey. This means that participants can use machine learning to do their own regression analyses and create their own rating systems. If they don’t feel like digging into basketball stats, they can use machine learning “ensembling” techniques to analyze the results of the dozens of already existing rating systems.

Regardless of their technique, participants must predict the outcome of each of the roughly 2,000 possible NCAA tournament games. In addition to predicting the winner and loser of each possible matchup, the competitors must also declare how certain they are of this outcome on a scale from zero to one. Points are awarded to participants based on a log loss scale, which means that high levels of certainty for incorrect predictions are severely punished and vice versa. Thus, for example, if I predicted that Virginia will beat Purdue with 0.9 certainty and Purdue ends up winning, I will lose exponentially more points than if I had predicted that outcome with, say, 0.6 certainty.

Michael Todisco, a data scientist at the event marketing software company Splash, entered Machine Learning Madness for the first time last year. He says he’s always been an analytically minded sports fan and entered the competition on a whim. After Villanova trounced Michigan to win last year’s national championship, Todisco says he was surprised to learn that he had won Machine Learning Madness and would be taking home the $25,000 first prize.

According to Todisco, the hardest part about the contest was the small amount of data available to train machine learning algorithms and the outsized role that luck played in the predictions. When it comes to machine learning, more data is almost always better. And while Todisco bemoaned the lack of March Madness data for training machine learning algorithms relative to training them for other tasks, it’s a far more complete dataset than most sports statisticians were working with only a few decades ago.

Todisco says it did take awhile to figure out which machine learning approach would work best for the relatively limited amount of training data. The approach he eventually chose was a random forest algorithm, which basically uses decision trees to probabilistically model all the possible outcomes of the tournament to arrive at a prediction. Using the algorithm, Todisco was able to see how altering the values of various parameters affected the accuracy of his model’s predictions; he could fine-tune the model by slightly altering the parameters each time it was run.

At the heart of any March Madness model is the team ranking, an ordinal list based on the ratings of the constituent teams. These ratings are a few variables. The most obvious is a team’s win-loss record and some rating systems are based entirely on this metric. But trying to predict the results of a game like basketball using only a team’s win-loss record is a bit like trying to perform surgery with a hammer. It ignores a lot of details that are important for accurately assessing the relative strength of two teams. For example, a team that only wins by one point is much more evenly matched with their opponent than a team that wins by 30 points. If you were to make a prediction based only on the results of a game without considering its point spread, you might overestimate the likelihood that the victor will win again.

The tricky part for statisticians is determining not only which variables are relevant to predicting a team’s performance, but also the importance or weight of each variable relative to the others. In this respect, Todisco says he found strength of schedule, a team’s number of assists, and three point defense percentages to be strong indicators of a team’s future performance.

The biggest benefit of using machine learning to create his bracket, Todisco says, is that it “takes the human bias out of it.” For example, he says, “my model said [Loyola] had a 60 percent chance of beating University of Miami, which I would never have thought of without machine learning.”

The adoption of machine learning techniques isn’t just limited to the amateur bracketologists in the Kaggle competition, however. In August, the NCAA announced it was scrapping the Rating Percentage Index (RPI), a system it had used since 1981 to create the official ranking of the 353 Division I men’s basketball teams. In its stead it would use the NCAA Evaluation Tool (NET), a new rating system that was developed using machine learning methods.

A team’s RPI is a number that is supposed to quantify its relative strength compared to other teams in the division. This number is calculated by combining the team’s winning percentage (calculated as the number of games won divided by the number of games played), its opponent’s winning percentage, and the winning percentage of its opponent’s opponents, while also taking into account whether those wins occurred at home or away (home wins count for less than away wins).

The RPI was used by the NCAA championship selection committee to help determine which teams would compete in the tournament each year and how those teams would be seeded in the tournament. In theory, anyone filling out a March Madness bracket could simply look at the NCAA’s official ratings to determine how the tournament would play out. There would be upsets, of course, but if you just picked the NCAA’s highest ranked team in each bracket, your results should be pretty close to the actual results in the tournament.

The reality, however, was much different. In fact, the NCAA’s official rating system produced the second worst March Madness results of the 75 different rating systems tracked by the sports statistician Kenneth Massey in 2017. Although the inaccuracy of the official rating method had been criticized for years, it wasn’t until just before the start of this year’s collegiate basketball season that the NCAA revealed it would be using the NET rating system to help select teams for the tournament going forward.

The NCAA didn’t respond to my request for comment, but according to a press release describing the new system, it incorporates far more variables into its system for calculating a team’s rating. In addition to winning percentages, NET also factors in a team’s strength of schedule, game location, scoring margin (capped at 10 points), and “net offensive and defensive efficiency.” In a break with tradition, the NCAA hasn’t released the exact formula for the new rating system, but it did say the model was optimized using machine learning techniques that used late-season games, including tournament games, as training data.

Machine learning is a field that is both full of promise and woefully overhyped. We’ll have to wait to see the final results of the NCAA championship to determine whether it helped to create a more accurate official ranking, but if Machine Learning Madness has proved anything, it’s that the future of collegiate basketball is as much about building networks as cutting down the nets.

Images Powered by Shutterstock