One of my favorite times of the year is between the middle of March and the beginning of April, also known as March Madness. I grew up being hooked on college basketball; my parents are alumni of the University of Connecticut (which has arguably the best D1 women’s basketball programs in the country), and since my dad and his friend buy annual season tickets I would go to games when his friend was unavailable. Attending a D1 school for my doctorate at UNC-Chapel Hill has reignited my love of the game.
As many other people do, I created brackets for both the women’s and men’s tournaments. I pored over statistics and team schedules, trying to use the best combination of factors (games won and lost to highly-ranked teams, how often the opponents intercepted the ball, how many points they racked up on average per game, etc.) to support my picks. Unfortunately, both brackets were busted early on – in fact, I made a set of joke brackets where I chose my favorite schools to advance to each round, and embarrassingly enough the women’s bracket containing the picks I made randomly has gotten more right than the one I painstakingly cobbled together so far.
For what it is worth, this is not a surprising outcome; in Dae Hee Kwak’s work “The Overestimation Phenomenon in a Skill-Based Gaming Context: The Case of March Madness Pools”, he found that people who had reported more confidence in their draft picks had about the same accuracy as those who had less confidence. Regardless of how many games you watch or how much effort you put in to keep track of team statistics, it is tough to take all variables into account and to weigh them appropriately in a manual way. However, a lot of effort goes into building computational models that will give a more accurate answer.
One of the most well-known prediction models is generated by FiveThirtyEight, which attempts to predict the likelihood of teams winning each round before the tournament. Without going into too much detail, for the Men’s teams ¾ of the input data comes from the averaging multiple “power ratings” that assess the competitive strengths of each team (or in other words, how well is each team performing with respect to their overall schedule). For example, one source called LMRC collects information for each game in terms of the location of the game, which two teams were playing, and the margin of victory. If we pick a team (let’s call them A) at random and pick one of the games against another team (let’s call them B), we use information about the games outcome to compute the likelihood of team A being a better team over team B, given that team A won their game by X points. They then tweak the model to account for any home court advantage one team could have over another. Over multiple games each team can be reduced to a “steady-state” probability, or their overall likelihood of being the better team in all team matchups (teams are ranked in descending order based on this probability). The other ¼ of the input data comes from calculations such as the NCAA selection committee’s “S-curve” (the seeding list that is used to make bracket decisions) and pre-season team rankings. These pieces of information are combined to produce teams’ pre-tournament rankings. Adjustments to this score are made to account for any major injuries and player suspensions, performance in real time (for example, a lower-ranked team handily winning over a higher-ranked team in the first round will have their rank increased), and travel fatigue going from game to game.
Making predictions has even expanded into an annual competition; each year people from around the world participate in Kaggle’s “March Machine Learning Mania”. Users are given a wealth of information for every game spanning over a decade, such as game locations, the teams that were playing each other, free throws attempted (and made) for each team, three pointers attempted (and made) for each team, offensive and defensive rebounds for each team, points scored by each team, past tournament results, and much more. In addition, users are encouraged to seek other outside information. The users use this information to create a model that aims to accurately calculate the probability that teams will win each round in the past five March Madness tournaments, then separately on the current tournament.
So, how are the models this year? Unsurprisingly given that the number of upsets was unusually high, the models miscalled a lot of games; for example, the LRMC predictions did not predict that Loyola-Chicago would take out the highest-ranked team in the Midwest Conference (Illinois) or that Oral Roberts University (ranked #15 in the South Conference) would beat Ohio State (ranked #2). Furthermore, looking at the current Kaggle leaderboard and comparing it to 2019’s would suggest that users are far less accurate this year. But in the end, when there are greater than 100 billion possible matchups (when making educated guesses about the outcomes) some wrong answers are understandable.
Basketball Terminology:
Busted bracket: When someone picks the wrong team in a match-up to win and advance to the next round, thus ending the streak of 100% correct predictions.
D1: Division 1 school; it is the most competitive division with the best athletes and teams, as well as the largest budgets for the sports programs.
Seed: Ranking of the 64 college basketball teams that have qualified for the NCAA basketball tournament, determined by top collegiate basketball analysts and experts that compare teams based on the team’s performance metrics and competitiveness throughout the season. Seeding determines the first round matchups for the NCAA tournament bracket; programs are ranked from #1 (best) to #16 (worst) in their conference, with four conferences in total.
Upsets: Times where the lower-ranked team beat the higher-ranked team (with the exception of scenarios where the #9 seed beats the #8 seed).
Peer edited by Riya Gohil