Last week I posted my NCAA Bayesian Ratings and methodology. Today I thought I’d update the numbers quickly and add a new twist.
What is the objective in basketball? To win the game! When doing a predictive rating system (like this Bayesian method) or even trying to tell how good teams are over this season (KenPom’s ratings), we account for margin of victory rather than just wins and losses. To be perfectly “just,” when choosing who should be in the NCAA tournament, why should we look at margin of victory? I’m going to craft a rating system here that doesn’t look at margin of victory at all. I’ll call it DSMRPI, since the RPI system is what it would supersede.
Ken Pomeroy lists each team’s opponent strength as a Pythagorean winning percentage (against average foes). We also can easily calculate each team’s winning percentage. How does one merge the two into an overall win%? Just convert the win percentages into a Z-score, sum, and convert back into a win%. Normsdist(Normsinv(Win%)+Normsinv(OppPyth%)). Easy!
The top 10 teams:
Except… what happens if a team has never lost or has never won? Then Normsinv() is undefined, because the Z-score is either +infinity or -infinity. Great.
So here’s an alternate approach: assign each win a constant value and each loss a constant value (the same value). Sum those values, and add to the opponent efficiency differential to create an overall “efficiency differential” only based on opponent strength and win-loss record. To hone those constant values for the wins and losses, I maximized the correlation between the new number and the Z-scores calculated previously, as well as the correlation of the rank orders of the new number and the z-score.
A number between 18 and 21 for the value of a win works the best; I’ll use 20 just for simplicity’s sake. There isn’t much of a difference within that region. So I’m multiplying all wins by 20 and all losses by -20, thus assuming a margin of victory of 20 in all games. This harms teams with outlying records, but I think the Z-score is actually less valid than this method with teams with records like 29-1.
So the final DSMRPI ratings look like this:
Kansas is below San Diego St. because they played more home games, tweaking their SoS and overall value downward.
So now, with DSMRPI in hand we have a good idea of who SHOULD get into the NCAA tournament. I have added an automatic seeding capability to my spreadsheet, along with a few other cool tweaks.
I also added a “Heat” column, showing how the Bayesian Rating has changed over the last 10 games.
Open the FULL SPREADSHEET for all of the bells and whistles!
