|
APBRmetrics The statistical revolution will not be televised.
|
View previous topic :: View next topic |
Author |
Message |
Ed Küpfer
Joined: 30 Dec 2004 Posts: 785 Location: Toronto
|
Posted: Tue Jan 25, 2005 8:03 pm Post subject: Off Topic: NFL Pythagorean Win% |
|
|
I wrote the following for another forum, but figured someone here might be interested.
Pythagorean projections take the form
Code: | Win% = Pts^x / (Pts^x + OppPts^x) |
where x is some exponent. The exponent that minimizes errors for teams between 1991 and 2004 is 2.45. That gives us a RMSE of 0.0766. In English, this means that roughly 2/3 of all team's projected win totals will fall within 1.2 wins of their actual win totals. (BTW if you use 2 or 3 in the exponent, you'll get a RMSE of 1.32 games and 1.29 games respectively, meaning you don't gain much accuracy with the extra decimal places.) IIRC the RMSE for MLB Pythagorean projections was around 3 games, or about 4 times more accurate relative to the season length. The RMSE for NBA was around 3.5 games, or about twice as accurate as the NFL Pythagorean projections.
Somewhere, someone named Patriot came up with PythagoPat, which takes the form of the Pythagorean equation, but with the following as the exponent:
Code: | x = ((Pts + OppPts) ^ 0.28) |
This outperforms the regular Pyth for MLB (as I recall), and does so as well for NFL ball. Using 0.14 instead of 0.28 in the exponent, we get a RMSE of 0.0761, a minimal improvement over the Pyth.
Finally, Dean Oliver came up with what he called the Correlated Gaussian method. It's explained here. It also outperforms Pyth, although the gain in accuracy is again minimal: RMSE = 0.0725, or 1.16 wins.
For my money, simplicity is the way to go: just use the old Pythagorean Method, with 2 as your exponent. _________________ ed |
|
Back to top |
|
|
Kevin Pelton Site Admin
Joined: 30 Dec 2004 Posts: 979 Location: Seattle
|
Posted: Wed Jan 26, 2005 11:59 pm Post subject: |
|
|
It's worth noting that the Pythagorean ratings work quite well in football, as "Dean Oliver of football" Aaron Schatz pointed out in the New York Times this Sunday:
Quote: | Only 2 of the past 16 Super Bowls were won by teams that did not finish first or second in projected victories: the 2002 game, won by the New England Patriots, who finished seventh in projected victories; and the 1989 game, won by the San Francisco 49ers, who finished sixth. |
In the WNBA, the regular-season leader in point differential has won all eight championships. |
|
Back to top |
|
|
Ed Küpfer
Joined: 30 Dec 2004 Posts: 785 Location: Toronto
|
Posted: Thu Jan 27, 2005 1:39 am Post subject: |
|
|
admin wrote: | It's worth noting that the Pythagorean ratings work quite well in football, as "Dean Oliver of football" Aaron Schatz pointed out in the New York Times this Sunday: |
I want to be called the Dean Oliver Of [something], but all the good things are taken.
So what's the best predictor of playoff success? First, let me quantify success in a messy way. The highest playoff success is a Super Bowl win -- that will be worth 5 points. A SB loss is worth 4. A loss in the semi finals (I think the NFL calls it conference finals) is worth 3. A loss in the round before that (division finals?) is worth two. Making the playoffs is worth one. Not making the playoffs is worth zero.
Because of the NFL playoff structure changes over the years, some years don't have teams getting 1 point, but in every season I looked at (1970-2003) 4 teams got 2 points, 2 teams got 3 points, 1 team got 4 points, and 1 team got 5 points.
I regressed four different measures of team strength against my playoff success index: Correlated Gaussian, Pythagorean, PythagoPat, and W-L percentage. Which one predicted playoff success best? W-L%. Here's my results:
Code: | Intercept Slope R^2
CorrGauss -1.82 5.44 0.47
Pyth -1.77 5.34 0.46
PythagoPat -1.77 5.35 0.46
W-L% -1.76 5.32 0.55 |
To put it in less abstract terms, here's how often the #1 ranked team in each category won the Super Bowl:
Code: | Won SB Did Not Win SB
CorrGauss 12 22
Pyth 15 19
PythagoPat 14 20
W-L% 20 34 |
(Numbers aren't the same for each category because of ties in certain years.) _________________ ed |
|
Back to top |
|
|
Kevin Pelton Site Admin
Joined: 30 Dec 2004 Posts: 979 Location: Seattle
|
Posted: Thu Jan 27, 2005 1:42 am Post subject: |
|
|
Ed Küpfer wrote: | I want to be called the Dean Oliver Of [something], but all the good things are taken. |
I'm pretty sure "Dean Oliver of lawn darts" is still up for grabs.
Quote: | I regressed four different measures of team strength against my playoff success index: Correlated Gaussian, Pythagorean, PythagoPat, and W-L percentage. Which one predicted playoff success best? W-L%. Here's my results: |
I wonder how much home-field advantage comes into play here, since it's generally tied to W-L percentage. That Pythagorean shows as a slightly better predictor of the Super Bowl is more impressive because of this fact, though by the time one gets to the Super Bowl there is no longer home-field advantage. |
|
Back to top |
|
|
Dan Rosenbaum
Joined: 03 Jan 2005 Posts: 541 Location: Greensboro, North Carolina
|
Posted: Thu Jan 27, 2005 8:45 am Post subject: |
|
|
When you use the correlated Gaussian, how do you come up with the standard deviation? The ideal way would be to compute a Sagarin-like prediction for the point differential for every game in a given season and then the standard deviation would simply be the square root of 1/(n-1) times the sum of the squared deviations from that prediction. An even better method would be to do this separately for each team, so that we had a team-specific/season-specific standard devation.
(This is all assuming that you use point differential in the numerator. If you use offensive minus defensive rating, then you really would need a Sagarin-like prediction of the game-by-game offensive minus defensive ratings. Not impossible to do, but A LOT more work.)
It does not appear to me that this is what DeanO's formula suggests doing, but his formula might work pretty well in cases where you don't have game-level data. |
|
Back to top |
|
|
Ed Küpfer
Joined: 30 Dec 2004 Posts: 785 Location: Toronto
|
Posted: Thu Jan 27, 2005 11:53 am Post subject: |
|
|
Dan Rosenbaum wrote: | When you use the correlated Gaussian, how do you come up with the standard deviation? The ideal way would be to compute a Sagarin-like prediction for the point differential for every game in a given season and then the standard deviation would simply be the square root of 1/(n-1) times the sum of the squared deviations from that prediction. An even better method would be to do this separately for each team, so that we had a team-specific/season-specific standard devation. |
I'm not sure what you mean. I have a schedule for each team, which records their scores and their opponent's score in each game. I simply use the standard deviation of the team's score, the opponent's score, and the covariation of both. (I normally don't use RTGs for these types of calcuations -- it's complicated enough as it is.) _________________ ed |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|