View previous topic :: View next topic |
Author |
Message |
Brian M
Joined: 25 Nov 2006 Posts: 20
|
Posted: Thu Jan 31, 2008 12:24 pm Post subject: relationship between raw and adjusted +/- |
|
|
I was curious to see how reliable raw +/- is. I'm inherently skeptical about it just because there are so many things that can determine raw +/- other than a player's actual impact. But I have a bit more faith in adjusted +/-. So one simple way to judge it is to see how raw +/- predicts adjusted +/-.
That's what I've plotted in the chart below. Data are from the 04/05 season, raw +/- from 82games.com and adjusted +/- from Lewin's 82games.com article. Only players with at least 800 minutes played were included. Red lines indicate the 95% prediction interval as computed by SPSS -- not sure why they are asymmetrical.
The correlation is good, r = .75. Even so, a given level of raw +/- seems to span about a range of about 10-15 integer values of adjusted +/-. So a guy with a raw +/- of +5 could range from being a bit of a drag on his team (negative adj +/-) to being an elite player (adj +/- ~ + 10). Even a player with an apparently neutral raw +/- around 0 could be either quite good or quite bad. This highlights the haziness and confoundedness of raw +/- numbers.
Still, the data seem to suggest some utility in raw +/- for making broad probabilistic claims. For instance, a good rule of thumb seems to be that a raw +/- of +5 marks a threshold past which you can say a player is probably helping his team, while a raw +/- of around -5 is the point at which you start getting confident that the player is hurting his team. But it's still probabilistic and you still can't make much more than a binary helping/hurting judgment, without specification of the magnitude of helpfulness or hurtfulness. It seems doubtful that dependable conclusions drawn from raw +/- can be much higher resolution than these very broad kinds of claims. |
|
Back to top |
|
|
Mountain
Joined: 13 Mar 2007 Posts: 437
|
Posted: Thu Jan 31, 2008 1:45 pm Post subject: |
|
|
Thanks for sharing this work. Good to see- as would additional years of data. Your comments about the utility of raw +/- strike me as appropriate and I have been using similar rules of thumb.
Could you clarify which axis is which? I guessed /assumed horizontal is raw +/- and vertical is adjusted +/-. If so, is the slope of the regression line showing that raw +/- is usually a bit higher/better than adjusted +/- on average for those in the positive raw +/- range and more strongly negative on raw +/- for those in the negative raw +/- range than their lower/better adjusted +/-? Extremes aren't on average as extreme as they seem? Let me know if I am misinterpreting it. |
|
Back to top |
|
|
tpryan
Joined: 11 Feb 2005 Posts: 72
|
Posted: Thu Jan 31, 2008 5:40 pm Post subject: |
|
|
There appears to be multiple things wrong with your graph. First, the limits should be symmetric, as you indicated. Second, the width of the interval is not constant and depends on the distance that each value of X (the independent variable) is from its average. So the lines that designate the upper and lower limits cannot be straight lines. |
|
Back to top |
|
|
Brian M
Joined: 25 Nov 2006 Posts: 20
|
Posted: Thu Jan 31, 2008 7:16 pm Post subject: |
|
|
Mountain: yeah, X axis is raw +/- and Y-axis is adjusted.
tpryan: yes, the PI stuff came out screwy for some reason. But you can ignore it and the basic points still seem to stand. |
|
Back to top |
|
|
tpryan
Joined: 11 Feb 2005 Posts: 72
|
Posted: Fri Feb 01, 2008 1:41 am Post subject: |
|
|
Since the objective is to see how well raw +/- predicts adjusted +/-, we should be looking at r^2 and (.75)^2 = .5625, which is a mediocre value. It seems to me that it is of limited usefulness. |
|
Back to top |
|
|
Eli W
Joined: 01 Feb 2005 Posts: 354
|
Posted: Fri Feb 01, 2008 9:06 am Post subject: |
|
|
tpryan wrote: | Since the objective is to see how well raw +/- predicts adjusted +/-, we should be looking at r^2 and (.75)^2 = .5625, which is a mediocre value. It seems to me that it is of limited usefulness. |
I disagree with that characterization. I find it very useful to see how predictive a publicly available statistic is of perhaps the most theoretically sound player rating. This kind of research can also help us understand the effects of teammate and opponent strength on player productivity (since adjusted +/- is raw +/- with those things factored in). And as for the r^2 being a "mediocre" value, I think that's a purely subjective and arbitrary classification. _________________ Eli W. (formerly John Quincy)
CountTheBasket.com |
|
Back to top |
|
|
Ben
Joined: 13 Jan 2005 Posts: 222 Location: Iowa City
|
Posted: Fri Feb 01, 2008 11:44 am Post subject: |
|
|
It would be interesting to see what happens if statistical +/- was thrown in as an independent variable. |
|
Back to top |
|
|
Ed Küpfer
Joined: 30 Dec 2004 Posts: 639 Location: Toronto
|
Posted: Fri Feb 01, 2008 3:58 pm Post subject: |
|
|
tpryan wrote: | Since the objective is to see how well raw +/- predicts adjusted +/-, we should be looking at r^2 and (.75)^2 = .5625, which is a mediocre value. It seems to me that it is of limited usefulness. |
I'm with Eli -- an r2 value of .56 sounds pretty good to me. At the player level, the stat with the highest y-t-y stability is probably FT%, which has an r2 value of something like .4.
In any case, the r2 value is not really useful for forecasting anything. What we need is the standard errors of the residuals. For FT%, that is something like 5%, which means that for 2/3 of all players, next season's FT% will be within +/- 5% of this season's. _________________ ed |
|
Back to top |
|
|
tpryan
Joined: 11 Feb 2005 Posts: 72
|
Posted: Sat Feb 02, 2008 3:40 am Post subject: |
|
|
Ed Küpfer wrote: | tpryan wrote: | Since the objective is to see how well raw +/- predicts adjusted +/-, we should be looking at r^2 and (.75)^2 = .5625, which is a mediocre value. It seems to me that it is of limited usefulness. |
I'm with Eli -- an r2 value of .56 sounds pretty good to me. At the player level, the stat with the highest y-t-y stability is probably FT%, which has an r2 value of something like .4.
In any case, the r2 value is not really useful for forecasting anything. What we need is the standard errors of the residuals. For FT%, that is something like 5%, which means that for 2/3 of all players, next season's FT% will be within +/- 5% of this season's. |
Ed,
Relative to forecasting, yes, the r^2 value simply shows the quality of the fit for that particular set of data. The standard errors of those residuals would be easily computed if not given in the computer output.
Whether or not a particular value of r^2 is good or at least acceptable of course depends on what we are trying to predict. If we had a model to predict the stock market, we would be thrilled if r^2 was even .4, whereas in certain other applications anything less than .90 might be unacceptable. |
|
Back to top |
|
|
|