This is Google's cache of viewtopic.php?p=21943&sid=3600f45feffd7b8db23eeb720a1a7747. It is a snapshot of the page as it appeared on Apr 11, 2011 16:58:22 GMT. The current page could have changed in the meantime. Learn more

Text-only version
These search terms are highlighted: ilardi adjusted plus minus  
APBRmetrics :: View topic - Celtics '08 pre(&post)diction etc.
APBRmetrics Forum Index APBRmetrics
The statistical revolution will not be televised.
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Celtics '08 pre(&post)diction etc.
Goto page Previous  1, 2, 3, 4, 5, 6, 7, 8, 9  Next
 
Post new topic   Reply to topic    APBRmetrics Forum Index -> General discussion
View previous topic :: View next topic  
Author Message
Mike G



Joined: 14 Jan 2005
Posts: 3625
Location: Hendersonville, NC

PostPosted: Tue Jul 22, 2008 4:51 pm    Post subject: Reply with quote

DLew wrote:
... I strongly disagree with the implication that any advocate of adjusted plus-minus ever said that Stojakovic was better than Paul, because that would be a gross misinterpretation of the numbers...

Ilardi wrote:
...there was only about a 92% probability that Peja was better and 8% chance that Paul was better...

Only 92% is less than 100, I guess. Most observers would say there's close to 0% chance that Peja > Paul.

Tempering the probablility by inclusion of playoffs, prior seasons, etc, seems to save the day. Whether a player today is the same player he was 2-3 years ago, is another issue. More uncertainty, perhaps.

DLew wrote:
..If you don't understand adjusted plus-minus that is fine, many people don't, but then you can't make statements like "It has to be more than just 'noise' "..

Ilardi wrote:
...the biggest challenge in using adjusted plus-minus is dealing with (and reducing) standard errors (noise) in the estimates. Aaron Barzilai and I have been working on this issue, though, and hope to have some considerably less noisy numbers to make available in the not-too-distant future.

Maybe only a few people understand. Less noise sounds good, so why is high noise inevitable? Why is 3000 minutes a small sample?

If it takes several years to get a reasonably-sized sample, and if in that time a player's value has changed, then at what point can we hope to evaluate him? After the fact? Just as history?

Could an effectively larger sample be obtained in one season, by simplifying the +/- adjustment to # of starters in the lineup, for and against?
_________________
`
36% of all statistics are wrong
Back to top
View user's profile Send private message Send e-mail
Ilardi



Joined: 15 May 2008
Posts: 265
Location: Lawrence, KS

PostPosted: Tue Jul 22, 2008 5:24 pm    Post subject: Reply with quote

Mike G wrote:
Maybe only a few people understand. Less noise sounds good, so why is high noise inevitable? Why is 3000 minutes a small sample?

If it takes several years to get a reasonably-sized sample, and if in that time a player's value has changed, then at what point can we hope to evaluate him? After the fact? Just as history?

Could an effectively larger sample be obtained in one season, by simplifying the +/- adjustment to # of starters in the lineup, for and against?


If player minutes were distributed randomly on a team, 3000 minutes would be vastly more than enough to get non-noisy estimates for adjusted plus-minus ratings. The problem is that player minutes are often heavily inter-correlated with one another on the same team, introducing the problem of multicollinearity, which (as you probably know) inflates standard errors of parameter estimates in a general linear model.

The most obvious way to address this is to use more data. We can add additional seasons to disentangle player estimates and still disproportionately weight the model toward the most recent season to address concerns about year-to-year change in player performance.

I'm also currently exploring a few creative (and statistically esoteric) modeling options that should improve the estimates further - even within a single season - but it would be premature to share anything right now.

Finally, yes, you could consider tossing out all the non-starters from the model, leaving only 150 player effects to consider . . . generally speaking, the fewer the estimated parameters, the lower the noise, but there's an inherent tradeoff, since discarding all non-starter player effects would introduce its own share of noise into the model. I'll take a look at some point, though, to see (empirically) exactly what that tradeoff looks like.
Back to top
View user's profile Send private message
Mountain



Joined: 13 Mar 2007
Posts: 1527

PostPosted: Tue Jul 22, 2008 6:07 pm    Post subject: Reply with quote

Ilardi wrote:


[1] So, Peja was about +1.4 standard errors better than Paul ...


[2] Using data from the two prior seasons (05-07 combined) yields much less noisy estimates ...

Peja: +2.33 (se = 2.84)
Paul: +5.22 (se = 2.99)

...
[3] As I've mentioned before, the biggest challenge in using adjusted plus-minus is dealing with (and reducing) standard errors (noise) in the estimates. Aaron Barzilai and I have been working on this issue, though, and hope to have some considerably less noisy numbers to make available in the not-too-distant future.



Some questions...

Is there anything wrong with substituting for the part I labeled with [1] "the estimate for Peja is 1.4 standard errors higher than the estimate for Paul"?

What is [2] saying the chances based on a 2 yr sample are that Paul is really better?


With regard to [3] is the additional of a 3rd year of data being considered in addition to other more innovative steps and if so would it likely reduce errors by another 30-50%? Or does the age of the data reduce or raise questions about the validity of the error reduction somewhat?

Is it correct that the refined estimate of [2] is saying there is a 68% chance that the "true" impact of Peja on team +/- is between roughly +5 and -0.5? But roughly 1/3rd chance it is greater or less? And the true impact of Paul on team +/- is between +2 and +8 but roughly 1/3rd chance it is greater or less? And that the estimates are not "saying" anything more?

Is it also correct that on average for a 13-15 man roster 4.5 -5 of the adjusted +/- estimates will be more than one standard error from the true value and no one knows which player estimates are more than 1 standard error high or low or whether it is high or low?
Back to top
View user's profile Send private message
Mountain



Joined: 13 Mar 2007
Posts: 1527

PostPosted: Tue Jul 22, 2008 7:14 pm    Post subject: Reply with quote

DLew wrote:


-If five guys had true adjusted plus-minus ratings of +8 each and there was no diminishing returns for point differential (and they played all the minutes) then the projected point differential would be +40. Adjusted plus-minus is additive in this way, unlike raw plus-minus.



(edited based on DLew's response)

Yes.

But only knowing estimates, estimates can distributed higher or lower than true of course, so a 68% confidence interval of them might be between somewhere around +20 and +60 and still about a 1/3rd chance it is higher or lower.


Last edited by Mountain on Wed Jul 23, 2008 6:38 am; edited 1 time in total
Back to top
View user's profile Send private message
DLew



Joined: 13 Nov 2006
Posts: 224

PostPosted: Tue Jul 22, 2008 10:33 pm    Post subject: Reply with quote

Mountain,

My statement was for if they had 'true' adjusted plus-minus ratings of 8, i.e. not our estimate of them with noise, but the actual reality of the situation. This helps to disentangle the various issues at play.
Back to top
View user's profile Send private message
Ryan J. Parker



Joined: 23 Mar 2007
Posts: 711
Location: Raleigh, NC

PostPosted: Tue Jul 22, 2008 11:37 pm    Post subject: Reply with quote

DLew, is the adj +/- rating per 48 minutes?
Back to top
View user's profile Send private message Visit poster's website
Mountain



Joined: 13 Mar 2007
Posts: 1527

PostPosted: Wed Jul 23, 2008 6:41 am    Post subject: Reply with quote

Thanks DLew, my eye focused on "projected point differential" and thought estimates but you did in fact state that you were referring to true adjusted with regard to their additive quality. Sorry for the misstep.
Back to top
View user's profile Send private message
Ilardi



Joined: 15 May 2008
Posts: 265
Location: Lawrence, KS

PostPosted: Wed Jul 23, 2008 10:56 am    Post subject: Reply with quote

Mountain wrote:
Ilardi wrote:


[1] So, Peja was about +1.4 standard errors better than Paul ...


[2] Using data from the two prior seasons (05-07 combined) yields much less noisy estimates ...

Peja: +2.33 (se = 2.84)
Paul: +5.22 (se = 2.99)

...
[3] As I've mentioned before, the biggest challenge in using adjusted plus-minus is dealing with (and reducing) standard errors (noise) in the estimates. Aaron Barzilai and I have been working on this issue, though, and hope to have some considerably less noisy numbers to make available in the not-too-distant future.



Some questions...

(1) Is there anything wrong with substituting for the part I labeled with [1] "the estimate for Peja is 1.4 standard errors higher than the estimate for Paul"?

(2) What is [2] saying the chances based on a 2 yr sample are that Paul is really better?


(3) With regard to [3] is the additional of a 3rd year of data being considered in addition to other more innovative steps and if so would it likely reduce errors by another 30-50%? Or does the age of the data reduce or raise questions about the validity of the error reduction somewhat?

(4) Is it correct that the refined estimate of [2] is saying there is a 68% chance that the "true" impact of Peja on team +/- is between roughly +5 and -0.5? But roughly 1/3rd chance it is greater or less? And the true impact of Paul on team +/- is between +2 and +8 but roughly 1/3rd chance it is greater or less? And that the estimates are not "saying" anything more?

(5) Is it also correct that on average for a 13-15 man roster 4.5 -5 of the adjusted +/- estimates will be more than one standard error from the true value and no one knows which player estimates are more than 1 standard error high or low or whether it is high or low?


In response to your questions:

(1) Yes, we're always talking about estimated values.

(2) I was merely pointing out that the less-noisy estimates from 05-07 have Paul's rating better than Peja's by about a full standard error. Since the distribution of error terms is approximately normal, that would mean roughly an 84% chance that Paul's "true value" was higher than Peja's over that time period.

(3) Yes, one approach to reducing noise is to add a 3rd year of data. I've got some other tricks up my sleeve, though. (I won't share them until I'm certain they pan out.)

(4) Yes, you're interpreting things correctly . . . we simply don't know the "true" adjusted plus-minus value of any player, and have to estimate it based on the available data. That means we are left making probabilistic statements about the likelihood any player's true value falls in a given range. Clearly, the more we can reduce standard errors, the more definitive the statements we can make about each player.

(5) Again, yes - on average. But again, if standard errors are low enough, that's not a major concern.
Back to top
View user's profile Send private message
Mountain



Joined: 13 Mar 2007
Posts: 1527

PostPosted: Wed Jul 23, 2008 4:57 pm    Post subject: Reply with quote

Thanks for the detailed responses. The continuing research & innovation, data sharing and public education is appreciated.
Back to top
View user's profile Send private message
Ben F.



Joined: 07 Mar 2005
Posts: 391

PostPosted: Wed Jul 23, 2008 10:04 pm    Post subject: Reply with quote

Ilardi wrote:
The most obvious way to address this is to use more data. We can add additional seasons to disentangle player estimates and still disproportionately weight the model toward the most recent season to address concerns about year-to-year change in player performance.

If you can share this aspect of your methodology, how do you go about doing this? As I understand from Eli W's post about the subject, you are running a weighted regression on the number of possessions in each observation. To weight current years more, would you just increase the weights on observations from the current year?
Back to top
View user's profile Send private message
Ilardi



Joined: 15 May 2008
Posts: 265
Location: Lawrence, KS

PostPosted: Wed Jul 23, 2008 11:47 pm    Post subject: Reply with quote

Ben F. wrote:
Ilardi wrote:
The most obvious way to address this is to use more data. We can add additional seasons to disentangle player estimates and still disproportionately weight the model toward the most recent season to address concerns about year-to-year change in player performance.

If you can share this aspect of your methodology, how do you go about doing this? As I understand from Eli W's post about the subject, you are running a weighted regression on the number of possessions in each observation. To weight current years more, would you just increase the weights on observations from the current year?


Yes, exactly. Influenced by Dan Rosenbaum, I weighted the current season by a factor of 3 for the numbers I published in 05-07. But in the future I'll try experimenting with different weighting schemes to see which one minimizes standard errors . . .
Back to top
View user's profile Send private message
Mountain



Joined: 13 Mar 2007
Posts: 1527

PostPosted: Sat Jul 26, 2008 9:41 pm    Post subject: Reply with quote

As dominant as the Celtics were in regular season they had 10 losses in the playoffs, the most for a title winner ever (though there were fewer rounds in distant past).
Back to top
View user's profile Send private message
Mike G



Joined: 14 Jan 2005
Posts: 3625
Location: Hendersonville, NC

PostPosted: Sun Jul 27, 2008 6:47 am    Post subject: Reply with quote

Hey, that's a good point. The Lakers, at 14-7 had a 'better' record.

But the Celts' pythagorean estimate is closer to 18-8; the Lakes' 12-9. Over 82 games, that would be a 56W team vs a 47W team. Both are 10-12 wins under their season rates.

The only other pyth. 'winners' were Det (9-8), NO (7.5-5.5), Cle (7.5-6.5), Orl (5.5-4.5). Bos opponents before and after running into the Celts:

pre- (Bos) vs.
0-0 : Atl : 3-4
4-2 : Cle : 3-4
8-3 : Det : 2-4
12-3 : LA : 2-4

So, teams with a combined playoff record of 24-8 proceeded to go 10-16 vs Boston.
_________________
`
36% of all statistics are wrong


Last edited by Mike G on Mon Jul 28, 2008 9:26 am; edited 1 time in total
Back to top
View user's profile Send private message Send e-mail
Mountain



Joined: 13 Mar 2007
Posts: 1527

PostPosted: Sun Jul 27, 2008 10:36 am    Post subject: Reply with quote

Good detail.

Celtics were hot and cold and maybe the blowout wins raised expected W% significantly but I don't know offhand how volatile their performance was compared to others or averages. It could be computed but I am not that interested at the moment.
Back to top
View user's profile Send private message
Ben



Joined: 13 Jan 2005
Posts: 266
Location: Iowa City

PostPosted: Mon Jul 28, 2008 9:23 am    Post subject: Reply with quote

The Boston-Detroit series was 6 games.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    APBRmetrics Forum Index -> General discussion All times are GMT - 5 Hours
Goto page Previous  1, 2, 3, 4, 5, 6, 7, 8, 9  Next
Page 7 of 9

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group