This is Google's cache of viewtopic.php?p=31132&sid=6d54bd5d984651ca9dc6d134394f866d. It is a snapshot of the page as it appeared on Apr 13, 2011 09:35:22 GMT. The current page could have changed in the meantime. Learn more

Text-only version
These search terms are highlighted: bayesian adjusted plus minus  
APBRmetrics :: View topic - Dwight Howard's Adjusted Plus/Minus
APBRmetrics Forum Index APBRmetrics
The statistical revolution will not be televised.
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Dwight Howard's Adjusted Plus/Minus
Goto page Previous  1, 2
 
Post new topic   Reply to topic    APBRmetrics Forum Index -> General discussion
View previous topic :: View next topic  
Author Message
DSMok1



Joined: 05 Aug 2009
Posts: 611
Location: Where the wind comes sweeping down the plains

PostPosted: Fri Apr 02, 2010 7:54 am    Post subject: Reply with quote

Ilardi wrote:
gabefarkas wrote:
DSMok1 - I brought up the same issue a few months ago in another thread. I don't think there's a good way to completely handle it. Ridge Regression (a la Joe Sill) is an interesting and potentially viable approach.


I've had some luck with simply using multiple seasons' worth of data to overcome the multicollinearity issue: once we have 5+ seasons in the model, the standard errors of estimate are quite small, and the Howard/Gortat problems tend to disappear. Of course, such an approach has its own inherent limitation: it assumes that player APM effects remain reasonably constant over long stretches of time, and we know this is not always the case (due to injury, maturation effects, age-related deterioration, etc.).

We can try, perhaps, to have the best of both worlds by using multiple seasons to disentangle player effects (i.e., address multicollinearity) and then weighting the model very heavily toward the most recent season, but there's no way to know a priori which of the infinitely many weighting schemes is optimal. Joe Sill's suggestion of using out-of-sample goodness of fit as the main arbiter of such choices strikes me as eminently reasonable, and one I hope the field will adopt.


Good to see you back over here, Ilardi!

Remember the discussion we had over in the thread about weightings? Have you explored that direction at all? An empirical curve of year-to-year change as a baseline for multiyear APM?
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Ryan J. Parker



Joined: 23 Mar 2007
Posts: 711
Location: Raleigh, NC

PostPosted: Fri Apr 02, 2010 8:53 am    Post subject: Reply with quote

My april fools worked! Cool
_________________
I am a basketball geek.
Back to top
View user's profile Send private message Visit poster's website
DSMok1



Joined: 05 Aug 2009
Posts: 611
Location: Where the wind comes sweeping down the plains

PostPosted: Fri Apr 02, 2010 11:02 am    Post subject: Reply with quote

Ryan J. Parker wrote:
My april fools worked! Cool


Drat. I read it on April 2 so my guard wasn't up. Evil or Very Mad

Smile
Back to top
View user's profile Send private message Send e-mail Visit poster's website
gabefarkas



Joined: 31 Dec 2004
Posts: 1313
Location: Durham, NC

PostPosted: Sat Apr 03, 2010 12:26 pm    Post subject: Reply with quote

Ryan J. Parker wrote:
DSMok1 wrote:
it is a Bayesian style approach, but it has the issue of bringing all players toward 0--including scrubs. If there were a way to pull toward a sliding scale based on, say, mpg or something...


this isn't mathematically possible. don't even try. it will never work.
Sounds like you know something you're not telling us...
Back to top
View user's profile Send private message Send e-mail AIM Address
gabefarkas



Joined: 31 Dec 2004
Posts: 1313
Location: Durham, NC

PostPosted: Sat Apr 03, 2010 12:32 pm    Post subject: Reply with quote

Ilardi wrote:
gabefarkas wrote:
DSMok1 - I brought up the same issue a few months ago in another thread. I don't think there's a good way to completely handle it. Ridge Regression (a la Joe Sill) is an interesting and potentially viable approach.


I've had some luck with simply using multiple seasons' worth of data to overcome the multicollinearity issue: once we have 5+ seasons in the model, the standard errors of estimate are quite small, and the Howard/Gortat problems tend to disappear. Of course, such an approach has its own inherent limitation: it assumes that player APM effects remain reasonably constant over long stretches of time, and we know this is not always the case (due to injury, maturation effects, age-related deterioration, etc.).

We can try, perhaps, to have the best of both worlds by using multiple seasons to disentangle player effects (i.e., address multicollinearity) and then weighting the model very heavily toward the most recent season, but there's no way to know a priori which of the infinitely many weighting schemes is optimal. Joe Sill's suggestion of using out-of-sample goodness of fit as the main arbiter of such choices strikes me as eminently reasonable, and one I hope the field will adopt.

I agree that, at first, using multiple seasons of data seems to be able to address the MC issue. However, it raises another issue, which I mentioned in the post I linked to. For ease of reading, I'm re-pasting it here:

Quote:
Now, as for the age-related error, it has nothing to do with the problems that stem from MC. The age-related error comes from the fact that X1 (eg, Chris Paul) through X400 (eg, Mark Madsen) from 2008 represents different intrinsic values than X1 through X400 from 2009. That is assuming player abilities change over time, and can be seen as at least somewhat dependent on age (we do all agree on this one point, right?).

Since the different X1's don't all represent exactly the same intrinsic value, several questions emerge. First, is it really fair to lump them all together under the same PV? Second, if we do lump them together, do we get a coefficient estimate that represents anything in particular, or just a rough idea of that player's value over that timespan? Thirdly, if we incorporate several years' worth of data without considering a time series application, is our model still reasonable? And fourthly, if we do incorporate a time series into our regression, are we properly accounting for the problems that arise with autocorrelation, in addition to any other issues from MC or anything else?

You seem to be in agreement with my identification of the issue, and propose weighting schemes as a way to address it. I guess I would agree that's among the best options. Have you considered time series applications?
Back to top
View user's profile Send private message Send e-mail AIM Address
schtevie



Joined: 18 Apr 2005
Posts: 414

PostPosted: Sat Apr 03, 2010 2:54 pm    Post subject: Reply with quote

Is there a quick answer as to the potential implications of not using a time series application? Does this bias the estimates or their standard errors? Treated conventionally, can the estimates be understood as a weighted average of the various years' distinct values?

My sense of the matter based upon the aging curves produced by Ed, for example, is that players' box score statistical performance, on average, peaks in the mid-20s, but at levels which are scant few percentage points above when they enter. If we are talking about a 4% point improvement over the first five years, on average, as the biggest variation on account of time, is there much to worry about? Compared to individual variation, injuries, role changes, teammate-specific interactions, are the time series issues apt to matter?
Back to top
View user's profile Send private message
DSMok1



Joined: 05 Aug 2009
Posts: 611
Location: Where the wind comes sweeping down the plains

PostPosted: Mon Apr 05, 2010 8:02 am    Post subject: Reply with quote

schtevie wrote:
Is there a quick answer as to the potential implications of not using a time series application? Does this bias the estimates or their standard errors? Treated conventionally, can the estimates be understood as a weighted average of the various years' distinct values?

My sense of the matter based upon the aging curves produced by Ed, for example, is that players' box score statistical performance, on average, peaks in the mid-20s, but at levels which are scant few percentage points above when they enter. If we are talking about a 4% point improvement over the first five years, on average, as the biggest variation on account of time, is there much to worry about? Compared to individual variation, injuries, role changes, teammate-specific interactions, are the time series issues apt to matter?


I agree with this... can't we get the "average" value and adjust approximately for aging?

Perhaps even better: can we add a player-specific term for each year in this APM regression? Meaning each player's value in year N is (Val_P + Val_P_N), where Val_P_N can vary from year to year and Val_P is the player's baseline value? Or would that return to the multicollinearity problem...
Back to top
View user's profile Send private message Send e-mail Visit poster's website
gabefarkas



Joined: 31 Dec 2004
Posts: 1313
Location: Durham, NC

PostPosted: Mon Apr 05, 2010 12:30 pm    Post subject: Reply with quote

schtevie wrote:
Is there a quick answer as to the potential implications of not using a time series application? Does this bias the estimates or their standard errors? Treated conventionally, can the estimates be understood as a weighted average of the various years' distinct values?

That quote box in my previous post above, where I am quoting myself from a few months ago, does that answer your question regarding the implications?
Back to top
View user's profile Send private message Send e-mail AIM Address
schtevie



Joined: 18 Apr 2005
Posts: 414

PostPosted: Mon Apr 05, 2010 12:59 pm    Post subject: Reply with quote

Um, no. Your remarks motivated the explicit question, which is conditional upon what I think is the true fact that the aging curves aren't very curvy.
Back to top
View user's profile Send private message
gabefarkas



Joined: 31 Dec 2004
Posts: 1313
Location: Durham, NC

PostPosted: Mon Apr 05, 2010 1:57 pm    Post subject: Reply with quote

DSMok1 wrote:
Perhaps even better: can we add a player-specific term for each year in this APM regression? Meaning each player's value in year N is (Val_P + Val_P_N), where Val_P_N can vary from year to year and Val_P is the player's baseline value? Or would that return to the multicollinearity problem...

I believe it would reintroduce the problem, yes.
Back to top
View user's profile Send private message Send e-mail AIM Address
gabefarkas



Joined: 31 Dec 2004
Posts: 1313
Location: Durham, NC

PostPosted: Tue Apr 06, 2010 12:25 pm    Post subject: Reply with quote

In that case, no, I am not aware of a quick answer. I would defer to someone with more knowledge of Time Series applications (tpryan? mtamada?).
Back to top
View user's profile Send private message Send e-mail AIM Address
Display posts from previous:   
Post new topic   Reply to topic    APBRmetrics Forum Index -> General discussion All times are GMT - 5 Hours
Goto page Previous  1, 2
Page 2 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group