|
APBRmetrics The statistical revolution will not be televised.
|
View previous topic :: View next topic |
Author |
Message |
DSMok1
Joined: 05 Aug 2009 Posts: 611 Location: Where the wind comes sweeping down the plains
|
Posted: Fri Apr 02, 2010 7:54 am Post subject: |
|
|
Ilardi wrote: |
I've had some luck with simply using multiple seasons' worth of data to overcome the multicollinearity issue: once we have 5+ seasons in the model, the standard errors of estimate are quite small, and the Howard/Gortat problems tend to disappear. Of course, such an approach has its own inherent limitation: it assumes that player APM effects remain reasonably constant over long stretches of time, and we know this is not always the case (due to injury, maturation effects, age-related deterioration, etc.).
We can try, perhaps, to have the best of both worlds by using multiple seasons to disentangle player effects (i.e., address multicollinearity) and then weighting the model very heavily toward the most recent season, but there's no way to know a priori which of the infinitely many weighting schemes is optimal. Joe Sill's suggestion of using out-of-sample goodness of fit as the main arbiter of such choices strikes me as eminently reasonable, and one I hope the field will adopt. |
Good to see you back over here, Ilardi!
Remember the discussion we had over in the thread about weightings? Have you explored that direction at all? An empirical curve of year-to-year change as a baseline for multiyear APM? |
|
Back to top |
|
|
Ryan J. Parker
Joined: 23 Mar 2007 Posts: 711 Location: Raleigh, NC
|
Posted: Fri Apr 02, 2010 8:53 am Post subject: |
|
|
My april fools worked! _________________ I am a basketball geek. |
|
Back to top |
|
|
DSMok1
Joined: 05 Aug 2009 Posts: 611 Location: Where the wind comes sweeping down the plains
|
Posted: Fri Apr 02, 2010 11:02 am Post subject: |
|
|
Ryan J. Parker wrote: | My april fools worked! |
Drat. I read it on April 2 so my guard wasn't up.
|
|
Back to top |
|
|
gabefarkas
Joined: 31 Dec 2004 Posts: 1313 Location: Durham, NC
|
Posted: Sat Apr 03, 2010 12:26 pm Post subject: |
|
|
Ryan J. Parker wrote: | DSMok1 wrote: | it is a Bayesian style approach, but it has the issue of bringing all players toward 0--including scrubs. If there were a way to pull toward a sliding scale based on, say, mpg or something... |
this isn't mathematically possible. don't even try. it will never work. | Sounds like you know something you're not telling us... |
|
Back to top |
|
|
gabefarkas
Joined: 31 Dec 2004 Posts: 1313 Location: Durham, NC
|
Posted: Sat Apr 03, 2010 12:32 pm Post subject: |
|
|
Ilardi wrote: |
I've had some luck with simply using multiple seasons' worth of data to overcome the multicollinearity issue: once we have 5+ seasons in the model, the standard errors of estimate are quite small, and the Howard/Gortat problems tend to disappear. Of course, such an approach has its own inherent limitation: it assumes that player APM effects remain reasonably constant over long stretches of time, and we know this is not always the case (due to injury, maturation effects, age-related deterioration, etc.).
We can try, perhaps, to have the best of both worlds by using multiple seasons to disentangle player effects (i.e., address multicollinearity) and then weighting the model very heavily toward the most recent season, but there's no way to know a priori which of the infinitely many weighting schemes is optimal. Joe Sill's suggestion of using out-of-sample goodness of fit as the main arbiter of such choices strikes me as eminently reasonable, and one I hope the field will adopt. |
I agree that, at first, using multiple seasons of data seems to be able to address the MC issue. However, it raises another issue, which I mentioned in the post I linked to. For ease of reading, I'm re-pasting it here:
Quote: | Now, as for the age-related error, it has nothing to do with the problems that stem from MC. The age-related error comes from the fact that X1 (eg, Chris Paul) through X400 (eg, Mark Madsen) from 2008 represents different intrinsic values than X1 through X400 from 2009. That is assuming player abilities change over time, and can be seen as at least somewhat dependent on age (we do all agree on this one point, right?).
Since the different X1's don't all represent exactly the same intrinsic value, several questions emerge. First, is it really fair to lump them all together under the same PV? Second, if we do lump them together, do we get a coefficient estimate that represents anything in particular, or just a rough idea of that player's value over that timespan? Thirdly, if we incorporate several years' worth of data without considering a time series application, is our model still reasonable? And fourthly, if we do incorporate a time series into our regression, are we properly accounting for the problems that arise with autocorrelation, in addition to any other issues from MC or anything else? |
You seem to be in agreement with my identification of the issue, and propose weighting schemes as a way to address it. I guess I would agree that's among the best options. Have you considered time series applications? |
|
Back to top |
|
|
schtevie
Joined: 18 Apr 2005 Posts: 414
|
Posted: Sat Apr 03, 2010 2:54 pm Post subject: |
|
|
Is there a quick answer as to the potential implications of not using a time series application? Does this bias the estimates or their standard errors? Treated conventionally, can the estimates be understood as a weighted average of the various years' distinct values?
My sense of the matter based upon the aging curves produced by Ed, for example, is that players' box score statistical performance, on average, peaks in the mid-20s, but at levels which are scant few percentage points above when they enter. If we are talking about a 4% point improvement over the first five years, on average, as the biggest variation on account of time, is there much to worry about? Compared to individual variation, injuries, role changes, teammate-specific interactions, are the time series issues apt to matter? |
|
Back to top |
|
|
DSMok1
Joined: 05 Aug 2009 Posts: 611 Location: Where the wind comes sweeping down the plains
|
Posted: Mon Apr 05, 2010 8:02 am Post subject: |
|
|
schtevie wrote: | Is there a quick answer as to the potential implications of not using a time series application? Does this bias the estimates or their standard errors? Treated conventionally, can the estimates be understood as a weighted average of the various years' distinct values?
My sense of the matter based upon the aging curves produced by Ed, for example, is that players' box score statistical performance, on average, peaks in the mid-20s, but at levels which are scant few percentage points above when they enter. If we are talking about a 4% point improvement over the first five years, on average, as the biggest variation on account of time, is there much to worry about? Compared to individual variation, injuries, role changes, teammate-specific interactions, are the time series issues apt to matter? |
I agree with this... can't we get the "average" value and adjust approximately for aging?
Perhaps even better: can we add a player-specific term for each year in this APM regression? Meaning each player's value in year N is (Val_P + Val_P_N), where Val_P_N can vary from year to year and Val_P is the player's baseline value? Or would that return to the multicollinearity problem... |
|
Back to top |
|
|
gabefarkas
Joined: 31 Dec 2004 Posts: 1313 Location: Durham, NC
|
Posted: Mon Apr 05, 2010 12:30 pm Post subject: |
|
|
schtevie wrote: | Is there a quick answer as to the potential implications of not using a time series application? Does this bias the estimates or their standard errors? Treated conventionally, can the estimates be understood as a weighted average of the various years' distinct values? |
That quote box in my previous post above, where I am quoting myself from a few months ago, does that answer your question regarding the implications? |
|
Back to top |
|
|
schtevie
Joined: 18 Apr 2005 Posts: 414
|
Posted: Mon Apr 05, 2010 12:59 pm Post subject: |
|
|
Um, no. Your remarks motivated the explicit question, which is conditional upon what I think is the true fact that the aging curves aren't very curvy. |
|
Back to top |
|
|
gabefarkas
Joined: 31 Dec 2004 Posts: 1313 Location: Durham, NC
|
Posted: Mon Apr 05, 2010 1:57 pm Post subject: |
|
|
DSMok1 wrote: | Perhaps even better: can we add a player-specific term for each year in this APM regression? Meaning each player's value in year N is (Val_P + Val_P_N), where Val_P_N can vary from year to year and Val_P is the player's baseline value? Or would that return to the multicollinearity problem... |
I believe it would reintroduce the problem, yes. |
|
Back to top |
|
|
gabefarkas
Joined: 31 Dec 2004 Posts: 1313 Location: Durham, NC
|
Posted: Tue Apr 06, 2010 12:25 pm Post subject: |
|
|
In that case, no, I am not aware of a quick answer. I would defer to someone with more knowledge of Time Series applications (tpryan? mtamada?). |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|