View previous topic :: View next topic |
Author |
Message |
Ilardi
Joined: 15 May 2008 Posts: 257 Location: Lawrence, KS
|
Posted: Mon Aug 10, 2009 10:20 am Post subject: Need Help Estimating Dan Rosenbaum's Standard Error Term |
|
|
As many of you know, Dan Rosenbaum pioneered the use of "statistical plus-minus" (APM estimates based on boxscore stats) to help bring down the error levels of traditional APM estimates.
In his seminal 2004 paper (http://www.82games.com/comm30.htm) he describes the generation of a composite Statistical + Pure APM measure, as follows:
Quote: | (3) OVERALL = a * PURE + (1 – a) * STATS, where
OVERALL is the overall plus/minus rating
PURE is the pure adjusted plus/minus rating from Table 1
STATS is the statistical plus/minus rating from Table 3
a is the share of the overall rating due to the pure rating (it is chosen to minimize the standard error of the overall rating with the restriction that it fall between 10% and 90%, note that this will result in the pure rating counting less when it is especially noisy) |
Ok, here's what I can't figure out: How does one go about calculating the standard error of the ensuing Overall rating?
As a research psychologist, I know just enough statistics to be dangerous, but not enough (apparently) to solve this little conundrum. (I actually have a pretty good hunch about how to go about it, but don't fully trust it.) If one of you true statisticians out there can help with this, I'd be most appreciative! (Dave Lewin, it also occurs to me that Dan may have shared this little gem with you . . . if so, perhaps you could pass it along?)
Thanks in advance,
Steve |
|
Back to top |
|
|
Ryan J. Parker
Joined: 23 Mar 2007 Posts: 706 Location: Raleigh, NC
|
Posted: Mon Aug 10, 2009 12:52 pm Post subject: |
|
|
What was your idea Steve?
My thinking is you'd have a std err for both PURE and STATS, and when you multiply them by a and 1-a, and then add them together, your new standard error is:
SE(OVERALL) = sqrt( [a*SE(PURE)]^2 + [(1-a)*SE(STATS)]^2 )
Then you find the a that minimizes this.
I worry about how these might be correlated, but I'm no expert in adjusting for that. _________________ I am a basketball geek. |
|
Back to top |
|
|
Ilardi
Joined: 15 May 2008 Posts: 257 Location: Lawrence, KS
|
Posted: Mon Aug 10, 2009 1:08 pm Post subject: |
|
|
Ryan,
Yes, that's what I wound up with, as well, but then I ran into a snag:
How do you find the se associated with each player's STAT measure?
With the PURE measure, each player is treated as a separate regression variable, so the regression output actually gives you the se for each estimate . . . but with STAT, you're simply calculating each player's STAT value based on a set of pre-existing one-size-fits all regression weights applied to his boxscore stats (and/or other relevant stats from 82games, etc.). How does one derive an se estimate for each resulting STAT estimate? |
|
Back to top |
|
|
Ryan J. Parker
Joined: 23 Mar 2007 Posts: 706 Location: Raleigh, NC
|
Posted: Mon Aug 10, 2009 1:12 pm Post subject: |
|
|
Well you should have a std error for each predictor, so I would probably do it with simulation (since you should have a covariance matrix to work with).
If you didn't want to do that, then you should be able to just add and subtract as necessary using the covariance matrix to come up with the overall std error on the STAT rating. _________________ I am a basketball geek. |
|
Back to top |
|
|
DSMok1
Joined: 05 Aug 2009 Posts: 547 Location: Where the wind comes sweeping down the plains
|
Posted: Mon Aug 10, 2009 1:20 pm Post subject: |
|
|
It looks like you all are on the right track. Here's a quick reference PDF on combining errors: Combining Errors. I don't know how to measure the error covariance for the PURE and STAT interaction. |
|
Back to top |
|
|
Ilardi
Joined: 15 May 2008 Posts: 257 Location: Lawrence, KS
|
Posted: Mon Aug 10, 2009 1:23 pm Post subject: |
|
|
Ryan J. Parker wrote: | Well you should have a std error for each predictor, so I would probably do it with simulation (since you should have a covariance matrix to work with).
If you didn't want to do that, then you should be able to just add and subtract as necessary using the covariance matrix to come up with the overall std error on the STAT rating. |
Yes, that makes sense - thanks!
I'll probably go the latter route. However, I don't have a good text on hand to guide me through the requisite steps of adding/subtracting my way through the covariance matrix . . . do you happen to know of a good online reference that lays it out clearly? |
|
Back to top |
|
|
Ryan J. Parker
Joined: 23 Mar 2007 Posts: 706 Location: Raleigh, NC
|
Posted: Mon Aug 10, 2009 2:05 pm Post subject: |
|
|
I think you can use: http://en.wikipedia.org/wiki/Variance
You want to look at "In general, for the sum of N variables...", while making sure you keep track of negatives if the coefficients are < 0. _________________ I am a basketball geek. |
|
Back to top |
|
|
Ilardi
Joined: 15 May 2008 Posts: 257 Location: Lawrence, KS
|
Posted: Mon Aug 10, 2009 2:17 pm Post subject: |
|
|
Thanks - gotta love Wikipedia.
But wouldn't this method yield the exact same variance estimate for each player's STAT rating in any given model, or am I missing something important? |
|
Back to top |
|
|
DSMok1
Joined: 05 Aug 2009 Posts: 547 Location: Where the wind comes sweeping down the plains
|
Posted: Mon Aug 10, 2009 2:22 pm Post subject: |
|
|
Ilardi wrote: | Thanks - gotta love Wikipedia.
But wouldn't this method yield the exact same variance estimate for each player's STAT rating in any given model, or am I missing something important? |
Wouldn't you be using each player's individual stderror for each statistic included in the STAT calc? (If those are available... that's a lot of calculation...) |
|
Back to top |
|
|
Ryan J. Parker
Joined: 23 Mar 2007 Posts: 706 Location: Raleigh, NC
|
Posted: Mon Aug 10, 2009 2:27 pm Post subject: |
|
|
Well the variance would also be a function of coefficients x stats, so that is part of the calculation of Var(STAT), no? Like in the example on Wikipedia, we have some constant a multiplied by the coefficient X. _________________ I am a basketball geek. |
|
Back to top |
|
|
Crow
Joined: 20 Jan 2009 Posts: 746
|
Posted: Mon Aug 10, 2009 2:42 pm Post subject: |
|
|
Steve, I hope you are heading to publishing new, multi-year overall plus/minus ratings. That is what is needed.
I've supported that in recent years as that was the direction that Dan immediately moved in the progression of his first paper.Then for years we had just the pure adjusted. And eventually the different flavors of pure and the offensive /defensive splits and newer data for statistical by itself. All quite helpful for consideration of impact but multi-year overall plus/minus ratings might give the closest estimate to true overall impact. But I guess you'll have more information on that when you compute the errors.
While I want to see the new roll-up, I'd keep all the layers though. It is about understanding a complex story. |
|
Back to top |
|
|
DLew
Joined: 13 Nov 2006 Posts: 222
|
Posted: Mon Aug 10, 2009 4:57 pm Post subject: |
|
|
Steve,
Footnote #4 on that page is relevant to this discussion. |
|
Back to top |
|
|
Ilardi
Joined: 15 May 2008 Posts: 257 Location: Lawrence, KS
|
Posted: Mon Aug 10, 2009 5:59 pm Post subject: |
|
|
Ryan J. Parker wrote: | Well the variance would also be a function of coefficients x stats, so that is part of the calculation of Var(STAT), no? Like in the example on Wikipedia, we have some constant a multiplied by the coefficient X. |
Yes, but for some reason (sleep deprivation?) I'm still a bit confused about where the variance actually comes from vis-a-vis the specific stats for any given player. I've always assumed the STAT result for each player is calculated on the basis of his full-season (aggregate) stats, but are you suggesting that we should actually compute a STAT estimate for each player on a per game basis (which would then make it quite easy and obvious to derive its variance for any given player)?
If that's not the case, would you be willing to provide a concrete (hypothetical) example of how the s.e. for STAT might be calculated for a specific player based on the set of 14 specific boxscore-based stats Dan used in his original paper? |
|
Back to top |
|
|
Ryan J. Parker
Joined: 23 Mar 2007 Posts: 706 Location: Raleigh, NC
|
Posted: Mon Aug 10, 2009 6:17 pm Post subject: |
|
|
I think I'd want to try and reproduce his results to understand exactly what is going on. Without that, I'm not exactly sure how to construct the SE for each player using the STAT formula. _________________ I am a basketball geek. |
|
Back to top |
|
|
Ilardi
Joined: 15 May 2008 Posts: 257 Location: Lawrence, KS
|
Posted: Mon Aug 10, 2009 6:32 pm Post subject: |
|
|
Ryan J. Parker wrote: | I think I'd want to try and reproduce his results to understand exactly what is going on. Without that, I'm not exactly sure how to construct the SE for each player using the STAT formula. |
Scott Sereday recently did an updated version of Rosenbaum's statistical plus-minus model for offensive and defensive APM counterparts, even including some interesting non-boxscore stats: http://basketball-statistics.com/seredayanalysispartone.html
Unfortunately, he doesn't report any player SE terms or discuss how the SE might be derived for any given player, but he does provide lots of detail on the model itself if that would help . . .
Basically, unless the solution to the SE conundrum is to compute a STAT estimate for each player for each game, I'm still completely unclear about how to do the calculation. |
|
Back to top |
|
|
|