This is Google's cache of viewtopic.php?t=2295. It is a snapshot of the page as it appeared on Feb 12, 2011 15:41:02 GMT. The current page could have changed in the meantime. Learn more

Text-only version
These search terms are highlighted: ilardi  
APBRmetrics :: View topic - Need Help Estimating Dan Rosenbaum's Standard Error Term
APBRmetrics Forum Index APBRmetrics
The statistical revolution will not be televised.
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Need Help Estimating Dan Rosenbaum's Standard Error Term
Goto page 1, 2  Next
 
Post new topic   Reply to topic    APBRmetrics Forum Index -> General discussion
View previous topic :: View next topic  
Author Message
Ilardi



Joined: 15 May 2008
Posts: 257
Location: Lawrence, KS

PostPosted: Mon Aug 10, 2009 10:20 am    Post subject: Need Help Estimating Dan Rosenbaum's Standard Error Term Reply with quote

As many of you know, Dan Rosenbaum pioneered the use of "statistical plus-minus" (APM estimates based on boxscore stats) to help bring down the error levels of traditional APM estimates.

In his seminal 2004 paper (http://www.82games.com/comm30.htm) he describes the generation of a composite Statistical + Pure APM measure, as follows:

Quote:
(3) OVERALL = a * PURE + (1 – a) * STATS, where

OVERALL is the overall plus/minus rating

PURE is the pure adjusted plus/minus rating from Table 1

STATS is the statistical plus/minus rating from Table 3

a is the share of the overall rating due to the pure rating (it is chosen to minimize the standard error of the overall rating with the restriction that it fall between 10% and 90%, note that this will result in the pure rating counting less when it is especially noisy)


Ok, here's what I can't figure out: How does one go about calculating the standard error of the ensuing Overall rating?

As a research psychologist, I know just enough statistics to be dangerous, but not enough (apparently) to solve this little conundrum. (I actually have a pretty good hunch about how to go about it, but don't fully trust it.) If one of you true statisticians out there can help with this, I'd be most appreciative! (Dave Lewin, it also occurs to me that Dan may have shared this little gem with you . . . if so, perhaps you could pass it along?)

Thanks in advance,
Steve
Back to top
View user's profile Send private message
Ryan J. Parker



Joined: 23 Mar 2007
Posts: 706
Location: Raleigh, NC

PostPosted: Mon Aug 10, 2009 12:52 pm    Post subject: Reply with quote

What was your idea Steve?

My thinking is you'd have a std err for both PURE and STATS, and when you multiply them by a and 1-a, and then add them together, your new standard error is:

SE(OVERALL) = sqrt( [a*SE(PURE)]^2 + [(1-a)*SE(STATS)]^2 )

Then you find the a that minimizes this.

I worry about how these might be correlated, but I'm no expert in adjusting for that.
_________________
I am a basketball geek.
Back to top
View user's profile Send private message Visit poster's website
Ilardi



Joined: 15 May 2008
Posts: 257
Location: Lawrence, KS

PostPosted: Mon Aug 10, 2009 1:08 pm    Post subject: Reply with quote

Ryan,

Yes, that's what I wound up with, as well, but then I ran into a snag:

How do you find the se associated with each player's STAT measure?

With the PURE measure, each player is treated as a separate regression variable, so the regression output actually gives you the se for each estimate . . . but with STAT, you're simply calculating each player's STAT value based on a set of pre-existing one-size-fits all regression weights applied to his boxscore stats (and/or other relevant stats from 82games, etc.). How does one derive an se estimate for each resulting STAT estimate?
Back to top
View user's profile Send private message
Ryan J. Parker



Joined: 23 Mar 2007
Posts: 706
Location: Raleigh, NC

PostPosted: Mon Aug 10, 2009 1:12 pm    Post subject: Reply with quote

Well you should have a std error for each predictor, so I would probably do it with simulation (since you should have a covariance matrix to work with).

If you didn't want to do that, then you should be able to just add and subtract as necessary using the covariance matrix to come up with the overall std error on the STAT rating.
_________________
I am a basketball geek.
Back to top
View user's profile Send private message Visit poster's website
DSMok1



Joined: 05 Aug 2009
Posts: 547
Location: Where the wind comes sweeping down the plains

PostPosted: Mon Aug 10, 2009 1:20 pm    Post subject: Reply with quote

It looks like you all are on the right track. Here's a quick reference PDF on combining errors: Combining Errors. I don't know how to measure the error covariance for the PURE and STAT interaction.
Back to top
View user's profile Send private message Visit poster's website
Ilardi



Joined: 15 May 2008
Posts: 257
Location: Lawrence, KS

PostPosted: Mon Aug 10, 2009 1:23 pm    Post subject: Reply with quote

Ryan J. Parker wrote:
Well you should have a std error for each predictor, so I would probably do it with simulation (since you should have a covariance matrix to work with).

If you didn't want to do that, then you should be able to just add and subtract as necessary using the covariance matrix to come up with the overall std error on the STAT rating.


Yes, that makes sense - thanks!

I'll probably go the latter route. However, I don't have a good text on hand to guide me through the requisite steps of adding/subtracting my way through the covariance matrix . . . do you happen to know of a good online reference that lays it out clearly?
Back to top
View user's profile Send private message
Ryan J. Parker



Joined: 23 Mar 2007
Posts: 706
Location: Raleigh, NC

PostPosted: Mon Aug 10, 2009 2:05 pm    Post subject: Reply with quote

I think you can use: http://en.wikipedia.org/wiki/Variance

You want to look at "In general, for the sum of N variables...", while making sure you keep track of negatives if the coefficients are < 0.
_________________
I am a basketball geek.
Back to top
View user's profile Send private message Visit poster's website
Ilardi



Joined: 15 May 2008
Posts: 257
Location: Lawrence, KS

PostPosted: Mon Aug 10, 2009 2:17 pm    Post subject: Reply with quote

Thanks - gotta love Wikipedia.

But wouldn't this method yield the exact same variance estimate for each player's STAT rating in any given model, or am I missing something important?
Back to top
View user's profile Send private message
DSMok1



Joined: 05 Aug 2009
Posts: 547
Location: Where the wind comes sweeping down the plains

PostPosted: Mon Aug 10, 2009 2:22 pm    Post subject: Reply with quote

Ilardi wrote:
Thanks - gotta love Wikipedia.

But wouldn't this method yield the exact same variance estimate for each player's STAT rating in any given model, or am I missing something important?


Wouldn't you be using each player's individual stderror for each statistic included in the STAT calc? (If those are available... that's a lot of calculation...)
Back to top
View user's profile Send private message Visit poster's website
Ryan J. Parker



Joined: 23 Mar 2007
Posts: 706
Location: Raleigh, NC

PostPosted: Mon Aug 10, 2009 2:27 pm    Post subject: Reply with quote

Well the variance would also be a function of coefficients x stats, so that is part of the calculation of Var(STAT), no? Like in the example on Wikipedia, we have some constant a multiplied by the coefficient X.
_________________
I am a basketball geek.
Back to top
View user's profile Send private message Visit poster's website
Crow



Joined: 20 Jan 2009
Posts: 746

PostPosted: Mon Aug 10, 2009 2:42 pm    Post subject: Reply with quote

Steve, I hope you are heading to publishing new, multi-year overall plus/minus ratings. That is what is needed.

I've supported that in recent years as that was the direction that Dan immediately moved in the progression of his first paper.Then for years we had just the pure adjusted. And eventually the different flavors of pure and the offensive /defensive splits and newer data for statistical by itself. All quite helpful for consideration of impact but multi-year overall plus/minus ratings might give the closest estimate to true overall impact. But I guess you'll have more information on that when you compute the errors.

While I want to see the new roll-up, I'd keep all the layers though. It is about understanding a complex story.
Back to top
View user's profile Send private message
DLew



Joined: 13 Nov 2006
Posts: 222

PostPosted: Mon Aug 10, 2009 4:57 pm    Post subject: Reply with quote

Steve,

Footnote #4 on that page is relevant to this discussion.
Back to top
View user's profile Send private message
Ilardi



Joined: 15 May 2008
Posts: 257
Location: Lawrence, KS

PostPosted: Mon Aug 10, 2009 5:59 pm    Post subject: Reply with quote

Ryan J. Parker wrote:
Well the variance would also be a function of coefficients x stats, so that is part of the calculation of Var(STAT), no? Like in the example on Wikipedia, we have some constant a multiplied by the coefficient X.


Yes, but for some reason (sleep deprivation?) I'm still a bit confused about where the variance actually comes from vis-a-vis the specific stats for any given player. I've always assumed the STAT result for each player is calculated on the basis of his full-season (aggregate) stats, but are you suggesting that we should actually compute a STAT estimate for each player on a per game basis (which would then make it quite easy and obvious to derive its variance for any given player)?

If that's not the case, would you be willing to provide a concrete (hypothetical) example of how the s.e. for STAT might be calculated for a specific player based on the set of 14 specific boxscore-based stats Dan used in his original paper?
Back to top
View user's profile Send private message
Ryan J. Parker



Joined: 23 Mar 2007
Posts: 706
Location: Raleigh, NC

PostPosted: Mon Aug 10, 2009 6:17 pm    Post subject: Reply with quote

I think I'd want to try and reproduce his results to understand exactly what is going on. Without that, I'm not exactly sure how to construct the SE for each player using the STAT formula.
_________________
I am a basketball geek.
Back to top
View user's profile Send private message Visit poster's website
Ilardi



Joined: 15 May 2008
Posts: 257
Location: Lawrence, KS

PostPosted: Mon Aug 10, 2009 6:32 pm    Post subject: Reply with quote

Ryan J. Parker wrote:
I think I'd want to try and reproduce his results to understand exactly what is going on. Without that, I'm not exactly sure how to construct the SE for each player using the STAT formula.


Scott Sereday recently did an updated version of Rosenbaum's statistical plus-minus model for offensive and defensive APM counterparts, even including some interesting non-boxscore stats: http://basketball-statistics.com/seredayanalysispartone.html

Unfortunately, he doesn't report any player SE terms or discuss how the SE might be derived for any given player, but he does provide lots of detail on the model itself if that would help . . .

Basically, unless the solution to the SE conundrum is to compute a STAT estimate for each player for each game, I'm still completely unclear about how to do the calculation.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    APBRmetrics Forum Index -> General discussion All times are GMT - 5 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group