Below is a snapshot of the Web page as it appeared on 3/29/2011 (the last time our crawler visited it). This is the version of the page that was used for ranking your search results. The page may have changed since we last cached it. To see what might have changed (without the highlights), go to the current page.
Bing is not responsible for the content of this page.
APBRmetrics :: View topic - Bayesian Adjusted Plus/Minus
APBRmetrics Forum Index APBRmetrics
The statistical revolution will not be televised.
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Bayesian Adjusted Plus/Minus

 
Post new topic   Reply to topic    APBRmetrics Forum Index -> General discussion
View previous topic :: View next topic  
Author Message
josh



Joined: 01 Aug 2007
Posts: 6

PostPosted: Thu Aug 02, 2007 8:35 am    Post subject: Bayesian Adjusted Plus/Minus Reply with quote

Hello everyone, I'm brand new here and admittedly not as up to speed on basketball as I am on stats. I like basketball, but I will be the first to admit I like stats in general more. But I really like applying stats to sports and competitions in general. I mention this so you'll forgive any statements that show my ignorance in the area.

Has anyone tried using Bayesian linear regression to solve for the adjusted plus/minus values?

I see two benefits to a Bayesian treatment:

1) It handles uncertainty well. This means, the more data there is on a player, the more confidence in that player's rating. This allows players much less play time to be included since uncertainty about their plus/minus values will be handled naturally.

2) Let's say for example we use Dan's likelihood with a multivariate normal prior on the adjusted plus/minus values. Since this prior is conjugate to the normal likelihood in standard linear regression, it allows for efficient sequential updating. A new estimate for each player can be created after *every* substitution. It would require storing covariances between every player, but that's still relatively small on today's computers. The nice thing is, the updates would require at most inverting a 10x10 instead of, say, 700x700 matrix since only 10 players participate in each update. This would mean adjusted plus/minus values could be updated in real time after every substitution.

The prior and noise variance parameters can be inferred using either a hierarchical Bayesian model or (most Bayesians from the stats community will shudder) evidence maximization a.k.a. empirical Bayes, applied to an early data sample or season.

This would let sites like Basketball Value easily post changes to Bayesian adjusted plus/minus values after every game (in theory after every substitution if the data were available real time) along with home field advantages for each court. This could also include a variance estimate on each player's plus/minus value. Also, the covariance values would give insights into which players play well together, whether opponents or not.

To be fair, and as many of you may already know, I should mention that having fast sequential updates like this is not solely Bayesian. The data could also be fit sequentially using stochastic least mean squares (LMS).

Thanks in advance for your patience with my ignorance as a newcomer to these forums.
Back to top
View user's profile Send private message
Ed Küpfer



Joined: 30 Dec 2004
Posts: 785
Location: Toronto

PostPosted: Thu Aug 02, 2007 1:30 pm    Post subject: Reply with quote

This is one of those things that sounds like a good idea, but....

The standard errors on a single season of player +/- regressions are very large. It's worthwhile to look at the numbers produced -- but those coefficients are useful mostly as descriptive of past performance. I have not seen any predictve value of these numbers based on within-season production. The amount of work involved in producing undated coefiicents on the fly dwarfs the potential reward, I think.
_________________
ed
Back to top
View user's profile Send private message Send e-mail
josh



Joined: 01 Aug 2007
Posts: 6

PostPosted: Thu Aug 02, 2007 1:50 pm    Post subject: Reply with quote

Ed Küpfer wrote:

The standard errors on a single season of player +/- regressions are very large. It's worthwhile to look at the numbers produced -- but those coefficients are useful mostly as descriptive of past performance. I have not seen any predictve value of these numbers based on within-season production. The amount of work involved in producing undated coefiicents on the fly dwarfs the potential reward, I think.


I understand what you're saying, and if it really was difficult to get those "on-the-fly" coefficient updates, I would agree.

However, updating 10 coefficients for a period between substitutions can be done in 2 lines of, for example, R code, and takes very little processing time. It involves inverting a matrix or 3, but these are 10x10 matrices instead of the full 700x700 you would do in a standard pseudo-inverse (maximum likelihood) linear regression solution.

It's so cheap to do, it just screams, "Hey, why not try it?" I'm probably missing something here since I'm new and I tend to be excitable when I latch on to some fun application of statistics.
Back to top
View user's profile Send private message
Statman



Joined: 20 Feb 2005
Posts: 242
Location: Arlington, Texas

PostPosted: Thu Aug 02, 2007 3:52 pm    Post subject: Reply with quote

josh wrote:
Ed Küpfer wrote:

The standard errors on a single season of player +/- regressions are very large. It's worthwhile to look at the numbers produced -- but those coefficients are useful mostly as descriptive of past performance. I have not seen any predictve value of these numbers based on within-season production. The amount of work involved in producing undated coefiicents on the fly dwarfs the potential reward, I think.


I understand what you're saying, and if it really was difficult to get those "on-the-fly" coefficient updates, I would agree.

However, updating 10 coefficients for a period between substitutions can be done in 2 lines of, for example, R code, and takes very little processing time. It involves inverting a matrix or 3, but these are 10x10 matrices instead of the full 700x700 you would do in a standard pseudo-inverse (maximum likelihood) linear regression solution.

It's so cheap to do, it just screams, "Hey, why not try it?" I'm probably missing something here since I'm new and I tend to be excitable when I latch on to some fun application of statistics.


Well - you should definitely give it a try and show us the results... Wink
_________________
Dan

My current national college player rankings (and other stuff):
http://www.pointguardu.com/f136/statmans-ratings-56243/index6.html#post355594
Back to top
View user's profile Send private message Send e-mail
basketballvalue



Joined: 07 Mar 2006
Posts: 208

PostPosted: Fri Aug 03, 2007 5:53 am    Post subject: Reply with quote

josh wrote:
Ed Küpfer wrote:

The standard errors on a single season of player +/- regressions are very large. It's worthwhile to look at the numbers produced -- but those coefficients are useful mostly as descriptive of past performance. I have not seen any predictve value of these numbers based on within-season production. The amount of work involved in producing undated coefiicents on the fly dwarfs the potential reward, I think.


I understand what you're saying, and if it really was difficult to get those "on-the-fly" coefficient updates, I would agree.

However, updating 10 coefficients for a period between substitutions can be done in 2 lines of, for example, R code, and takes very little processing time. It involves inverting a matrix or 3, but these are 10x10 matrices instead of the full 700x700 you would do in a standard pseudo-inverse (maximum likelihood) linear regression solution.

It's so cheap to do, it just screams, "Hey, why not try it?" I'm probably missing something here since I'm new and I tend to be excitable when I latch on to some fun application of statistics.


Hey all, I am planning on updating my system this summer to calculate the adjusted +/- each time I update the numbers on basketballvalue.com. That would be nightly rather than in game, but it's close to what you've described as it at least would be in season. One thing I'll need to balance, though is that since the errors are a bit large, I might end up in a place where I use enough data (e.g. two seasons) that one more night's worth of games won't change the estimates much. Ideally I'd start the 2007-2008 regular season data with the first night of the season, but I think a week into the season the data could be a bit meaningless until there are more observations.

Thanks,
Aaron

www.basketballvalue.com
Back to top
View user's profile Send private message
josh



Joined: 01 Aug 2007
Posts: 6

PostPosted: Fri Aug 03, 2007 7:05 am    Post subject: Reply with quote

basketballvalue wrote:

Hey all, I am planning on updating my system this summer to calculate the adjusted +/- each time I update the numbers on basketballvalue.com. That would be nightly rather than in game, but it's close to what you've described as it at least would be in season. One thing I'll need to balance, though is that since the errors are a bit large, I might end up in a place where I use enough data (e.g. two seasons) that one more night's worth of games won't change the estimates much. Ideally I'd start the 2007-2008 regular season data with the first night of the season, but I think a week into the season the data could be a bit meaningless until there are more observations.


Yeah, that's an artifact of not modeling non-stationary regression coefficients. Winval/Danval assumes that the coefficients have one set average value given all of the data you use to fit the model.

You could use weighted regression and weight newer samples heavier. If you were using Bayesian updates, you could artificially increase the diagonal of the covariance matrix at the beginning of every season (Herbrich/Graepel do something similar using probit regression for the ranking system in Xbox Live 360).

Maybe the most elegant solution would be to explicitly model the fact that player skill changes over time. This could require writing your own model fitting code though.
Back to top
View user's profile Send private message
Mountain



Joined: 13 Mar 2007
Posts: 1527

PostPosted: Fri Aug 03, 2007 10:24 am    Post subject: Reply with quote

Aaron, news of your plans to produce adjusted data for 06-07 and 07-08 in real time is welcome.

Not sure if you saw the question in the defense thread but any way you can provide offense/defense splits of the adjusted player impacts?
Back to top
View user's profile Send private message
Mountain



Joined: 13 Mar 2007
Posts: 1527

PostPosted: Mon Oct 22, 2007 7:24 pm    Post subject: Reply with quote

Josh, any comments on the various papers using Bayesian techniques mentioned on the forum recently? Have you done anything new applying them to players? Just curious.
Back to top
View user's profile Send private message
josh



Joined: 01 Aug 2007
Posts: 6

PostPosted: Tue Oct 23, 2007 1:10 pm    Post subject: Reply with quote

Yeah, I've been trying several models lately, and doing some sampling (MCMC).

The advisor of the student on the BYU paper, Shane Reese, is the guy I learned a large portion of my Bayesian analysis techniques from. I was a CS student at BYU and sat in on Shane's Bayesian Data Analysis class.

Did you have any particular questions? I understand what they did pretty well, and I've used a similar approach but for +/- ratings. I should be able to release my results soon enough, but I have to investigate a few more things first.
Back to top
View user's profile Send private message
Mountain



Joined: 13 Mar 2007
Posts: 1527

PostPosted: Tue Oct 23, 2007 1:32 pm    Post subject: Reply with quote

I'll look forward to seeing your work.

As for questions, beyond the detailed ones out there in the various threads I am wondering if anyone has any overarching comments about how much credit the immediately prior action should get / is getting in various systems of the credit for the next action. If that makes sense or has any use in the discussion.

Two specific questions, if you are willing:

1. What did you think of this study?

http://tinyurl.com/2yomsk
Environmental factors affecting Sam Cassell's shooting behavior and results

Is a next frontier to break apart the components of a single play with Bayesian techniques and other tools?

2. Did you have any comments on Kenny Shirley's Markov model or findings?


Last edited by Mountain on Tue Oct 23, 2007 1:44 pm; edited 2 times in total
Back to top
View user's profile Send private message
josh



Joined: 01 Aug 2007
Posts: 6

PostPosted: Tue Oct 23, 2007 1:41 pm    Post subject: Reply with quote

When you're talking about prior action are you referring to Markov-Chain models like Shirley's? Where what happened in the previous play can affect the next?

Each data point in the model I am using is a sequence without substitutions, and so can include a number of plays.

I could model additional effects within there.

Like Reese I'm using hierarchical Bayesian linear regression.

I believe Liu uses Poisson regression, but I haven't dug too deep into the paper yet.


Last edited by josh on Tue Oct 23, 2007 1:54 pm; edited 2 times in total
Back to top
View user's profile Send private message
Mountain



Joined: 13 Mar 2007
Posts: 1527

PostPosted: Tue Oct 23, 2007 1:52 pm    Post subject: Reply with quote

"When you're talking about prior action are you referring to Markov-Chain models like Shirley's? Where what happened in the previous play can affect the next?"

Yes, I was interested in that general topic and his work in particular and got around to saying both in my post eventually.

"Each data point in the model I am using is a sequence without substitutions, and so can include a number of plays.

I could model additional effects within there."

Interesting, will wait to hear more.


I try to pull stuff out of those weightier studies but it helps when those familiar with the techniques provide reaction. If you got anything notable out of Liu's study I'd appreciate a summary.

P.S. Did you see this thread
http://sonicscentral.com/apbrmetrics/viewtopic.php?t=1498&start=105
And Gabe's comment on Berri's response to first question?

http://www.stat.columbia.edu/~cook/movabletype/archives/2006/06/

"AG: 1. Reading Gladwell's article, I assume that Berri et al. are doing regression analysis, i.e., estimating player abilities as a linear combination of individual statistics. I have the same question that Bill James asked in the context of baseball statistics: why restrict to linear functions? A function of the form A*B/C (that's what James used in his runs created formula, or more fully, something like (A1 + A2 +...)*(B1 + B2 +...)/C) could make more sense."

Anything further on this question? From anyone?

(I havent explored this blog but it seems like it might be crack or crackerjack for an applied stat junkie.)


Last edited by Mountain on Tue Oct 23, 2007 2:03 pm; edited 3 times in total
Back to top
View user's profile Send private message
josh



Joined: 01 Aug 2007
Posts: 6

PostPosted: Tue Oct 23, 2007 1:55 pm    Post subject: Reply with quote

I realized I let my fingers speak without consulting my brain in that "Let Reese" sentence fragment. I think I've cleaned it up.
Back to top
View user's profile Send private message
Mountain



Joined: 13 Mar 2007
Posts: 1527

PostPosted: Tue Oct 23, 2007 2:39 pm    Post subject: Reply with quote

Talk about position impacts, skill sets, play sequences and interactions, player interactions, model function types, etc. jogged my memory of a few past discussions.

These threads have discussion that I looked back at and think could be germaine
http://sonicscentral.com/apbrmetrics/viewtopic.php?t=1251
http://sonicscentral.com/apbrmetrics/viewtopic.php?t=1268
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    APBRmetrics Forum Index -> General discussion All times are GMT - 5 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group