|
APBRmetrics The statistical revolution will not be televised.
|
View previous topic :: View next topic |
Author |
Message |
jsill
Joined: 19 Aug 2009 Posts: 73
|
Posted: Wed Nov 04, 2009 5:50 am Post subject: Regularized APM at hoopnumbers.com (twice as accurate) |
|
|
I have some results at my website, hoopnumbers.com, which I'm hoping those of you with an interest in adjusted +/- (APM) will find interesting.
Mike Tamada and maybe some others of you here have mentioned the idea of using ridge regression (a.k.a. regularization) in conjunction with APM. Coincidentally,. this is what I've been working on off and on for the last few months, and I finally got it to the point where I'm ready to put it up on my website.
My main finding is that APM with a carefully chosen regularization parameter (which I'll call RAPM) is about twice as accurate as an APM using standard regression and using 3 years of data, where the weighting of past years of data and the reference player minutes cutoff has also been carefully optimized. Interestingly, this is more or less true even if you only use 1 year of data in conjunction with regularization, since the accuracy boost from using 3 years of data is measurable but fairly minor when regularization is used. The parameter estimates resulting from RAPM using 3 years of data are more intuitively reasonable than the 1 year estimates, although I think even the 1 year estimates look more reasonable than the 1 year estimates you get with standard regression.
My basis for claiming that RAPM is twice as accurate is explained at hoopnumbers.com, but I'll sketch it here. I evaluate the models by testing their predictions on unseen data, i.e., on games which were not included in the dataset used to fit the model. You can take the substitution history of a game and take a previously fitted APM model and generate predictions for each game snippet. Then you can add up all the snippet predictions, appropriately possessions-weighted, to get a prediction for the game's final margin of victory. You can compare this prediction to the actual margin of victory and evaluate the accuracy. I did this for the 342 games in March and April of last year, after fitting models on the games through February (and, additionally in some cases, also the games from '07-'08 and '06-'07). The best I managed to do with standard regression-based APM, using 3 years of data, was an R-squared of about 9% on the March and April games. With regularization and using 3 years of data, I got the R-squared up to 17%. Surprisingly, even with 1 year of data, I could get an R-squared of 16% if regularization was used appropriately, without even using a minutes cutoff, i.e., without lumping any players into the reference player bucket.
Again the claim of a near-doubling in accuracy is relative to a 3-year time-weighted APM using standard regression. If we were to compare RAPM to standard APM on 1 year of data, the boost would be bigger. In fact, I'm not even sure how to define the boost in that case, since the accuracy of 1 year standard APM is just not very good at all, according to my experiments.
In addition to introducing RAPM, a secondary goal of this post is to encourage the use of out-of-sample testing techniques like cross-validation as a way of evaluating methods to see what kind of predictive power they have, and also as a way to make choices which otherwise sometimes seem sort of arbitrary, like the minutes cutoff for the reference player or the weighting of past years of data. Looking at the standard errors around parameter estimates and so forth has its place, in my opinion, but ultimately these models are usually used to predict the future (implicitly and indirectly or otherwise) so I think it's important to gauge their success in doing so by testing their success on a holdout set which the model was not fit on.
Here is the writeup of my results:
http://hoopnumbers.com/allAnalysisView?analysis=RAPM&discussion=True
Here are my 3-year RAPM results:
http://hoopnumbers.com/allAnalysisView?analysis=RAPM&discussion=False&leaders=True&year=2009multiYear
Here are my 1-year RAPM results:
http://hoopnumbers.com/allAnalysisView?analysis=RAPM&discussion=False&leaders=True&year=2009
Thanks for any feedback! |
|
Back to top |
|
|
Crow
Joined: 20 Jan 2009 Posts: 817
|
Posted: Wed Nov 04, 2009 6:38 am Post subject: |
|
|
Very timely. Thanks and good luck with this and what might come out of it.
I've talked over time about a concern that traditional Adjusted was overstretching the data. Your writeup also mentions that concern and 1 year and 3 year RAPM gives a tighter range between +8 and -8. I have just started looking at your data but I like this much.
In the NESSiS thread I was essentially reaching for ways to achieve what cross-validation and regularization could achieve. I had temporarily neglected the Ridge Regression talk. I am glad to see it implemented.
The moves of players on good and bad teams under regularization compared to without might address the issue Nick S. noted about minutes weighted team Adjusted +/- of existing Adjusted models have some notable variance with actual team performance. How much better does with regularization do on that than without at the minutes-weighted team level?
Even after regularization it isn't explaining a lot is it? Sounds really low, lower than I expected. But this is a the stint or play by play level? I guess that shouldn't be so surprising or alarming. But how well does it explain at the game level or the season series between teams or playoff series level?
What do you think about the idea of some sort of SPM-APM blend or SPM influenced input for APM?
Do you have interest / plans in taking this technique to the lineup level? Player pairs?
Anything further to say about the multicollinearity issue or possible improved ways to address and reduce its effects?
What if you instead of trying to find a single value for each player applied to every stint on the court to solve the league puzzle as best you can, you allowed the model to assign a player a value from a limited number of different values, say 3 or 5 of them, to model a player who doesn't perform exactly the same all the time? Would that help reduce average errors and outliers? Can that be made to work? Would that help get at where the good and bad contexts and player fits with role and context are? Then the player's value instead of being a single point estimate of + this or - that would be for example 20% +4, 40% +2, and 40% -3 or some such. I think that could be useful. And I guess you could look at which of these partial Adjusted scores come during a more heavy ratio of more win-meaningful moments or less meaningful moments. Players will vary on that and it would be worth gauging. Winston addresses this issue and uses it in determining the average win impact estimate but seeing it at a lower split of the data might be useful for addressing the rotation based on game situation.
Last edited by Crow on Wed Nov 04, 2009 7:24 am; edited 4 times in total |
|
Back to top |
|
|
Mike G
Joined: 14 Jan 2005 Posts: 3604 Location: Hendersonville, NC
|
Posted: Wed Nov 04, 2009 6:46 am Post subject: |
|
|
Wow, 447 ranked players.
Misspelled "analyses" in the headline.
21 players are ranked as better than playing at home: Code: | 1 Lamar Odom 7.428
2 LeBron James 6.716
3 Ray Allen 5.956
4 Chris Paul 5.062
5 Dwyane Wade 4.966
6 Rashard Lewis 4.728
7 Yao Ming 4.608
8 Matt Bonner 4.468
9 Kevin Garnett 4.289
10 Jason Kidd 4.161
11 Jameer Nelson 4.026
12 J.R. Smith 3.996
13 Kirk Hinrich 3.976
14 Ronald Murray 3.756
15 Steve Nash 3.545
16 Brandon Roy 3.433
17 Rasheed Wallace 3.329
18 Tony Parker 3.216
19 Ben Wallace 3.215
20 Andre Iguodala 3.194
21 Kobe Bryant 3.192
22 Home Court Advantage 3.128 |
Here are the above-average Spurs (> 0): Code: | Rank Player RAPM
1 Matt Bonner 4.468
2 Tony Parker 3.216
3 Ime Udoka 1.821
4 Tim Duncan 1.699
5 Kurt Thomas 0.986
6 Roger Mason 0.509
7 Pops Mensah-Bonsu 0.331 |
These are the one-year rates. The 3-year has Duncan as a +5.73, which is #5 in the league.
The 3-year list in general seems to show fewer surprises. But Dwight is just #55, below Amir, Hayes, Tim Thomas, etc. _________________ `
36% of all statistics are wrong |
|
Back to top |
|
|
DSMok1
Joined: 05 Aug 2009 Posts: 611 Location: Where the wind comes sweeping down the plains
|
Posted: Wed Nov 04, 2009 11:59 am Post subject: |
|
|
Good work, jsill!
Did you happen to calculate the standard error for each player? That would be immensely useful in understanding the confidence associated with each evaluation. |
|
Back to top |
|
|
Ryan J. Parker
Joined: 23 Mar 2007 Posts: 711 Location: Raleigh, NC
|
Posted: Wed Nov 04, 2009 12:32 pm Post subject: |
|
|
Great stuff Joe, but I have a few questions. I'm in the process of becoming more familiar with using cross-validating to measure prediction error, so I'm very much interested in some of the stuff you've done here.
Would it be appropriate to say you're using 10-fold cross-validation using the data up to February? Did you do any cross-validation using an entire season worth of data? Can you calculate standard errors for your cross-validation estimates?
I would also be interested in seeing the mean absolute error of the cross-validation instead of just RMSE. Lastly, would it be possible to succinctly describe the difference(s) between ridge regression and lasso? _________________ I am a basketball geek. |
|
Back to top |
|
|
jsill
Joined: 19 Aug 2009 Posts: 73
|
Posted: Wed Nov 04, 2009 12:42 pm Post subject: |
|
|
Crow:
Thanks for all the feedback.
Quote: | In the NESSiS thread I was essentially reaching for ways to achieve what cross-validation and regularization could achieve |
Yes, I think the gist of some of your comments were aimed at combatting overfitting, which is indeed what regularization is intended to address.
Quote: | How much better does with regularization do on that than without at the minutes-weighted team level? |
I don't think I read Nick S.'s previous comments, but I guess the idea is that a minutes-weighted average of the APMs of the players on a team should roughly correspond to or track the team margin of victory (or success, more generally)? I haven't looked at that yet, but it's worth looking at.
I did do some preliminary experiments on a related topic, though. I need to go back and do these more carefully, so don't hold me to the results, but this is tentatively what I found. I ran a team-level version of APM which ignored the presence of individual players and essentially modelled things as if there all games were 1 on 1 games with Mr. Laker playing against Mr. Sixer or Mr. Bull playing versus Mr. Hornet, etc. The single APM number you get for each team corresponds pretty well to their average, season-long margin of victory, as you might expect. This approach actually beat standard APM by a healthy margin in its ability to predict the margin of victory on future, test set games. Regularized APM was at least its equal regarding test set accuracy. I was really hoping it would do better, but at least it was roughly equal. Again, I need to double-check these results, though.
Quote: | But this is a the stint or play by play level? |
The R-squareds are at a game level, actually (predicting game-level margin of victory). I plan to look at stint level as well, but the R-squareds there are going to be even lower. Yes, I was hoping for better, too, but at least we appear to be making progress relative to standard APM.
Quote: | What do you think about the idea of some sort of SPM-APM blend or SPM influenced input for APM?
|
I need to read about SPM (and your related ideas) some more before I can give an intelligent answer here.
Quote: | Do you have interest / plans in taking this technique to the lineup level? Player pairs?
|
Certainly. I have done some preliminary work on player pairs without much success, but I plan to revisit it. With enhancements like these, I think it's valuable to stay within the framework of cross validation in order to test whether the additions to your model really boost your ability to predict out-of-sample. Otherwise, you can just keep adding gizmos to your model and you'll probably end up overfitting and harming things. So when I say I haven't had success yet, I mean I haven't yet been able to demonstrate a boost in prediction performance from using player pairs (versus the individual player, sum of 5 APM framework). I haven't given up on it, though.
Quote: | Anything further to say about the multicollinearity issue or possible improved ways to address and reduce its effects? |
Not specifically just yet, but in my experience generally, one of the best ways to improve performance when working with noisy, limited data is to incorporate prior information or domain knowledge. The regularization I did, which essentially tells the regression what a reasonable APM range is, is one way to do that, but there may be others.
Your idea to assign players to a limited number of probability weighted values might be a form of regularization as well. It might be tricky algorithmically to implement, it, though.
I plan to look at concepts like win impact, as you mentioned, in the future. |
|
Back to top |
|
|
schtevie
Joined: 18 Apr 2005 Posts: 412
|
Posted: Wed Nov 04, 2009 12:45 pm Post subject: |
|
|
All else equal, bigger R-squareds are nice. But do the collective results make sense?
Take an arbitrary cut-off of a pretty darn good player, someone who delivers a net 4 points per 100 possessions. In the three-year data shown, there are nine players who accomplished this. Just nine.
The greatest of these is KG, who following the estimate, when he was in the game for his approximate 30 minutes, his contribution on the scoreboard above that of an average (0 APM) player was about 4.5 points. And for Chauncey Billups at #9, playing 35 minutes per game, his contribution was about 2.9 above average.
By contrast, the straight APM gives a dramatically different result (that is consistent whether it uses one or more seasons). Take Stephen Ilardi's stabilized results for last year (using six years of weighted data). Here we have 43 players with APMs above 4. And the stars are starrier.
Never mind the particular players cited; that isn't the point. The issue is whether it is plausible that the biggest stars in the league have such small impacts on the scoreboard and that there are apparently so few of them.
I am skeptical. |
|
Back to top |
|
|
basketballvalue
Joined: 07 Mar 2006 Posts: 208
|
Posted: Wed Nov 04, 2009 12:56 pm Post subject: |
|
|
Joe,
I think this looks very interesting and I'm looking forward to really reading through your links in detail. I particularly appreciate you've used the estimates to predict segments not in the dataset used for estimating, I agree this is very important.
For our reference, have you compared your predictions to predictions using other approaches (e.g. PER, Win Score,...)? This would help set our reference point for how good 9% or 17% is. Of course, this is venturing into the territory of Dan's presentation at NESSIS a couple of years ago.
Thanks,
Aaron _________________ www.basketballvalue.com
Follow on Twitter |
|
Back to top |
|
|
jsill
Joined: 19 Aug 2009 Posts: 73
|
Posted: Wed Nov 04, 2009 12:58 pm Post subject: |
|
|
Mike G:
I agree that Dwight Howard's ranking is lower than we would expect. On the other hand, at least his APM based on '08-'09 alone is 2.515, or 34th in the league. At basketballvalue he is at 1.04, or barely above average, for '08-'09 after looking tremendous in '07-'08.
The Spurs results and Matt Bonner's APM in particular are a little funky. If you look at his raw plus/minus per 48 minutes relative to the other Spurs last year, though, he looks awfully good. It's amazing to me, in particular, that they defended so well with him on the floor (90.7 vs. 92.6 for Duncan). As I mention in my writeup, by no means do I think Bonner was a top 10 player last year or the best on the Spurs. The numbers are what they are, though.
DSMok1: I do yet not have the standard errors for each player. Because I'm using regularization, this becomes more complicated than getting standard errors in a classic regression. In theory, we should be able to get an "a posteriori" distribution for the parameters which is a consequence of combining the a prior distribution from which the regularization term stems with the data. I need to do some research on how to do this, though. |
|
Back to top |
|
|
jsill
Joined: 19 Aug 2009 Posts: 73
|
Posted: Wed Nov 04, 2009 1:46 pm Post subject: |
|
|
Ryan:
Quote: | Would it be appropriate to say you're using 10-fold cross-validation using the data up to February? |
Yes, that is exactly what I did.
Quote: | Did you do any cross-validation using an entire season worth of data? |
Yes, I can easily run this and have run it in the past. It would be a good way of getting an accurate estimate of the out-of-sample error of your model if it's already been tuned by other means, since you could get an estimate over the entire season (by holding out each 10th of the data and testing on it in succession).
However, if you're tuning a parameter like the lambda of the regularization or the reference player minutes cutoff, then it's not quite legit to run the cross validation for lots of values of the parameter and then take the performance of the best performing parameter and report that as an unbiased estimate of the actual performance. In that case, you've subtly and indirectly fit your parameter on the same data you're evaluating it on. Reporting the CV results for the best parameter choices is not nearly as egregious as reporting the in-sample results of a regression on noisy, limited data, of course. It's a minor sin, but it's still slightly dubious.
Also, a tougher and more realistic evaluation is to evaluate a model on data which happened chronologically after the data the model was fit on, since that's the situation in reality. So that's why I used the cross validation to tune the parameters and the later March/April data for a final evaluation.
When you ask about standard errors for the cross-validation estimates, do you mean estimates of the RMSEs or estimates of the APM values for each player?
Quote: | I would also be interested in seeing the mean absolute error of the cross-validation instead of just RMSE |
I might try to run this at some point. I'd be surprised if it yielded a significantly different picture, though.
Quote: | Lastly, would it be possible to succinctly describe the difference(s) between ridge regression and lasso?
|
Ridge regression penalizes the square of the APM values, which corresponds to a gaussian prior over the APM values in a Bayesian interpretation, with the regularization parameter (lambda) corresponding to the ratio of the variance of the noise in the problem to the variance of your prior distribution.
The lasso would minimize the squared error on the data subject to the sum of the absolute values of the APMs being below some constant. I don't have hands-on experience with the lasso, but my understand is that it often sets the coefficients of many of the variables to zero. So in our context, it would likely yield an APM of zero for a lot of players. I'm not sure that's desirable, but on the other hand, it's hard to say for sure how it would perform in terms of prediction accuracy until we try it. |
|
Back to top |
|
|
deepak
Joined: 26 Apr 2006 Posts: 665
|
Posted: Wed Nov 04, 2009 2:05 pm Post subject: |
|
|
If you have the numbers readily available, could you publish the leaders in fast break points per game (team-wise, or even player-wise) over the last several years? I can't find that information elsewhere. |
|
Back to top |
|
|
Crow
Joined: 20 Jan 2009 Posts: 817
|
Posted: Wed Nov 04, 2009 2:34 pm Post subject: |
|
|
Thanks jsill for the replies to my questions and the others. |
|
Back to top |
|
|
Ryan J. Parker
Joined: 23 Mar 2007 Posts: 711 Location: Raleigh, NC
|
Posted: Wed Nov 04, 2009 2:46 pm Post subject: |
|
|
Thanks for the response Joe. Very insightful.
As for the standard error, I'm talking about the standard error of the RMSE. More specifically, in The Elements of Statistical Learning, Hastie et al. refer to "... the importance of reporting the estimated standard error of the CV estimate" (pg 249). I'm still going through this section of the book, so I don't know exactly how you go about calculating it, but I figure you might know how to do so. _________________ I am a basketball geek. |
|
Back to top |
|
|
DSMok1
Joined: 05 Aug 2009 Posts: 611 Location: Where the wind comes sweeping down the plains
|
Posted: Wed Nov 04, 2009 2:48 pm Post subject: |
|
|
I was considering your Lambda (a-priori distribution) and realized that what you are getting, because of its inclusion, is a "regressed to the mean" APM. Since you calculated your Lambda based on one year of data, the regression to the mean is greater. If you used multiple years of data, the Lambda should change such that there is a greater spread, or at least more outliers. That said, because of "regression to the mean" most players' APM does balance out over several years, reducing outliers that way....
I would be interested if you looked into this.
Basically, this is analogous to a Bayesian "best estimate" of the player's true current APM, similar to what I discussed here. The issue, however, is that all players are regressed toward 0--which is not accurate. I would prefer to see the player's regressed toward a value based on their "minutes per game," which I see as approximately based on APM and thus providing a good frame of reference. |
|
Back to top |
|
|
Crow
Joined: 20 Jan 2009 Posts: 817
|
Posted: Wed Nov 04, 2009 3:14 pm Post subject: |
|
|
DSMok1 wrote: | The issue, however, is that all players are regressed toward 0--which is not accurate. I would prefer to see the player's regressed toward a value based on their "minutes per game," which I see as approximately based on APM and thus providing a good frame of reference. |
You said based on rather than explicitly just minutes per game, so what about regressed toward
minutes per game * (1+ (team win%- .5))
or something in that vein? |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|