View previous topic :: View next topic |
Author |
Message |
Mike G
Joined: 14 Jan 2005 Posts: 3615 Location: Hendersonville, NC
|
Posted: Wed Feb 09, 2011 7:32 am Post subject: |
|
|
back2newbelf wrote: | I did a test on how many years one should use to get best prediction results.
I split this seasons' data into several (N) parts, computed player values on N-1 parts N times, always leaving out just one part. Then, using the computed player values, computed error on the part that was left out (N times, because N parts were left out).
Then I did the same thing but included data from seasons prior. All of this older data is used to compute player values, combined with the parts from this running season, always removing one part from this running season as described above
If I use just this season the error on out-of-sample-2010/2011-data is bigger than if I include 2009/2010. Including 2008/2009 on top of 09/10 improves the error even more and it's actually best when I include 07/08 too. From here on it always gets worse when I include older data.
From best to worst:
3.x year
4.x year
2.x year
5.x year
1.x year
0.x year | Nice.
3-4 years sounds about right. _________________ `
36% of all statistics are wrong |
|
Back to top |
|
|
DSMok1
Joined: 05 Aug 2009 Posts: 611 Location: Where the wind comes sweeping down the plains
|
Posted: Wed Feb 09, 2011 8:57 am Post subject: |
|
|
I think it will be possible to apply aging curves, back2newbelf, as part of a pre-processing phase. Once that is done, I will be interested in what length of time is best...
Aging curves in preprocessing: for each player, convert his value in the past to current value. If the player was 21 in the previous matchup and now is 25, take the aging from 21 to 25 and add it to the observed score in the previous matchup. Probably just do this at a yearly basis; I have a rough aging curve for APM calculated that I use for ASPM. After the preprocessing, do the same calcs you just; I expect to see maybe even the 5.x take over the best position, and the overall error be significantly lower (maybe the lambdas lower as well). _________________ GodismyJudgeOK.com/DStats
Twitter.com/DSMok1 |
|
Back to top |
|
|
Ilardi
Joined: 15 May 2008 Posts: 265 Location: Lawrence, KS
|
Posted: Wed Feb 09, 2011 11:43 am Post subject: |
|
|
back2newbelf,
A couple quick questions:
1) did you include playoff data in your models?
2a) did you weight each season equally?
2b) if so, have you explored the effect of differential weighting across seasons? |
|
Back to top |
|
|
back2newbelf
Joined: 21 Jun 2005 Posts: 275
|
Posted: Wed Feb 09, 2011 12:26 pm Post subject: |
|
|
Ilardi wrote: | 1) did you include playoff data in your models?
2a) did you weight each season equally?
2b) if so, have you explored the effect of differential weighting across seasons? |
No, yes, no. Very good points. I'll add everything to my todo list _________________ http://stats-for-the-nba.appspot.com/ |
|
Back to top |
|
|
EvanZ
Joined: 22 Nov 2010 Posts: 298
|
Posted: Fri Feb 11, 2011 9:21 am Post subject: |
|
|
b2nb, question...
Is it possible to break out the rebounding component of the offensive and defensive RAPM?
The reason I ask is because I'm doing some validation of ezPM. One of the things I want to do is regress each individual component of ezPM (off100, def100, reb100) against the components of RAPM, if possible.
As a test of ezPM (and maybe a suggestion for you to do with RAPM), I have regressed ezPM100 against each of its internal components (O100, D100, REB100). Here's the summary for the REB100 regression:
Code: |
> summary(ezpm.reb100.lm)
Call:
lm(formula = ezPM100 ~ REB100 - 1, data = ezpm.2010, weights = POSS)
Residuals:
Min 1Q Median 3Q Max
-265.18 -64.45 17.86 96.57 438.14
Coefficients:
Estimate Std. Error t value Pr(>|t|)
REB100 0.9567 0.1490 6.422 7.83e-10 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 126.9 on 226 degrees of freedom
Multiple R-squared: 0.1543, Adjusted R-squared: 0.1506
F-statistic: 41.24 on 1 and 226 DF, p-value: 7.83e-10
|
The R^2 for the rebounding component for the current season is about 0.15, which lines up very well with my previous regressions of point differential on the four factors. In that study, I found that rebounding accounted for about 15% of point differential. Therefore, it's obviously comforting that ezPM is about the same - i.e. rebounding is not being give more weight than it's involvement in winning.
Hopefully, this makes sense. Have you thought about or previously done these regressions?
-evan _________________ http://www.thecity2.com
http://www.ibb.gatech.edu/evan-zamir |
|
Back to top |
|
|
back2newbelf
Joined: 21 Jun 2005 Posts: 275
|
Posted: Wed Feb 23, 2011 9:06 am Post subject: |
|
|
updated this seasons' ranking http://stats-for-the-nba.appspot.com/ranking11
I still recommend the 4 year ranking though
By 1 year appr RAPM:
-Udoh is now the top rookie, the Warriors are +3.2 when he plays, -4.3 when he doesn't. Wall looks atrocious
- two 'no names' in the top 20: Keyon Dooling and Anthony Tolliver
A whole bunch of Bulls players are rated as above average defenders. I think it's all coaching _________________ http://stats-for-the-nba.appspot.com/ |
|
Back to top |
|
|
greyberger
Joined: 27 Sep 2010 Posts: 52
|
Posted: Sun Mar 06, 2011 1:49 am Post subject: |
|
|
On the subject of APM, Arturo Galletti has a post about the APM available at basketballvalue.com.
I didn't know that a SPM-type regression was involved until I read the details. That would seem to be a key point of distinction between APM there and hypothetical public alternatives. Are there any good posts or links about this step in the Rosenbaum approach? I'm not even sure I'm asking the right question.
How about this one: any responses provoked by the Arturo post? |
|
Back to top |
|
|
DLew
Joined: 13 Nov 2006 Posts: 224
|
Posted: Mon Mar 07, 2011 11:13 am Post subject: |
|
|
Arturo clearly got a little confused there... Adjusted plus-minus is not easy to understand, especially for people who come in with a prior belief that it's not a good method. If Arturo actually wanted to understand APM you would think he would post in this forum or check out Eli's very informative work on the topic. |
|
Back to top |
|
|
greyberger
Joined: 27 Sep 2010 Posts: 52
|
Posted: Mon Mar 07, 2011 12:44 pm Post subject: |
|
|
In the post he says he's been working on this for 'months' (before 'breaking out his pimp hand')
Just to be clear and to satisfy my special curiosity:
Adjusted +/- at BValue.com does not use a SPM component as AG claims? |
|
Back to top |
|
|
bchaikin
Joined: 27 Jan 2005 Posts: 690 Location: cleveland, ohio
|
Posted: Mon Mar 07, 2011 2:18 pm Post subject: |
|
|
wait a minute...
you claimed in a previous thread:
Adjusted plus-minus... when compared to all other overall player rating systems... is clearly superior when done properly.
when questioned with some spurious results of adjusted plus minus, and whether you could substantiate them, you blew off that questioning with:
No, I have contractual obligations not to, and frankly I wouldn't care to if I was allowed. If you choose to make an effort to understand adjusted plus-minus then you'll likely come around, but I suspect you've already made up your mind about it.
but now when someone actually does make the effort to understand the process, you respond with:
Arturo clearly got a little confused there... Adjusted plus-minus is not easy to understand, especially for people who come in with a prior belief that it's not a good method. If Arturo actually wanted to understand APM you would think he would post in this forum or check out Eli's very informative work on the topic.
now you are blowing off the attempt to understand it with the caveat that they say it doesn't work because they don't want it to work and must have some grudge against it...
on the one hand you are saying people need to make an effort to understand the process, but when they actually do you say it's not easy to understand...
i read through his posting - several times - including the comments section. and my question for you is this - how do you respond to his statements?
Calculating Adjusted +/-
The final step is to take the Pure regression and the Stats model and adds them up by player like so:
APM = x* Pure +/- + (1-x)*Statistical +/-
And proceed to adjust x between 10% and 90% for each player to minimize the error. In essence he tweaks the rating to get a high R-Square. To summarize, the APM model calculates two variables with a low correlation to wins (R^2 <5%) and adds them up to minimize the error and guarantee a 90%+ Rsq. for the overall model.
Funny that.
What does this mean exactly? Well, the R^2 for the APM model is very much a fabrication. The correlation to point margin & wins of the model shown in Basketball value is artificially inflated by adding the error back in.
that last line is a pretty serious claim. if what he is saying is true, that the players values (normalized to player minutes played) do not add up to team wins (or team average per game point differential) without a fudge factor, then the value of the process for player evaluation is severely weakened...
i am not saying he is correct, but am simply asking - how do you respond to that? |
|
Back to top |
|
|
greyberger
Joined: 27 Sep 2010 Posts: 52
|
Posted: Mon Mar 07, 2011 6:17 pm Post subject: |
|
|
Apparently this is much ado about nothing. Arturo is commenting on a SPM technique outlined in a 2004 Rosenbaum piece. He incorrectly describes this as being the foundation for BV.com's APM ratings.
Apologies for hijacking the thread. |
|
Back to top |
|
|
back2newbelf
Joined: 21 Jun 2005 Posts: 275
|
Posted: Mon Mar 07, 2011 7:10 pm Post subject: |
|
|
Quote: | if what he is saying is true, that the players values (normalized to player minutes played) do not add up to team wins (or team average per game point differential) without a fudge factor, then the value of the process for player evaluation is severely weakened...
i am not saying he is correct, but am simply asking - how do you respond to that? |
Possession weighted RAPM should line up to the teams' homecourt (and pace) adjusted SRS-rating pretty well, and also to the teams' average point differential. And I sure didn't use any "fudge factor" _________________ http://stats-for-the-nba.appspot.com/ |
|
Back to top |
|
|
DLew
Joined: 13 Nov 2006 Posts: 224
|
Posted: Mon Mar 07, 2011 8:19 pm Post subject: |
|
|
Bob, I think greyberger's comment pretty much summed things up... |
|
Back to top |
|
|
Crow
Joined: 20 Jan 2009 Posts: 822
|
Posted: Mon Mar 07, 2011 9:47 pm Post subject: |
|
|
Looking again at back2newbelf's Coaching only Adjusted +/= for the most recent 5 years I found a few trends:
Only Scott Brooks was +1 or better on both the offensive and defensive splits. Aided by the favorable comparison to the previous stretch with PJ Carlesimo.
Of the top 10 actives, 6 are in the east, 4 in the west.
Of the bottom 17 out the total of 67 Coaches active during the time period only 2 are still active. Only three Coaches -1 or worse overall are active.
There were 10 Coaches estimated to have more than a 2 point helpful impact on defense. No Coach was +2 on offense. 2 active coaches are over +1.5.
Only 6 of the 67 were over +1 on offense. 13 were estimated to have more than a 1 point helpful impact on defense. 20 were estimated to have more than a 0.5 point helpful impact on defense. 14 on offense. Only 3 better than 0.5 positive impact on both, so except for those rare occasions the better Coaches are estimated to be notably helpful on just one side of the court.
15 were estimated -1 or worse on offense. 13 on defense. None on both. |
|
Back to top |
|
|
back2newbelf
Joined: 21 Jun 2005 Posts: 275
|
Posted: Tue Mar 08, 2011 5:31 pm Post subject: |
|
|
Some of this stuff appeared here http://www.slate.com/id/2287339/
But I have to say, if that's the way magazines write about my work I don't want my stuff to appear anywhere.
Writing about a certain technique, then saying that it might actually a bad technique because one person said so and cite the critique, while failing to realize that (1) the person who wrote the critique only has comments on the original model and (2) that it has been proven that using ridge regression instead greatly improves performance.. that's just beautiful.
no wait, that's bad journalism _________________ http://stats-for-the-nba.appspot.com/ |
|
Back to top |
|
|
|