Below is a snapshot of the Web page as it appeared on 4/10/2011 (the last time our crawler visited it). This is the version of the page that was used for ranking your search results. The page may have changed since we last cached it. To see what might have changed (without the highlights), go to the current page.
Bing is not responsible for the content of this page.
APBRmetrics :: View topic - Advanced Statistical Plus/Minus
APBRmetrics Forum Index APBRmetrics
The statistical revolution will not be televised.
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Advanced Statistical Plus/Minus
Goto page 1, 2, 3, 4, 5, 6, 7  Next
 
Post new topic   Reply to topic    APBRmetrics Forum Index -> General discussion
View previous topic :: View next topic  
Author Message
DSMok1



Joined: 05 Aug 2009
Posts: 611
Location: Where the wind comes sweeping down the plains

PostPosted: Wed Jul 14, 2010 4:39 pm    Post subject: Advanced Statistical Plus/Minus Reply with quote

Advanced Statistical Plus/Minus

I've been working on deriving a new SPM regression based purely upon "advanced" stats (like TS% and OR%) for some time now. I feel comfortable enough with the results thus far to release the first iteration of this SPM.

The data used: Neil Paine's collection of 1-Yr APM's (unfortunately without std err's; I estimated the standard errors for weighting purposes), Joe Sill's 4 Year RAPM's, with the regression toward 0 backed out, and finally (and most importantly) Steve Ilardi's 6-Year APM's posted on this forum. These 6 Year APM's had quite low errors, and provided the groundwork for this regression. I weighted each player in the regression by 1/stderr^2, where stderr is their APM standard error.

I then compiled the advanced metrics from the Basketball Reference Play Index for each player, and weighted-averaged the multi-year data (including playoffs, for the APM's that included those). Thus I have 3 APM data sets and the associated advanced statistics.

I experimented with a number of constructions for the rebounding and especially the scoring parts of this regression. Finding a good way to relate turnovers, shooting, usage, and assists proved illusive for some time. I finally now have a construction I am comfortable with, though (like with any construction) there are a few holes.

To avoid over-weighting steals and blocks for defense, I also included offensive rating and defensive rating of the teams. This is not included in the final SPM, because the team adjustment (to make the teams sum to their efficiency differentials) already accounts for this.

Here are the factors in this regression:

Code:
Factor         Value
TRB%           1.33823090
TRB^2         -0.08918572
TRB^3          0.00219790
     
STL%           1.43951052
BLK%           0.35237880
MPG            0.10099403
     
TO% Coeff      0.66920540
PPP Threshold  1.64758151
PPP USG Scale  0.01394727
PPP AST Scale  0.01005596
Scoring        0.55728095
USG Const      4.67604494
     
Intercept     -6.90680060


Let me explain.

First of all, note the rebounding terms. I discovered that the value of splitting rebounding into offensive and defensive was much less than that of adding this nonlinearity (which didn't work when ORB and DRB were split). Basically, in the neighborhood of 10%, there isn't a huge amount of change. A player that gets very few rebound hurts the team a lot, and a player near 20% rebounds helps quite a bit. Here's a quick table:

Code:
TRB%   Pts
0      0.00
2.5    2.82
5      4.74
7.5    5.95
10     6.66
12.5   7.09
15     7.42
17.5   7.89
20     8.67
22.5   10.00
25     12.06


Next: steals, blocks and MPG. These are all straightforward, linear terms. Be aware, though: I'm inputing these percentages throughout in their whole-number forms, like Basketball-Reference outputs them.

Charges taken would be added into the steals term--other research I've done shows them to be equivalent in SPM terms (1 ChgTkn = 1 Steal). I'm trying to make this SPM able to be applied historically; thus I've left that out.

Here's the complicated part: the scoring term. First the actual formula:

Code:
{TS%*2*(1-TO%/100) - TO%Coeff*(TO%/100) - (PPPThreshold - PPPUSGScale*USG% - PPPASTScale*AST%)}*(USG% + USGConst)*Scoring


What's going on here? First of all, this is basically an efficiency*USG term. It takes into account TS%, USG%, TO%, and AST% to create a composite scoring value.

Now, term by term. The True Shooting term is very basic. It gives the number of points scored per possession used by the player. Next, the turnover term provides the penalty for each turnover. These terms make up the efficiency side of the equation.

Next, the PPP (Points per Possession) threshold and modifiers. The threshold is just a baseline constant. Then usage is subtracted out, indicating from the regression that there is a clear benefit to having a higher usage--in fact, .1 PPP per 7 %USG increased. Finally, the assist modifier. This is the ONLY place in the regression that has assists included. It was not significant anywhere else I tried it, compared to this location in the regression. Assists also modify the PPP; when everything is multiplied through the assists basically go to the form AST%*(USG%+Constant), which is a reasonable construction.

Finally, the whole (PPP - PPPThreshold) term is multiplied by (USG% + USGConst). Again, we're using whole percentages, everywhere but with TS% (I'm following Basketball Reference on this). Because of the USGConst, even if a player has NO usage, he still gets some credit for assists. Just not very much. In other words, Steve Blake just isn't that great.

Finally, after compiling the RAW SPM, the team adjustment must be applied. This can range from negligible (Cleveland, Boston, and Utah had 0 team adjustments this year) to quite large (+1.36 for ORL, -1.43 for GSW). Mostly defense is what is accounted for by the team adjustment since it is not captured well by the regression.

Here is a sample of the results--the top 20 in SPM, minimum 1000 minutes:

Code:
Rnk  Tm     Player           G     MP      SPM
1    CLE    LeBron James     76    2966   12.16
2    MIA    Dwyane Wade      77    2792    9.69
3    NOH    Chris Paul       45    1712    6.51
4    ORL    Dwight Howard    82    2843    6.31
5    SAS    Manu Ginobili    75    2150    5.57
6    LAL    Kobe Bryant      73    2835    5.38
7    OKC    Kevin Durant     82    3239    5.32
8    SAS    Tim Duncan       78    2438    5.21
9    BOS    Rajon Rondo      81    2963    4.82
10   LAC    Marcus Camby     51    1596    4.81
11   UTA    Deron Williams   76    2802    4.42
12   ATL    Josh Smith       81    2871    4.32
13   DAL    Dirk Nowitzki    81    3039    4.07
14   LAL    Pau Gasol        65    2403    4.01
15   UTA    Carlos Boozer    78    2673    3.99
16   WAS    Gilbert Arenas   32    1169    3.90
17   DEN    Nene Hilario     82    2755    3.77
18   TOR    Chris Bosh       70    2526    3.62
19   CHA    Gerald Wallace   76    3119    3.53
20   DEN    Carmelo Anthony  69    2634    3.51


The full results for 2009-2010 regular season are here: Google Spreadsheet: Advanced SPM 09-10

EDIT: See later in this thread for revisions to this method and a complete spreadsheet to play with.


Last edited by DSMok1 on Tue Oct 26, 2010 12:11 pm; edited 1 time in total
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Ilardi



Joined: 15 May 2008
Posts: 265
Location: Lawrence, KS

PostPosted: Wed Jul 14, 2010 4:55 pm    Post subject: Reply with quote

Nice work, DSM: this looks like an important contribution.

A couple of quick questions:

a) Can you provide standard error (se) estimates for the SPM values?

b) Did you consider using any of the advanced metrics from 82games? I've always thought eFG% Allowed would be quite useful in an SPM model . . .

c) What is the correlation between your SPM values for each player and his corresponding APM value? (i.e., the zero-order correlation for the entire league)

d) Any plans for "out-of-sample testing" on this new SPM metric (a la Joe Sill)?
Back to top
View user's profile Send private message
DSMok1



Joined: 05 Aug 2009
Posts: 611
Location: Where the wind comes sweeping down the plains

PostPosted: Wed Jul 14, 2010 5:38 pm    Post subject: Reply with quote

Ilardi wrote:
Nice work, DSM: this looks like an important contribution.

A couple of quick questions:

a) Can you provide standard error (se) estimates for the SPM values?

b) Did you consider using any of the advanced metrics from 82games? I've always thought eFG% Allowed would be quite useful in an SPM model . . .

c) What is the correlation between your SPM values for each player and his corresponding APM value? (i.e., the zero-order correlation for the entire league)

d) Any plans for "out-of-sample testing" on this new SPM metric (a la Joe Sill)?


Good to see you around, Ilardi!

a) How would I go about developing them for a nonlinear model? I would love to, but haven't figured out how. Another issue with the standard errors is that the APM against which we are regressing has error within it (which I think biases the error on the regression upwards).

b) I wanted to make this metric as useful historically as possible. Basketball Reference has all of the stats used in this regression back to 1977. A more intricate SPM is possible, using things like eFG% allowed, location of assists, etc.

c) I can run that... should I do it just on the low-error six season sample?

d) That would be tough for me to do. I don't have a lot of samples to work with.

Thanks for the input!
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Ilardi



Joined: 15 May 2008
Posts: 265
Location: Lawrence, KS

PostPosted: Wed Jul 14, 2010 6:01 pm    Post subject: Reply with quote

Thanks: and most guys on the forum call me 'Steve'.

I'd have to get a consult to figure out how to calculate se's on a nonlinear metric like that, but I know it must be do-able. Perhaps someone on this forum can point the way to a workable approach?

As for the correlation between SPM and APM, I might suggest using the 08-09 season, for which you have my 6-season estimates (weighted heavily toward 08-09), as well as your own SPM values.

On the out-of-sample test: presumably it would be possible to calculate SPM values for each player based on games through, say, the first 4 months of last season, and then use those estimates to predict results of the final 2 months. (Same basic approach Joe used with his ridge regression APM numbers.) It would be a fair amount of work, but should be easily do-able, at least in principle.
Back to top
View user's profile Send private message
DSMok1



Joined: 05 Aug 2009
Posts: 611
Location: Where the wind comes sweeping down the plains

PostPosted: Wed Jul 14, 2010 11:47 pm    Post subject: Reply with quote

Ilardi wrote:
Thanks: and most guys on the forum call me 'Steve'.

I'd have to get a consult to figure out how to calculate se's on a nonlinear metric like that, but I know it must be do-able. Perhaps someone on this forum can point the way to a workable approach?

As for the correlation between SPM and APM, I might suggest using the 08-09 season, for which you have my 6-season estimates (weighted heavily toward 08-09), as well as your own SPM values.

On the out-of-sample test: presumably it would be possible to calculate SPM values for each player based on games through, say, the first 4 months of last season, and then use those estimates to predict results of the final 2 months. (Same basic approach Joe used with his ridge regression APM numbers.) It would be a fair amount of work, but should be easily do-able, at least in principle.


I'd love to figure out how to do standard errors on nonlinear metrics.

I'll look into the correlation for the data you suggested, when I have time.

I still have issues with the out of sample test, because it is replacing a descriptive stat with a predictive stat--which is why the ridge regression technique provided the best out-of-sample results. It's basically regression to the mean. When I do regression, I'm going to use the samples, with their error, and regress in a Bayesian manner toward a prior based on peripheral data.
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Ilardi



Joined: 15 May 2008
Posts: 265
Location: Lawrence, KS

PostPosted: Thu Jul 15, 2010 12:18 pm    Post subject: Reply with quote

DSMok1 wrote:
I still have issues with the out of sample test, because it is replacing a descriptive stat with a predictive stat--which is why the ridge regression technique provided the best out-of-sample results. It's basically regression to the mean. When I do regression, I'm going to use the samples, with their error, and regress in a Bayesian manner toward a prior based on peripheral data.


But isn't the utility of any metric linked in large part to its predictive ability? Certainly, in the natural sciences, the valid prediction of phenomena is regarded as the sine qua non of the entire enterprise, so I'm admittedly a bit biased, but suffice it to say that even NBA decision makers realize that it's much more valuable to have a stat that gives accurate prediction than one that merely provides accurate description.

Also, although ridge regression makes use of 'regression to the mean', it does so in a limited way - essentially by simply reining in outlier values via an a priori (Bayesian) determination that they are unlikely. In my view, it's an extremely clever technique for enhancing the 'signal' of player APM values via tamping down the 'noise' of extreme variations in efficiency from one low-minute lineup to another.
Back to top
View user's profile Send private message
DSMok1



Joined: 05 Aug 2009
Posts: 611
Location: Where the wind comes sweeping down the plains

PostPosted: Thu Jul 15, 2010 1:00 pm    Post subject: Reply with quote

Ilardi wrote:
DSMok1 wrote:
I still have issues with the out of sample test, because it is replacing a descriptive stat with a predictive stat--which is why the ridge regression technique provided the best out-of-sample results. It's basically regression to the mean. When I do regression, I'm going to use the samples, with their error, and regress in a Bayesian manner toward a prior based on peripheral data.


But isn't the utility of any metric linked in large part to its predictive ability? Certainly, in the natural sciences, the valid prediction of phenomena is regarded as the sine qua non of the entire enterprise, so I'm admittedly a bit biased, but suffice it to say that even NBA decision makers realize that it's much more valuable to have a stat that gives accurate prediction than one that merely provides accurate description.

Also, although ridge regression makes use of 'regression to the mean', it does so in a limited way - essentially by simply reining in outlier values via an a priori (Bayesian) determination that they are unlikely. In my view, it's an extremely clever technique for enhancing the 'signal' of player APM values via tamping down the 'noise' of extreme variations in efficiency from one low-minute lineup to another.


I'm not disputing the value of prediction. However, I'd like to do that AFTER the SPM is calculated. In other words, construct a SPM, THEN apply the Bayesian regression to estimate "true talent", then combine with previous years to create a projection. I simply want the SPM itself to not be "biased" with information outside of actual production numbers.

I agree that RAPM works very well, but it does have a few quirks. Like Anderson Varajao getting very highly rated because it is so unlikely that Lebron is really a +11 player.
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Ilardi



Joined: 15 May 2008
Posts: 265
Location: Lawrence, KS

PostPosted: Sat Jul 17, 2010 10:40 am    Post subject: Reply with quote

I've also had Varajao rated highly using a more traditional APM approach . . .
Back to top
View user's profile Send private message
DSMok1



Joined: 05 Aug 2009
Posts: 611
Location: Where the wind comes sweeping down the plains

PostPosted: Sat Jul 17, 2010 12:27 pm    Post subject: Reply with quote

Ilardi wrote:
I've also had Varajao rated highly using a more traditional APM approach . . .


15th is pretty high. That's what the 4-year RAPM had him. Don't you think there is possibility of using the Bayesian in such a way causing some odd effects like that?

Also--would it be possible to get from you a 4 year, regular season only APM, though 09-10? Then I could use the advanced stats collected by Hoopdata in that time to run a more comprehensive SPM.
_________________
GodismyJudgeOK.com/DStats
Twitter.com/DSMok1
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Neil Paine



Joined: 13 Oct 2005
Posts: 774
Location: Atlanta, GA

PostPosted: Sat Jul 17, 2010 2:50 pm    Post subject: Reply with quote

Great work, DSMok!! I'm trying to replicate your work, and I had a question: how are you doing the team adjustment? What I always did was to find the minute-weighted average of each team's SPM and multiply by 5, then subtract that from the team's actual efficiency differential and divide the result by 5. But when I do that, my team adjustments don't match yours (ORL is +1.26, GSW is -1.70). Is it a rounding issue (I'm using the full, calculated versions of the BBR stats, while you used rounded versions), or is my team adjustment method incorrect?
_________________
http://www.basketball-reference.com/blog/
Back to top
View user's profile Send private message Visit poster's website
DSMok1



Joined: 05 Aug 2009
Posts: 611
Location: Where the wind comes sweeping down the plains

PostPosted: Sat Jul 17, 2010 4:02 pm    Post subject: Reply with quote

Neil Paine wrote:
Great work, DSMok!! I'm trying to replicate your work, and I had a question: how are you doing the team adjustment? What I always did was to find the minute-weighted average of each team's SPM and multiply by 5, then subtract that from the team's actual efficiency differential and divide the result by 5. But when I do that, my team adjustments don't match yours (ORL is +1.26, GSW is -1.70). Is it a rounding issue (I'm using the full, calculated versions of the BBR stats, while you used rounded versions), or is my team adjustment method incorrect?


I didn't use the team efficiency precisely. I summed to 2/3 SRS 1/3 Efficiency differential. Is the SRS calculated from efficiency differentials or point margins? If it is calculated off of efficiency differentials per game, it should be the best thing to sum to. I think it's point differential, which is why I used the average I did. But whatever you choose to sum to, that's up to you.

I'm glad you're doing this! You've got all of the data for compiling a full list and actually doing the team adjustments correctly.

I'm hoping this doesn't undervalue great centers--because there weren't any in the time period I used for the regression, I don't know if the top end of the regression can capture them. Then again, I don't know how much a great center truly contributed, either.
_________________
GodismyJudgeOK.com/DStats
Twitter.com/DSMok1
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Ilardi



Joined: 15 May 2008
Posts: 265
Location: Lawrence, KS

PostPosted: Sat Jul 17, 2010 4:59 pm    Post subject: Reply with quote

[quote="DSMok1"]
Ilardi wrote:

Also--would it be possible to get from you a 4 year, regular season only APM, though 09-10? Then I could use the advanced stats collected by Hoopdata in that time to run a more comprehensive SPM.


I haven't run it yet, but maybe your request will be just the catalyst I need. Is it really the case that no one else out there has put out any publicly available multi-year APM stats? Are Aaron's 2-year APM stats on basketballvalue.com all there is? If that's the case, I really will try to carve out the time to work on this . . .
Back to top
View user's profile Send private message
DSMok1



Joined: 05 Aug 2009
Posts: 611
Location: Where the wind comes sweeping down the plains

PostPosted: Sat Jul 17, 2010 5:28 pm    Post subject: Reply with quote

Ilardi wrote:
DSMok1 wrote:

Also--would it be possible to get from you a 4 year, regular season only APM, though 09-10? Then I could use the advanced stats collected by Hoopdata in that time to run a more comprehensive SPM.


I haven't run it yet, but maybe your request will be just the catalyst I need. Is it really the case that no one else out there has put out any publicly available multi-year APM stats? Are Aaron's 2-year APM stats on basketballvalue.com all there is? If that's the case, I really will try to carve out the time to work on this . . .


I don't know of any other APM's out there, now that the RAPM was taken down. The one I could use would be an "average" APM over the last 4 years.
_________________
GodismyJudgeOK.com/DStats
Twitter.com/DSMok1
Back to top
View user's profile Send private message Send e-mail Visit poster's website
DSMok1



Joined: 05 Aug 2009
Posts: 611
Location: Where the wind comes sweeping down the plains

PostPosted: Mon Jul 19, 2010 9:53 am    Post subject: Reply with quote

Neil Paine wrote:
Yeah, SRS is actually just SOS-adjusted point differential per game, which means it's not tempo-independent (we do it that way because we only have game possessions going back to 1986-87). If we had historical SOS-adjusted efficiency differential, that would definitely be the thing to sum to, but since we don't, I'm probably going to just sum to efficiency differential (which is what APM does anyway).

In case I didn't say so, like this new regression a lot! The most glaring problem with the old regression was that it drastically overvalued assists (and therefore PGs -- I found that the average PG was +1 or so while every other position was near zero), but it looks like you fixed this by tying AST% to the scoring term instead of having it stand alone. I would imagine this retrodicts better than the old regression as a result.

...

One troubling result at a first glance is that Dennis Rodman's 1995 is +10.91, one of the greatest seasons of all time... Maybe the rebounding term needs to be re-evaluated?
(Posted from an email)

It looks like the rebounding term will need some more work. The cubic works for just about everyone, except Rodman. He breaks the regression. The 30% TRB% is way out into the nonlinear term, and is worth like 9 points more than D-Howard's 22% TRB%.

Here are a few possibilities:



The cubic is the best fit, but only by a hair. The power + Sq is very close in terms of fit, and would probably be my pick to use. The pure power curve can't capture the desired up-turn at higher TRB% rates, but in terms of fit is still very close to power+Sq (most of the difference is out where there aren't any observations). The linear is here for reference (incidentally, TRB% by itself is a BETTER fit than ORB% and DRB% split out).
_________________
GodismyJudgeOK.com/DStats
Twitter.com/DSMok1
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Mike G



Joined: 14 Jan 2005
Posts: 3618
Location: Hendersonville, NC

PostPosted: Mon Jul 19, 2010 10:19 am    Post subject: Reply with quote

DSMok1 wrote:
...
I'm hoping this doesn't undervalue great centers...


This is reminiscent of a discussion starting with Nazr Mohammed (this year), that many of his rates resembled prime Moses, Gilmore, and others. Low versatility index the likely culprit.

It was asked then whether those known greats were also undervalued by the (then most current) SPM method.

If versatile less-than-great centers (Divac, Daugherty) seem to be better than Moses or Artis, then maybe it needs to be fixed?
_________________
`
36% of all statistics are wrong
Back to top
View user's profile Send private message Send e-mail
Display posts from previous:   
Post new topic   Reply to topic    APBRmetrics Forum Index -> General discussion All times are GMT - 5 Hours
Goto page 1, 2, 3, 4, 5, 6, 7  Next
Page 1 of 7

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group