web analytics

A Review of Adjusted Plus/Minus and Stabilization

May 20, 2011
By

As I prepare to release my first work based on the Adjusted Plus/Minus and derivative methods, I felt it would be wise to write a plain-English review of the state-of-the-art of Adjusted Plus/Minus and its derivatives, or at least what is known in the public domain.

What is Plus/Minus?

Plus/Minus, at its core, simply compares how the team fares with the player on the court with how the team fares with the same player off the court.  If the team does better with the team on the court than off, the player is probably pretty good.  If the team does better with the player on the bench, the player is probably not as good.

The concept of Plus/Minus originated, I believe, with hockey.  It was first used there in the 1950s, according to Wikipedia.  Since the 1960s it has been in common use.

Plus/Minus was adapted to basketball some time thereafter; I have not been able to track down when it was first used.

For the 2007-2008 season, the NBA began tracking Plus/Minus in box scores.  Results may be found on NBA.com (along with player pairs, trios, etc.) for every year since 05-06.  BasketballValue.com has complete results (termed “1 year unadjusted overall rating”) for every year since 07-08.

Issues with Raw Plus/Minus

It has long been recognized that raw Plus/Minus is not particularly indicative of each player’s actual performance or value.  The largest issue is that the stat is totally context-dependent: if I was on the court with Lebron James, Dwyane Wade, and Chris Bosh, I’d probably have a pretty good Plus/Minus despite the fact I shouldn’t be playing at all.  And even if I was a really good player, I still wouldn’t have good raw Plus/Minus numbers if I were playing with Jamario Moon, Anthony Parker, Ramon Sessions, and J.J. Hickson.  In addition, the opponents matter as well.  Some players just play in garbage time against the other team’s scrubs.  Their Plus/Minus numbers may be pretty good, but only because they were playing against Brian Scalabrine!

This big issue, the dependence of Plus/Minus on the other 9 players on the court, gave rise to a new statistic aimed at adjusting for this problem: Adjusted Plus/Minus.

Adjusted Plus/Minus: the Basics

Each period of time when a group of 10 players are on the floor may be expressed as:

ObservedEfficiencyDifferential = P1 + P2 + P3 + P4 + P5
                                 - P6 - P7 - P8 - P9 - P10 + HCA

where P1 to P5 are on one team, and P6 to P10 are on the other.

Next, compile that equation for EVERY matchup of lineups through the whole season.  You now have somewhere around 60,000 independent equations and somewhere around 400 unknowns (the number of players that played that year).  If you solve this system of equations for P1, P2 … Pn , weighting each equation by the number of possessions played in that stint, you have normal, raw APM.

However, some players have a very small sample size, so they may muddy the final results by basically taking up all of the residual in the few minutes they played.  Many APM approaches, such as that used by BasketballValue.com (the site to go to for APM results), lump the rarely-used players into an “other players” bucket and just average them out.  This should improve the estimates on the players that are actually rated.

Let me digress here to point the reader to the definitive series on Adjusted Plus/Minus, mostly over at 82games.com:

(Those last 2 provide nice “beginners” guides on how to do the actual construction of APM.)

Adjusted Plus/Minus: Collinearity & Sample Size

What are the issues with normal Adjusted Plus/Minus?

Well, the biggest one is that the sample size within one season is not enough for stability.  The “New and Improved” article above discusses some of the issues.  Basically, the big issue is one of collinearity.

Collinearity occurs in this data because of the way coaches use rotations.  P1 and P2 may always go into and come out of the game together–so which one causes the results?  Their numbers would be identical, except for the few times they don’t come in or go out together–and those few cases would dominate each of the players’ results.  So, say they always played together except 1 possession all year.  If the team scored (efficiency = 200) then the player out there at that time would be rated +200 above the other player.  They together may have to sum to 0 (known from all of the other stints together), but P1 is rated +100 and P2 -100 because of the 1 time they did not play together.  Now obviously, this never happens to this degree, but it does happen somewhat.

A second case is when P1 and P2 are only substituted for each other.  Suppose P1 and P2 both play center.  Suppose P1 is Dwight Howard and P2 is Marcin Gortat.  They only sub for each other, for just about the whole season.  When this is the case, we only really can detect how they relate to each other, not how the 2 of them relate to their teammates.  For the season, the team may be +8.  There is no way to know whether the center position is +10 and the rest is -2, or the center position is -2 and the rest of the team is +10.  What numbers are returned for the team are subject to the vagaries of the few minutes when the situation is different.

Why did I bring up D-Howard and Marcin Gortat?  Because the exact situation outlined above actually occurred.  In the 09/10 season, Howard and Gortat basically only subbed for each other.  Every lineup for the Magic that played more than 13 minutes total the whole year featured exactly 1 of Gortat or Howard manning the center position.  A similar situation occured in 08/09, though not quite as drastic.  What happened?  Here’s a quick table:

Year Dwight Howard Marcin Gortat Difference
2010-2011 14.09 -2.13 16.22
2009-2010 24.97 13.73 11.24
2008-2009 1.04 -8.06 9.1
2007-2008 12.71 N/A N/A

2010-2011 offers the clearest answer as to their actual rating, because Gortat was traded mid season, breaking up the tandem. Finally, we get a read that isn’t totally obscured by collinearity.  Note how the difference between Gortat and Howard stayed pretty consistent in ’08 and ’09, but they varied inversely with the rest of the team tremendously.

Interlude: Reliability vs. Validity

Before I go further, a quick review of reliability and validity is in order.  Columbia University has an excellent discussion of this subject online; if you don’t want to read it, I’ll attempt to summarize here.

“Reliability refers to a condition where a measurement process yields consistent scores (given an unchanged measured phenomenon) over repeat measurements.”  This is a measure that quantifies how much random fluctuations can interfere with getting consistent results.  As we have seen above, the collinearity of APM samples greatly decreases APM’s reliability.

“Validity refers to the extent we are measuring what we hope to measure (and what we think we are measuring).”  APM is completely valid, because it is directly measuring the result.  However, it may not be as valid for players who switch teams, because their value will almost certainly change based on fit/role in the system.  Box-score based stats are not as valid, because they measure a proxy rather than the subject directly.

This image, from the Columbia article, sums it up:

Reliability and Validity Target Chart

Reliability and Validity

Adjusted Plus/Minus: Stabilization Techniques

So, we have collinearity/reliability problems with APM.  What are the possible solutions?  Several were alluded to or discussed in the articles listed above.  I’ll take them 1 by 1.

1. Long-Term APM

The obvious one is just to use more years/data.  Many players change teams between years, and that greatly helps the collinearity problem.  Several long-term APM’s have been calculated.  Steve Ilardi did in his “New and Improved” article; he posted a full, equally-weighted six-year APM on the APBRmetrics forum.  More recently, Jeremias Engelmann ran 4 year 07-10 APM’s and posted on APBRmetrics (results here).  Aaron Barzilai on BasketballValue.com gives 2 year APM’s.

That helps.  However, the long-term APM may still have a few collinearity issues, and more importantly, it only tells us one thing: how good the given player averaged over the possessions they played in that span.  So yeah, it’s nice to know that KG was the best of the last 6 years, or that Lebron was the best of the last 4.  But perhaps I want more specificity?  Also, there can be issues with players that only played a portion of the time span–if a player plays only for the last year in the time span, and the other players have gone down hill from their average over the time span, the player that only played 1 year will be artificially & inaccurately inflated.  We’ve traded some validity for quite a bit more reliability.

There are at least 3 more approaches.

2. Weighted Long-Term APM

The next simplest approach is to take several years of data and weight the year desired more heavily than the other years.  This approach will give a more reliable estimate for the year in question, and doesn’t lose as much validity as the equally-weighted approach.  The collinearity is again reduced by the multi-year data.  Of course, we’re still dealing with, perhaps, 40% of our sample coming from years besides the one we wanted to measure.  This is the approach Steve Ilardi took in Adjusted Plus-Minus Ratings: New and Improved for 2007-2008. He posted updated numbers for this approach in a thread on APBRmetrics, and there was a long discussion on the method as well.

There are a few avenues for improvement with the “weighted years” approach that have not been explored, at least publicly.  The first would be to add an aging curve as a pre-processing adjustment.  If we’re trying to measure Kevin Garnett now, we should adjust his prior performance downward before running the regression.  Another area, one that’s quite basic, would be to run cross validation to determine the weights: take the prior year data and PART of the current year data, and explore what weights to use for best prediction of the rest of the current year data.  I’d like to see more research done on this sort of stabilization; I think it holds a lot of promise.

Okay, a few more approaches to go.

3. Statistically-Stabilized APM

In his seminal article on this subject, Dan Rosenbaum questioned the use of un-stabilized Adjusted Plus/Minus because of its very noisy nature.  His approach was to create a box score metric and use that to stabilize the regression.  The box score metric was generated by regressing box score data onto un-stabilized APM results; this sort of box score metric has become known as Statistical Plus/Minus (SPM) and has bred a whole series of versions.  That’s another story.  Anyway, Rosenbaum combined that SPM with his un-stabilized APM after each were run; the result was weighted toward either SPM or APM depending on which had the lower standard error.  I’m not entirely sure how this worked mathematically and how appropriate the approach was, but I’m pretty sure he knew what he was doing.  Thus far, I have not seen anyone else use a similar approach.

Now, we’ll really get math intensive:

4. Regularized Adjusted Plus/Minus

The last approach I have seen in public is the Regularized Adjusted Plus/Minus (RAPM) first proposed by Joe Sill and presented at the MIT Sloan Sports Analytics Conference. This approach takes advantage of a mathematical method known as Tikhonov Regularization or Ridge Regression.  The method essentially adds a penalty factor to the regression for results being far away from the mean.  This penalty factor, called lambda, is chosen based on cross validation, usually K-fold cross validation.  The data is broken into a number of segments (folds) and 1 at a time is removed and various choices for lambda explored.  The penalty factor is chosen such that maximum out-of-sample accuracy is attained.  This should remove most collinearity/noise, but lose only a small amount of the validity of the measure–except, one major issue is that because all players are regressed toward league mean, players with few minutes are considered average, and to rate out really badly, a player must have had a bunch of minutes to verify that the player was indeed that bad.

Jeremias Engelmann’s site stats-for-the-nba.appspot.com contains several versions of RAPM for different purposes.  He’s got 1 year RAPM for the last 5 or 6 years, a 3.x year rating that has best out-of-sample accuracy for this year, some longer term ratings, ratings for the Euroleague, and ratings for several stats (such as rebounding).  The framework is all RAPM.  There are a few issues, as mentioned above: the regression of rarely-used players to league mean is a big one, the lack of aging adjustment for multi-year ratings another, and not weighting multi-year ratings towards recency for best predictive power a third.  APBRmetrics threads are here and here.

Other Potential Variations on APM

I think those 4 stabilization variations are the ones that I have seen in the public domain.  There’s probably significantly more accurate/intricate models in NBA teams’ hands.  Winston, Rosenbaum, Lewin, Ilardi, Barzilai, Witus, and Sill are all either working for teams now or have worked for teams in the past.

There are a number of potential avenues for research that have not been explored.

  • One area is team lineup APM.  BasketballValue.com runs lineup APM’s, but due to small sample sizes and the tremendous number of lineups/variables used, using one of the stabilizing techniques are absolutely required to get good results.  Many lineups play just a few minutes a year, so lumping a bunch of the less-used lineups together may be necessary.  Wayne Winston also works with lineup APM’s quite a lot on his blog; I will note Winston often does not seem to respect enough the small sample size/error issues with his research.
  • Another major area would be looking at player pairs, trios, quartets, etc. to investigate synergistic effects between players.  Again, sample size issues are likely a risk here.
  • Bayesian Adjusted Plus/Minus, which is a generalization of the RAPM approach, has been talked about but has not been implemented in the public domain.  Whereas RAPM regresses toward league average, Bayesian APM could regress toward any orthogonal prior–toward a value based on any sort of independent information.  For instance, we could regress toward a value suggested by a player’s playing time, adjusted for how good his team is.  Or, we could regress toward a Statistical Plus/Minus rating.  Or–well, the possibilities are endless.  The key to Bayesian Adjusted Plus/Minus is proper validation of the methods used to avoid over-fitting the data.
  • Another area for research is the handling of positions–do some players play better in some positions than others?  The collinearity when researching this gets even worse, but it is an area that needs investigation.
  • As mentioned several times above, using aging adjustments to pre-process data looks like a promising approach for using long-term data sets to yield more of a point estimate (i.e. What’s KG like right now?)
  • A seemingly untouched area involves projections based on adjusted plus/minus numbers.  Once the difficulties with APM calculations have been ironed out, projections and good aging curves are a key area to study.  I discussed this subject once on the APBRmetrics forum, but never got very far.

I’m sure there are some other areas of potential research that I have overlooked, but those are what came to mind.

 

This wraps up my review of Adjusted Plus Minus and Stabilization Techniques.  I would like to make this a working document, so feel free to contact me to add something, point out an area that needs to be written better, or any other comment.  I hope this proves a useful reference to everyone in the advanced basketball stats community!

Tags: , , , , ,

28 Responses to A Review of Adjusted Plus/Minus and Stabilization

  1. EvanZ on May 20, 2011 at 11:43 am

    Great review, Daniel!

    I want to make sure I understand how you would calculate “lineup level” RAPM. Am I right that instead of having P1+P2+…-P9-P10 + HCA, you would simply have something like U1-U2+HCA, where U1 and U2 are the two units? If so, how many unique units are there in the league? It looks to me like each team has on the order of 100 or so. Therefore, you’d still have the same 60,000 equations or so, but now you’d have 3000 unknowns. Is that right?

    • DanielM on May 20, 2011 at 12:03 pm

      Yes, you have the construction correct. There are more than 3000 lineups, though. 12,408, to be exact, most of which have played very few possessions. Some idealization must be done–the approach I have taken is to roll all lineups for each team that have below a threshold number of possessions into a single variable–”LAL_ETC” for instance. If you set the threshold at 70 possessions, you’re down to like 600 unknowns, which is manageable. You lose some validity, but otherwise most of the lineups are regressed completely to 0 or whatever and validity is lost that way as well.

      • EvanZ on May 20, 2011 at 1:20 pm

        Interesting. It seems to me an alternative approach (maybe better, maybe worse) is instead of rolling all those units into one “etc” unit, roll them into the most used units by some sort of similarity measure. What do you think?

        • DanielM on May 20, 2011 at 1:45 pm

          Most of the lesser used units use 3, 4, or 5 bench players. In my initial runs, they were always way negative (except a team or two). I don’t want to roll them in with the primary units, which I want to measure more accurately.

  2. bobbofitos on May 20, 2011 at 1:35 pm

    Hey great summary of APM/the history/etc. Nice work!

  3. Boston Celtics Daily Links 5/20 | Celts Hub on May 20, 2011 at 1:40 pm

    [...] in Golden State, Doc’s Highest Paid NBA Coach, Will Shaq Return? God is My Judge  A Review of Adjusted Plus/Minus and StabilizationCeltics Town   Diaries of a go-to move: Paul Pierce’s step back jumper Green [...]

  4. Greyberger on May 20, 2011 at 2:06 pm

    Great read, I appreciate you took the time to write this all down.

  5. Stathead » Blog Archive on May 20, 2011 at 4:52 pm

    [...] A Review of Adjusted Plus/Minus and Stabilization: At DStats, an extremely detailed history of Adjusted Plus/Minus and similar metrics. [...]

  6. Daily Links 5/20 « jessydakers on May 22, 2011 at 10:14 pm

    [...] Interviews in Golden State, Doc?s Highest Paid NBA Coach, Will Shaq Return? God is My Judge  A Review of Adjusted Plus/Minus and Stabilization Celtics Town   Diaries of a go-to move: Paul Pierce?s step back jumper Green [...]

  7. Statophile Volume 20 | Raptors Republic on June 15, 2011 at 8:52 am

    [...] For an accessible read of the evolution of plus/minus to RAPM, I recommend this: “A Review of Adjusted Plus/Minus and Stabilization”With other data being relatively equal, RAPM highlights Casey as a favourite. Almost all of [...]

  8. Overrated and Underrated via RAPM (Part 2) | DStats on November 10, 2012 at 6:35 am

    [...] A week ago, I discussed overrated and underrated players over the past 12 years, evaluated by comparing popular box-score stats with Jeremias Engelmann’s 12 year average Regularized Adjusted Plus/Minus (RAPM) dataset. For a primer on the virtues of RAPM (in large samples), see my article reviewing the state-of-the-art of adjusted plus/minus and stabilization. [...]

  9. [...] there’s also the defensive part of Regularized Adjusted Plus Minus. This year, on defense alone, Lebron’s made him team better by an approximated 1.8 points per [...]

  10. [...] variable was the players’ regularized adjusted plus/minus for his 4th season in the league (read this for a thorough review of RAPM). We then used the formula provided by the regression to predict the 4th RAPM of this years’ [...]

  11. The Perils Of Plus-Minus | Ultiworld on August 16, 2013 at 8:31 am

    […] people are able to use plus-minus metrics to analyze players, but they often do so by utilizing multiple seasons of data, performing “adjustments” and […]

  12. […] It should also be noted that using regularization (the R in xRAPM) creates a bias wherein players who have not played very many minutes are pushed significantly towards the average. You can read more about the technical issues here. […]

  13. […] RAPM (Regularized adjusted plus-minus) – A variation of plus-minus that compares the on-court impact of every NBA player to a league-average standard (0). The adjustment helps account for much of the statistical noise that exists in raw plus-minus measures. […]

  14. […] RAPM (Regularized adjusted plus-minus) — A variation of plus-minus that compares the on-court impact of every NBA player to a league-average standard (0). The adjustment helps account for much of the statistical noise that exists in raw plus-minus measures. […]

  15. […] RAPM (Regularized adjusted plus-minus) — A variation of plus-minus that compares the on-court impact of every NBA player to a league-average standard (0). The adjustment helps account for much of the statistical noise that exists in raw plus-minus measures. […]

  16. […] floor with Duncan, Leonard, Parker and Splitter. They’re awesome and I’m not, so collinearity would allow me to ride the coattails of my more talented teammates into a great plus/minus […]

  17. […] RAPM (Regularized adjusted plus-minus) — A variation of plus-minus that compares the on-court impact of every NBA player to a league-average standard (0). The adjustment helps account for much of the statistical noise that exists in raw plus-minus measures. […]

  18. […] RAPM (Regularized adjusted plus-minus) — A variation of plus-minus that compares the on-court impact of every NBA player to a league-average standard (0). The adjustment helps account for much of the statistical noise that exists in raw plus-minus measures. […]

  19. […] RAPM (Regularized adjusted plus-minus) — A variation of plus-minus that compares the on-court impact of every NBA player to a league-average standard (0). The adjustment helps account for much of the statistical noise that exists in raw plus-minus measures. […]

  20. […] on XRAPM data and projected possessions. XRAPM (Expected Regularized Adjusted Plus Minus) is a play-by-play plus minus metric blended with a box score metric. The model works by looking at how team XRAPM has correlated with […]

  21. […] defense, quick and consistent rotations). With others, ample lineup data (see specifics of RAPM for details) is needed in order to arrive at an accurate result. When examining film as well, some details are […]

  22. […] I used the xRAPM (Expected Regularized Adjusted Plus/Minus) ratings from Jeremias Engelmann’s stats-for-the-nba.appspot.com to take a look at some of the career trends for NBA players through superstars. It uses play-by-play data, box score data, a height adjustment, and some advanced statistical techniques to evaluate all parts of a player’s game, from simpler things like scoring and rebounding to the more complicated like spacing, pick setting, and defensive rotations. It then puts a numerical value on how the player affects a team’s points per 100 possessions and points allowed per 100 possessions, relative to the league average. For more detail on plus/minus metrics see Daniel M’s explanatory post. […]

  23. […] That’s 600 words on something that could take 500 pages to accurately explain, but I think you now should understand to some degree what RAPM is telling you. If you want more there are a ton of resources out there, but a good starting point would be this. […]

  24. […] The defensive half of Regularized Adjusted Plus Minus. To put it simply, RAPM looks to reduce error in Adjusted Plus Minus calculations, which offer more […]

  25. […] version of xRAPM, which adds box score stats to Regularized Adjusted +/0 (RAPM, explained here or straight from the creator here). RAPM essentially is an adjusted +/- that tries to quiet the […]

Leave a Reply

Your email address will not be published. Required fields are marked *

Current day month ye@r *

DSMok1 on Twitter

To-Do List

  1. Salary and contract value discussions and charts
  2. Multi-year APM/RAPM with aging incorporated
  3. Revise ASPM based on multi-year RAPM with aging
  4. ASPM within-year stability/cross validation
  5. Historical ASPM Tableau visualizations
  6. Create Excel VBA recursive web scraping tutorial
  7. Comparison of residual exponents for rankings
  8. Comparison of various "value metrics" ability to "explain" wins
  9. Publication of spreadsheets used
  10. Work on using Bayesian priors in Adjusted +/-
  11. Work on K-Means clustering for player categorization
  12. Learn ridge regression
  13. Temporally locally-weighted rankings
  14. WOWY as validation of replacement level
  15. Revise ASPM with latest RAPM data
  16. Conversion of ASPM to" wins"
  17. Lineup Bayesian APM
  18. Lineup RAPM
  19. Learn SQL