As I prepare to release my first work based on the Adjusted Plus/Minus and derivative methods, I felt it would be wise to write a plain-English review of the state-of-the-art of Adjusted Plus/Minus and its derivatives, or at least what is known in the public domain.
What is Plus/Minus?
Plus/Minus, at its core, simply compares how the team fares with the player on the court with how the team fares with the same player off the court. If the team does better with the team on the court than off, the player is probably pretty good. If the team does better with the player on the bench, the player is probably not as good.
The concept of Plus/Minus originated, I believe, with hockey. It was first used there in the 1950s, according to Wikipedia. Since the 1960s it has been in common use.
Plus/Minus was adapted to basketball some time thereafter; I have not been able to track down when it was first used.
For the 2007-2008 season, the NBA began tracking Plus/Minus in box scores. Results may be found on NBA.com (along with player pairs, trios, etc.) for every year since 05-06. BasketballValue.com has complete results (termed “1 year unadjusted overall rating”) for every year since 07-08.
Issues with Raw Plus/Minus
It has long been recognized that raw Plus/Minus is not particularly indicative of each player’s actual performance or value. The largest issue is that the stat is totally context-dependent: if I was on the court with Lebron James, Dwyane Wade, and Chris Bosh, I’d probably have a pretty good Plus/Minus despite the fact I shouldn’t be playing at all. And even if I was a really good player, I still wouldn’t have good raw Plus/Minus numbers if I were playing with Jamario Moon, Anthony Parker, Ramon Sessions, and J.J. Hickson. In addition, the opponents matter as well. Some players just play in garbage time against the other team’s scrubs. Their Plus/Minus numbers may be pretty good, but only because they were playing against Brian Scalabrine!
This big issue, the dependence of Plus/Minus on the other 9 players on the court, gave rise to a new statistic aimed at adjusting for this problem: Adjusted Plus/Minus.
Adjusted Plus/Minus: the Basics
Each period of time when a group of 10 players are on the floor may be expressed as:
ObservedEfficiencyDifferential = P1 + P2 + P3 + P4 + P5 - P6 - P7 - P8 - P9 - P10 + HCA
where P1 to P5 are on one team, and P6 to P10 are on the other.
Next, compile that equation for EVERY matchup of lineups through the whole season. You now have somewhere around 60,000 independent equations and somewhere around 400 unknowns (the number of players that played that year). If you solve this system of equations for P1, P2 … Pn , weighting each equation by the number of possessions played in that stint, you have normal, raw APM.
However, some players have a very small sample size, so they may muddy the final results by basically taking up all of the residual in the few minutes they played. Many APM approaches, such as that used by BasketballValue.com (the site to go to for APM results), lump the rarely-used players into an “other players” bucket and just average them out. This should improve the estimates on the players that are actually rated.
Let me digress here to point the reader to the definitive series on Adjusted Plus/Minus, mostly over at 82games.com:
- Numbers Game (Washington Post, Winston & Sagarin, 4/2004)
- Measuring how NBA players help their teams win (Dan Rosenbaum, 4/2004)
- 2005/2006 Adjusted Plus Minus Results (David Lewin, 2006)
- 2004/2005 Adjusted Plus Minus Results (David Lewin, 2006)
- Adjusted Plus-Minus: An Idea Whose Time Has Come (Steve Ilardi, 10/2007)
- Adjusted Plus-Minus: 2007-2008 Midseason results (Ilardi & Barzilai, 2008)
- Adjusted Plus-Minus Ratings: New and Improved for 2007-2008 (Ilardi & Barzilai, 2008)
- Calculating Adjusted Plus/Minus (Eli Witus, 2008)
- Offensive and Defensive Adjusted Plus/Minus (Eli Witus, 2008)
(Those last 2 provide nice “beginners” guides on how to do the actual construction of APM.)
Adjusted Plus/Minus: Collinearity & Sample Size
What are the issues with normal Adjusted Plus/Minus?
Well, the biggest one is that the sample size within one season is not enough for stability. The “New and Improved” article above discusses some of the issues. Basically, the big issue is one of collinearity.
Collinearity occurs in this data because of the way coaches use rotations. P1 and P2 may always go into and come out of the game together–so which one causes the results? Their numbers would be identical, except for the few times they don’t come in or go out together–and those few cases would dominate each of the players’ results. So, say they always played together except 1 possession all year. If the team scored (efficiency = 200) then the player out there at that time would be rated +200 above the other player. They together may have to sum to 0 (known from all of the other stints together), but P1 is rated +100 and P2 -100 because of the 1 time they did not play together. Now obviously, this never happens to this degree, but it does happen somewhat.
A second case is when P1 and P2 are only substituted for each other. Suppose P1 and P2 both play center. Suppose P1 is Dwight Howard and P2 is Marcin Gortat. They only sub for each other, for just about the whole season. When this is the case, we only really can detect how they relate to each other, not how the 2 of them relate to their teammates. For the season, the team may be +8. There is no way to know whether the center position is +10 and the rest is -2, or the center position is -2 and the rest of the team is +10. What numbers are returned for the team are subject to the vagaries of the few minutes when the situation is different.
Why did I bring up D-Howard and Marcin Gortat? Because the exact situation outlined above actually occurred. In the 09/10 season, Howard and Gortat basically only subbed for each other. Every lineup for the Magic that played more than 13 minutes total the whole year featured exactly 1 of Gortat or Howard manning the center position. A similar situation occured in 08/09, though not quite as drastic. What happened? Here’s a quick table:
|Year||Dwight Howard||Marcin Gortat||Difference|
2010-2011 offers the clearest answer as to their actual rating, because Gortat was traded mid season, breaking up the tandem. Finally, we get a read that isn’t totally obscured by collinearity. Note how the difference between Gortat and Howard stayed pretty consistent in ’08 and ’09, but they varied inversely with the rest of the team tremendously.
Interlude: Reliability vs. Validity
Before I go further, a quick review of reliability and validity is in order. Columbia University has an excellent discussion of this subject online; if you don’t want to read it, I’ll attempt to summarize here.
“Reliability refers to a condition where a measurement process yields consistent scores (given an unchanged measured phenomenon) over repeat measurements.” This is a measure that quantifies how much random fluctuations can interfere with getting consistent results. As we have seen above, the collinearity of APM samples greatly decreases APM’s reliability.
“Validity refers to the extent we are measuring what we hope to measure (and what we think we are measuring).” APM is completely valid, because it is directly measuring the result. However, it may not be as valid for players who switch teams, because their value will almost certainly change based on fit/role in the system. Box-score based stats are not as valid, because they measure a proxy rather than the subject directly.
This image, from the Columbia article, sums it up:
Adjusted Plus/Minus: Stabilization Techniques
So, we have collinearity/reliability problems with APM. What are the possible solutions? Several were alluded to or discussed in the articles listed above. I’ll take them 1 by 1.
1. Long-Term APM
The obvious one is just to use more years/data. Many players change teams between years, and that greatly helps the collinearity problem. Several long-term APM’s have been calculated. Steve Ilardi did in his “New and Improved” article; he posted a full, equally-weighted six-year APM on the APBRmetrics forum. More recently, Jeremias Engelmann ran 4 year 07-10 APM’s and posted on APBRmetrics (results here). Aaron Barzilai on BasketballValue.com gives 2 year APM’s.
That helps. However, the long-term APM may still have a few collinearity issues, and more importantly, it only tells us one thing: how good the given player averaged over the possessions they played in that span. So yeah, it’s nice to know that KG was the best of the last 6 years, or that Lebron was the best of the last 4. But perhaps I want more specificity? Also, there can be issues with players that only played a portion of the time span–if a player plays only for the last year in the time span, and the other players have gone down hill from their average over the time span, the player that only played 1 year will be artificially & inaccurately inflated. We’ve traded some validity for quite a bit more reliability.
There are at least 3 more approaches.
2. Weighted Long-Term APM
The next simplest approach is to take several years of data and weight the year desired more heavily than the other years. This approach will give a more reliable estimate for the year in question, and doesn’t lose as much validity as the equally-weighted approach. The collinearity is again reduced by the multi-year data. Of course, we’re still dealing with, perhaps, 40% of our sample coming from years besides the one we wanted to measure. This is the approach Steve Ilardi took in Adjusted Plus-Minus Ratings: New and Improved for 2007-2008. He posted updated numbers for this approach in a thread on APBRmetrics, and there was a long discussion on the method as well.
There are a few avenues for improvement with the “weighted years” approach that have not been explored, at least publicly. The first would be to add an aging curve as a pre-processing adjustment. If we’re trying to measure Kevin Garnett now, we should adjust his prior performance downward before running the regression. Another area, one that’s quite basic, would be to run cross validation to determine the weights: take the prior year data and PART of the current year data, and explore what weights to use for best prediction of the rest of the current year data. I’d like to see more research done on this sort of stabilization; I think it holds a lot of promise.
Okay, a few more approaches to go.
3. Statistically-Stabilized APM
In his seminal article on this subject, Dan Rosenbaum questioned the use of un-stabilized Adjusted Plus/Minus because of its very noisy nature. His approach was to create a box score metric and use that to stabilize the regression. The box score metric was generated by regressing box score data onto un-stabilized APM results; this sort of box score metric has become known as Statistical Plus/Minus (SPM) and has bred a whole series of versions. That’s another story. Anyway, Rosenbaum combined that SPM with his un-stabilized APM after each were run; the result was weighted toward either SPM or APM depending on which had the lower standard error. I’m not entirely sure how this worked mathematically and how appropriate the approach was, but I’m pretty sure he knew what he was doing. Thus far, I have not seen anyone else use a similar approach.
Now, we’ll really get math intensive:
4. Regularized Adjusted Plus/Minus
The last approach I have seen in public is the Regularized Adjusted Plus/Minus (RAPM) first proposed by Joe Sill and presented at the MIT Sloan Sports Analytics Conference. This approach takes advantage of a mathematical method known as Tikhonov Regularization or Ridge Regression. The method essentially adds a penalty factor to the regression for results being far away from the mean. This penalty factor, called lambda, is chosen based on cross validation, usually K-fold cross validation. The data is broken into a number of segments (folds) and 1 at a time is removed and various choices for lambda explored. The penalty factor is chosen such that maximum out-of-sample accuracy is attained. This should remove most collinearity/noise, but lose only a small amount of the validity of the measure–except, one major issue is that because all players are regressed toward league mean, players with few minutes are considered average, and to rate out really badly, a player must have had a bunch of minutes to verify that the player was indeed that bad.
Jeremias Engelmann’s site stats-for-the-nba.appspot.com contains several versions of RAPM for different purposes. He’s got 1 year RAPM for the last 5 or 6 years, a 3.x year rating that has best out-of-sample accuracy for this year, some longer term ratings, ratings for the Euroleague, and ratings for several stats (such as rebounding). The framework is all RAPM. There are a few issues, as mentioned above: the regression of rarely-used players to league mean is a big one, the lack of aging adjustment for multi-year ratings another, and not weighting multi-year ratings towards recency for best predictive power a third. APBRmetrics threads are here and here.
Other Potential Variations on APM
I think those 4 stabilization variations are the ones that I have seen in the public domain. There’s probably significantly more accurate/intricate models in NBA teams’ hands. Winston, Rosenbaum, Lewin, Ilardi, Barzilai, Witus, and Sill are all either working for teams now or have worked for teams in the past.
There are a number of potential avenues for research that have not been explored.
- One area is team lineup APM. BasketballValue.com runs lineup APM’s, but due to small sample sizes and the tremendous number of lineups/variables used, using one of the stabilizing techniques are absolutely required to get good results. Many lineups play just a few minutes a year, so lumping a bunch of the less-used lineups together may be necessary. Wayne Winston also works with lineup APM’s quite a lot on his blog; I will note Winston often does not seem to respect enough the small sample size/error issues with his research.
- Another major area would be looking at player pairs, trios, quartets, etc. to investigate synergistic effects between players. Again, sample size issues are likely a risk here.
- Bayesian Adjusted Plus/Minus, which is a generalization of the RAPM approach, has been talked about but has not been implemented in the public domain. Whereas RAPM regresses toward league average, Bayesian APM could regress toward any orthogonal prior–toward a value based on any sort of independent information. For instance, we could regress toward a value suggested by a player’s playing time, adjusted for how good his team is. Or, we could regress toward a Statistical Plus/Minus rating. Or–well, the possibilities are endless. The key to Bayesian Adjusted Plus/Minus is proper validation of the methods used to avoid over-fitting the data.
- Another area for research is the handling of positions–do some players play better in some positions than others? The collinearity when researching this gets even worse, but it is an area that needs investigation.
- As mentioned several times above, using aging adjustments to pre-process data looks like a promising approach for using long-term data sets to yield more of a point estimate (i.e. What’s KG like right now?)
- A seemingly untouched area involves projections based on adjusted plus/minus numbers. Once the difficulties with APM calculations have been ironed out, projections and good aging curves are a key area to study. I discussed this subject once on the APBRmetrics forum, but never got very far.
I’m sure there are some other areas of potential research that I have overlooked, but those are what came to mind.
This wraps up my review of Adjusted Plus Minus and Stabilization Techniques. I would like to make this a working document, so feel free to contact me to add something, point out an area that needs to be written better, or any other comment. I hope this proves a useful reference to everyone in the advanced basketball stats community!