31 Comments

  1. Great review, Daniel!

    I want to make sure I understand how you would calculate “lineup level” RAPM. Am I right that instead of having P1+P2+…-P9-P10 + HCA, you would simply have something like U1-U2+HCA, where U1 and U2 are the two units? If so, how many unique units are there in the league? It looks to me like each team has on the order of 100 or so. Therefore, you’d still have the same 60,000 equations or so, but now you’d have 3000 unknowns. Is that right?

    • DanielM

      Yes, you have the construction correct. There are more than 3000 lineups, though. 12,408, to be exact, most of which have played very few possessions. Some idealization must be done–the approach I have taken is to roll all lineups for each team that have below a threshold number of possessions into a single variable–“LAL_ETC” for instance. If you set the threshold at 70 possessions, you’re down to like 600 unknowns, which is manageable. You lose some validity, but otherwise most of the lineups are regressed completely to 0 or whatever and validity is lost that way as well.

      • Interesting. It seems to me an alternative approach (maybe better, maybe worse) is instead of rolling all those units into one “etc” unit, roll them into the most used units by some sort of similarity measure. What do you think?

        • DanielM

          Most of the lesser used units use 3, 4, or 5 bench players. In my initial runs, they were always way negative (except a team or two). I don’t want to roll them in with the primary units, which I want to measure more accurately.

  2. [...] A week ago, I discussed overrated and underrated players over the past 12 years, evaluated by comparing popular box-score stats with Jeremias Engelmann’s 12 year average Regularized Adjusted Plus/Minus (RAPM) dataset. For a primer on the virtues of RAPM (in large samples), see my article reviewing the state-of-the-art of adjusted plus/minus and stabilization. [...]

  3. […] I used the xRAPM (Expected Regularized Adjusted Plus/Minus) ratings from Jeremias Engelmann’s stats-for-the-nba.appspot.com to take a look at some of the career trends for NBA players through superstars. It uses play-by-play data, box score data, a height adjustment, and some advanced statistical techniques to evaluate all parts of a player’s game, from simpler things like scoring and rebounding to the more complicated like spacing, pick setting, and defensive rotations. It then puts a numerical value on how the player affects a team’s points per 100 possessions and points allowed per 100 possessions, relative to the league average. For more detail on plus/minus metrics see Daniel M’s explanatory post. […]

  4. […] We can add a lot of complexity to this basic framework. We could change the unit of analysis, include other control variables (for coaches, playing back-to-back games and etc.) and use weighting in the regression to reflect that longer stints give us more/better information. Another common tweak is to use a technique called ridge regression to adjust for collinearity problems. Collinearity arises when two players on the same team always play with each other or never play with each other. Recall the toy example from above. What if P1 and P2 were never on the court together? This would lead to us having 3 variables and only 2 equations.3 If you remember your algebra, that means we can’t solve the system of equations. Something similar can happen with our regression, leading to biased (wrong) coefficients. Ridge regression shrinks coefficients to correct for some of this bias, while introducing bias of its own. On the whole, ridge regression coefficients are ‘more correct.’ When an APM uses ridge regression, as most modern ones do, it might be called RAPM, for ‘regularized adjusted plus-minus.’ Daniel Myers has a nice explanation of how collinearity can bias results, along with a bit more history for APM, on his…. […]

Leave a Reply

Your email address will not be published. Required fields are marked *

Current ye@r *