APBRmetrics

Dan Rosenbaum · Posted: Mon Nov 13, 2006 7:46 pm Post subject:

Thanks to all; the point that I am making here is very subtle and I learn a lot hearing your reactions.

Similar to how Ed put it, Adjusted plus/minus ratings present "facts" about how teams play when particular players are on the floor. Now it is an open question when adjusted plus/minus ratings have good predictive power, but it is really hard to question their usefulness as a measure of "reality" that we can use to ground our player ratings and so much more.

DLew · Joined: 13 Nov 2006 Posts: 3

I agree with Dan that adjusted plus-minus is a fact, a statement of what has occcurred, to a much greater degree than Dean is giving it credit for. The test you propose--"Now that we do have play-by-play level data, why don't we look at different combinations of players and their net +/- (unadjusted) and see how close the different measures of value do for those combinations?"--will by definition say that adjusted plus-minus ratings are the most accurate because that is what the adjusted plus-minus regression did, it found THE most accurate way to predict how a team will do with certain players on the floor.

Adjusted Plus-Minus ratings have their limitations in individual cases but it is an indisputable fact (unless you prefer some other method of regression to the generally accepted least squares) that they are the most accurate descriptor of how a team did with a player/players on the court.

Ed Küpfer · Joined: 30 Dec 2004 Posts: 522 Location: Toronto

DeanO: I think you're mistaking my aim. I was looking to defend regression as the baseline against which models should be compared, as opposed to using it as a method to estimate value. (Your writings over the years have created in me a great fear of using regression to estimate value.)

Let me be more specific: I don't even see regression as a model of reality in this context, which is why I used the word "description" -- regression is a decription of a set of predictor variables and a repsonse in the same way the mean is a description of a single variable. Means and regression are even caluclated using the same principles, by minimising squared errors. I see the regression slope as a single number description of how the variables covary, not as a model of how the variables interact. To the extent that a mean describes the variable, we can accept regression results as equally descriptive.

That was horrible. What I mean is that regression should be a baseline precisely because it isn't a model of basketball.
_________________
ed

HoopStudies · Posted: Mon Nov 13, 2006 10:06 pm Post subject:

HoopStudies · Posted: Mon Nov 13, 2006 10:22 pm Post subject:

Dan Rosenbaum · Posted: Mon Nov 13, 2006 10:46 pm Post subject:

admin · Posted: Mon Nov 13, 2006 11:26 pm Post subject:

HoopStudies · Posted: Tue Nov 14, 2006 7:38 am Post subject:

mikez · Joined: 14 Mar 2005 Posts: 2

I think this (very interesting and valuable) thread has turned into a bit of a semantic argument, using two words (“descriptive” and “predictive”) to describe what is actually a 3-step process for using plus/minus statistics.

The first step is clearly descriptive: Assume for a second that during 48 minutes when Brian Scalabrine was on the court, the Celtics outscored their opponents by 10 points. (Numbers completely made up – this is a completely theoretical post that does not rely on any real data). If this is true, it is simply a fact that over the time when Scal was in the game, the Celtics scored 10 points more than their opponents per 48 minutes played. This, I believe is the “descriptive” fact of plus/minus that Ed is referring to, at least partially – it is simply an observation of the real world, without any manipulation of data other than the point of view of observation (i.e. the selection of Scal and the period of time observed). So I would say this step is entirely descriptive.

The second step, as I see it, is actually ascriptive. We ask what portion (if any) of the 10 point advantage was gained due to Scal, as opposed to other players. This step is not predictive, in that we’re not yet trying to say what effects Scal will have on games in the future. Instead we’re merely trying to look at our description of the world and say something about the causes of the real world effects we’ve seen. Perhaps Scal caused the 10 point differential by playing good defense and hitting a 3 or two. Perhaps he was playing with Paul Pierce the whole time, never touched the ball, never set a pick, and was guarding someone who just stood in the corner, thus having no effect on whether there was a 10 point differential or not. Perhaps he had a negative effect that was outweighed by the positive effect of others. Whatever. This step is where the noise in the plus/minus data becomes particularly problematic, since players don’t play an equal amount of time with (and against) every possible lineup – if Scal has only played with Pierce, we’ll have difficulty isolating Scal’s effects from Pierce’s. There are some excellent econometric techniques for dealing with these things, but in the end this step, where we try to ascribe portions of an observed outcome to the effect of each of several inputs, is by far the most difficult part of using plus/minus analysis in basketball, simply because of the noise in the particular data that we’re using. One interesting method for doing this ascription is, of course, to use other observations (for example box score stats) to assist in deciding how to credit the +/- stats, as Dan has done. But in the end crediting the observed outcome to particular players and/or actions that’s all we’re trying to do in the second step.

The third step is entirely predictive – it says “If Scal had this effect in the past, what does that say about what effects he might have in the future?” This is where we would hope that our ascription of past descriptive data would show some high correlation with future descriptive observations. I think the point that Dean’s trying to make is that the goal of player-rating systems is generally to find some ascriptive process that successfully generates such a correlation. If you can’t show this sort of correlation, then either (a) you’re not doing a perfect job of ascribing the observations to particular input factors (i.e. players), or (b) it turns out that the past observations just simply aren’t correlated with future ones – no matter how well we ascribe the outcomes to the effects of individual players, it won’t say anything about their effects in the future.

I think all of us in this business believe (b) just CAN’T be entirely true, or else we wouldn’t be doing this stuff. So we’re left with (a).* Dan wants to argue that Dean is confusing the “predictive” power of plus/minus-based ratings with their “descriptive” power, but Dean says (at least I think he says – correct me if I’m wrong), that since we believe players’ past performance has some correlation with their future performances, one of best ways to check to see if we’re ascribing past effects correctly should be to check to see if the ascription generates a set of data that correlates well with future performance. I agree with this.

It is true, as Dan seems to be arguing, and as I say in my “first step,” above, that you can’t deny the descriptive fact of the plus-minus statistic: the team with Scal was (in my example) +10 over the 48 minutes played. But when we’re doing a player “rating,” it is important to remember we’re really doing two other things besides just (1) describing an observation, in terms of stating exactly what our senses observe about the world: We’re then (2) using data-manipulation methods to ascribe credit for portions of the observed “descriptive” outcome to a particular set of players or actions, and (3) using the resulting ascription to predict something about the future. I think (Dean, correct me if I’m wrong) that Dean’s point is just that since we all believe that past performance is an indicator of future results in basketball, at least to some extent, and since there may not be another great way to see if a particular ascription is accurate in parceling out credit for the descriptive facts we observed, why wouldn’t you use (3), the predictive power of the output from a particular ascription, to determine how good you are at (2), parceling out credit? I’m guessing that Dan will agree with this point and that this discussion will turn out to have been purely semantic over the meaning of “predictive” and “descriptive.” But maybe not…

(Of course, the other reason to care about predictive power is that those of us who are not just trying to see who won last week’s fantasy matchup only care about past performance insofar as it helps us predict future performance. But that’s a separate point from arguing about whether plus/minus stats have “predictive” or “descriptive” power.)

-MZ (taking no public position on Scal’s actual overall value, of course)

* Of course, it’s really more complicated than this – a particular ascription may have certain effects for which it is predictive, and other effects for which it’s not, and a different ascription may predict other effects accurately, but some of the first ones inaccurately, and there are some random effects that no ascription could predict. So (b) is of course partially true, for any given ascription. But for the sake of this post, assume that either past performance is predictive of future results, or it’s not.

HoopStudies

Ed Küpfer · Joined: 30 Dec 2004 Posts: 522 Location: Toronto

Dan Rosenbaum · Posted: Tue Nov 14, 2006 3:27 pm Post subject:

Thanks again for a fascinating discussion. I especially love it when I have to go the dictionary to figure out what a word means – in this case “ascriptive.” For anyone else who might be vocabulary-challenged, Merriam-Webster defines ascribe as “inferring or conjecturing of cause, quality, authorship.”

That is a perfect word to nail down the crux of the issue here. I am arguing that adjusted plus/minus ratings can be used in such a way that there is no inference of causation. They can be used in a purely descriptive sense. When used for descriptive purposes perhaps we should not refer to them as “ratings,” since ratings imply attribution. Maybe they should be referred to as adjusted plus/minus statistics.

Economists have a long history of running regression models – sometimes quite complex regression models – that are purely descriptive. In fact, there is a large strain of economists who argue that the vast majority of regressions – including almost all OLS regressions – are purely descriptive. I do not agree with that claim, but I do believe that regressions can be purely descriptive.

So there is nothing inherent in running a regression model that implies an adjusted plus minus statistic is saying anything more than that when Big Bird was in the game, the team was 3 points better (or worse) than when an average player was in game, holding constant the quality of his teammates and opponents. That statement is just a description – a complicated description – but a description nonetheless. It does not say that Big Bird was the “cause” for the team being 3 points better; it is just saying that it was.

Now, of course, different regression models will lead to different descriptions, but they are still just descriptions. If I measure team efficiency weighting clutch time more heavily than non-clutch time or recent games more heavily than earlier games, my rankings by team efficiency will differ from others who do not do those same things. But my measure of team efficiency is still descriptive; I am just choosing to describe things in a way differently than most people do.

I suppose we could use these different ways of measuring team efficiency (or even team wins) to predict future team efficiency (or team wins) and validate team efficiency that way. But even if team efficiency (or team wins) weren’t very predictive, would we reject them as a good way to describe team performance? I don’t think we would.

I am just making the same argument for adjusted plus/minus statistics. Like with team efficiency, I think their usefulness in describing the past is a separate issue from whether or not they are useful in predicting the future. (I guess an exception could be if they were soooo noisy that they provided practically no useful information; then they would have no usefulness for describing the past. But even though adjusted plus/minus statistics are noisy, they are not that noisy.)

I would imagine that in describing the past, it would probably make sense to compute adjusted plus/minus statistics over short time periods, such as just one season. But if we want to infer causality and predict the future, we probably want to move to longer time periods and/or incorporate box score data (although still rooted in adjusted plus/minus statistics). When I do this, I find that predicting future adjusted plus/minus statistics with box score-based metrics is worse than predicting using a rating that combines (a) adjusted plus/minus ratings and (b) box-score based statistics rooted in adjusted plus/minus statistics.

But in my mind that does not validate using adjusted plus/minus statistics as a barometer. It only validates that method of predicting using this combination of adjusted plus/minus ratings and box score statistics.

So I guess I am disagreeing with Mike here. I do not think this argument is semantic, and I do not believe that using adjusted plus/minus ratings to predict the future does much to validate their use as a descriptor of the past. (I think adjusted plus/minus statistics are validated in much the same way as team efficiency or team wins – that is, through theory.) In fact, in the way I have tried to carve out separate niches for adjusted plus/minus statistics and adjusted plus/minus ratings, I could imagine that they likely should be computed in different ways. Their purposes are different and so their formulations likely should be as well.

Hopefully, this does a better job explaining my view on this. I have learned a ton through this discussion, and I hope others have as well.

John Quincy · Joined: 01 Feb 2005 Posts: 174

I must say that lots of this is over my head, but I find it very interesting. I especially appreciate guys like DeanO and Dan who have jobs with teams but still take the time to add to discussions on the board.

Harold Almonte · Joined: 04 Aug 2006 Posts: 40

I can understand for today wheather forecasting is necessary every day sharper supersoftwares, a lot, lot of data, and complex stats methods (and some satelites). But, is already necessary for today basketball (maybe for Vegas), more than an aerial approach picture from a little less aerial two lines formula-rating based on reality limited box score data?..If the answer is yes, I think everybody knows better than me, it´ll be needed first a more "descriptive" data than boxscore and raw +/-. Play by play video "ascriptive" aided data plus supersoftwares is the next. But is in the "ascriptive" part where the trouble is. You can´t pretend that only with global stats evidence and tendencies (I don´t agree with WOW here), without some expert scouting and coaching aid (I mean 4 different eyes and a camera watch better than 2, and they don´t need a thousand years) you can create the supermetrics you want to. The stat tricks box could be emptied one of this days.