Joined: 30 Dec 2004 Posts: 410 Location: Near Philadelphia, PA
Posted: Mon Nov 13, 2006 7:39 pm Post subject:
Ed Küpfer wrote:
ziller wrote:
Is this what you are now proposing, then, that adjusted +/- is an effective barometer of player rating systems?
Regression is a funny thing. There are limits to its usefulness, but one of the benefits is its pragmatism: it describes what actually happened within the range of its data set, and given a certain amount of error. If a player is a +70 over the season, that isn't just an esimate of his worth, that is an exact description of what took place. If you adjust for oncourt personnel (using regression) and find that player is a +2 per 100 possession, then that is a description of what actually took place on the court (within statistical error). Now, an analyst can look at these results and say there's an explanantion for why this doesn't properly describe the player's value -- and this explanation may really be true and make sense. But the point is that the regression itself is not an interpretation, it's a description. This is what makes it a good baseline against which to test other measures.
Never sure how to answer such broad brushstrokes. So I'll take a couple stabs.
Stab 1
Two things:
1. Value is subjective. It becomes objective when you give it an objective. The "objective" of adj +/- is marginal net pts contributed by a player per 100 possessions.
2. Any model is a description, using Ed's word, or an estimate of the reality or objective we're aiming for.
Stab 2
We can describe the game in many many ways. But many people are not thinking about stats as descriptions, rather as a way to get a "true measure of value" or a "rating system". For example, People ask me "what's better - TS% or eFG%" - and I always say that it depends on what you're trying to do. True Shooting Percentage isn't inherently better than Effective Field Goal Percentage. Nor vice versa. They are descriptions of exactly what their formulas capture. If you're trying to evaluate a player's ability contribute "from the field" in one number, TS% makes more sense because "the field" implies some ability to get to the line. If you're just looking at ability to make a shot from the field, then eFG% is better. I use one more than the other, but does that make it better? Personally, yeah. But generally, no. I converse in both and don't preach either.
Along this line, we used to have the argument of using possessions or plays (or minor possessions) -- which was better? I believe that I always said that it was simpler to describe the game with possessions than with plays. And simplicity won out, which leads me to...
Let me put out my rules for evaluating models in the order of weight.
1. Reality. Is the model evaluating reality well?
2. Simplicity. Is the model simple?
3. Conservatism. If you're using a model to make a decision, is that model set up to support the conventional wisdom, so that if it gives results that don't support it, you can describe the degree you went to to avoid going against the CW?
4. Consistency. Is the model consistent with what is in practice?
Reality and simplicity are HUGE. Science is built around the simplest description of reality. But reality is where we often get into trouble. Reality is often defined in science as honoring physical principles based on prediction of closely controlled experiments. We don't have that. Economics doesn't have that. So we have Dave Berri talking about 70-80% of variance of individual basketball performance being described by their previous year. He's saying that is providing evidence of "reality". Dan is suggesting another test of "reality", I believe. I can't say I fully understand why adj +/- is a barometer, but other tests are vital.
So is regression a description? I'd say it's a model subject to the above evaluation criteria. The regression model behind adj +/- is trying to describe marginal net points contributed historically by an individual over some specified period of time.
- Is it realistically doing so? Yes, I think so. You can go through and find questionable aspects about its description of reality - mainly its assumption that player values are additive - but these seem minor.
- Is it simple? At its most basic, yeah, its simple. Adding in different aspects - for clutch time, for positional offsets, for temporal variations - make it less than simple and potentially even unrealistic.
- Conservatism and Consistency are policy decisions in using results from the adj +/- model. Those are things that Danny Ferry should care about, not us in a general discussion.
Description implies to me an undeniable fact (though, having gone through a political season, it's clear than undeniable facts can be very disproportionately emphsized to give the perception that reality is different). With Danval yielding different results than Winval and AaronVal, I'd say it's not a description, but a model or analysis.
Anyway, this leads me to a test that may be better. Now that we do have play-by-play level data, why don't we look at different combinations of players and their net +/- (unadjusted) and see how close the different measures of value do for those combinations? I think that gets away from the adjusted (and controversial) part of +/- and provides another test. We can take the per-minute values of whatever measures and multiply by minutes played in that lineup to come up with expected wins or, equivalently, net points and we'll have actual undeniable (almost, see below) data to compare against.
This isn't a perfect test. A lineup that is +17 over the last 5 minutes of a game they still lose by 15 still really isn't piling up wins. The only way I can see to account for this is to note the approximate odds of winning/losing over the period that the numbers are posted. Some lineups will typically be used at times when the odds reflect garbage time.
Also, some lineups face "better" lineups. Starters face starters, but let's say a rating system says a sub should be starting, based on their measure. If that players lineups don't perform as well as that person's measure says even against the subs he usually faces, that kind of suggests the measure may overrate that guy.
Further, multiple systems can pass this test. Even further, people have to choose whether they use career rating values or just last year's. Last year's should do best predicting last year's lineup combo's, but probably not if we're looking over several years of lineups.
Even further, Danval is built upon some of these principles (clutch time is worth so much, garbage time is worth nothing, many years are better than just one year), so Dan could set up a test that Danval would win over Winval or Aaronval (multi-year test with garbage time getting thrown out and extra weight on clutch time, for instance). I think the question is how close do some of these measures actually get, regardless of the details of how it's set up...
And, finally, the data isn't undeniable. How you credit points scored on a foul call when a guy goes out of the game is somewhat subjective. This is fairly minor. _________________ Dean Oliver
Author, Basketball on Paper
http://www.basketballonpaper.com
Last edited by HoopStudies on Mon Nov 13, 2006 7:53 pm; edited 1 time in total
Joined: 03 Jan 2005 Posts: 413 Location: Greensboro, North Carolina
Posted: Mon Nov 13, 2006 7:46 pm Post subject:
Thanks to all; the point that I am making here is very subtle and I learn a lot hearing your reactions.
Similar to how Ed put it, Adjusted plus/minus ratings present "facts" about how teams play when particular players are on the floor. Now it is an open question when adjusted plus/minus ratings have good predictive power, but it is really hard to question their usefulness as a measure of "reality" that we can use to ground our player ratings and so much more.
I agree with Dan that adjusted plus-minus is a fact, a statement of what has occcurred, to a much greater degree than Dean is giving it credit for. The test you propose--"Now that we do have play-by-play level data, why don't we look at different combinations of players and their net +/- (unadjusted) and see how close the different measures of value do for those combinations?"--will by definition say that adjusted plus-minus ratings are the most accurate because that is what the adjusted plus-minus regression did, it found THE most accurate way to predict how a team will do with certain players on the floor.
Adjusted Plus-Minus ratings have their limitations in individual cases but it is an indisputable fact (unless you prefer some other method of regression to the generally accepted least squares) that they are the most accurate descriptor of how a team did with a player/players on the court.
DeanO: I think you're mistaking my aim. I was looking to defend regression as the baseline against which models should be compared, as opposed to using it as a method to estimate value. (Your writings over the years have created in me a great fear of using regression to estimate value.)
Let me be more specific: I don't even see regression as a model of reality in this context, which is why I used the word "description" -- regression is a decription of a set of predictor variables and a repsonse in the same way the mean is a description of a single variable. Means and regression are even caluclated using the same principles, by minimising squared errors. I see the regression slope as a single number description of how the variables covary, not as a model of how the variables interact. To the extent that a mean describes the variable, we can accept regression results as equally descriptive.
That was horrible. What I mean is that regression should be a baseline precisely because it isn't a model of basketball. _________________ ed
Joined: 30 Dec 2004 Posts: 410 Location: Near Philadelphia, PA
Posted: Mon Nov 13, 2006 10:06 pm Post subject:
DLew wrote:
I agree with Dan that adjusted plus-minus is a fact, a statement of what has occcurred, to a much greater degree than Dean is giving it credit for. The test you propose--"Now that we do have play-by-play level data, why don't we look at different combinations of players and their net +/- (unadjusted) and see how close the different measures of value do for those combinations?"--will by definition say that adjusted plus-minus ratings are the most accurate because that is what the adjusted plus-minus regression did, it found THE most accurate way to predict how a team will do with certain players on the floor.
Adjusted Plus-Minus ratings have their limitations in individual cases but it is an indisputable fact (unless you prefer some other method of regression to the generally accepted least squares) that they are the most accurate descriptor of how a team did with a player/players on the court.
I absolutely agree that they should be the best. But how different are the 3 different versions we have from each other? How different are they when you use adj +/- from multiple years to make projections on one year? Or use adj +/- from one year to project over multiple years? And how much worse are the others? Will they also be unbiased, but more noisy? How much more noisy? I personally would find all this pretty interesting.
But if adj +/- were fact, shouldn't there be just one of "THE most accurate" methods? Are you saying that we shouldn't run the test I propose? _________________ Dean Oliver
Author, Basketball on Paper
http://www.basketballonpaper.com
Joined: 30 Dec 2004 Posts: 410 Location: Near Philadelphia, PA
Posted: Mon Nov 13, 2006 10:22 pm Post subject:
Ed Küpfer wrote:
DeanO: I think you're mistaking my aim. I was looking to defend regression as the baseline against which models should be compared, as opposed to using it as a method to estimate value. (Your writings over the years have created in me a great fear of using regression to estimate value.)
I kinda got that -- and I know enough of your writing to be confident that you understand its use -- but I didn't fully get it.
Ed Küpfer wrote:
Let me be more specific: I don't even see regression as a model of reality in this context, which is why I used the word "description" -- regression is a decription of a set of predictor variables and a repsonse in the same way the mean is a description of a single variable. Means and regression are even caluclated using the same principles, by minimising squared errors. I see the regression slope as a single number description of how the variables covary, not as a model of how the variables interact. To the extent that a mean describes the variable, we can accept regression results as equally descriptive.
Also keep in mind that -- and I hate to say this -- an average is essentially a model of belief. Or being less obtuse, there are lots of different types of averages -- linear, harmonic, geometric, sabermetric. An average or a regression result is a summary of data. Using it to make forecasts, in some way, makes it a model. OLS is a linear model built upon a Gaussian assumption. I worked with people who felt that the world was not indeed Gaussian and felt that too much time was spent using Gaussian tools for problems that deserved non-Gaussian solutions. I felt they went a bit too far at times, but their point did get to me. There is often lack of checking of the assumptions underlying a regression. There can be overspecification (when I was taught that fitting data with an nth order polynomial was stupid, I learned a lot) or underspecification. There can be overgrouping (grouping bimodal data can make something looking pretty Gaussian) or undergrouping of data in order to get Gaussian data...
That being said, the assumptions underlying a linear regression are fairly safe with what we're doing.
All this math stuff is hard to say. I don't know if I reflect what you're trying to say or if I had too much wine at dinner, but I know I'm gonna get some sleep now... _________________ Dean Oliver
Author, Basketball on Paper
http://www.basketballonpaper.com
Joined: 03 Jan 2005 Posts: 413 Location: Greensboro, North Carolina
Posted: Mon Nov 13, 2006 10:46 pm Post subject:
HoopStudies wrote:
DLew wrote:
I agree with Dan that adjusted plus-minus is a fact, a statement of what has occcurred, to a much greater degree than Dean is giving it credit for. The test you propose--"Now that we do have play-by-play level data, why don't we look at different combinations of players and their net +/- (unadjusted) and see how close the different measures of value do for those combinations?"--will by definition say that adjusted plus-minus ratings are the most accurate because that is what the adjusted plus-minus regression did, it found THE most accurate way to predict how a team will do with certain players on the floor.
Adjusted Plus-Minus ratings have their limitations in individual cases but it is an indisputable fact (unless you prefer some other method of regression to the generally accepted least squares) that they are the most accurate descriptor of how a team did with a player/players on the court.
I absolutely agree that they should be the best. But how different are the 3 different versions we have from each other? How different are they when you use adj +/- from multiple years to make projections on one year? Or use adj +/- from one year to project over multiple years? And how much worse are the others? Will they also be unbiased, but more noisy? How much more noisy? I personally would find all this pretty interesting.
But if adj +/- were fact, shouldn't there be just one of "THE most accurate" methods? Are you saying that we shouldn't run the test I propose?
DeanO, I don't think you are getting the point made here. On past data adjusted plus/minus ratings would always better predict than non-adjusted plus/minus ratings. And the adjusted plus/minus rating that would best predict would be the one that most narrowly defined the period over which the adjusted plus/minus ratings were measured. For example, ratings re-estimated every month would do better than ratings that we re-estimated every year or every two years.
The differences between adjusted plus/minus ratings lie in how clutch/non-clutch time is weighted and how changes over time in player value are handled. (I don't know what you are talking about with positional adjustments. Positions are not a part of my adjusted plus/minus ratings.) But these differences are all about coming up with a rating that best predicts the future. And no one is arguing that adjusted plus/minus ratings are the best predictor of the future.
The argument here is that they are the best arbiter of the past, which can help us think about how to better predict the future.
And yes, WINVAL and my adjusted plus/minus ratings do differ (I think Aaron would tell you that his metric is not yet a full adjusted plus/minus measure), but that is only because we are describing different things, i.e. we differ in how much we value clutch/non-clutch time and/or we differ in how we want to handle changes over time in player value. But I can run adjusted plus/minus ratings in a million different ways and as long as I have not screwed up my data, the same story comes out in terms of how valuable rebounds are relative to assists relative to points. How individual players are rated changes, but what these ratings say about different rating systems (or about the average impact of players with particular statistics) changes very little.
The only thing that changes things a little bit is if I put a lot of value on clutch time play, since the way player qualities impact the game is a little different in clutch time versus non-clutch time.
DeanO, I think you are confounding the predictive power of adjusted plus/minus ratings with their descriptive power.
Net efficiency at the team level (offensive minus defensive efficiency) is a "fact" that we use to gauge our various metrics. It is not the only one or in all cases the best gauge, but no one would disagree that it is a useful barometer.
But suppose net efficiency in the past did not predict wins in the future very well. Would we then reject it as a barometer? No, because the descriptive and predictive uses are not the same. Net efficiency may be a great descriptor and terrible predictor. (It turns out that it is pretty good at both.)
And this is despite the fact that different people measure net efficiency differently. Those differences are relatively minor and would not result in significantly different valuations of the important of shooting, rebounding, avoiding turnovers in net efficiency.
That is the story with adjusted plus/minus ratings. They are great as a description of the past. Their value as a predictor of the future is greatly limited by the noisiness of the ratings. But that is a separate issue and does not take away from their value as a barometer.
But if adj +/- were fact, shouldn't there be just one of "THE most accurate" methods?
I don't think I buy this argument, Dean. We agree that Offensive and Defensive Ratings are fact, right? And they are the most accurate method for rating an offense or defense, right?
Yet if you look at KnickerBlogger's stat page and B-R.com, you'll see different Offensive and Defensive Ratings, because the two sites calculate possessions slightly differently.
Frankly, if adjusted plus-minus wasn't proprietary, we might not have different methods for calculating it.
Joined: 30 Dec 2004 Posts: 410 Location: Near Philadelphia, PA
Posted: Tue Nov 14, 2006 7:38 am Post subject:
Dan Rosenbaum wrote:
HoopStudies wrote:
DLew wrote:
I agree with Dan that adjusted plus-minus is a fact, a statement of what has occcurred, to a much greater degree than Dean is giving it credit for. The test you propose--"Now that we do have play-by-play level data, why don't we look at different combinations of players and their net +/- (unadjusted) and see how close the different measures of value do for those combinations?"--will by definition say that adjusted plus-minus ratings are the most accurate because that is what the adjusted plus-minus regression did, it found THE most accurate way to predict how a team will do with certain players on the floor.
Adjusted Plus-Minus ratings have their limitations in individual cases but it is an indisputable fact (unless you prefer some other method of regression to the generally accepted least squares) that they are the most accurate descriptor of how a team did with a player/players on the court.
I absolutely agree that they should be the best. But how different are the 3 different versions we have from each other? How different are they when you use adj +/- from multiple years to make projections on one year? Or use adj +/- from one year to project over multiple years? And how much worse are the others? Will they also be unbiased, but more noisy? How much more noisy? I personally would find all this pretty interesting.
But if adj +/- were fact, shouldn't there be just one of "THE most accurate" methods? Are you saying that we shouldn't run the test I propose?
DeanO, I don't think you are getting the point made here. On past data adjusted plus/minus ratings would always better predict than non-adjusted plus/minus ratings. And the adjusted plus/minus rating that would best predict would be the one that most narrowly defined the period over which the adjusted plus/minus ratings were measured. For example, ratings re-estimated every month would do better than ratings that we re-estimated every year or every two years.
Actually, this is exactly what I do believe. Exactly, precisely, I think I said this "Last year's should do best predicting last year's lineup combo's..."
And my point is, with those different estimates, based on different lengths of time, how different are the different adj +/- at evaluating the past? If you use a 3yr adj +/-, is it worse than a one year Wins Produced at evaluating lineups (since Dave uses previous year Wins Produced to project the future)? Is it better, by how much?
Dan Rosenbaum wrote:
The argument here is that they are the best arbiter of the past, which can help us think about how to better predict the future.
But which one is the best? And how different are they from one another given different data bases?
Dan Rosenbaum wrote:
DeanO, I think you are confounding the predictive power of adjusted plus/minus ratings with their descriptive power.
We often use history as a test of predictive power. You're saying you modify what you do to improve predictive power. Look at the past as an example of making a prediction, not just a description. Don't use that period of time to calibrate your model, then make a prediction in that period of time. Do the same with other metrics, or not. Much of Dave's argument is predictive ability. Test it with something unambiguous.
Dan Rosenbaum wrote:
Net efficiency at the team level (offensive minus defensive efficiency) is a "fact" that we use to gauge our various metrics. It is not the only one or in all cases the best gauge, but no one would disagree that it is a useful barometer.
But suppose net efficiency in the past did not predict wins in the future very well. Would we then reject it as a barometer? No, because the descriptive and predictive uses are not the same. Net efficiency may be a great descriptor and terrible predictor. (It turns out that it is pretty good at both.)
And this is despite the fact that different people measure net efficiency differently. Those differences are relatively minor and would not result in significantly different valuations of the important of shooting, rebounding, avoiding turnovers in net efficiency.
That is the story with adjusted plus/minus ratings. They are great as a description of the past. Their value as a predictor of the future is greatly limited by the noisiness of the ratings. But that is a separate issue and does not take away from their value as a barometer.
But the lack of controversy is associated with the team version of net efficiency, not the individual. Regardless of what possession formula we use, the net efficiency is about the same at the team level. We know individual ratings change with context, but not team ratings. We know that you can get different __val ratings from the same team lineup data (for reasons you listed above). All I'm saying is that we should use the basic undeniable lineup data to test ratings that aren't even built from it.
Am I missing something obvious? Do we know already that Danval based on 2003 data will miss 2005 lineups by 3 pts/100 poss, but Danval based on 2002-2004 data will miss 2005 lineups by 1.5 pts/100 poss? Did I miss that literature? _________________ Dean Oliver
Author, Basketball on Paper
http://www.basketballonpaper.com
Posted: Tue Nov 14, 2006 12:07 pm Post subject: "Predictive" vs "descriptive"
I think this (very interesting and valuable) thread has turned into a bit of a semantic argument, using two words (“descriptive” and “predictive”) to describe what is actually a 3-step process for using plus/minus statistics.
The first step is clearly descriptive: Assume for a second that during 48 minutes when Brian Scalabrine was on the court, the Celtics outscored their opponents by 10 points. (Numbers completely made up – this is a completely theoretical post that does not rely on any real data). If this is true, it is simply a fact that over the time when Scal was in the game, the Celtics scored 10 points more than their opponents per 48 minutes played. This, I believe is the “descriptive” fact of plus/minus that Ed is referring to, at least partially – it is simply an observation of the real world, without any manipulation of data other than the point of view of observation (i.e. the selection of Scal and the period of time observed). So I would say this step is entirely descriptive.
The second step, as I see it, is actually ascriptive. We ask what portion (if any) of the 10 point advantage was gained due to Scal, as opposed to other players. This step is not predictive, in that we’re not yet trying to say what effects Scal will have on games in the future. Instead we’re merely trying to look at our description of the world and say something about the causes of the real world effects we’ve seen. Perhaps Scal caused the 10 point differential by playing good defense and hitting a 3 or two. Perhaps he was playing with Paul Pierce the whole time, never touched the ball, never set a pick, and was guarding someone who just stood in the corner, thus having no effect on whether there was a 10 point differential or not. Perhaps he had a negative effect that was outweighed by the positive effect of others. Whatever. This step is where the noise in the plus/minus data becomes particularly problematic, since players don’t play an equal amount of time with (and against) every possible lineup – if Scal has only played with Pierce, we’ll have difficulty isolating Scal’s effects from Pierce’s. There are some excellent econometric techniques for dealing with these things, but in the end this step, where we try to ascribe portions of an observed outcome to the effect of each of several inputs, is by far the most difficult part of using plus/minus analysis in basketball, simply because of the noise in the particular data that we’re using. One interesting method for doing this ascription is, of course, to use other observations (for example box score stats) to assist in deciding how to credit the +/- stats, as Dan has done. But in the end crediting the observed outcome to particular players and/or actions that’s all we’re trying to do in the second step.
The third step is entirely predictive – it says “If Scal had this effect in the past, what does that say about what effects he might have in the future?” This is where we would hope that our ascription of past descriptive data would show some high correlation with future descriptive observations. I think the point that Dean’s trying to make is that the goal of player-rating systems is generally to find some ascriptive process that successfully generates such a correlation. If you can’t show this sort of correlation, then either (a) you’re not doing a perfect job of ascribing the observations to particular input factors (i.e. players), or (b) it turns out that the past observations just simply aren’t correlated with future ones – no matter how well we ascribe the outcomes to the effects of individual players, it won’t say anything about their effects in the future.
I think all of us in this business believe (b) just CAN’T be entirely true, or else we wouldn’t be doing this stuff. So we’re left with (a).* Dan wants to argue that Dean is confusing the “predictive” power of plus/minus-based ratings with their “descriptive” power, but Dean says (at least I think he says – correct me if I’m wrong), that since we believe players’ past performance has some correlation with their future performances, one of best ways to check to see if we’re ascribing past effects correctly should be to check to see if the ascription generates a set of data that correlates well with future performance. I agree with this.
It is true, as Dan seems to be arguing, and as I say in my “first step,” above, that you can’t deny the descriptive fact of the plus-minus statistic: the team with Scal was (in my example) +10 over the 48 minutes played. But when we’re doing a player “rating,” it is important to remember we’re really doing two other things besides just (1) describing an observation, in terms of stating exactly what our senses observe about the world: We’re then (2) using data-manipulation methods to ascribe credit for portions of the observed “descriptive” outcome to a particular set of players or actions, and (3) using the resulting ascription to predict something about the future. I think (Dean, correct me if I’m wrong) that Dean’s point is just that since we all believe that past performance is an indicator of future results in basketball, at least to some extent, and since there may not be another great way to see if a particular ascription is accurate in parceling out credit for the descriptive facts we observed, why wouldn’t you use (3), the predictive power of the output from a particular ascription, to determine how good you are at (2), parceling out credit? I’m guessing that Dan will agree with this point and that this discussion will turn out to have been purely semantic over the meaning of “predictive” and “descriptive.” But maybe not…
(Of course, the other reason to care about predictive power is that those of us who are not just trying to see who won last week’s fantasy matchup only care about past performance insofar as it helps us predict future performance. But that’s a separate point from arguing about whether plus/minus stats have “predictive” or “descriptive” power.)
-MZ (taking no public position on Scal’s actual overall value, of course)
* Of course, it’s really more complicated than this – a particular ascription may have certain effects for which it is predictive, and other effects for which it’s not, and a different ascription may predict other effects accurately, but some of the first ones inaccurately, and there are some random effects that no ascription could predict. So (b) is of course partially true, for any given ascription. But for the sake of this post, assume that either past performance is predictive of future results, or it’s not.
Joined: 30 Dec 2004 Posts: 410 Location: Near Philadelphia, PA
Posted: Tue Nov 14, 2006 12:43 pm Post subject: Re: "Predictive" vs "descriptive"
mikez wrote:
I think this (very interesting and valuable) thread has turned into a bit of a semantic argument...
....
-MZ (taking no public position on Scal’s actual overall value, of course)
I believe I agree with everything you said. I would modify this:
mikez wrote:
why wouldn’t you use (3), the predictive power of the output from a particular ascription, to determine how good you are at (2), parceling out credit?
to be
not mikez wrote:
why wouldn’t you use (3), the predictive power of the output from a particular ascription, to determine how good you are at (2), parceling out credit (ascription), using (1), the descriptive data we have?
The errors in using season data relative to using play-by-play level data, as Berri has, to get his wins produced, will be shown with this test, potentially along with other errors. And it wouldn't be described as a correlation, but by an average and a mean-square error. _________________ Dean Oliver
Author, Basketball on Paper
http://www.basketballonpaper.com
Also keep in mind that -- and I hate to say this -- an average is essentially a model of belief. Or being less obtuse, there are lots of different types of averages -- linear, harmonic, geometric, sabermetric. An average or a regression result is a summary of data.
Right. So far so good.
Quote:
Using it to make forecasts, in some way, makes it a model.
But I'm not using it for forecasting. My argument was that it's value lies in description. If I say the mean height of an NBA center is 81 inches, that is an objective description of reality. Any predictive power it may have is completely secondary to my purposes.
Quote:
OLS is a linear model built upon a Gaussian assumption. I worked with people who felt that the world was not indeed Gaussian and felt that too much time was spent using Gaussian tools for problems that deserved non-Gaussian solutions.
I'm with you there. I'm a huge fan of quantile regression, which regresses arbitrary quantiles of the repsonse (medians, for example) on the predictors. There are other regression methods that use non-gaussian distributions (the so-called 'robust' family of regression transformations), and for every measure of central tendency for a univariate distribution, there's probably a regression analogue.
This is getting further and further away from the topic, but my main point is that if you feel comfortable describing a population by any average you choose, you should feel just as comfortable describing the relationship between predictors and a repronse using whatever regression you feel is appropriate, and for the exact same reasons.
mikez wrote:
-MZ (taking no public position on Scal’s actual overall value, of course)
Joined: 03 Jan 2005 Posts: 413 Location: Greensboro, North Carolina
Posted: Tue Nov 14, 2006 3:27 pm Post subject:
Thanks again for a fascinating discussion. I especially love it when I have to go the dictionary to figure out what a word means – in this case “ascriptive.” For anyone else who might be vocabulary-challenged, Merriam-Webster defines ascribe as “inferring or conjecturing of cause, quality, authorship.”
That is a perfect word to nail down the crux of the issue here. I am arguing that adjusted plus/minus ratings can be used in such a way that there is no inference of causation. They can be used in a purely descriptive sense. When used for descriptive purposes perhaps we should not refer to them as “ratings,” since ratings imply attribution. Maybe they should be referred to as adjusted plus/minus statistics.
Economists have a long history of running regression models – sometimes quite complex regression models – that are purely descriptive. In fact, there is a large strain of economists who argue that the vast majority of regressions – including almost all OLS regressions – are purely descriptive. I do not agree with that claim, but I do believe that regressions can be purely descriptive.
So there is nothing inherent in running a regression model that implies an adjusted plus minus statistic is saying anything more than that when Big Bird was in the game, the team was 3 points better (or worse) than when an average player was in game, holding constant the quality of his teammates and opponents. That statement is just a description – a complicated description – but a description nonetheless. It does not say that Big Bird was the “cause” for the team being 3 points better; it is just saying that it was.
Now, of course, different regression models will lead to different descriptions, but they are still just descriptions. If I measure team efficiency weighting clutch time more heavily than non-clutch time or recent games more heavily than earlier games, my rankings by team efficiency will differ from others who do not do those same things. But my measure of team efficiency is still descriptive; I am just choosing to describe things in a way differently than most people do.
I suppose we could use these different ways of measuring team efficiency (or even team wins) to predict future team efficiency (or team wins) and validate team efficiency that way. But even if team efficiency (or team wins) weren’t very predictive, would we reject them as a good way to describe team performance? I don’t think we would.
I am just making the same argument for adjusted plus/minus statistics. Like with team efficiency, I think their usefulness in describing the past is a separate issue from whether or not they are useful in predicting the future. (I guess an exception could be if they were soooo noisy that they provided practically no useful information; then they would have no usefulness for describing the past. But even though adjusted plus/minus statistics are noisy, they are not that noisy.)
I would imagine that in describing the past, it would probably make sense to compute adjusted plus/minus statistics over short time periods, such as just one season. But if we want to infer causality and predict the future, we probably want to move to longer time periods and/or incorporate box score data (although still rooted in adjusted plus/minus statistics). When I do this, I find that predicting future adjusted plus/minus statistics with box score-based metrics is worse than predicting using a rating that combines (a) adjusted plus/minus ratings and (b) box-score based statistics rooted in adjusted plus/minus statistics.
But in my mind that does not validate using adjusted plus/minus statistics as a barometer. It only validates that method of predicting using this combination of adjusted plus/minus ratings and box score statistics.
So I guess I am disagreeing with Mike here. I do not think this argument is semantic, and I do not believe that using adjusted plus/minus ratings to predict the future does much to validate their use as a descriptor of the past. (I think adjusted plus/minus statistics are validated in much the same way as team efficiency or team wins – that is, through theory.) In fact, in the way I have tried to carve out separate niches for adjusted plus/minus statistics and adjusted plus/minus ratings, I could imagine that they likely should be computed in different ways. Their purposes are different and so their formulations likely should be as well.
Hopefully, this does a better job explaining my view on this. I have learned a ton through this discussion, and I hope others have as well.
I must say that lots of this is over my head, but I find it very interesting. I especially appreciate guys like DeanO and Dan who have jobs with teams but still take the time to add to discussions on the board.
I can understand for today wheather forecasting is necessary every day sharper supersoftwares, a lot, lot of data, and complex stats methods (and some satelites). But, is already necessary for today basketball (maybe for Vegas), more than an aerial approach picture from a little less aerial two lines formula-rating based on reality limited box score data?..If the answer is yes, I think everybody knows better than me, it´ll be needed first a more "descriptive" data than boxscore and raw +/-. Play by play video "ascriptive" aided data plus supersoftwares is the next. But is in the "ascriptive" part where the trouble is. You can´t pretend that only with global stats evidence and tendencies (I don´t agree with WOW here), without some expert scouting and coaching aid (I mean 4 different eyes and a camera watch better than 2, and they don´t need a thousand years) you can create the supermetrics you want to. The stat tricks box could be emptied one of this days.
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum