|
APBRmetrics The statistical revolution will not be televised.
|
View previous topic :: View next topic |
Author |
Message |
kbche
Joined: 19 Jul 2005 Posts: 51 Location: washington d.c.
|
Posted: Wed Jul 27, 2005 9:58 pm Post subject: Question for Dan Rosenbaum regarding his WINVAL analysis |
|
|
Hi Dan,
I am working on mathematically characterizing the teamwork aspect of NBA basketball. I am a chemical engineer and an avid fan of NBA basketball (Washington Wizards and Miami Heat). I recently read your WINVAL analysis paper dated May 30,2004. Thanks for sharing your creative methods for adjusting plus-minus statistics.
Have you done any more work on the OLS estimates? I noticed that the r-squared value was low. Have you looked at choosing a different set of variables?
kbche (Kimberly Brown)
I presume that you are talking about the estimates that relate box score statistics to adjusted plus/minus ratings. In my line of work the R^2 values in those regressions are pretty high. In wage regressions in labor economics R^2 values of 0.01 are not uncommon.
But in a lot of regressions, like these, we know that we are not explaining everything so a "low" R^2 is to be expected. A more critical issue is whether we can precise estimates, which in this case the estimates are not too bad. I have tried other combinations of variables, but R^2 probably will not rise a lot more until I start using non-box score statistics. I suspect that variables, such as opponents' PER, probably would improve R^2. That said, the adjusted plus/minus ratings are measured imprecisely themselves, so there is no way any set of variables is going to fully explain them.
Rather than continuing this conversation in PM, we could just take it to the board. If you would like to, feel free to copy both your post and my response.
Welcome to the board.
Best wishes,
Dan |
|
Back to top |
|
|
Jon Cohodas
Joined: 08 Jul 2005 Posts: 31 Location: Richmond, VA
|
Posted: Thu Jul 28, 2005 10:10 am Post subject: |
|
|
How amusing. I too was composing questions about Dan's regressions for email when I decided that it might be better to ask them here.
My question is the following:
I see your base equation has each observation defined as no substitutions and you estimate Margin based on average points per possession. Have you tried running the data by possession?
In other words, each possession ends with an event such as a turnover, defensive rebound, foul, score, etc.
It would seem to me that one could get at the propensities for certain events to occur on offense or defense even if they do not get a direct tablulated statistic. For example, a player who boxes out well, might not get a rebound, but the probability of a rebound occuring on his watch may be higher.
It also may be possible that "weighting" multiple observations of the same 10 player combos may affect the +/- statistics. |
|
Back to top |
|
|
Dan Rosenbaum
Joined: 03 Jan 2005 Posts: 541 Location: Greensboro, North Carolina
|
Posted: Thu Jul 28, 2005 10:35 am Post subject: |
|
|
Jon Cohodas wrote: | How amusing. I too was composing questions about Dan's regressions for email when I decided that it might be better to ask them here.
My question is the following:
I see your base equation has each observation defined as no substitutions and you estimate Margin based on average points per possession. Have you tried running the data by possession?
In other words, each possession ends with an event such as a turnover, defensive rebound, foul, score, etc.
It would seem to me that one could get at the propensities for certain events to occur on offense or defense even if they do not get a direct tablulated statistic. For example, a player who boxes out well, might not get a rebound, but the probability of a rebound occuring on his watch may be higher.
It also may be possible that "weighting" multiple observations of the same 10 player combos may affect the +/- statistics. |
Weighting by the number of possessions gives exactly the same results as creating separate observations for each possession. The latter just takes the computer a lot longer to run. In essence, the weight is just saying this observation should be counted 10 times because of its 10 possessions, this one 5 times because of its 5 possessions, etc.
Now it is a little more complicated than that because I simultaneously weight for clutch/garbage time play - in essence saying that we shoud pretend that a possession in clutch time is like having 1.5 or 2 or 2.5 possessions and that one in garbage time is like having 0 or 0.5 possessions. |
|
Back to top |
|
|
Jon Cohodas
Joined: 08 Jul 2005 Posts: 31 Location: Richmond, VA
|
Posted: Fri Jul 29, 2005 4:12 pm Post subject: |
|
|
Quote: | Weighting by the number of possessions gives exactly the same results as creating separate observations for each possession. The latter just takes the computer a lot longer to run. In essence, the weight is just saying this observation should be counted 10 times because of its 10 possessions, this one 5 times because of its 5 possessions, etc. |
Maybe I don't understand completely what you are doing then. When you say you use the average points per possession (APPP), is that APPP for that group of 10 players, the team, or for the entire dataset?
If it is for that cohort group, then I agree that pooling might not hurt. (The weighting of garbage, crunch, normal might be skewing things). If it is not, then could this explain why your estimates are so noisy? |
|
Back to top |
|
|
Dan Rosenbaum
Joined: 03 Jan 2005 Posts: 541 Location: Greensboro, North Carolina
|
Posted: Fri Jul 29, 2005 5:22 pm Post subject: |
|
|
Jon, every observation is a shift of a game with no substitutions. The dependent variable is the points scored by the home team during that shift minus the points scored by the away team. This point differential is expressed in points per 100 possessions.
The explanatory variables include variables for every (non-replacement) player in the league indicating whether he is playing during that shift for the home team, the away team, or not all. In my latest version I also include some variables that account for the ages and experience of the home and away teams.
I weight each shift by the number of possessions (and factors that account for garbage/clutch play).
I don't know if this answers your question, because I am not quite sure what you are asking.
The results are so noisy because it is hard for the data to assign credit to all ten players during a given shift. Lots of players play together a lot and so it is difficult to statistically separate them. The regression does not give more credit to the player who gets more blocks, steals, points, assists, etc. during a shift like we do when we watch a game. So in that sense it is less efficient than a person watching a game.
In the extreme if two players always played together, we would never be able to get separate adjusted plus/minus ratings for them. If players were assigned to teams randomly throughout the season (switching teams every game), we could get very precise estimates using adjusted plus/minus ratings. But they are not, so have a lot of "noise." |
|
Back to top |
|
|
Jon Cohodas
Joined: 08 Jul 2005 Posts: 31 Location: Richmond, VA
|
Posted: Mon Aug 01, 2005 7:50 am Post subject: |
|
|
Dan,
First of all, thanks for patiently answering my questions.
Dan Rosenbaum wrote: | Jon, every observation is a shift of a game with no substitutions. The dependent variable is the points scored by the home team during that shift minus the points scored by the away team. This point differential is expressed in points per 100 possessions. |
I think I understand the "normal" cases. I was just curoius about the one possession observations where you use the "average". Now that I've conteplated this and have seen more of the regression results, I'm sure that these account for very few observations.
Dan Rosenbaum wrote: |
I weight each shift by the number of possessions (and factors that account for garbage/clutch play). |
Does this adjustment dramatically change the results? Can you share your insight into how you chose your working definition of garbage/clutch time? Is it based on feel, or observation, or from the data, or all of the above?
Dan Rosenbaum wrote: |
The results are so noisy because it is hard for the data to assign credit to all ten players during a given shift. Lots of players play together a lot and so it is difficult to statistically separate them. The regression does not give more credit to the player who gets more blocks, steals, points, assists, etc. during a shift like we do when we watch a game. So in that sense it is less efficient than a person watching a game.
In the extreme if two players always played together, we would never be able to get separate adjusted plus/minus ratings for them. If players were assigned to teams randomly throughout the season (switching teams every game), we could get very precise estimates using adjusted plus/minus ratings. But they are not, so have a lot of "noise." |
I understand why pooling the data does not matter. I just had not realized how homogeneous the lineups were until I looked at the data a little more closely. |
|
Back to top |
|
|
Dan Rosenbaum
Joined: 03 Jan 2005 Posts: 541 Location: Greensboro, North Carolina
|
Posted: Tue Aug 02, 2005 4:12 am Post subject: |
|
|
I have not looked at the clutch/garbage time adjustment this year, but last season it really did not make that big of a difference. It is an ad hoc adjustment that could be improved upon. |
|
Back to top |
|
|
back2newbelf
Joined: 21 Jun 2005 Posts: 260
|
Posted: Wed Aug 03, 2005 4:49 am Post subject: |
|
|
i have a question too...
dan, i still don't fully understand your concept of reference players. in your analysis, do you only compare a specific player with a reference player or do you also compare a non-reference player with a non-reference player? also, since you have different reference players for each team, do you assume that all reference players are the same?
the thing is, i'm gonna start my pure adjusted +/- analysis soon, what i do is:
(A, B etc stands for a specific player)
A B C D E vs F G H I J
and
K(changed) B C D E vs F G H I J
to compare A and K. (basically just pure adjusted +/- if i'm not mistaken)
this does not allow for any cross team comparison of 2 players... i can only compare players from the same team...
[in some way it's probably not a bad thing not to be able to give a rating for all nba players in one chart since every player's value changes for whichever team he plays (an offensive oriented player might be better in a team with good defenders and vice versa). nevertheless, sometimes it might be pretty useful]
thanks for any explanations... |
|
Back to top |
|
|
Dan Rosenbaum
Joined: 03 Jan 2005 Posts: 541 Location: Greensboro, North Carolina
|
Posted: Wed Aug 03, 2005 8:02 am Post subject: |
|
|
In essence, I treat all players who played less than 250 minutes in the last three seasons as one player for estimating purposes. And it is no problem that occasionally more than one reference players plays together or sometimes they play against each other. |
|
Back to top |
|
|
Ben
Joined: 13 Jan 2005 Posts: 264 Location: Iowa City
|
Posted: Wed Aug 03, 2005 9:44 am Post subject: |
|
|
Dan Rosenbaum wrote: | In essence, I treat all players who played less than 250 minutes in the last three seasons as one player for estimating purposes. And it is no problem that occasionally more than one reference players plays together or sometimes they play against each other. |
Occasionally a susperstar gets a season ending injury after a few games. (I'm thinking of David Robinson the year before Tim Duncan was drafted.) Would that cause much of a problem or are there enough minutes to swamp this effect? |
|
Back to top |
|
|
Dan Rosenbaum
Joined: 03 Jan 2005 Posts: 541 Location: Greensboro, North Carolina
|
Posted: Wed Aug 03, 2005 11:08 am Post subject: |
|
|
Ben wrote: | Dan Rosenbaum wrote: | In essence, I treat all players who played less than 250 minutes in the last three seasons as one player for estimating purposes. And it is no problem that occasionally more than one reference players plays together or sometimes they play against each other. |
Occasionally a susperstar gets a season ending injury after a few games. (I'm thinking of David Robinson the year before Tim Duncan was drafted.) Would that cause much of a problem or are there enough minutes to swamp this effect? | It is less than 250 minutes in the last three seasons combined. |
|
Back to top |
|
|
Ben
Joined: 13 Jan 2005 Posts: 264 Location: Iowa City
|
Posted: Wed Aug 03, 2005 11:17 am Post subject: |
|
|
Dan Rosenbaum wrote: | Ben wrote: | Dan Rosenbaum wrote: | In essence, I treat all players who played less than 250 minutes in the last three seasons as one player for estimating purposes. And it is no problem that occasionally more than one reference players plays together or sometimes they play against each other. |
Occasionally a susperstar gets a season ending injury after a few games. (I'm thinking of David Robinson the year before Tim Duncan was drafted.) Would that cause much of a problem or are there enough minutes to swamp this effect? | It is less than 250 minutes in the last three seasons combined. |
Oops, sorry I missed that. |
|
Back to top |
|
|
back2newbelf
Joined: 21 Jun 2005 Posts: 260
|
Posted: Wed Aug 03, 2005 5:29 pm Post subject: |
|
|
- you treat all reference players as equally good?
- you compare 2 lineups only when they're all the same except for the reference player? do you use lineups that differ in alot more than just the reference player, but differ only in one position from each other? |
|
Back to top |
|
|
Dan Rosenbaum
Joined: 03 Jan 2005 Posts: 541 Location: Greensboro, North Carolina
|
Posted: Wed Aug 03, 2005 7:39 pm Post subject: |
|
|
back2newbelf wrote: | - you treat all reference players as equally good?
- you compare 2 lineups only when they're all the same except for the reference player? do you use lineups that differ in alot more than just the reference player, but differ only in one position from each other? |
Yes, I assume all replacement players have the same effectiveness. I use a regression framework that can simultaneously compare all line-ups. |
|
Back to top |
|
|
back2newbelf
Joined: 21 Jun 2005 Posts: 260
|
Posted: Wed Aug 03, 2005 9:30 pm Post subject: |
|
|
don't you think that hurts the objectivity of this research? |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|