This is Google's cache of viewtopic.php?t=304. It is a snapshot of the page as it appeared on Mar 20, 2011 01:39:44 GMT. The current page could have changed in the meantime. Learn more

Text-only version
These search terms are highlighted: dan rosenbaum  
APBRmetrics :: View topic - Question for Dan Rosenbaum regarding his WINVAL analysis
APBRmetrics Forum Index APBRmetrics
The statistical revolution will not be televised.
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Question for Dan Rosenbaum regarding his WINVAL analysis
Goto page 1, 2  Next
 
Post new topic   Reply to topic    APBRmetrics Forum Index -> General discussion
View previous topic :: View next topic  
Author Message
kbche



Joined: 19 Jul 2005
Posts: 51
Location: washington d.c.

PostPosted: Wed Jul 27, 2005 9:58 pm    Post subject: Question for Dan Rosenbaum regarding his WINVAL analysis Reply with quote

Hi Dan,

I am working on mathematically characterizing the teamwork aspect of NBA basketball. I am a chemical engineer and an avid fan of NBA basketball (Washington Wizards and Miami Heat). I recently read your WINVAL analysis paper dated May 30,2004. Thanks for sharing your creative methods for adjusting plus-minus statistics.

Have you done any more work on the OLS estimates? I noticed that the r-squared value was low. Have you looked at choosing a different set of variables?

kbche (Kimberly Brown)

I presume that you are talking about the estimates that relate box score statistics to adjusted plus/minus ratings. In my line of work the R^2 values in those regressions are pretty high. In wage regressions in labor economics R^2 values of 0.01 are not uncommon.

But in a lot of regressions, like these, we know that we are not explaining everything so a "low" R^2 is to be expected. A more critical issue is whether we can precise estimates, which in this case the estimates are not too bad. I have tried other combinations of variables, but R^2 probably will not rise a lot more until I start using non-box score statistics. I suspect that variables, such as opponents' PER, probably would improve R^2. That said, the adjusted plus/minus ratings are measured imprecisely themselves, so there is no way any set of variables is going to fully explain them.

Rather than continuing this conversation in PM, we could just take it to the board. If you would like to, feel free to copy both your post and my response.

Welcome to the board.

Best wishes,
Dan
Back to top
View user's profile Send private message Send e-mail
Jon Cohodas



Joined: 08 Jul 2005
Posts: 31
Location: Richmond, VA

PostPosted: Thu Jul 28, 2005 10:10 am    Post subject: Reply with quote

How amusing. I too was composing questions about Dan's regressions for email when I decided that it might be better to ask them here.

My question is the following:
I see your base equation has each observation defined as no substitutions and you estimate Margin based on average points per possession. Have you tried running the data by possession?

In other words, each possession ends with an event such as a turnover, defensive rebound, foul, score, etc.

It would seem to me that one could get at the propensities for certain events to occur on offense or defense even if they do not get a direct tablulated statistic. For example, a player who boxes out well, might not get a rebound, but the probability of a rebound occuring on his watch may be higher.

It also may be possible that "weighting" multiple observations of the same 10 player combos may affect the +/- statistics.
Back to top
View user's profile Send private message
Dan Rosenbaum



Joined: 03 Jan 2005
Posts: 541
Location: Greensboro, North Carolina

PostPosted: Thu Jul 28, 2005 10:35 am    Post subject: Reply with quote

Jon Cohodas wrote:
How amusing. I too was composing questions about Dan's regressions for email when I decided that it might be better to ask them here.

My question is the following:
I see your base equation has each observation defined as no substitutions and you estimate Margin based on average points per possession. Have you tried running the data by possession?

In other words, each possession ends with an event such as a turnover, defensive rebound, foul, score, etc.

It would seem to me that one could get at the propensities for certain events to occur on offense or defense even if they do not get a direct tablulated statistic. For example, a player who boxes out well, might not get a rebound, but the probability of a rebound occuring on his watch may be higher.

It also may be possible that "weighting" multiple observations of the same 10 player combos may affect the +/- statistics.

Weighting by the number of possessions gives exactly the same results as creating separate observations for each possession. The latter just takes the computer a lot longer to run. In essence, the weight is just saying this observation should be counted 10 times because of its 10 possessions, this one 5 times because of its 5 possessions, etc.

Now it is a little more complicated than that because I simultaneously weight for clutch/garbage time play - in essence saying that we shoud pretend that a possession in clutch time is like having 1.5 or 2 or 2.5 possessions and that one in garbage time is like having 0 or 0.5 possessions.
Back to top
View user's profile Send private message Send e-mail Visit poster's website Yahoo Messenger
Jon Cohodas



Joined: 08 Jul 2005
Posts: 31
Location: Richmond, VA

PostPosted: Fri Jul 29, 2005 4:12 pm    Post subject: Reply with quote

Quote:
Weighting by the number of possessions gives exactly the same results as creating separate observations for each possession. The latter just takes the computer a lot longer to run. In essence, the weight is just saying this observation should be counted 10 times because of its 10 possessions, this one 5 times because of its 5 possessions, etc.


Maybe I don't understand completely what you are doing then. When you say you use the average points per possession (APPP), is that APPP for that group of 10 players, the team, or for the entire dataset?

If it is for that cohort group, then I agree that pooling might not hurt. (The weighting of garbage, crunch, normal might be skewing things). If it is not, then could this explain why your estimates are so noisy?
Back to top
View user's profile Send private message
Dan Rosenbaum



Joined: 03 Jan 2005
Posts: 541
Location: Greensboro, North Carolina

PostPosted: Fri Jul 29, 2005 5:22 pm    Post subject: Reply with quote

Jon, every observation is a shift of a game with no substitutions. The dependent variable is the points scored by the home team during that shift minus the points scored by the away team. This point differential is expressed in points per 100 possessions.

The explanatory variables include variables for every (non-replacement) player in the league indicating whether he is playing during that shift for the home team, the away team, or not all. In my latest version I also include some variables that account for the ages and experience of the home and away teams.

I weight each shift by the number of possessions (and factors that account for garbage/clutch play).

I don't know if this answers your question, because I am not quite sure what you are asking.

The results are so noisy because it is hard for the data to assign credit to all ten players during a given shift. Lots of players play together a lot and so it is difficult to statistically separate them. The regression does not give more credit to the player who gets more blocks, steals, points, assists, etc. during a shift like we do when we watch a game. So in that sense it is less efficient than a person watching a game.

In the extreme if two players always played together, we would never be able to get separate adjusted plus/minus ratings for them. If players were assigned to teams randomly throughout the season (switching teams every game), we could get very precise estimates using adjusted plus/minus ratings. But they are not, so have a lot of "noise."
Back to top
View user's profile Send private message Send e-mail Visit poster's website Yahoo Messenger
Jon Cohodas



Joined: 08 Jul 2005
Posts: 31
Location: Richmond, VA

PostPosted: Mon Aug 01, 2005 7:50 am    Post subject: Reply with quote

Dan,
First of all, thanks for patiently answering my questions.

Dan Rosenbaum wrote:
Jon, every observation is a shift of a game with no substitutions. The dependent variable is the points scored by the home team during that shift minus the points scored by the away team. This point differential is expressed in points per 100 possessions.


I think I understand the "normal" cases. I was just curoius about the one possession observations where you use the "average". Now that I've conteplated this and have seen more of the regression results, I'm sure that these account for very few observations.

Dan Rosenbaum wrote:

I weight each shift by the number of possessions (and factors that account for garbage/clutch play).


Does this adjustment dramatically change the results? Can you share your insight into how you chose your working definition of garbage/clutch time? Is it based on feel, or observation, or from the data, or all of the above?

Dan Rosenbaum wrote:

The results are so noisy because it is hard for the data to assign credit to all ten players during a given shift. Lots of players play together a lot and so it is difficult to statistically separate them. The regression does not give more credit to the player who gets more blocks, steals, points, assists, etc. during a shift like we do when we watch a game. So in that sense it is less efficient than a person watching a game.

In the extreme if two players always played together, we would never be able to get separate adjusted plus/minus ratings for them. If players were assigned to teams randomly throughout the season (switching teams every game), we could get very precise estimates using adjusted plus/minus ratings. But they are not, so have a lot of "noise."


I understand why pooling the data does not matter. I just had not realized how homogeneous the lineups were until I looked at the data a little more closely.
Back to top
View user's profile Send private message
Dan Rosenbaum



Joined: 03 Jan 2005
Posts: 541
Location: Greensboro, North Carolina

PostPosted: Tue Aug 02, 2005 4:12 am    Post subject: Reply with quote

I have not looked at the clutch/garbage time adjustment this year, but last season it really did not make that big of a difference. It is an ad hoc adjustment that could be improved upon.
Back to top
View user's profile Send private message Send e-mail Visit poster's website Yahoo Messenger
back2newbelf



Joined: 21 Jun 2005
Posts: 260

PostPosted: Wed Aug 03, 2005 4:49 am    Post subject: Reply with quote

i have a question too...
dan, i still don't fully understand your concept of reference players. in your analysis, do you only compare a specific player with a reference player or do you also compare a non-reference player with a non-reference player? also, since you have different reference players for each team, do you assume that all reference players are the same?
the thing is, i'm gonna start my pure adjusted +/- analysis soon, what i do is:

(A, B etc stands for a specific player)
A B C D E vs F G H I J
and
K(changed) B C D E vs F G H I J

to compare A and K. (basically just pure adjusted +/- if i'm not mistaken)

this does not allow for any cross team comparison of 2 players... i can only compare players from the same team...

[in some way it's probably not a bad thing not to be able to give a rating for all nba players in one chart since every player's value changes for whichever team he plays (an offensive oriented player might be better in a team with good defenders and vice versa). nevertheless, sometimes it might be pretty useful]

thanks for any explanations...
Back to top
View user's profile Send private message
Dan Rosenbaum



Joined: 03 Jan 2005
Posts: 541
Location: Greensboro, North Carolina

PostPosted: Wed Aug 03, 2005 8:02 am    Post subject: Reply with quote

In essence, I treat all players who played less than 250 minutes in the last three seasons as one player for estimating purposes. And it is no problem that occasionally more than one reference players plays together or sometimes they play against each other.
Back to top
View user's profile Send private message Send e-mail Visit poster's website Yahoo Messenger
Ben



Joined: 13 Jan 2005
Posts: 264
Location: Iowa City

PostPosted: Wed Aug 03, 2005 9:44 am    Post subject: Reply with quote

Dan Rosenbaum wrote:
In essence, I treat all players who played less than 250 minutes in the last three seasons as one player for estimating purposes. And it is no problem that occasionally more than one reference players plays together or sometimes they play against each other.


Occasionally a susperstar gets a season ending injury after a few games. (I'm thinking of David Robinson the year before Tim Duncan was drafted.) Would that cause much of a problem or are there enough minutes to swamp this effect?
Back to top
View user's profile Send private message
Dan Rosenbaum



Joined: 03 Jan 2005
Posts: 541
Location: Greensboro, North Carolina

PostPosted: Wed Aug 03, 2005 11:08 am    Post subject: Reply with quote

Ben wrote:
Dan Rosenbaum wrote:
In essence, I treat all players who played less than 250 minutes in the last three seasons as one player for estimating purposes. And it is no problem that occasionally more than one reference players plays together or sometimes they play against each other.


Occasionally a susperstar gets a season ending injury after a few games. (I'm thinking of David Robinson the year before Tim Duncan was drafted.) Would that cause much of a problem or are there enough minutes to swamp this effect?
It is less than 250 minutes in the last three seasons combined.
Back to top
View user's profile Send private message Send e-mail Visit poster's website Yahoo Messenger
Ben



Joined: 13 Jan 2005
Posts: 264
Location: Iowa City

PostPosted: Wed Aug 03, 2005 11:17 am    Post subject: Reply with quote

Dan Rosenbaum wrote:
Ben wrote:
Dan Rosenbaum wrote:
In essence, I treat all players who played less than 250 minutes in the last three seasons as one player for estimating purposes. And it is no problem that occasionally more than one reference players plays together or sometimes they play against each other.


Occasionally a susperstar gets a season ending injury after a few games. (I'm thinking of David Robinson the year before Tim Duncan was drafted.) Would that cause much of a problem or are there enough minutes to swamp this effect?
It is less than 250 minutes in the last three seasons combined.


Oops, sorry I missed that.
Back to top
View user's profile Send private message
back2newbelf



Joined: 21 Jun 2005
Posts: 260

PostPosted: Wed Aug 03, 2005 5:29 pm    Post subject: Reply with quote

- you treat all reference players as equally good?
- you compare 2 lineups only when they're all the same except for the reference player? do you use lineups that differ in alot more than just the reference player, but differ only in one position from each other?
Back to top
View user's profile Send private message
Dan Rosenbaum



Joined: 03 Jan 2005
Posts: 541
Location: Greensboro, North Carolina

PostPosted: Wed Aug 03, 2005 7:39 pm    Post subject: Reply with quote

back2newbelf wrote:
- you treat all reference players as equally good?
- you compare 2 lineups only when they're all the same except for the reference player? do you use lineups that differ in alot more than just the reference player, but differ only in one position from each other?

Yes, I assume all replacement players have the same effectiveness. I use a regression framework that can simultaneously compare all line-ups.
Back to top
View user's profile Send private message Send e-mail Visit poster's website Yahoo Messenger
back2newbelf



Joined: 21 Jun 2005
Posts: 260

PostPosted: Wed Aug 03, 2005 9:30 pm    Post subject: Reply with quote

don't you think that hurts the objectivity of this research?
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    APBRmetrics Forum Index -> General discussion All times are GMT - 5 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group