This is Google's cache of viewtopic.php?p=30410&sid=111a6f20493cfc00bd896f1451145611. It is a snapshot of the page as it appeared on Mar 31, 2011 14:12:34 GMT. The current page could have changed in the meantime. Learn more

Text-only version
These search terms are highlighted: usage vs efficiency  
APBRmetrics :: View topic - Usage vs. efficiency in lineup data
APBRmetrics Forum Index APBRmetrics
The statistical revolution will not be televised.
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Usage vs. efficiency in lineup data

 
Post new topic   Reply to topic    APBRmetrics Forum Index -> General discussion
View previous topic :: View next topic  
Author Message
jready



Joined: 28 Jan 2010
Posts: 7
Location: University of Chicago

PostPosted: Thu Jan 28, 2010 11:31 pm    Post subject: Usage vs. efficiency in lineup data Reply with quote

I'm new to posting here, but I've been reading about APBRmetrics for a while. I know a lot of work has already been done on the usage versus efficiency debate, but I have not seen it studied on a lineup basis. I was not able to find any significant negative correlation between usage and efficiency, but maybe something useful can be found in all of this.

All data used for this can be found by downloading my spreadsheet from http://www.4shared.com/file/210055863/8c750ce/Lineup_Efficiency_Usage.html

I used the play by play data from http://www.basketballgeek.com/data/ for the 2008-09 season. While enormously useful, it was missing a few games for each team, so season totals won’t add up to the numbers seen elsewhere, but I don’t think this should cause too much error.

First, for every lineup that played at least one possession, I found how many points were scored and how many possessions were used by each player in each lineup. I counted a possession as being used by a player if he turned it over, made or missed and field goal, or made or missed the last of two or three free throws. This sheet is called TeamLinStats.

Next, I organized the data by player, keeping lineups separate. (By this, I mean that each player has points scored, player possessions used, and team possessions used for each lineup in which the player played). I calculated the players “usage” as player possessions used divided by team possessions used, and “efficiency” as points scored divided by player possessions used. This “efficiency” is not Dean Oliver’s offensive efficiency, but rather something very similar to true shooting percentage. I think the only significant difference is that I include turnovers in the denominator. This is because I wanted player stats to sum up to team stats, and this allows player possessions to sum up to team possessions. Also, for the purposes of this study, a player who is relied on for more usage would likely see his turnover rate increase, and that should be reflected in the data. This sheet is called PlayerLinStats.

I then summed up all the stats for each player, creating season totals. This data is in PlayerTotalStats.

For each player-lineup in PlayerLinStats, I found a player’s “True” efficiency and usage by taking the season totals and subtracting out the data of the lineup in question. I did this to avoid having the same shots counted as both independent and dependent variables in the same regression.

I then did a multiple regression, using Player Lineup Efficiency (the player efficiency in a given lineup) as the dependent variable, and Lineup Usage, True Usage, and True Efficiency as the independent variables. The value I am most interested in is the coefficient for lineup usage. A negative coefficient for lineup usage would imply that, if a player is in a lineup where he is relied upon to shoot more than his usual amount (accounted for by True Usage), his efficiency goes down. A positive coefficient would be evidence that teams manage to get the ball to players in lineups where he has a better change of scoring. The full results can be found in the “Multiple Regression” sheet, but the equation I got is:

Player Lineup Efficiency=0.785+ .062*True Eff+ .167*True Usage- .019*Lineup Usage

This initially looks promising, but despite the 39,000 player-lineups, the p value of the coefficient for lineup usage is only .35. Considering that half of the data consisted of lineups with 11 total possessions or less, I don’t think this is a very reliable regression.

I next did a weighted least squares regression. I don’t know any automatic way to do this in Excel, so I used the Excel solver to minimize squared errors weighted by team possessions used. The equation I got for this is:

Player Lineup Efficiency=0.819+ .071*True Eff+ .047*True Usage+ .087*Lineup Usage

It appears this data does not provide any evidence that the amount of possessions a player uses in a lineup, relative to his normal usage rate, significantly affects his efficiency in that lineup.
_________________
Jimmy Ready
Back to top
View user's profile Send private message Send e-mail
schtevie



Joined: 18 Apr 2005
Posts: 411

PostPosted: Fri Jan 29, 2010 10:02 am    Post subject: Reply with quote

jready, am I interpreting your results correctly? Looking at the second regression, if a given player, who in a given line-up uses one fifth of the "possessions" (0.2) doubles his usage in that line-up (to 0.4) the effect on his "efficiency" in that line-up is to increase it by 0.0174, or 1.74 points per 100 "possessions" (turnovers, shot attempts, and freethrows).

But continuing the example, what about the other player(s) whose cumulative usage decreased by 0.2? According to your specification, does their resulting decrease in efficiency perfectly offset the increase?

These points of clarification aside, how does your approach address the fundamental problem in identifying the effect of usage on efficiency: the mismatch issue? Basically, we expect players who have advantageous mismatches relative to their defenders to take and make more shots. This effect needs to be accounted for to address the underlying issue.
Back to top
View user's profile Send private message
jready



Joined: 28 Jan 2010
Posts: 7
Location: University of Chicago

PostPosted: Fri Jan 29, 2010 4:40 pm    Post subject: Reply with quote

schtevie wrote:
jready, am I interpreting your results correctly? Looking at the second regression, if a given player, who in a given line-up uses one fifth of the "possessions" (0.2) doubles his usage in that line-up (to 0.4) the effect on his "efficiency" in that line-up is to increase it by 0.0174, or 1.74 points per 100 "possessions" (turnovers, shot attempts, and freethrows).


You are interpreting that correctly. I know my efficiency measure isn't something standard, but am I doing something wrong to calculate possessions used? (I ask because of how you put it in quotes and list what I include). I think the only thing I am leaving out of traditional calculations of possessions is offensive rebounds, and I don't think a player can really control whether his shot is rebounded or not. Would it be better to use only shot attempts (FG and FT), or weight shot attempts less than turnovers, because of the chance of an offensive rebound?

Quote:
But continuing the example, what about the other player(s) whose cumulative usage decreased by 0.2? According to your specification, does their resulting decrease in efficiency perfectly offset the increase?


If player 1 increases his usage from .2 to .4, and all 5 other players adjust their usage from .2 to .15, the individual efficiencies of each of players 2-4 would decrease by a combined total of .087, but the team efficiency would not totally offset, because the points scored per possession is based on player efficiencies weighted by usage. This means the teams best possession distribution is to always have one player take all the shots, which is of course nonsense. I don't think this value should be considered significant. The math below illustrates this:

pts = (.2)(e1) + (.2)(e2) + (.2)(e3) + (.2)(e4) + .(2)(e5)
pts' = (.4)(e1 + .087*.2) + (.015)(e2 - .087*.05) + (.015)(e3 - .087*.05) + (.015)(e4 - .087*.05) (.015)(e5 - .087*.05)
= (.4)(e1) + (.015)(e2) + (.015)(e3) + (.015)(e4) + (.015)(e5) + .00435

The difference in this case is an increase of .4 points per 100 possessions, if all players baseline efficiencies are the same. This difference is small, but it exists, if you trust the .087 number (which I don't think is significantly greater than 0).

Quote:
These points of clarification aside, how does your approach address the fundamental problem in identifying the effect of usage on efficiency: the mismatch issue? Basically, we expect players who have advantageous mismatches relative to their defenders to take and make more shots. This effect needs to be accounted for to address the underlying issue.


Because of the positive coefficient of lineup usage, its seems like this effect was stronger than I anticipated. I do not do anything specific to control for this, but I thought (probably falsely, in retrospect) that the method used should make this effect small. Consider a player like Pau Gasol. In some lineups (generally with Kobe on the bench), he is a high usage player (with usages above .25); in other lineups (when Kobe is in the game) he is a relatively low usage player (with usages around .20 and below). I thought that the lineups where he had mismatches would be somewhat independent of the lineups where he had high usage. I might go back and try to control for the season usages of the other four players on the floor, and see if that affects things, if there is interest in that.
_________________
Jimmy Ready
Back to top
View user's profile Send private message Send e-mail
schtevie



Joined: 18 Apr 2005
Posts: 411

PostPosted: Fri Jan 29, 2010 5:26 pm    Post subject: Reply with quote

No, no problem with the terminology, the "quotes" were merely included to emphasize that the terminology was yours.

Regarding the positive coefficient in the preferred specification, I guess it should also be mentioned that it is not just unidentified (but anticipated) mismatches that contribute to it, but randomness as well. Players take the easy buckets when defensive errors allow them. (On the other hand, bad players will take hard shots into the teeth of the defense.)

Finally, regarding your Pau Gasol explanation, I am not sure that I get your point. I think that my prior is that mismatches are independent of base usage, e.g. Pau Gasol is apt to be better than his primary defender, whether or not KB is on the floor. Then what determines the division of to-be-dispensed possessions between these two (or other) players has to do primarily with choices on the part of the defense. But whatever the case, the positive correlation between "line-up usage" and "line-up efficiency" should be expected to remain, no?
Back to top
View user's profile Send private message
jready



Joined: 28 Jan 2010
Posts: 7
Location: University of Chicago

PostPosted: Fri Jan 29, 2010 6:46 pm    Post subject: Reply with quote

schtevie wrote:

Finally, regarding your Pau Gasol explanation, I am not sure that I get your point. I think that my prior is that mismatches are independent of base usage, e.g. Pau Gasol is apt to be better than his primary defender, whether or not KB is on the floor. Then what determines the division of to-be-dispensed possessions between these two (or other) players has to do primarily with choices on the part of the defense. But whatever the case, the positive correlation between "line-up usage" and "line-up efficiency" should be expected to remain, no?


I thought the general idea in the usage vs. efficiency issue, as shown by skill curves and other studies, is that a player forced to increase his usage will have a decreased efficiency. There are other factors (such as mismatches) that will cause a player to have an increase in both efficiency and usage simultaneously, but the goal is to control for those, and find a negative correlation between lineup usage and lineup efficiency. It looks like you are assuming that the division of shots is entirely dependent on the defense, but I don't think this is true; certainly the coach and offensive players have some control over how they divide the shots. Certain players in each lineup will be stuck taking the bad shots, at the end of the shot clock and in other difficult situations, because they don't have teammates who can do so. This is the effect I am looking for. If Pau Gasol has extra usage in a specific lineup, that means the difficult shots that someone has to take, instead of hurting KB's efficiency, will hurt Pau's.
_________________
Jimmy Ready
Back to top
View user's profile Send private message Send e-mail
IrishHand



Joined: 15 Jul 2009
Posts: 115

PostPosted: Sun Jan 31, 2010 8:55 am    Post subject: Reply with quote

jready wrote:
I think the only thing I am leaving out of traditional calculations of possessions is offensive rebounds, and I don't think a player can really control whether his shot is rebounded or not.


Players also can't really control a myriad of things that are included in the numbers you're using - not including the full possession (meaning any ORs plus, I'm assuming, any subsequent points arising post-OR) seems like it would adversely impact the accuracy and value of any conclusions you draw.

Shots taken from different locations and in different situations have relatively consistent and predictable OR rates. Why wouldn't you want that incorporated into the efficiency part of the analysis?

For instance, PlayerA regularly takes a certain kind of shots and hits 50% of that shot while his team recovers his misses 25% of the time. PlayerB regularly takes a different selection of shots, hitting at 45% while his team recovers his misses at a 40% rate. Do you believe that PlayerA is the more efficient offensive player?
Back to top
View user's profile Send private message
DLew



Joined: 13 Nov 2006
Posts: 224

PostPosted: Sun Jan 31, 2010 12:16 pm    Post subject: Reply with quote

How is this study different than the ones done by Ryan Parker and Eli Witus on this topic? Why are you getting different results?
Back to top
View user's profile Send private message
jready



Joined: 28 Jan 2010
Posts: 7
Location: University of Chicago

PostPosted: Sun Jan 31, 2010 3:28 pm    Post subject: Reply with quote

DLew wrote:
How is this study different than the ones done by Ryan Parker and Eli Witus on this topic? Why are you getting different results?


What Eli Witus did did is different because he looked at all of the players together in each lineup and did his regression based on a value that predicted lineup ORtg for the team given the whole lineup on the floor. I did this more on an individual level. Rather than look at how high usage players affect the team's efficiency, I was trying to find how high usage players affect their own efficiency. I was hoping this would give me a relatively simple way of eventually finding individual skill curves, or even some generic skill curve that could be applied to all low usage players, or something like that, but because I could not find a negative coefficient, it did not. I got different results because of our different methods.

I'm not sure which Ryan Parker study you are referring to, but probably either this one or this one ?
If I am understanding his method correctly, this is almost identical to what Eli did, except for applied to 3PTM in the first case, and I just covered how that's different.
_________________
Jimmy Ready
Back to top
View user's profile Send private message Send e-mail
jready



Joined: 28 Jan 2010
Posts: 7
Location: University of Chicago

PostPosted: Sun Jan 31, 2010 3:37 pm    Post subject: Reply with quote

IrishHand wrote:
jready wrote:
I think the only thing I am leaving out of traditional calculations of possessions is offensive rebounds, and I don't think a player can really control whether his shot is rebounded or not.


Players also can't really control a myriad of things that are included in the numbers you're using - not including the full possession (meaning any ORs plus, I'm assuming, any subsequent points arising post-OR) seems like it would adversely impact the accuracy and value of any conclusions you draw.

Shots taken from different locations and in different situations have relatively consistent and predictable OR rates. Why wouldn't you want that incorporated into the efficiency part of the analysis?

For instance, PlayerA regularly takes a certain kind of shots and hits 50% of that shot while his team recovers his misses 25% of the time. PlayerB regularly takes a different selection of shots, hitting at 45% while his team recovers his misses at a 40% rate. Do you believe that PlayerA is the more efficient offensive player?


I know all of this is true, but that would be very difficult to control for. No study I've seen of efficiency vs. usage has controlled for all those different factors (location of shot, time left on shot clock, frequency of a players shot getting rebounded, etc.), and while those are all good ideas of things to incorporate into a future study by me or someone else, they are beyond the scope of what I am doing here.
_________________
Jimmy Ready
Back to top
View user's profile Send private message Send e-mail
Crow



Joined: 20 Jan 2009
Posts: 816

PostPosted: Sun Jan 31, 2010 9:59 pm    Post subject: Reply with quote

If you split the data are there are any subgroups (by usage or efficiency level, age, shot distribution mix or some other player "type" or combinations of these) that behave differently, in a more statistically significant way?

When you can't find a pattern that fits everybody consistently, rather than end there (not saying you have or would, just trying to kick the ball forward) I'd think a next step would be to look for subgroups who have a more patterned behavior.
Back to top
View user's profile Send private message
jready



Joined: 28 Jan 2010
Posts: 7
Location: University of Chicago

PostPosted: Mon Feb 01, 2010 6:02 pm    Post subject: Reply with quote

Thanks for the input. That's a a good idea. Because I already have usage rates in my spreadsheet, I started by separating players by their level of season usage, and did the weighted regressions again. The results this time were much more interesting.

For players who have a true usage of greater than .2 (which is, of course, more than an 1/5 of the teams possessions):

Player Lineup Efficiency=0.661+ .231*True Eff+ .046*True Usage+ .138*Lineup Usage

For players with true usage of less than .2:

Player Lineup Efficiency=0.836+ .042*True Eff+ .201*True Usage-.0128*Lineup Usage

Next, I tested the more extreme cases. First, the approximately 10% of players with usage over .260:

Player Lineup Efficiency=0.857+ .012*True Eff+ .026*True Usage + .227*Lineup Usage

Then, the approximately 10% of players with usage less than .13:

Player Lineup Efficiency=0.595+ .153*True Eff+ 1.54*True Usage-.224*Lineup Usage



These results are more interesting than my original results, and intuitively make sense to me. For players who are used to being high usage players (usage > .26), being in a lineup in which they have to be even higher usage won't effect them negatively at all. These players are generally the first option in any lineup, so they already are taking the tough shots that someone has to take. Increasing their usage further is thus generally a result of mismatches or other factors that would similarly increase efficiency.
For players who are low usage players (usage < .13), the lineup usage value has a large negative coefficient. These players tend to be the fourth or fifth option on the court, normally only taking open shots or layups. When they are in a lineup where they have an increased usage, this either means they are in a lineup where they are getting more open shots and layups, or taking shots that are above the difficulty level of the shots they normally take. This is the effect that I have been trying to isolate all along, and I think here, I finally found some evidence of it, in the -.224 lineup usage coefficient.
_________________
Jimmy Ready
Back to top
View user's profile Send private message Send e-mail
schtevie



Joined: 18 Apr 2005
Posts: 411

PostPosted: Tue Feb 02, 2010 1:50 pm    Post subject: Reply with quote

I just want to go way back here and revisit the premise of usage vs. efficiency, and perhaps someone can remind me if I get the story incorrect. The belief that efficiency declines with increased usage is predicated on a big ALL ELSE EQUAL (apologies for the gratuitous use of caps). And the theory, solid theory that, is that this result occurs because of the increasing predictability of the offense. If it is known that Player X is going to increase his offensive responsibility, then the defense can accordingly shift their attention and increase on help defense, decreasing the resulting efficiency. I think that that is it.

One can then begin relaxing the ALL ELSE EQUAL condition a bit and say that some players might need some shots before they warm up, thereby supposing a positive relationship on that account. Equivalently, one can argue the flip side and say that some players begin to get fatigued past a certain usage rate, resulting in another negative relationship.

Other than this tweaking, one really needs to control for ALL ELSE to have faith in the resulting empirical estimates. In particular, there are a few important factors that I think have never been taken into account in any of the estimates I have seen produced (and again, someone please correct me if I am wrong).

(1) Defense. Players/Line-ups face opposing line-ups of dramatically dissimilar strength. And it cannot be assumed that they even out over a season.

(2) Possession/Scoring Opportunity Origin. For example, a player can increase his usage by increasing the fraction of fast breaks vs. increasing the number of shots in the half court, this is comparing apples and oranges.

(3) Shot Clock. A player may increase his usage by taking more shots at the end of the shot clock. These are expected to be of lower "efficiency" than those taken earlier on. But is completely incorrect to say, in the context of this question, that these are more inefficient.
Back to top
View user's profile Send private message
Crow



Joined: 20 Jan 2009
Posts: 816

PostPosted: Thu Feb 04, 2010 8:27 pm    Post subject: Reply with quote

Jready, your equations are digestable and show easy to see differences. I wonder how they would look for other subgroups or combos of subgroups as I mentioned before. Efficiency / usage sub groups, usage / age, usage for perimeter vs inside by position or style, etc. Up to you if you want to check more but I think there is more to understand about usage, efficiency, experience and style.


I also wonder if there could be things worth seeing and thinking about resulting from a comparison of what you found by this approach for various subgroups to 1) hoopnumbers.com's estimated Adjusted team eFG% impact for the sum of the same subsets of players timeweighted and 2) just the shooting / scoring components of Statistical +/- for the same group. How well to all 3 align in general and when they differ what does that say about players, lineups, play calls / shot distributions or about the metrics?
Back to top
View user's profile Send private message
jready



Joined: 28 Jan 2010
Posts: 7
Location: University of Chicago

PostPosted: Fri Feb 05, 2010 1:24 am    Post subject: Reply with quote

I'm very busy this week, but I do plan to go further with this idea, and look for trends in the data in many of the ideas you have suggested, as soon as I can.

Could you explain what you mean in your second paragraph? The way I read it is that you are suggesting comparing the results if, instead of my current measure of efficiency, which is essentially points per possession, I try using different measures of efficiency instead. I'm not sure what this would accomplish, beyond what a basic comparison of different measures of efficiency would.
_________________
Jimmy Ready
Back to top
View user's profile Send private message Send e-mail
Crow



Joined: 20 Jan 2009
Posts: 816

PostPosted: Fri Feb 05, 2010 1:47 am    Post subject: Reply with quote

I am sometimes not that successful at explaining or re-explaining but,

Ultimately the efficiency of a player, using your definition, over how ever many shots they take while on the court in all lineups is worth plus x or minus y points for time played per game or per 48 minutes compared to league average efficiency.

At least the Adjusted eFG% factor could be converted to a points for time played per game or per 48 minutes basis for player and probably FT/FGA could be roughly too.

The shooting / scoring components of Statistical +/ could be too.

That would make them more easily comparable, for all lineups or potentially you could do this for the major "Player Lineups".

It would allow an estimate of how much of a player's Adjusted eFG% factor was direct- because of their individual shooting / scoring, identified by your method or the Statistical +/- approach - while presumably the residual would be their indirect "shot creation impact", or at least one way to estimate it.

Your findings and the findings of the shooting / scoring components of Statistical +/- should have the same distribution but the values will be different if the "value" of better or worse shooting /scoring, based on the regression using Adjusted +/- data, is worth more than the surface raw value.

If that isn't helpful, don't worry about it. That's about as much as I can think to say about the idea right now.

If there are these three different ways to look at shooting / scoring impact I'd think it might be helpful to compare them.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    APBRmetrics Forum Index -> General discussion All times are GMT - 5 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group