APBRmetrics

Dan Rosenbaum

I moved this from the conference aftermath thread. We would love to hear any comments that folks have on this paper. We have tried to take many of the ideas developed in our community and bring them to the wider research community in this paper.

A Starting Point for Analyzing Basketball Statistics
(under review at the Journal of Quantitative Analysis in Sports)
http://www.uncg.edu/eco/rosenbaum/jqas1.doc

Abstract
The quantitative analysis of sports is a new branch of science and, in many ways one that has grown through non-academic and non-traditionally peer-reviewed work. The aim of this paper is to bring to a peer-reviewed journal the generally accepted basics of the analysis of basketball, thereby providing a common starting point for future research in basketball. The possession concept, in particular the concept of equal possessions for opponents in a game, is central to basketball analysis. Estimates of possessions have existed for approximately two decades, but the various formulas have sometimes created confusion. We hope that by showing how most previous formulas are special cases of our more general formulation, we shed light on the relationship between possessions and various statistics. Also, we hope that our new estimates can provide a common basis for future possession estimation. In addition to listing data sources for statistical research on basketball, we also discuss other concepts and methods, including offensive and defensive ratings, plays, per-minutes statistics, pace adjustments, true shooting percentage, effective field goal percentage, rebound rates, Four Factors, plus/minus statistics, counterpart statistics, linear weights metrics, individual possession usage, individual efficiency, Pythagorean method, and Bell Curve method. This list is not an exhaustive list of methodologies used in the field, but we believe that they provide a set of tools that fit within the possession framework and form the basis of common conversations on statistical research in basketball.

Ed Küpfer · Joined: 30 Dec 2004 Posts: 787 Location: Toronto

Man, I've always wanted to do one of these. I'm glad not everyone is as lazy as me.

I just started reading. Is this in press already, or should I be watching for typos?
_________________
ed

Dan Rosenbaum · Posted: Tue Feb 13, 2007 11:40 am Post subject:

Dan Rosenbaum · Posted: Tue Feb 13, 2007 11:44 am Post subject:

HeatherA · Joined: 03 Aug 2006 Posts: 55

Ed Küpfer · Joined: 30 Dec 2004 Posts: 787 Location: Toronto

HeatherA: Keep in mind that there's two types of player metric, production (eg Pts/G) and efficiency (eg FG%). Attempts to combine these are problematic. Because (as noted in the paper under discussion) the precise relationship between production and efficiency is unknown -- and largely unexplored -- I always look at both. Always.

NBA EFF, from what I recall, combines both. Yuck. For production, you'd do better, as a start, to regress team points on the various component stats (on a season-by-season basis) and apply that to the players. These so-called linear weights have a bad reputation among us statheads, but they should be adequate for your needs. Better still is to go with Dean Oliver's individual points per possessions, as outlined in his book. He writes about a Points Produces concept that would be useful.

As far as efficiency, you can also that individual points per possessions. For most players, EFG% and TS% pretty much captures most of the variance in ORTG, so if you want to go simple, you can use that. Add turnovers to the denominator if you want.

Another approach is to ignore composite metrics altogether and look solely at the component stats: REB%, EFG%, etc. The advantage of this is that we are much more confident that these metrics actually measure what we think they do. Also, these measure things that map closely to player skills we find interesting. The down side is that analysis is more complicated since you have increased the number of response variables.

I prefer the last approach. I'd rather know that a player is a good rebounder, poor shooter, great foul-drawer, etc, than know that he contributes an above average number of points per possession. That's just me, though.
_________________
ed

HeatherA · Joined: 03 Aug 2006 Posts: 55

The NBA Efficiency stat is definitely a kitchen-sink kind of stat:

compute efficiency = ((Pts + Reb + Asts + Stl + Blk) - ((fga - fgm) + (fta - ftm) + turnover))

And I would never consider it if I were trying to do any serious analysis of individual players. However, my purpose is to get a more general sense of "player performance" for a really stat-lite general article.

When I calculate NBA Efficiency for players' rookie years 1980-2005, the top 20 players come out to be:

David Robinson
Kevin Garnett
LeBron James
Michael Jordan
Shaquill O'neal
Shawn Marion
Elton Brand
Kobe Bryant
Dirk Nowitzki
Dwyane Wade
Tim Duncan
Allen Iverson
Terry Cummings
Hakeem Olajuwon
Yao Ming
Larry Johnson
Chris Bosh
Paul Pierce
Alonzo Mourning
Gilbert Arenas

Eyeballing the list as a whole seems to make some sense as well. So I have some comfort that, while not by any means perfect, this stat is telling me something about players' relative performance in the NBA.

I like your thought about breaking down the individual pieces and looking at them separately, as well. I'll have to think about whether there is a way to include that without making the article completely unwieldy.

Thanks,
Heather

tpryan · Joined: 11 Feb 2005 Posts: 100

Dan,

First of all, congratulations on putting the paper together.

One comment I have is that the first sentence of your abstract creates the impression that quantitative analysis of sports is new, or at least a new branch of science. Certainly it is now being taken to a new level in professional basketball, but Bud Goode was modeling NFL games almost 50 years ago. In the peer-reviewed literature, David Harville had a paper on using linear model methodology for ranking college football teams in the Journal of the American Statistical Association in 1977 and many other papers have appeared in leading statistics journals.

And sophisticated statistical analyses have been applied to various other sports for many years.

I assume there are some moderately high correlations among the 7 variables in Table 1, so why not do the following.

There is now a routine in R (the freeware version of S-Plus for anyone not familar with it), that uses 6 different methods for assessing the relative importance of each variable in a regression model when the variables are correlated (see http://www.jstatsoft.org/v17/i01/v17i01.pdf ). Why not use this and compare the results with your Table 1?

Chronz1 · Joined: 22 May 2006 Posts: 201

Just a question, why would anyone use NBA EFF when PER is like the upgraded form that adjusts for pace and minutes played.

HoopStudies · Posted: Tue Feb 13, 2007 7:13 pm Post subject:

Dan Rosenbaum · Posted: Tue Feb 13, 2007 8:26 pm Post subject:

HeatherA · Joined: 03 Aug 2006 Posts: 55

Neil Paine · Joined: 13 Oct 2005 Posts: 774 Location: Atlanta, GA

mtamada · Joined: 28 Jan 2005 Posts: 377

tpryan · Joined: 11 Feb 2005 Posts: 100