Having just emailed the NESSIS organizers with a couple of questions, I got a response from Mark Glickman which included the following comment:
> It was great to see that we've piqued the interest
> of APBRmetricians. Basketball statistics is taking
> front and center stage at this conference. Who would
> have predicted ahead of time that baseball statistics
> (the conventional favorite of statisticians) would take
> back seat to basketball!
The abstract for a poster presentation by Kenny Shirley "A Markov model for basketball" caught my eye. What it finds about productivity after offensive and defensive rebounds and independence of events might add to the discussion. Anybody with thoughts about the approach? A report back on this topic after the presentation would be of interest.
It's similar to something I've been grappling with for a while. Markov chains for something like basketball is tough, because how specific do you want to be with it? Too specific and it's too difficult to track and probably not instructive. Too vague and it doesn't tell you much.
An approximate model isn't too hard, this is something that I worked out almost 20 years ago, but didn't progress farther with because at the time the detailed data for making the next step forward weren't available.
The simplest semi-reasonable (but probably not realistic enough to be useful) model would have 4 states. Suppose the Sonics are playing the Lakers. Then the 4 states are:
Sonics have the ball, after having given up a score.
Sonics have the ball, after having prevented a score (TO or DR).
Lakers have the ball, after having given up a score.
Lakers have the ball, after having prevented a score (TO or DR).
You can pretty much fill in the blanks from there. What are the transition probabilities from state to state? They can be estimated from standard box scores.
The trouble with the model above is that it lacks detail. There are after all at least two different ways of successfully preventing a score: via TO, or by forcing a missed shot and getting the defensive rebound. Again, thos probabilities can be estimated from standard box score data. What's harder to estimate is the next transition: what is the probability that the Sonics score after having forced a TO? What is the probability that they score after having grabbed a DR? In the past, those numbers were not available -- but now they are!
However we need to add yet more detail (and more states to the Markov matrix): one can score simply by shooting and making it. But one can also score by missing, grabbing the offensive rebound, and then making the second shot attempt. So now we start taking into account offensive rebounding percentages. Those are easily estimatable, the part that used to be hard was estimating the probability of scoring off of an offensive rebound. But again, we now have that info.
When you add the TO, DR, and OR states to the matrix, we're now up to, what, an 8 x 8 matrix? Again, one reason that I didn't pursue this further was that I didn't have a way of inverting an 8 x 8 matrix. But nowadays any full-featured math or statistical program can do that.
There's also of course different ways to score: shoot a FG, or draw a foul and shoot FTs. Those events also have probabilities that can be estimated, and incorporated into the model. The matrix does start getting big, but hey that's what computers are for.
What I'm eager to see is what he's done with his model, if he's worked it out along the lines that I've described. In particular, I think this model can be used to answer the age old, almost philosophical conundrum (debated here and on the previous apbrmetrics email list and probably on every hoops forum): what is the relative value of offensive and defensive rebounds? By tweaking the OR and DR percentages, I think one can work out a theoretically sound estimate.
What did you think of Kenny Shirley's presented materials for "A Markov model for basketball" and any discussion? From the abstract I got impression the matrix wasn't going to be quite as big as you outlined.
How do the values compare to "a starting point for basketball analysis" or the coefficients of statistical +/-?
The later two look quite different (values for discrete actions vs. per 40 minutes). Should these numbers somehow crosswalk or converge?
With pure +/- from a specific year difficult to get reliable meaning out of or blend in properly (except by the method author for inside use) aren't we largely left with various flavors of regression based results that could be used as linear weights?
Where does advanced analysis in public go from here?
What stats should be included and with what weights?
Can / will defense be represented better / more fully even if still rough?
What did you think of Kenny Shirley's presented materials for "A Markov model for basketball" and any discussion? From the abstract I got impression the matrix wasn't going to be quite as big as you outlined.
Kenny Shirley's matrix, as presented, was 18 x 18 (18 states)! His "ultimate" matrix had 40 states, including deflections (that do not result in turnovers), 4-point plays ... I don't know what else, maybe flagrant fouls, technical fouls, TOs off of steals vs TOs w/o a steal, blocked shots vs ordinary FG misses, etc. He quickly added that much of that would be overkill because many of those plays are so rare or similar to each other; if I recall correctly he said that he thought that a 30 state matrix would be a good balance of detail vs convenience. But the data to estimate those 30 states are not available, he said (however, I'm not sure of that, with 82games.com and BasketballValue.com now available). He got the data for his estimates of the transition probabilities by recording 4.5 taped games (he was quick to acknowledge that that's a small sample size; that sort of recording is very time intensive so he didn't have time to do more).
Quote:
How do the values compare to "a starting point for basketball analysis" or the coefficients of statistical +/-?
The later two look quite different (values for discrete actions vs. per 40 minutes). Should these numbers somehow crosswalk or converge?
Good questions but I don't remember any of the values, except for two: I asked him about the offensive vs defensive rebound value question and he instantly pointed to a column beside the matrix where he had calculated the average value of the transitions: .86 for an OR, and .66 for a DR. I presume that's measured in points although it could be relative to the value of a possession. I also don't know if we can take those raw figures and call them the value of the rebound, as opposed to the average value of the transition, those might be different concepts; I didn't ask and haven't thought this through. Also he was hesitant to apply any of his estimated values, given that they came from the "small" 18x18 matrix instead of his prefered 30x30 or 40x40. There's also the possibility that his 4.5 game sample could be skewed, although one could simply compare the stats from those games to the average NBA game to see if they differed much (I don't recall if he did that or not, I don't remember seeing such a comparison but I didn't think to look).
Quote:
With pure +/- from a specific year difficult to get reliable meaning out of or blend in properly (except by the method author for inside use) aren't we largely left with various flavors of regression based results that could be used as linear weights?
Where does advanced analysis in public go from here?
My raw guess is that +/- is subject to a lot of random error; the other measures are subject to their own errors, perhaps the optimal estimator will be some sort of combination of them, perhaps using Bayesian techniques.
Quote:
What stats should be included and with what weights?
One interesting item which DeanO mentioned during the final panel discussion is that "more stats are on the way" from the NBA. I didn't get a chance to ask him for details, probably nothing has been decided yet, but he does think that the NBA will start to track (if it isn't tracking them already) and release new stats that it currently is not releasing.
Quote:
Can / will defense be represented better / more fully even if still rough?
To go back to Kenny Shirley's poster: it was at the team level, so defense is covered just as fully as offense. Each team has a FG%, and a defensive FG%, an OR% and and a DR%, etc. etc. The model does not attempt to measure the contributions of any single individual player, or that player's offensive vs defensive contributions.
For the bigger question, of defensive stats, I don't know which if any of them might be on the list of oncoming NBA stats (DeanO implied that they're still at least a few seasons away).
I don't know when/if I'll have time to do a fuller write-up so let me say here that I was very glad that I went to this conference. I didn't go to the Feb. Sloan Conference so I can't compare them. Great to finally meet several of the apbrmetric contributors. Good overall quality of presentations. They emphasized research, rather than application to actual work with teams or organizations, so I was a bit surprised when one of the organizers said that one of the goals of the conference was to bring together academics and professionals (rather than just bring together academics from different fields and different sports interests). There were several people there who have positions with professional teams (or the US Olympic Committee, in the case of one presenter), but I imagine fewer than at the Sloan Conference.
Too bad that Boston/Cambridge is so far away for many of us (and expensive to stay in, unless one has a nearby friend to host you); next year's Sloan Conference has apparently already been scheduled and the organizers of this conference would like to make it an annual event also. Which is great, but that's two conferences, both in Cambridge. Which is a great place, but doesn't do much for geographic diversity or affordability.
Joined: 30 Dec 2004 Posts: 578 Location: Near Philadelphia, PA
Posted: Tue Oct 02, 2007 6:43 pm Post subject:
mtamada wrote:
Too bad that Boston/Cambridge is so far away for many of us (and expensive to stay in, unless one has a nearby friend to host you); next year's Sloan Conference has apparently already been scheduled and the organizers of this conference would like to make it an annual event also. Which is great, but that's two conferences, both in Cambridge. Which is a great place, but doesn't do much for geographic diversity or affordability.
Hey, Mike, why don't you organize something there in LA? Opposite corner of the country, closer for some, farther for others.
Shirley's model really gave me a moment of clarity as to how lacking the NBA's basic stats are. The fact he couldn't do this off a play-by-play sheet is just wrong, wrong, wrong. I spent some time talking to him about the wonder of the league's catch-all sweep-it-under-the-rug stat, the mighty Team Rebound, since that was basically the reason he had to do his own PBP off tapes rather than scraping a year's worth of PBP off the web....
Ditto what Mike said -- great to see everyone there and finally meet Ed and Mike and a few of the other regulars; hopefully we do this again soon. Be sweet to do it on the west coast, since I know Boston is a haul for Mike, Kevin P and a few others on here.
"Bayesian Analysis of Dyadic Data Arising in Basketball
bu Lucy Liu
The goal of this project is to use statistical methods to identify players and combinations of players which affect a basketball team's performance. The traditional statistics which are recorded tell us only about the contribution of individual players (eg. points scored, rebounds, etc). However, there are subtle aspects of play such as defensive help, setting screens and verbal communication that are known to be important but are not routinely recorded. The model we propose is based on the Bayesian social relations model. The results help us identify aspects of player performance."
Could be interesting for a leading voice to contact the author for dialogue and dissemination or summarization of the findings
http://tinyurl.com/2yomsk
Environmental factors affecting Sam Cassell's shooting behavior and results under serious study
http://patriot.lib.byu.edu/ETD/image/etd998.pdf
position / box score contributions to success. A thick document that could use a detailed review from a qualified reader. Tables on pages 54-56 present the main findings of this BYU study and they are briefly described on the pages thereafter.
How does the Cassell shot study compare with what NBA teams have- the teams with strong analytic shops and those who may have data & tape but don't work it as hard with statistical techniques? Cassell study representative of what is being done or leading in any way?
"I asked him about the offensive vs defensive rebound value question and he instantly pointed to a column beside the matrix where he had calculated the average value of the transitions: .86 for an OR, and .66 for a DR. I presume that's measured in points although it could be relative to the value of a possession. I also don't know if we can take those raw figures and call them the value of the rebound, as opposed to the average value of the transition, those might be different concepts; I didn't ask and haven't thought this through"
I share interest in sorting this thru if possible.
Would the author come to this forum and discuss it further?
Or if any get hard copy or discuss it further with him it would be helpful to hear more.
Table 4.3: Posterior means (rest of table editted out for brevity) of parameters of assists, steals, turnovers, free throws made, freethrow percentage for each position
Assist(µ_1 )
Position Mean
Center 0.2836
Power Forward 0.3721
Small Forward 0.4013
Point Guard 0.4007
Shooting Guard 0.4354 (highest)
Steals(µ_2 )
Center 0.5689 (highest)
Power Forward 0.1908
Small Forward 0.3291
Point Guard 0.2920
Shooting Guard 0.1957
Turn Overs(µ_3 )
Center -0.3648
Power Forward -0.5687
Small Forward -0.3912
Point Guard -0.3764
Shooting Guard -0.4281
Table 4.4: Posterior means for field goals made, field goal percentage, offensive
rebounds, and defensive rebounds for each of the positions
Field Goals Made(µ_6 )
Position Mean
Center 0.2061 (highest)
Power Forward 0.0820
Small Forward 0.0508
Point Guard 0.0340
Shooting Guard -0.0105
Field Goal Percentage(µ_7 )
Center 0.0437
Power Forward 0.0700
Small Forward 0.1064
Point Guard 0.1418
Shooting Guard 0.1703 (highest)
Defensive Rebounds(µ_9 )
Center 0.2609
Power Forward 0.3948
Small Forward 0.3530
Point Guard 0.5414 (highest)
Shooting Guard 0.4005
Last edited by Mountain on Sat Oct 06, 2007 12:37 pm; edited 3 times in total
All times are GMT - 5 Hours Goto page Previous1, 2, 3Next
Page 2 of 3
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum