APBRmetrics Forum Index APBRmetrics
The statistical revolution will not be televised.
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Some rules of thumb
Goto page 1, 2  Next
 
Post new topic   Reply to topic    APBRmetrics Forum Index -> General discussion
View previous topic :: View next topic  
Author Message
Ed Küpfer



Joined: 30 Dec 2004
Posts: 522
Location: Toronto

PostPosted: Thu Apr 07, 2005 1:55 am    Post subject: Some rules of thumb Reply with quote

Some hacks I'm always flipping through my notes to look up. Figured I'd just jot them down here and bookmark the page to save me some trouble. Feel free to add your own.

Assisted % = 0.75 - AST/MIN * 1.5
This one has a standard error of about 10%.

Potential Assists = AST * (0.5 * PTS/FGM) / TS%
Thanks to Dan for this one.

In-Game Home Team Win Probability = 1 / (1 + EXP(-(0.06 + MinutesRemaining * 0.01+ HomeTeamLead * 0.34)))
This one will be the subject of a massive study this offseason. It only works on less than one quarter remaining.

For every point in team point differential, add 3 games to a team's win total over the course of 82 games. Home court advantage is worth about 3 points per game.
_________________
ed
Back to top
View user's profile Send private message
Ed Küpfer



Joined: 30 Dec 2004
Posts: 522
Location: Toronto

PostPosted: Sat Apr 09, 2005 3:56 am    Post subject: Reply with quote

For estimating high possession players

  • Regress FT% and FTA/poss 20% to the mean
  • Regress eFG% 25% to the mean
  • Regress ORTG 25% to the mean
  • Regress DRTG 30% to the mean
  • Regress TO% 20% to the mean


Do not regress

  • DR%
  • OR%
  • AST%
  • BLK%
  • STL%

_________________
ed
Back to top
View user's profile Send private message
HoopStudies



Joined: 30 Dec 2004
Posts: 410
Location: Near Philadelphia, PA

PostPosted: Sat Apr 09, 2005 10:36 am    Post subject: Reply with quote

Ed Küpfer wrote:
For estimating high possession players

  • Regress FT% and FTA/poss 20% to the mean
  • Regress eFG% 25% to the mean
  • Regress ORTG 25% to the mean
  • Regress DRTG 30% to the mean
  • Regress TO% 20% to the mean




What do you mean by this, Ed? What are you "estimating"?
_________________
Dean Oliver
Author, Basketball on Paper
http://www.basketballonpaper.com
Back to top
View user's profile Send private message Visit poster's website
Ed Küpfer



Joined: 30 Dec 2004
Posts: 522
Location: Toronto

PostPosted: Sat Apr 09, 2005 12:44 pm    Post subject: Reply with quote

HoopStudies wrote:
What do you mean by this, Ed? What are you "estimating"?

It's my best guess for the next season. Once upon a time I calculated the year-to-year correlation coefficients for these stats -- the numbers above represent the regression to the mean.

For example, FT% is pretty stable: r = 0.8 among players who shoot a lot of free throws. If a player shoots 90% in season 1, my best guess for season 2 is

90% - [(1 - r) * (90% - 75%)] = 87%.
_________________
ed
Back to top
View user's profile Send private message
Ed Küpfer



Joined: 30 Dec 2004
Posts: 522
Location: Toronto

PostPosted: Tue Apr 19, 2005 8:12 pm    Post subject: Reply with quote

Team1 Vs Team2 predictors

Code:
EFG% = 1 / (1 + EXP (-(-3.1 + 0.05 * HOME + 3.2 * 1offEFG% + 3.0 * 2defEFG%)))

Predictor       Coef    SE Coef        Z      P  Ratio  Lower  Upper
Constant    -3.09752  0.0114094  -271.49  0.000
HOME       0.0460184  0.0007412    62.09  0.000   1.05   1.05   1.05
1offEFG%       3.22721  0.0187841   171.81  0.000  25.21  24.30  26.15
2defEFG%       3.00915  0.0205559   146.39  0.000  20.27  19.47  21.10


Code:
TO% = 1 / (1 + EXP (-(-3.65 + 0.03 * HOME + 6.1 * 1offTO% + 6.3 * 2defTO%)))
Predictor        Coef    SE Coef        Z      P  Odds Ratio   Lower   Upper
Constant     -3.65004  0.0084608  -431.40  0.000
HOME       -0.0284064  0.0010110   -28.10  0.000        0.97    0.97    0.97
1offTO%         6.05658  0.0407233   148.73  0.000      426.91  394.16  462.39
2defTO%         6.29438  0.0384019   163.91  0.000      541.52  502.26  583.85


Code:
OR% = 1 / (1 + EXP (-(-3.0 + 0.08 * HOME + 3.8 * 1offOR% + 3.2 * 2defDR%)))
                                      Odds     95% CI
Predictor       Coef    SE Coef        Z      P  Ratio  Lower  Upper
Constant    -2.99216  0.0057565  -519.79  0.000
HOME       0.0769602  0.0008089    95.14  0.000   1.08   1.08   1.08
1offOR%        3.78669  0.0151349   250.20  0.000  44.11  42.82  45.44
2defOR%        3.15742  0.0176015   179.38  0.000  23.51  22.71  24.34

2defOR% = 1 - 2defDR%



Code:

FTA/Poss = -0.22 + 0.01 * HOME + 0.9 * 1FTA/poss + 0.9 * 2FTA/poss

Predictor       Coef    SE Coef       T      P
Constant   -0.216195   0.006288  -34.38  0.000
Home       0.0108320  0.0008882   12.20  0.000
1FTA         0.88636    0.01859   47.69  0.000
2FTA         0.91714    0.01631   56.25  0.000

S = 0.0759282   R-Sq = 17.9%   R-Sq(adj) = 17.9%

_________________
ed
Back to top
View user's profile Send private message
Ed Küpfer



Joined: 30 Dec 2004
Posts: 522
Location: Toronto

PostPosted: Wed Apr 20, 2005 4:12 pm    Post subject: Reply with quote

Linear Weights-style individual possession estimator:

POSS = 0.74 * FGA + 0.44 * FTA + 0.25 * OR + 0.25 * AST + TO

Points Produced estimator:

PtsProd = 1.45 * 2Made + 2.2 * 3Made + FTMade + 0.6 * OR + 0.6 * AST
_________________
ed
Back to top
View user's profile Send private message
mtamada



Joined: 28 Jan 2005
Posts: 127

PostPosted: Mon Apr 25, 2005 7:00 pm    Post subject: Reply with quote

Ed Küpfer wrote:
For estimating high possession players

Do not regress

  • DR%
  • OR%
  • AST%
  • BLK%
  • STL%


Why would one not want to regress these statistics? Given that even FT% should be regressed .2 to the mean according to your figures, I would think that these should be also, e.g. an Elmore Smith might have a titanic shot-blocking year which he is unlikely to repeat.
Back to top
View user's profile Send private message
Ed Küpfer



Joined: 30 Dec 2004
Posts: 522
Location: Toronto

PostPosted: Mon May 09, 2005 4:18 pm    Post subject: Reply with quote

{Edited by ed}

Here's a cool formula for estimating a team's final winning percentage from the within-season win%, from DennisBoz.

Code:

Final Win % = ((1-F)^2)/2 + (2*F – F*F) * Win% To Date

where F = %age of the season completed (GP/82).


I have no idea why this works -- you can see some discussion at the link above. It does work though, giving an overall RMSE of 6.1 games. The formula, of course, gets more accurate as the season progresses -- here's a comparison between Boz's estimate of final season Win% and the final Win% if you simply extrapolated from the present win% (that is, if you assumed your .600 record at game# 40 would produce a season ending record of .600):

Code:

              RMSE
Game#    BOZ  extrapolated
1-10    11.0    21.1
11-20   8.7     9.7
21-30   6.9     6.8
31-40   5.3     5.1
41-50   4.1     4.0
51-60   3.1     3.1
61-70   2.2     2.2
71-80   1.3     1.3





Pretty neat.
_________________
ed
Back to top
View user's profile Send private message
Ed Küpfer



Joined: 30 Dec 2004
Posts: 522
Location: Toronto

PostPosted: Mon May 09, 2005 4:30 pm    Post subject: Reply with quote

Ed Küpfer wrote:
Code:

              RMSE
Game#    BOZ  extrapolated
1-10    11.0    21.1
11-20   8.7     9.7
21-30   6.9     6.8
31-40   5.3     5.1
41-50   4.1     4.0
51-60   3.1     3.1
61-70   2.2     2.2
71-80   1.3     1.3



Pretty neat.


Hey, wait a minute -- that's no good! It only does better at games 1-20. Hmm. Maybe I can screw around with the exponent.
_________________
ed
Back to top
View user's profile Send private message
Dan Rosenbaum



Joined: 03 Jan 2005
Posts: 413
Location: Greensboro, North Carolina

PostPosted: Mon May 09, 2005 4:59 pm    Post subject: Reply with quote

Hey Ed, I know things are slow around here once in awhile, but it kind of makes us all look bad if you have to argue with yourself. Very Happy Very Happy Very Happy
Back to top
View user's profile Send private message Send e-mail Visit poster's website Yahoo Messenger
Ed Küpfer



Joined: 30 Dec 2004
Posts: 522
Location: Toronto

PostPosted: Sun May 15, 2005 2:56 pm    Post subject: Reply with quote

A linear weights-style estimator for RTG, based on the four factors. I have no idea what it's good for, but anyway:

Code:
RTG = (1 + 5 * EFG% + FTA% - 4 * TO% + OR%) * 31

where
EFG% = (FGM + .5 * 3M) / FGA
FTA% = FTA / POSS
TO% = TO / POSS
OR% = OR / (OR + OppDR)
and POSS = FGA + FTA * 0.44 - OR + TO


Even these simplified weights -- 1, 5, 1, -4, 1 -- are very accurate: RMSE is 0.7 points per 100 possessions. Adding three decimal places only decreases the RMSE to 0.5.

One further note: I have found that standardising the four factors isn't helpful. I have tried standardising by the season (ie using the season by season mean and SD) and also by the entire sample, and neither has improved upon the accuracy of the raw stats. I don't know why this is.

And while I'm at it, a four factors-based WIN% estimator:
Code:

WIN% = 1 / ( 1 + EXP (bX) )

where bX = - (  20 * oEFG
              +  5 * oFTA% 
              - 16 * oTO%
              +  6 * oOR%
              - 20 * dEFG%
              -  5 * dFTA%
              + 16 * dTO%
              -  6 * dOR% )


This too is very accurate, with a RMSE of 3.5 games in an 82-game season, compared to an RMSE of 3.0 games for a pythagorean estimate based on exponents customised by season.
_________________
ed
Back to top
View user's profile Send private message
Ed Küpfer



Joined: 30 Dec 2004
Posts: 522
Location: Toronto

PostPosted: Thu Feb 09, 2006 8:17 pm    Post subject: Reply with quote

After fuch mucking around, I finally came up with a logistic within-game win estimator.

Code:
p(Home Team Win) = 1/(1+exp(-(0.13 + 0.12 * HomeTmLead + 0.0044 * HomeTmLead2 + 0.0068 * MinutesRemaining)))

HomeTmLead = Home Team lead
HomeTmLead2 = HomeTmLead * ABS(HomeTmLead)
MinutesRem = Minutes in game remaining.


Overtimes treated like the final five minutes of the 4th quarter, and the fourth quarter of overtime games treated as if they ended in regulation. What? What I mean is this: if a game goes to overtime, and the home team leads by five with two minutes left in the overtime, that situation is dealt with exactly the same as if it were a five point lead with two minutes remaining in the 4th quarter.

WARNING: I have not tested this at the <1 minute remaining level. In fact, I used times rounded off to the nearest minute, so I'm not exactly sure how well the equation performs during the final minutes of games, when time slows down.

This equation above gives a very nice fit to my data (r2 = .96), which includes about 110,000 observations from almost every game in 04-05. I'm adding more observations soon, so the coefficents will be updated a little, and I'll post my methodology at that time. I'm not quite sure if logit is the way to go at the <1 minute remaining spots, but I'll see about that later.
_________________
ed
Back to top
View user's profile Send private message
mtamada



Joined: 28 Jan 2005
Posts: 127

PostPosted: Thu Feb 09, 2006 9:10 pm    Post subject: Reply with quote

Fascinating stuff, but is there a typo in the formula? The coefficient on MinutesRemaining is positive, so the larger the number of minutes remaining, the more negative the argument to the exponentiation function, so the smaller the denominator, and the higher the probability of a Home Team Win.

That surely can't be right; a home team with a 10 point lead with 1 minute left should have a near guarantee of victory (I get 85.4% from your formula, assuming that I typed it in correctly). As the MinutesRemaining gets larger, shouldn't we see a decrease, not an increase, in the probability of this home team winning (while still remaining substantially above 50%)? Also 85.4% seems way too low, unless the opponent has Reggie Miller or Isiah Thomas on one of their heroic playoff rampages.

My in-my-head calculations in the first paragraph, as well as the numbers in my spreadsheet (again, assuming that I haven't made any typos) both show an ever-growing probability of victory as MinutesRemaining INCREASES. That makes perfect sense for a team that is behind, but not for a team that is ahead.
Back to top
View user's profile Send private message
Ed Küpfer



Joined: 30 Dec 2004
Posts: 522
Location: Toronto

PostPosted: Thu Feb 09, 2006 9:33 pm    Post subject: Reply with quote

I'm pretty sure your calculations are correct. The problem is, as you can see, that the logit fit breaks down at the extremes. It's hard for me to tell exactly what's going on here as these areas are respresent by only a few observations. In a previous effort, I also had real problems fitting a logistic curve to the extreme areas in both Points Difference and Time Remaining. All I can say for now is, hang on. I'm still accumulating data. If the fit is still poor, I may have to mix curves somehow. I'll keep everyone posted.
_________________
ed
Back to top
View user's profile Send private message
mtamada



Joined: 28 Jan 2005
Posts: 127

PostPosted: Thu Feb 09, 2006 10:33 pm    Post subject: Reply with quote

Maybe some pure curve-fitting technique, such as cubic splines is the way to go.

http://mathews.ecs.fullerton.edu/n2003/CubicSplinesMod.html
http://www.zoology.ubc.ca/~schluter/splines.html
Using your formula, I was looking for the combinations at which a team would have a 90% probability of winning. If I typed in the formula correctly, a home team with a 12 point lead always has a better than 90% chance of winning, while a team with a lead of 11 or less never has a 90% probability of winning. The first part might be plausible (but not the way the probability rises with increased MinutesRemaining), but not the second.

A friend of mine once said that he used he following as a rule of thumb: if "m" is the number of minutes remaining, then a team with a lead of 2m+7 points is practically guaranteed to win the game ("game in the refrigerator" in Chick Hearn-speak). I don't know if his rule of thumb is a good one or not, but in trying to calibrate it against your formula, I found the patterns that I've pointed out, the ones which don't seem realistic. But if you've got the data and can present, say, a table of "MinutesRemaining" and "Points Ahead" stats and the corresponding observed probabilities of winning, maybe I'll discover that the true probabilities are not what I think they should be.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    APBRmetrics Forum Index -> General discussion All times are GMT - 5 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group