View previous topic :: View next topic |
Author |
Message |
Ed Küpfer
Joined: 30 Dec 2004 Posts: 522 Location: Toronto
|
Posted: Thu Apr 07, 2005 1:55 am Post subject: Some rules of thumb |
|
|
Some hacks I'm always flipping through my notes to look up. Figured I'd just jot them down here and bookmark the page to save me some trouble. Feel free to add your own.
Assisted % = 0.75 - AST/MIN * 1.5
This one has a standard error of about 10%.
Potential Assists = AST * (0.5 * PTS/FGM) / TS%
Thanks to Dan for this one.
In-Game Home Team Win Probability = 1 / (1 + EXP(-(0.06 + MinutesRemaining * 0.01+ HomeTeamLead * 0.34)))
This one will be the subject of a massive study this offseason. It only works on less than one quarter remaining.
For every point in team point differential, add 3 games to a team's win total over the course of 82 games. Home court advantage is worth about 3 points per game. _________________ ed |
|
Back to top |
|
|
Ed Küpfer
Joined: 30 Dec 2004 Posts: 522 Location: Toronto
|
Posted: Sat Apr 09, 2005 3:56 am Post subject: |
|
|
For estimating high possession players
- Regress FT% and FTA/poss 20% to the mean
- Regress eFG% 25% to the mean
- Regress ORTG 25% to the mean
- Regress DRTG 30% to the mean
- Regress TO% 20% to the mean
Do not regress
_________________ ed |
|
Back to top |
|
|
HoopStudies
Joined: 30 Dec 2004 Posts: 410 Location: Near Philadelphia, PA
|
Posted: Sat Apr 09, 2005 10:36 am Post subject: |
|
|
Ed Küpfer wrote: | For estimating high possession players
- Regress FT% and FTA/poss 20% to the mean
- Regress eFG% 25% to the mean
- Regress ORTG 25% to the mean
- Regress DRTG 30% to the mean
- Regress TO% 20% to the mean
|
What do you mean by this, Ed? What are you "estimating"? _________________ Dean Oliver
Author, Basketball on Paper
http://www.basketballonpaper.com |
|
Back to top |
|
|
Ed Küpfer
Joined: 30 Dec 2004 Posts: 522 Location: Toronto
|
Posted: Sat Apr 09, 2005 12:44 pm Post subject: |
|
|
HoopStudies wrote: | What do you mean by this, Ed? What are you "estimating"? |
It's my best guess for the next season. Once upon a time I calculated the year-to-year correlation coefficients for these stats -- the numbers above represent the regression to the mean.
For example, FT% is pretty stable: r = 0.8 among players who shoot a lot of free throws. If a player shoots 90% in season 1, my best guess for season 2 is
90% - [(1 - r) * (90% - 75%)] = 87%. _________________ ed |
|
Back to top |
|
|
Ed Küpfer
Joined: 30 Dec 2004 Posts: 522 Location: Toronto
|
Posted: Tue Apr 19, 2005 8:12 pm Post subject: |
|
|
Team1 Vs Team2 predictors
Code: | EFG% = 1 / (1 + EXP (-(-3.1 + 0.05 * HOME + 3.2 * 1offEFG% + 3.0 * 2defEFG%)))
Predictor Coef SE Coef Z P Ratio Lower Upper
Constant -3.09752 0.0114094 -271.49 0.000
HOME 0.0460184 0.0007412 62.09 0.000 1.05 1.05 1.05
1offEFG% 3.22721 0.0187841 171.81 0.000 25.21 24.30 26.15
2defEFG% 3.00915 0.0205559 146.39 0.000 20.27 19.47 21.10 |
Code: | TO% = 1 / (1 + EXP (-(-3.65 + 0.03 * HOME + 6.1 * 1offTO% + 6.3 * 2defTO%)))
Predictor Coef SE Coef Z P Odds Ratio Lower Upper
Constant -3.65004 0.0084608 -431.40 0.000
HOME -0.0284064 0.0010110 -28.10 0.000 0.97 0.97 0.97
1offTO% 6.05658 0.0407233 148.73 0.000 426.91 394.16 462.39
2defTO% 6.29438 0.0384019 163.91 0.000 541.52 502.26 583.85
|
Code: | OR% = 1 / (1 + EXP (-(-3.0 + 0.08 * HOME + 3.8 * 1offOR% + 3.2 * 2defDR%)))
Odds 95% CI
Predictor Coef SE Coef Z P Ratio Lower Upper
Constant -2.99216 0.0057565 -519.79 0.000
HOME 0.0769602 0.0008089 95.14 0.000 1.08 1.08 1.08
1offOR% 3.78669 0.0151349 250.20 0.000 44.11 42.82 45.44
2defOR% 3.15742 0.0176015 179.38 0.000 23.51 22.71 24.34
2defOR% = 1 - 2defDR%
|
Code: |
FTA/Poss = -0.22 + 0.01 * HOME + 0.9 * 1FTA/poss + 0.9 * 2FTA/poss
Predictor Coef SE Coef T P
Constant -0.216195 0.006288 -34.38 0.000
Home 0.0108320 0.0008882 12.20 0.000
1FTA 0.88636 0.01859 47.69 0.000
2FTA 0.91714 0.01631 56.25 0.000
S = 0.0759282 R-Sq = 17.9% R-Sq(adj) = 17.9% |
_________________ ed |
|
Back to top |
|
|
Ed Küpfer
Joined: 30 Dec 2004 Posts: 522 Location: Toronto
|
Posted: Wed Apr 20, 2005 4:12 pm Post subject: |
|
|
Linear Weights-style individual possession estimator:
POSS = 0.74 * FGA + 0.44 * FTA + 0.25 * OR + 0.25 * AST + TO
Points Produced estimator:
PtsProd = 1.45 * 2Made + 2.2 * 3Made + FTMade + 0.6 * OR + 0.6 * AST _________________ ed |
|
Back to top |
|
|
mtamada
Joined: 28 Jan 2005 Posts: 127
|
Posted: Mon Apr 25, 2005 7:00 pm Post subject: |
|
|
Ed Küpfer wrote: | For estimating high possession players
Do not regress
|
Why would one not want to regress these statistics? Given that even FT% should be regressed .2 to the mean according to your figures, I would think that these should be also, e.g. an Elmore Smith might have a titanic shot-blocking year which he is unlikely to repeat. |
|
Back to top |
|
|
Ed Küpfer
Joined: 30 Dec 2004 Posts: 522 Location: Toronto
|
Posted: Mon May 09, 2005 4:18 pm Post subject: |
|
|
{Edited by ed}
Here's a cool formula for estimating a team's final winning percentage from the within-season win%, from DennisBoz.
Code: |
Final Win % = ((1-F)^2)/2 + (2*F – F*F) * Win% To Date
where F = %age of the season completed (GP/82). |
I have no idea why this works -- you can see some discussion at the link above. It does work though, giving an overall RMSE of 6.1 games. The formula, of course, gets more accurate as the season progresses -- here's a comparison between Boz's estimate of final season Win% and the final Win% if you simply extrapolated from the present win% (that is, if you assumed your .600 record at game# 40 would produce a season ending record of .600):
Code: |
RMSE
Game# BOZ extrapolated
1-10 11.0 21.1
11-20 8.7 9.7
21-30 6.9 6.8
31-40 5.3 5.1
41-50 4.1 4.0
51-60 3.1 3.1
61-70 2.2 2.2
71-80 1.3 1.3
|
Pretty neat. _________________ ed |
|
Back to top |
|
|
Ed Küpfer
Joined: 30 Dec 2004 Posts: 522 Location: Toronto
|
Posted: Mon May 09, 2005 4:30 pm Post subject: |
|
|
Ed Küpfer wrote: | Code: |
RMSE
Game# BOZ extrapolated
1-10 11.0 21.1
11-20 8.7 9.7
21-30 6.9 6.8
31-40 5.3 5.1
41-50 4.1 4.0
51-60 3.1 3.1
61-70 2.2 2.2
71-80 1.3 1.3
|
Pretty neat. |
Hey, wait a minute -- that's no good! It only does better at games 1-20. Hmm. Maybe I can screw around with the exponent. _________________ ed |
|
Back to top |
|
|
Dan Rosenbaum
Joined: 03 Jan 2005 Posts: 413 Location: Greensboro, North Carolina
|
Posted: Mon May 09, 2005 4:59 pm Post subject: |
|
|
Hey Ed, I know things are slow around here once in awhile, but it kind of makes us all look bad if you have to argue with yourself. |
|
Back to top |
|
|
Ed Küpfer
Joined: 30 Dec 2004 Posts: 522 Location: Toronto
|
Posted: Sun May 15, 2005 2:56 pm Post subject: |
|
|
A linear weights-style estimator for RTG, based on the four factors. I have no idea what it's good for, but anyway:
Code: | RTG = (1 + 5 * EFG% + FTA% - 4 * TO% + OR%) * 31
where
EFG% = (FGM + .5 * 3M) / FGA
FTA% = FTA / POSS
TO% = TO / POSS
OR% = OR / (OR + OppDR)
and POSS = FGA + FTA * 0.44 - OR + TO |
Even these simplified weights -- 1, 5, 1, -4, 1 -- are very accurate: RMSE is 0.7 points per 100 possessions. Adding three decimal places only decreases the RMSE to 0.5.
One further note: I have found that standardising the four factors isn't helpful. I have tried standardising by the season (ie using the season by season mean and SD) and also by the entire sample, and neither has improved upon the accuracy of the raw stats. I don't know why this is.
And while I'm at it, a four factors-based WIN% estimator:
Code: |
WIN% = 1 / ( 1 + EXP (bX) )
where bX = - ( 20 * oEFG
+ 5 * oFTA%
- 16 * oTO%
+ 6 * oOR%
- 20 * dEFG%
- 5 * dFTA%
+ 16 * dTO%
- 6 * dOR% ) |
This too is very accurate, with a RMSE of 3.5 games in an 82-game season, compared to an RMSE of 3.0 games for a pythagorean estimate based on exponents customised by season. _________________ ed |
|
Back to top |
|
|
Ed Küpfer
Joined: 30 Dec 2004 Posts: 522 Location: Toronto
|
Posted: Thu Feb 09, 2006 8:17 pm Post subject: |
|
|
After fuch mucking around, I finally came up with a logistic within-game win estimator.
Code: | p(Home Team Win) = 1/(1+exp(-(0.13 + 0.12 * HomeTmLead + 0.0044 * HomeTmLead2 + 0.0068 * MinutesRemaining)))
HomeTmLead = Home Team lead
HomeTmLead2 = HomeTmLead * ABS(HomeTmLead)
MinutesRem = Minutes in game remaining. |
Overtimes treated like the final five minutes of the 4th quarter, and the fourth quarter of overtime games treated as if they ended in regulation. What? What I mean is this: if a game goes to overtime, and the home team leads by five with two minutes left in the overtime, that situation is dealt with exactly the same as if it were a five point lead with two minutes remaining in the 4th quarter.
WARNING: I have not tested this at the <1 minute remaining level. In fact, I used times rounded off to the nearest minute, so I'm not exactly sure how well the equation performs during the final minutes of games, when time slows down.
This equation above gives a very nice fit to my data (r2 = .96), which includes about 110,000 observations from almost every game in 04-05. I'm adding more observations soon, so the coefficents will be updated a little, and I'll post my methodology at that time. I'm not quite sure if logit is the way to go at the <1 minute remaining spots, but I'll see about that later. _________________ ed |
|
Back to top |
|
|
mtamada
Joined: 28 Jan 2005 Posts: 127
|
Posted: Thu Feb 09, 2006 9:10 pm Post subject: |
|
|
Fascinating stuff, but is there a typo in the formula? The coefficient on MinutesRemaining is positive, so the larger the number of minutes remaining, the more negative the argument to the exponentiation function, so the smaller the denominator, and the higher the probability of a Home Team Win.
That surely can't be right; a home team with a 10 point lead with 1 minute left should have a near guarantee of victory (I get 85.4% from your formula, assuming that I typed it in correctly). As the MinutesRemaining gets larger, shouldn't we see a decrease, not an increase, in the probability of this home team winning (while still remaining substantially above 50%)? Also 85.4% seems way too low, unless the opponent has Reggie Miller or Isiah Thomas on one of their heroic playoff rampages.
My in-my-head calculations in the first paragraph, as well as the numbers in my spreadsheet (again, assuming that I haven't made any typos) both show an ever-growing probability of victory as MinutesRemaining INCREASES. That makes perfect sense for a team that is behind, but not for a team that is ahead. |
|
Back to top |
|
|
Ed Küpfer
Joined: 30 Dec 2004 Posts: 522 Location: Toronto
|
Posted: Thu Feb 09, 2006 9:33 pm Post subject: |
|
|
I'm pretty sure your calculations are correct. The problem is, as you can see, that the logit fit breaks down at the extremes. It's hard for me to tell exactly what's going on here as these areas are respresent by only a few observations. In a previous effort, I also had real problems fitting a logistic curve to the extreme areas in both Points Difference and Time Remaining. All I can say for now is, hang on. I'm still accumulating data. If the fit is still poor, I may have to mix curves somehow. I'll keep everyone posted. _________________ ed |
|
Back to top |
|
|
mtamada
Joined: 28 Jan 2005 Posts: 127
|
Posted: Thu Feb 09, 2006 10:33 pm Post subject: |
|
|
Maybe some pure curve-fitting technique, such as cubic splines is the way to go.
http://mathews.ecs.fullerton.edu/n2003/CubicSplinesMod.html
http://www.zoology.ubc.ca/~schluter/splines.html
Using your formula, I was looking for the combinations at which a team would have a 90% probability of winning. If I typed in the formula correctly, a home team with a 12 point lead always has a better than 90% chance of winning, while a team with a lead of 11 or less never has a 90% probability of winning. The first part might be plausible (but not the way the probability rises with increased MinutesRemaining), but not the second.
A friend of mine once said that he used he following as a rule of thumb: if "m" is the number of minutes remaining, then a team with a lead of 2m+7 points is practically guaranteed to win the game ("game in the refrigerator" in Chick Hearn-speak). I don't know if his rule of thumb is a good one or not, but in trying to calibrate it against your formula, I found the patterns that I've pointed out, the ones which don't seem realistic. But if you've got the data and can present, say, a table of "MinutesRemaining" and "Points Ahead" stats and the corresponding observed probabilities of winning, maybe I'll discover that the true probabilities are not what I think they should be. |
|
Back to top |
|
|
|