View previous topic :: View next topic |
Author |
Message |
jkubatko
Joined: 05 Jan 2005 Posts: 702 Location: Columbus, OH
|
Posted: Wed Jan 05, 2005 11:17 am Post subject: B-R updates (plus a question) |
|
|
I posted this to APBR_analysis, so my apologies to those who see this twice.
I wanted to let everyone know that I recently made some updates to Basketball-Reference.com. One major change includes adding Dean Oliver's Offensive Rating, Defensive Rating, Player Wins, and Player Losses to the player pages. Please see John Stockton's page for an example:
http://www.basketball-reference.com/players/s/stockjo01.html
The statistics mentioned are in the "Other" section of the player pages.
I also have a strange question: How is APBRmetrics pronounced? I've been reading it as "app-burr-metrics" in my head; is that correct? _________________ Regards,
Justin Kubatko
Basketball-Reference.com |
|
Back to top |
|
|
kjb
Joined: 03 Jan 2005 Posts: 865 Location: Washington, DC
|
Posted: Wed Jan 05, 2005 11:47 am Post subject: |
|
|
I have a question about the Individual wins and losses section -- which I think is a great idea to include. But, it looks like you're saying that Stockton's career record is 184 wins and 4 losses, which doesn't seem compatible with the year-by-year records. |
|
Back to top |
|
|
jkubatko
Joined: 05 Jan 2005 Posts: 702 Location: Columbus, OH
|
Posted: Wed Jan 05, 2005 11:54 am Post subject: |
|
|
WizardsKev wrote: | I have a question about the Individual wins and losses section -- which I think is a great idea to include. But, it looks like you're saying that Stockton's career record is 184 wins and 4 losses, which doesn't seem compatible with the year-by-year records. |
It's 184-24. Please check out the glossary:
http://www.basketball-reference.com/about/glossary.html
If that doesn't clear things up then please let me know. _________________ Regards,
Justin Kubatko
Basketball-Reference.com |
|
Back to top |
|
|
kjb
Joined: 03 Jan 2005 Posts: 865 Location: Washington, DC
|
Posted: Wed Jan 05, 2005 12:50 pm Post subject: |
|
|
jkubatko wrote: | WizardsKev wrote: | I have a question about the Individual wins and losses section -- which I think is a great idea to include. But, it looks like you're saying that Stockton's career record is 184 wins and 4 losses, which doesn't seem compatible with the year-by-year records. |
It's 184-24. Please check out the glossary:
http://www.basketball-reference.com/about/glossary.html
If that doesn't clear things up then please let me know. |
184-24 makes more sense, but when I load the page, the 2 in "24" is missing.
Like I said, I think it's a great idea to include these. I've begun computing them for this season -- it's great to have them available for other seasons.
Sorta related query -- how is the exponent in the Pythagorean formula arrived at? Dean once used 16.5. It's now been adjusted down to 14, and I've seen some suggestions that the "best" number may actually be lower. Is there a formula that can spit out a value based on pace, or is it just trying numbers until you get one that "works"? |
|
Back to top |
|
|
kjb
Joined: 03 Jan 2005 Posts: 865 Location: Washington, DC
|
Posted: Wed Jan 05, 2005 1:06 pm Post subject: |
|
|
Ignore what I just posted. I was looking ALL THE WAY at the bottom under the "Player Wins" section. I now realize that it's merely a listing of wins with what I think is his rank in the league for that season in personal wins. I'm guessing that the 184.0-4 at the end is saying that Stockton had 184 personal wins, which was 4th most in the league during his career. Correct? |
|
Back to top |
|
|
jkubatko
Joined: 05 Jan 2005 Posts: 702 Location: Columbus, OH
|
Posted: Wed Jan 05, 2005 1:17 pm Post subject: |
|
|
Quote: | 184-24 makes more sense, but when I load the page, the 2 in "24" is missing. |
That's really weird. I checked it on Firefox and IE and they both displayed 23.9 for career losses. What browser are you using?
Quote: | Sorta related query -- how is the exponent in the Pythagorean formula arrived at? Dean once used 16.5. It's now been adjusted down to 14, and I've seen some suggestions that the "best" number may actually be lower. Is there a formula that can spit out a value based on pace, or is it just trying numbers until you get one that "works"? |
The functional form of my model is:
Code: | log(WPct / (1 - WPct)) = B1*log(tmPTS / oppPTS) |
where WPct = Team Winning Percentage, tmPTS = Team Points Scored, and oppPTS = Opponent Points Scored. Fitting this model to numerous random samples of team-seasons, the estimated value of the parameter B1 is always around 14. Substituting into the formula above and solving for WPct yields:
Code: | WPct = exp(14*log(tmPTS / oppPTS)) / (1 + exp(14*log(tmPTS / oppPTS))) |
which simplifies to:
Code: | Wpct = tmPTS^14 / (tmPTS^14 + oppPTS^14) |
I think the exponent that works best at a particualr time depends on the scoring environment. I have found 14 to work well for almost all environments. _________________ Regards,
Justin Kubatko
Basketball-Reference.com |
|
Back to top |
|
|
jkubatko
Joined: 05 Jan 2005 Posts: 702 Location: Columbus, OH
|
Posted: Wed Jan 05, 2005 1:18 pm Post subject: |
|
|
WizardsKev wrote: | Ignore what I just posted. I was looking ALL THE WAY at the bottom under the "Player Wins" section. I now realize that it's merely a listing of wins with what I think is his rank in the league for that season in personal wins. I'm guessing that the 184.0-4 at the end is saying that Stockton had 184 personal wins, which was 4th most in the league during his career. Correct? |
Yes, you got it. The Leaderboards section shows Year-Lg-Value-Rank. _________________ Regards,
Justin Kubatko
Basketball-Reference.com |
|
Back to top |
|
|
Ed Küpfer
Joined: 30 Dec 2004 Posts: 786 Location: Toronto
|
Posted: Wed Jan 05, 2005 3:30 pm Post subject: |
|
|
jkubatko wrote: | Code: | Wpct = tmPTS^14 / (tmPTS^14 + oppPTS^14) |
I think the exponent that works best at a particualr time depends on the scoring environment. I have found 14 to work well for almost all environments. |
I've done a lot of work on win estimators (I think I'm the one who originally suggested the 14 exponent). The best one so far is known as PythagoPat or something like that. The equation stays the same, but the exponent is
Code: | ((OffPts + DefPts)^0.27) |
Which adjusts for points scoring enviroments. (Read more than you'd ever want to know about win estimators at Patriot's website.
I may as well post these here. Here are the battered remains of an unfinished study I did once.
Code: |
Pyth14 - A^14 / (A^14 + B^14)
Pyth16.5 - A^16.5 / (A^16.5 + B^16.5)
PythagoPat - A^[(A + B)^0.27] / {A^[(A + B)^0.27] + B^[(A + B)^0.27]}
BBPro - [(A - B)/Games * 2.7 + 41]/82
CorrGauss - NORMSDIST{(A - B) / SQRT [VAR(A) + VAR(B) - 2*COVAR(A,B)]}
where A = Offensive points per game and B = Defensive points per game |
The RMSE of each win estimator over time:
The Mean Absolute Deviation, by actual win percentage of each team:
You can see that DeanO's Correlated Gaussian estimator outperforms the rest, with PythagoPat coming second place. Hollinger's estimators does suprisingly (to me) well. The differences are hardly worth fretting over, I think. _________________ ed |
|
Back to top |
|
|
Ed Küpfer
Joined: 30 Dec 2004 Posts: 786 Location: Toronto
|
Posted: Wed Jan 05, 2005 3:39 pm Post subject: Re: B-R updates (plus a question) |
|
|
jkubatko wrote: | I also have a strange question: How is APBRmetrics pronounced? I've been reading it as "app-burr-metrics" in my head; is that correct? |
I don't know about "correct," but that's how I've been saying it in my head. There's got to be a better term for what we're doing. _________________ ed |
|
Back to top |
|
|
jkubatko
Joined: 05 Jan 2005 Posts: 702 Location: Columbus, OH
|
Posted: Wed Jan 05, 2005 4:37 pm Post subject: |
|
|
Quote: | I've done a lot of work on win estimators (I think I'm the one who originally suggested the 14 exponent). |
I didn't know about your work. Did you come about it the same way I did?
Quote: | You can see that DeanO's Correlated Gaussian estimator outperforms the rest, with PythagoPat coming second place. Hollinger's estimators does suprisingly (to me) well. The differences are hardly worth fretting over, I think. |
I agree. Dean Oliver's method is nice, but you need game-by-game scores in order to calculate it. (I have them, but most people don't.)
Looking at the RMSEs you presented in the graph above, I'm wondering if they're a little too high. For example, using an exponent of 14 for all team-seasons in the 1950s, I get an RMSE of 3.18 wins. Your graph shows an RMSE of roughly 4.5 for this time period. _________________ Regards,
Justin Kubatko
Basketball-Reference.com |
|
Back to top |
|
|
jkubatko
Joined: 05 Jan 2005 Posts: 702 Location: Columbus, OH
|
Posted: Wed Jan 05, 2005 4:57 pm Post subject: |
|
|
FYI, here are the RMSEs I get for each decade:
Code: |
Decade Pyth 14 Pyth 16.5 Pyth Pat
1940s 3.424 4.558 2.537
1950s 3.184 4.033 2.871
1960s 3.473 3.276 3.430
1970s 3.168 3.452 3.159
1980s 3.088 3.075 3.067
1990s 3.126 3.561 3.149
2000s 2.579 3.324 2.537
|
_________________ Regards,
Justin Kubatko
Basketball-Reference.com |
|
Back to top |
|
|
Ed Küpfer
Joined: 30 Dec 2004 Posts: 786 Location: Toronto
|
Posted: Wed Jan 05, 2005 5:02 pm Post subject: |
|
|
jkubatko wrote: | Looking at the RMSEs you presented in the graph above, I'm wondering if they're a little too high. For example, using an exponent of 14 for all team-seasons in the 1950s, I get an RMSE of 3.18 wins. Your graph shows an RMSE of roughly 4.5 for this time period. |
To tell the truth, I don't know why they're so high. It could be a data problem, but I deleted all that and am left only with the graphs. Another difference between your RMSE and mine is that I divided the squared errors by (n - 1), not (n), which would raise the number a little. Another difference is that I multiplied every team-season RMSE by 82, which would raise some of those 60 games seasons in the 1950s.
I'll repeat the process quickly on the 1950 season:
Code: | TEAM oPTS dPTS Win% Pyth14 err err^2
1950AND 5589 5346 0.578 0.651 0.073 0.0053
1950BAB 4973 5353 0.368 0.263 -0.105 0.0110
1950BOS 5420 5590 0.324 0.394 0.070 0.0049
1950CHS 5352 5243 0.588 0.572 -0.017 0.0003
1950DNN 4817 5530 0.177 0.126 -0.051 0.0026
1950FTW 5390 5297 0.588 0.561 -0.028 0.0008
1950INO 5493 5256 0.609 0.650 0.040 0.0016
1950MNL 5717 5150 0.750 0.812 0.062 0.0038
1950NY 5488 5344 0.588 0.592 0.004 0.0000
1950PHW 4983 5194 0.382 0.359 -0.024 0.0006
1950ROC 5602 5074 0.750 0.800 0.050 0.0025
1950SHE 5108 5443 0.355 0.291 -0.064 0.0040
1950SLB 5010 5202 0.382 0.371 -0.011 0.0001
1950SYR 5429 4908 0.797 0.804 0.007 0.0001
1950TRI 5313 5351 0.453 0.475 0.022 0.0005
1950WAT 4921 5264 0.306 0.280 -0.026 0.0007
1950WCP 5201 5265 0.471 0.457 -0.013 0.0002
SUM/(N-1)= 0.0024
SQRT = 0.0493
* 82 games= 4.04 |
Still a little low, but at least back in the ballpark. What can I say? _________________ ed |
|
Back to top |
|
|
jkubatko
Joined: 05 Jan 2005 Posts: 702 Location: Columbus, OH
|
Posted: Wed Jan 05, 2005 5:19 pm Post subject: |
|
|
Quote: | Another difference is that I multiplied every team-season RMSE by 82, which would raise some of those 60 games seasons in the 1950s. |
Okay, that has to be it. I find the squared differences between actual wins and expected wins, then calculate the RMSE from those figures. We're doing it two different ways. (Your way is probably better.) _________________ Regards,
Justin Kubatko
Basketball-Reference.com |
|
Back to top |
|
|
Ed Küpfer
Joined: 30 Dec 2004 Posts: 786 Location: Toronto
|
Posted: Wed Jan 05, 2005 5:37 pm Post subject: |
|
|
jkubatko wrote: | Quote: | I've done a lot of work on win estimators (I think I'm the one who originally suggested the 14 exponent). |
I didn't know about your work. Did you come about it the same way I did? |
Hah! No, I worked it out the old fashioned, non-technical way: by manually plugging in numbers until the errors were minimized. Logarithms scared me to death (still do). I am only now coming to terms with logistic regression, which I've been using to find matchup probabilities (ie Pr(team A beating team B)), which gives slightly better results than the old log5 method. The subject of a post in the near future I think. _________________ ed |
|
Back to top |
|
|
Kevin Pelton Site Admin
Joined: 30 Dec 2004 Posts: 979 Location: Seattle
|
Posted: Thu Jan 06, 2005 12:00 am Post subject: |
|
|
1. The difference is small enough that I favor John's point-differential method. Though I do feel bad for Pythagoras.
2. APE-burr-metrics. If you guys think I'm changing the name of this place (well, at least the URL), you're way off.
3. I just want to publicly thank Justin for doing an outstanding job with B-R.com. The things you've added are really producing a great deal of added value for the site and making a great resource for us apbrmetricians. |
|
Back to top |
|
|
|