This is Google's cache of viewtopic.php?t=12&sid=1a48f98a4d90d89b0c39466c1383d93c. It is a snapshot of the page as it appeared on Apr 2, 2011 04:32:00 GMT. The current page could have changed in the meantime. Learn more

Text-only version
These search terms are highlighted: ed küpfer  
APBRmetrics :: View topic - B-R updates (plus a question)
APBRmetrics Forum Index APBRmetrics
The statistical revolution will not be televised.
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

B-R updates (plus a question)
Goto page 1, 2, 3  Next
 
Post new topic   Reply to topic    APBRmetrics Forum Index -> General discussion
View previous topic :: View next topic  
Author Message
jkubatko



Joined: 05 Jan 2005
Posts: 702
Location: Columbus, OH

PostPosted: Wed Jan 05, 2005 11:17 am    Post subject: B-R updates (plus a question) Reply with quote

I posted this to APBR_analysis, so my apologies to those who see this twice.

I wanted to let everyone know that I recently made some updates to Basketball-Reference.com. One major change includes adding Dean Oliver's Offensive Rating, Defensive Rating, Player Wins, and Player Losses to the player pages. Please see John Stockton's page for an example:

http://www.basketball-reference.com/players/s/stockjo01.html

The statistics mentioned are in the "Other" section of the player pages.

I also have a strange question: How is APBRmetrics pronounced? I've been reading it as "app-burr-metrics" in my head; is that correct?
_________________
Regards,
Justin Kubatko
Basketball-Reference.com
Back to top
View user's profile Send private message Send e-mail Visit poster's website
kjb



Joined: 03 Jan 2005
Posts: 865
Location: Washington, DC

PostPosted: Wed Jan 05, 2005 11:47 am    Post subject: Reply with quote

I have a question about the Individual wins and losses section -- which I think is a great idea to include. But, it looks like you're saying that Stockton's career record is 184 wins and 4 losses, which doesn't seem compatible with the year-by-year records.
Back to top
View user's profile Send private message AIM Address Yahoo Messenger
jkubatko



Joined: 05 Jan 2005
Posts: 702
Location: Columbus, OH

PostPosted: Wed Jan 05, 2005 11:54 am    Post subject: Reply with quote

WizardsKev wrote:
I have a question about the Individual wins and losses section -- which I think is a great idea to include. But, it looks like you're saying that Stockton's career record is 184 wins and 4 losses, which doesn't seem compatible with the year-by-year records.


It's 184-24. Please check out the glossary:

http://www.basketball-reference.com/about/glossary.html

If that doesn't clear things up then please let me know.
_________________
Regards,
Justin Kubatko
Basketball-Reference.com
Back to top
View user's profile Send private message Send e-mail Visit poster's website
kjb



Joined: 03 Jan 2005
Posts: 865
Location: Washington, DC

PostPosted: Wed Jan 05, 2005 12:50 pm    Post subject: Reply with quote

jkubatko wrote:
WizardsKev wrote:
I have a question about the Individual wins and losses section -- which I think is a great idea to include. But, it looks like you're saying that Stockton's career record is 184 wins and 4 losses, which doesn't seem compatible with the year-by-year records.


It's 184-24. Please check out the glossary:

http://www.basketball-reference.com/about/glossary.html

If that doesn't clear things up then please let me know.


184-24 makes more sense, but when I load the page, the 2 in "24" is missing.

Like I said, I think it's a great idea to include these. I've begun computing them for this season -- it's great to have them available for other seasons.

Sorta related query -- how is the exponent in the Pythagorean formula arrived at? Dean once used 16.5. It's now been adjusted down to 14, and I've seen some suggestions that the "best" number may actually be lower. Is there a formula that can spit out a value based on pace, or is it just trying numbers until you get one that "works"?
Back to top
View user's profile Send private message AIM Address Yahoo Messenger
kjb



Joined: 03 Jan 2005
Posts: 865
Location: Washington, DC

PostPosted: Wed Jan 05, 2005 1:06 pm    Post subject: Reply with quote

Ignore what I just posted. Smile I was looking ALL THE WAY at the bottom under the "Player Wins" section. I now realize that it's merely a listing of wins with what I think is his rank in the league for that season in personal wins. I'm guessing that the 184.0-4 at the end is saying that Stockton had 184 personal wins, which was 4th most in the league during his career. Correct?
Back to top
View user's profile Send private message AIM Address Yahoo Messenger
jkubatko



Joined: 05 Jan 2005
Posts: 702
Location: Columbus, OH

PostPosted: Wed Jan 05, 2005 1:17 pm    Post subject: Reply with quote

Quote:
184-24 makes more sense, but when I load the page, the 2 in "24" is missing.


That's really weird. I checked it on Firefox and IE and they both displayed 23.9 for career losses. What browser are you using?

Quote:
Sorta related query -- how is the exponent in the Pythagorean formula arrived at? Dean once used 16.5. It's now been adjusted down to 14, and I've seen some suggestions that the "best" number may actually be lower. Is there a formula that can spit out a value based on pace, or is it just trying numbers until you get one that "works"?


The functional form of my model is:

Code:
log(WPct / (1 - WPct)) = B1*log(tmPTS / oppPTS)


where WPct = Team Winning Percentage, tmPTS = Team Points Scored, and oppPTS = Opponent Points Scored. Fitting this model to numerous random samples of team-seasons, the estimated value of the parameter B1 is always around 14. Substituting into the formula above and solving for WPct yields:

Code:
WPct = exp(14*log(tmPTS / oppPTS)) / (1 + exp(14*log(tmPTS / oppPTS)))


which simplifies to:

Code:
Wpct =  tmPTS^14 / (tmPTS^14 + oppPTS^14)


I think the exponent that works best at a particualr time depends on the scoring environment. I have found 14 to work well for almost all environments.
_________________
Regards,
Justin Kubatko
Basketball-Reference.com
Back to top
View user's profile Send private message Send e-mail Visit poster's website
jkubatko



Joined: 05 Jan 2005
Posts: 702
Location: Columbus, OH

PostPosted: Wed Jan 05, 2005 1:18 pm    Post subject: Reply with quote

WizardsKev wrote:
Ignore what I just posted. Smile I was looking ALL THE WAY at the bottom under the "Player Wins" section. I now realize that it's merely a listing of wins with what I think is his rank in the league for that season in personal wins. I'm guessing that the 184.0-4 at the end is saying that Stockton had 184 personal wins, which was 4th most in the league during his career. Correct?


Yes, you got it. The Leaderboards section shows Year-Lg-Value-Rank.
_________________
Regards,
Justin Kubatko
Basketball-Reference.com
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Ed Küpfer



Joined: 30 Dec 2004
Posts: 786
Location: Toronto

PostPosted: Wed Jan 05, 2005 3:30 pm    Post subject: Reply with quote

jkubatko wrote:
Code:
Wpct =  tmPTS^14 / (tmPTS^14 + oppPTS^14)


I think the exponent that works best at a particualr time depends on the scoring environment. I have found 14 to work well for almost all environments.


I've done a lot of work on win estimators (I think I'm the one who originally suggested the 14 exponent). The best one so far is known as PythagoPat or something like that. The equation stays the same, but the exponent is

Code:
((OffPts + DefPts)^0.27)

Which adjusts for points scoring enviroments. (Read more than you'd ever want to know about win estimators at Patriot's website.

I may as well post these here. Here are the battered remains of an unfinished study I did once.

Code:

Pyth14     - A^14 / (A^14 + B^14)
Pyth16.5   - A^16.5 / (A^16.5 + B^16.5)
PythagoPat - A^[(A + B)^0.27] / {A^[(A + B)^0.27] + B^[(A + B)^0.27]}
BBPro      - [(A - B)/Games * 2.7 + 41]/82
CorrGauss  - NORMSDIST{(A - B) / SQRT [VAR(A) + VAR(B) - 2*COVAR(A,B)]}

where A = Offensive points per game and B = Defensive points per game


The RMSE of each win estimator over time:


The Mean Absolute Deviation, by actual win percentage of each team:


You can see that DeanO's Correlated Gaussian estimator outperforms the rest, with PythagoPat coming second place. Hollinger's estimators does suprisingly (to me) well. The differences are hardly worth fretting over, I think.
_________________
ed
Back to top
View user's profile Send private message Send e-mail
Ed Küpfer



Joined: 30 Dec 2004
Posts: 786
Location: Toronto

PostPosted: Wed Jan 05, 2005 3:39 pm    Post subject: Re: B-R updates (plus a question) Reply with quote

jkubatko wrote:
I also have a strange question: How is APBRmetrics pronounced? I've been reading it as "app-burr-metrics" in my head; is that correct?


I don't know about "correct," but that's how I've been saying it in my head. There's got to be a better term for what we're doing.
_________________
ed
Back to top
View user's profile Send private message Send e-mail
jkubatko



Joined: 05 Jan 2005
Posts: 702
Location: Columbus, OH

PostPosted: Wed Jan 05, 2005 4:37 pm    Post subject: Reply with quote

Quote:
I've done a lot of work on win estimators (I think I'm the one who originally suggested the 14 exponent).


I didn't know about your work. Did you come about it the same way I did?

Quote:
You can see that DeanO's Correlated Gaussian estimator outperforms the rest, with PythagoPat coming second place. Hollinger's estimators does suprisingly (to me) well. The differences are hardly worth fretting over, I think.


I agree. Dean Oliver's method is nice, but you need game-by-game scores in order to calculate it. (I have them, but most people don't.)

Looking at the RMSEs you presented in the graph above, I'm wondering if they're a little too high. For example, using an exponent of 14 for all team-seasons in the 1950s, I get an RMSE of 3.18 wins. Your graph shows an RMSE of roughly 4.5 for this time period.
_________________
Regards,
Justin Kubatko
Basketball-Reference.com
Back to top
View user's profile Send private message Send e-mail Visit poster's website
jkubatko



Joined: 05 Jan 2005
Posts: 702
Location: Columbus, OH

PostPosted: Wed Jan 05, 2005 4:57 pm    Post subject: Reply with quote

FYI, here are the RMSEs I get for each decade:

Code:

Decade   Pyth 14  Pyth 16.5  Pyth Pat

1940s     3.424     4.558     2.537
1950s     3.184     4.033     2.871
1960s     3.473     3.276     3.430
1970s     3.168     3.452     3.159
1980s     3.088     3.075     3.067
1990s     3.126     3.561     3.149
2000s     2.579     3.324     2.537

_________________
Regards,
Justin Kubatko
Basketball-Reference.com
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Ed Küpfer



Joined: 30 Dec 2004
Posts: 786
Location: Toronto

PostPosted: Wed Jan 05, 2005 5:02 pm    Post subject: Reply with quote

jkubatko wrote:
Looking at the RMSEs you presented in the graph above, I'm wondering if they're a little too high. For example, using an exponent of 14 for all team-seasons in the 1950s, I get an RMSE of 3.18 wins. Your graph shows an RMSE of roughly 4.5 for this time period.


To tell the truth, I don't know why they're so high. It could be a data problem, but I deleted all that and am left only with the graphs. Another difference between your RMSE and mine is that I divided the squared errors by (n - 1), not (n), which would raise the number a little. Another difference is that I multiplied every team-season RMSE by 82, which would raise some of those 60 games seasons in the 1950s.

I'll repeat the process quickly on the 1950 season:

Code:
TEAM    oPTS    dPTS    Win%    Pyth14   err    err^2
1950AND 5589    5346    0.578   0.651   0.073  0.0053
1950BAB 4973    5353    0.368   0.263  -0.105  0.0110
1950BOS 5420    5590    0.324   0.394   0.070  0.0049
1950CHS 5352    5243    0.588   0.572  -0.017  0.0003
1950DNN 4817    5530    0.177   0.126  -0.051  0.0026
1950FTW 5390    5297    0.588   0.561  -0.028  0.0008
1950INO 5493    5256    0.609   0.650   0.040  0.0016
1950MNL 5717    5150    0.750   0.812   0.062  0.0038
1950NY  5488    5344    0.588   0.592   0.004  0.0000
1950PHW 4983    5194    0.382   0.359  -0.024  0.0006
1950ROC 5602    5074    0.750   0.800   0.050  0.0025
1950SHE 5108    5443    0.355   0.291  -0.064  0.0040
1950SLB 5010    5202    0.382   0.371  -0.011  0.0001
1950SYR 5429    4908    0.797   0.804   0.007  0.0001
1950TRI 5313    5351    0.453   0.475   0.022  0.0005
1950WAT 4921    5264    0.306   0.280  -0.026  0.0007
1950WCP 5201    5265    0.471   0.457  -0.013  0.0002
                                               
                                    SUM/(N-1)= 0.0024
                                    SQRT     = 0.0493

                                   * 82 games=   4.04

Still a little low, but at least back in the ballpark. What can I say?
_________________
ed
Back to top
View user's profile Send private message Send e-mail
jkubatko



Joined: 05 Jan 2005
Posts: 702
Location: Columbus, OH

PostPosted: Wed Jan 05, 2005 5:19 pm    Post subject: Reply with quote

Quote:
Another difference is that I multiplied every team-season RMSE by 82, which would raise some of those 60 games seasons in the 1950s.


Okay, that has to be it. I find the squared differences between actual wins and expected wins, then calculate the RMSE from those figures. We're doing it two different ways. (Your way is probably better.)
_________________
Regards,
Justin Kubatko
Basketball-Reference.com
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Ed Küpfer



Joined: 30 Dec 2004
Posts: 786
Location: Toronto

PostPosted: Wed Jan 05, 2005 5:37 pm    Post subject: Reply with quote

jkubatko wrote:
Quote:
I've done a lot of work on win estimators (I think I'm the one who originally suggested the 14 exponent).


I didn't know about your work. Did you come about it the same way I did?


Hah! No, I worked it out the old fashioned, non-technical way: by manually plugging in numbers until the errors were minimized. Logarithms scared me to death (still do). I am only now coming to terms with logistic regression, which I've been using to find matchup probabilities (ie Pr(team A beating team B)), which gives slightly better results than the old log5 method. The subject of a post in the near future I think.
_________________
ed
Back to top
View user's profile Send private message Send e-mail
Kevin Pelton
Site Admin


Joined: 30 Dec 2004
Posts: 979
Location: Seattle

PostPosted: Thu Jan 06, 2005 12:00 am    Post subject: Reply with quote

1. The difference is small enough that I favor John's point-differential method. Though I do feel bad for Pythagoras.

2. APE-burr-metrics. If you guys think I'm changing the name of this place (well, at least the URL), you're way off.

3. I just want to publicly thank Justin for doing an outstanding job with B-R.com. The things you've added are really producing a great deal of added value for the site and making a great resource for us apbrmetricians.
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Display posts from previous:   
Post new topic   Reply to topic    APBRmetrics Forum Index -> General discussion All times are GMT - 5 Hours
Goto page 1, 2, 3  Next
Page 1 of 3

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group