APBRmetrics

Ilardi · Joined: 15 May 2008 Posts: 265 Location: Lawrence, KS

I've enjoyed seeing draft projections this year from John Hollinger (and others), but have been wondering about the noise level inherent in the underlying regression models. John, when you derived your projected 3rd-year PER ratings, what were the typical standard errors of estimate? In other words, if we constructed a 95% confidence interval around your projections (e.g., Beasley at a PER of 19.19), what would those intervals look like?

Any information you - or others - can provide on this issue would be appreciated.

bballfan72031 · Joined: 13 Feb 2005 Posts: 54

I've always thought using confidence intervals would be the best way to analyze prospects statistically.

I don't want to hijack the thread, but if anyone knows how to calculate confidence intervals in excel (preferably on individual outputs in regression analysis, such as TREND), any help would be immensely appreciated.

ErichDoerr · Joined: 06 Jul 2008 Posts: 15

Ilardi,
I've done some work to try and recreate the Hollinger 3rd year PER projection. In this effort, my initial take away suggests the data set size alone should indicate a high noise level. In a previous thread here, John seems to have the same concerns, looking forward toward additional data points to improve accuracy.

From his articles, it appears John used data from the 2002 draft onward, which provides a sample size of ~230 players with NCAA backgrounds going to the NBA. From that group, only 115 had NBA playing time in the third year subsequent to their draft class, and 30 of them logged under 500 minutes. Given these requirements, I came up with a list of 86 players.

Breaking it down further, John took this group and split it among Bigs, Wings, and Points, giving me 27, 47, and 12 samples respectively. While regression analysis isn't my forte, these small sample sizes and the 16 explanatory variables may be enough in itself to give an experienced statistician a general idea of the level of noise present.

I have also done some work regressing the same 16 variables against John's listed results with some success, though my work is incomplete.

John, in case you happen to stop by, did you normalize height to position played or any other stat?

John Hollinger · Joined: 14 Feb 2005 Posts: 175

I actually included a bunch of "failures" in the sample as well -- if you only include players who played significant minutes in their third year the results will be biased as all hell. I put in anybody who got drafted plus everybody whose stats made them seem like OK prospects, so I had 100+ players at big and wing and something like 70 or 80 at the point. (all this work is sitting on my other computer so unfortunately I don't have it in front of me)

Obviously, how I handle the players who don't make it is very important to the method. For those who got a cup of coffee in the league but no more I gave them a PER of 9 or 9.5, for those who never even got a whiff I gave them a 7.5 I think . I didn't do anything special with height -- it was just one of the variables, expressed as inches above or below 6-0. .

Ilardi · Joined: 15 May 2008 Posts: 265 Location: Lawrence, KS

John Hollinger · Joined: 14 Feb 2005 Posts: 175

Sorry for the delay, finally dug out the laptop and have the standard errors:

Point guards: 2.66
Wings: 2.82
Bigs: 3.04

Ilardi · Joined: 15 May 2008 Posts: 265 Location: Lawrence, KS

Many thanks.

So, it looks like there's a fair degree of "noise" in the predictive model right now, yes? For example, the 95% confidence interval around Beasley's projected 3rd year PER would range roughly from 13 to 25.

I wonder if these standard errors might not come down some in the future with more players in the database - or perhaps additional variables. Do you think adding athleticism/anthropometric data (standing reach, vertical leap, sprint, wingspan, etc.) from the pre-draft combine would be useful at all?

John Hollinger · Joined: 14 Feb 2005 Posts: 175

I have no doubt that the method will improve with more years of data. Right now I'm only working with six drafts, plus the past four are still basically works in progress. Then when you get down to the number of big men with talent like Beasley or Oden, you're down to a very small number very quickly.

I got the anthropomorphic data from the Orlando camp (standing reach, vertical, etc). and started playing with it but it's hard to incorporate because not everybody goes to the pre-draft camp, so it's a blank for the majority of players in the database.

tawtaw · Joined: 25 Jun 2008 Posts: 28 Location: Oregon

John, are you using the draftexpress data base for that info?

I'm not sure how detailed it is as you go back six years, but it's really good for the more recent drafts.

John Hollinger · Joined: 14 Feb 2005 Posts: 175

I've got my sources, I'll leave it at that.

Perhaps a couple years down the road it will be useful because it seems like they're taking more measurements from more people. So far it's not, at least not in any way that I can divine.

tawtaw · Joined: 25 Jun 2008 Posts: 28 Location: Oregon

Yeah, I could see how it'd be problematic if you didn't have the info for the majority of guys.

I'd really like to see a study someday about how well the pre-draft physical numbers do when it comes to predicting success.

I'd be willing to bet a PER-based analysis that factored in game pace and schedule strength of college prospects would predict NBA success every bit as well as combine numbers and draft position, if not better.

ErichDoerr · Joined: 06 Jul 2008 Posts: 15

John, have you considered using Game Score per minute to possibly increase your sample basis?

John Hollinger · Joined: 14 Feb 2005 Posts: 175

Actually I create a PER for the entire NCAA.

jemagee · Joined: 05 Nov 2005 Posts: 129

I wonder from a fan point of view (of the sixers and speights), how much your thoughts on Speights this season change after the season ending injury to Jason Smith (someone of whom i'm not fond of being on the sixers roster, i wish that scout that thought he's a 'starter' had convinced his team to trade for him)

ErichDoerr · Joined: 06 Jul 2008 Posts: 15

John, perhaps I made a bad assumption or should clarify.

I thought your limitation in going back to NCAA seasons prior to 2002 was a lack of advanced NCAA data, including NCAA PER numbers. If this was the case, it may be possible to get the box score stats and assess a larger player pool based on Game Score.

If that is not your limitation in getting a bigger data pool, the suggestion has no bearing and I apologize for not being clearer in my earlier post.