Below is a snapshot of the Web page as it appeared on 4/5/2011 (the last time our crawler visited it). This is the version of the page that was used for ranking your search results. The page may have changed since we last cached it. To see what might have changed (without the highlights), go to the current page.
Bing is not responsible for the content of this page.
APBRmetrics :: View topic - 2009 NESSIS Presentations & Videos
APBRmetrics Forum Index APBRmetrics
The statistical revolution will not be televised.
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

2009 NESSIS Presentations & Videos
Goto page 1, 2  Next
 
Post new topic   Reply to topic    APBRmetrics Forum Index -> General discussion
View previous topic :: View next topic  
Author Message
Ryan J. Parker



Joined: 23 Mar 2007
Posts: 711
Location: Raleigh, NC

PostPosted: Sat Oct 24, 2009 2:49 pm    Post subject: 2009 NESSIS Presentations & Videos Reply with quote

They're now available online: presentations & videos.
_________________
I am a basketball geek.
Back to top
View user's profile Send private message Visit poster's website
fundamentallysound



Joined: 18 Jul 2008
Posts: 25
Location: VA

PostPosted: Sat Oct 24, 2009 3:37 pm    Post subject: Reply with quote

watching the Wayne Winston video - heard his bit on Multicollinearity and wondered about ways of fixing it.

There's a paper written about a multicollinearity fix, here:

http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1339926

I don't know enough about regressions and stats to know what it all means, but it might be useful for you all here.
Back to top
View user's profile Send private message AIM Address
deepak



Joined: 26 Apr 2006
Posts: 665

PostPosted: Sat Oct 24, 2009 5:56 pm    Post subject: Reply with quote

I'm definitely no statistician, but here's a thought on addressing the multicollinearity issue.

For each "time segment" with 10 players on the floor, the regression proposed by Dan Rosenbaum went like this:

MARGIN = b0 + b1X1 + b2X2 + . . . + bkXk + e

For 5 home players n=h1,h2,h3,h4,h5, each Xn=1 (so it sums to 5). For 5 road players m=r1,r2,r3,r4,r5, each Xm=-1 (so it sums to -5). It seems like what we want to do is somehow take into account how well each player played in the time segment, and adjust those coefficients accordingly to change the credit/penalty they get towards the margin.

I'll just consider a positive MARGIN (home team outscored road team) and the Xn for home players right now. I'd just like someone to let me know if my thinking is way off on this.

Take an example of a particular time segment in which the 5 home players had the following "box score rating" (let's say SPM), in increasing order: -7, -2, +1, +4, and +10. This means the worst player on the home team played as a -7 according to the box score, and the best player played as a +10. I'd calculate their adjusted coefficients as Xn = 1.48, 1.19, 1.01, 0.84, 0.48, for n=h1,h2,h3,h4,h5. Where:

Xn = 1 + (average(SPM_i, i=h1,h2,h3,h4,h5) - SPM_n) / (SPM_h5 - SPM_h1)

So in the regression the worst performing home player will get the highest positive coefficient (forcing his APM value lower?), and the best performing home player will get the lowest positive coefficient (forcing his APM value higher?).

Am I breaking any rules by fudging those coefficients like that? Would this have the desired effect of ultimately giving more credit to the players that performed better by the boxscore?
Back to top
View user's profile Send private message
Crow



Joined: 20 Jan 2009
Posts: 820

PostPosted: Sun Oct 25, 2009 3:10 am    Post subject: Reply with quote

Did Wayne Winston go into any detail discussing the differences between his ratings (and method) vs basketballvalue?

Using 18 player cases I could quickly assemble where he gave out his marks the average difference to bv is about 2.2. 5 cases were a difference of more than 2.5, 4 cases of 4 points difference and then Dirk was an 8.5 point difference. In 4 of these 5 cases WW was higher. For the rest is almost evenly split which was higher. Take away these biggest variances the rest were different by less than 1 on average.

I guess Dirk could be an outlier but it is an interesting extreme case. Hard to imagine method differences, which I assumed were at the margin, causing such a large difference. But maybe.
Different methods will report somewhat different answer sets.



This is a off the cuff question but rather than finding the Adjusted +/- solution that fits the data the very best of all possibilities (for one method), what if you looked at the top 10 solutions or 100 or whatever useful max or all within x distance of the best solution and took the average or weighted average or reported the range? Is there are president or support for doing this? Would this provide something akin to the value of a Monte Carlo simulation?

And then perhaps going further and blending the results from runs of somewhat different methods? Would this help improve confidence in the average Adjusted +/= numbers? Is relying on the single best solution too small or weak or unique a base for several hundred ratings?


Last edited by Crow on Sun Oct 25, 2009 2:26 pm; edited 5 times in total
Back to top
View user's profile Send private message
fundamentallysound



Joined: 18 Jul 2008
Posts: 25
Location: VA

PostPosted: Sun Oct 25, 2009 3:33 am    Post subject: Reply with quote

Crow wrote:
Did Wayne Winston go into any detail discussing the differences between his ratings (and method) vs basketballvalue?

Using 18 player cases I could quickly assemble where he gave out his marks the average difference to bv is about 2.2. 5 cases were a difference of more than 2.5, 4 cases of 4 points difference and then Dirk was an 8.5 point difference. In 4 of these 5 cases WW was higher. For the rest is almost evenly split which was higher. Take away these biggest variances the rest were different by less than 1 on average.

I guess Dirk could be an outlier but it is an interesting extreme case. Hard to imagine method differences, which I assumed were at the margin, causing such a large difference. But maybe.



Yes, he did mention that they calculated it differently. He explicitly mentioned the example of Dirk and essentially said that Dirk's APM was being punished because Devean George sucked. They managed to figure out a way to fix that to show how great Dirk is. He didn't really get into it too much about how they did that, but it sounds like they swapped out George's actual numbers for a more negative value to show how bad he was.

Additionally, as a more general matter, he said that they don't use a standard regression to come up with their numbers, but instead solve the data as a system of equations using the least squares method.
Back to top
View user's profile Send private message AIM Address
Crow



Joined: 20 Jan 2009
Posts: 820

PostPosted: Sun Oct 25, 2009 3:48 am    Post subject: Reply with quote

Thanks for the response.
Back to top
View user's profile Send private message
gabefarkas



Joined: 31 Dec 2004
Posts: 1313
Location: Durham, NC

PostPosted: Mon Oct 26, 2009 7:57 am    Post subject: Reply with quote

deepak_e wrote:
I'm definitely no statistician, but here's a thought on addressing the multicollinearity issue.

For each "time segment" with 10 players on the floor, the regression proposed by Dan Rosenbaum went like this:

MARGIN = b0 + b1X1 + b2X2 + . . . + bkXk + e

For 5 home players n=h1,h2,h3,h4,h5, each Xn=1 (so it sums to 5). For 5 road players m=r1,r2,r3,r4,r5, each Xm=-1 (so it sums to -5). It seems like what we want to do is somehow take into account how well each player played in the time segment, and adjust those coefficients accordingly to change the credit/penalty they get towards the margin.

I'll just consider a positive MARGIN (home team outscored road team) and the Xn for home players right now. I'd just like someone to let me know if my thinking is way off on this.

Take an example of a particular time segment in which the 5 home players had the following "box score rating" (let's say SPM), in increasing order: -7, -2, +1, +4, and +10. This means the worst player on the home team played as a -7 according to the box score, and the best player played as a +10. I'd calculate their adjusted coefficients as Xn = 1.48, 1.19, 1.01, 0.84, 0.48, for n=h1,h2,h3,h4,h5. Where:

Xn = 1 + (average(SPM_i, i=h1,h2,h3,h4,h5) - SPM_n) / (SPM_h5 - SPM_h1)

So in the regression the worst performing home player will get the highest positive coefficient (forcing his APM value lower?), and the best performing home player will get the lowest positive coefficient (forcing his APM value higher?).

Am I breaking any rules by fudging those coefficients like that? Would this have the desired effect of ultimately giving more credit to the players that performed better by the boxscore?

I'm too tired on a Monday morning to get into this. It's an interesting idea, but I think you're double-dipping.
MTamada or TPRyan - you can probably explain it better than I can, anyway.
Back to top
View user's profile Send private message Send e-mail AIM Address
tpryan



Joined: 11 Feb 2005
Posts: 100

PostPosted: Mon Oct 26, 2009 4:04 pm    Post subject: Reply with quote

I drop in to see what is happening and I see that I have a call to duty from Gabe. Very Happy I am busy with a variety of things at the moment but I will try to look at this in a day or two. I also want to read that paper that fundamentallysound linked.
Back to top
View user's profile Send private message
mtamada



Joined: 28 Jan 2005
Posts: 377

PostPosted: Tue Oct 27, 2009 8:26 pm    Post subject: Reply with quote

Ditto.

Grrr, the power strip on my computer went dead just as I was hitting "submit". Here's what I think I wrote (and was lost when my PC went black):

This is double-dipping in a sense, but any system that combines APM with SPM is going to be double-dipping from the +/- and the box scores.

So, the idea is not to treat all five players on the floor equally, but to use SPM or box score stats to try to estimate who made the most contributions in that time frame, and use those results to weight or pre-judge the data before we put it into the regression? Might be a helpful idea. I don't like the formula at first glance, but I have not thought it through. But the underlying idea might be a good one.

I conjecture that we'll get the usual pros and cons -- the result might over-reward a stat-hungry gunner who hits the offensive glass instead of getting back on defense, or a player who goes for the steal instead of playing solid team defense. But maybe the benefits outweigh the disadvantages.

If a fivesome has a successful APM, but mainly via good defense rather than via their own scoring (e.g. if your fivesome plays against Chris Paul, Kobe, LeBron, Duncan, and Dwight Howard for four minutes, and finishes that 4-minute period with a score of 2-2, they did mighty well, but you don't want to give disproportionate credit to the one guy on your fivesome who sank a field goal), then we might want to give less weight to the SPM weights and rely more on ordinary APM. Whereas if they were outscored 10-14 during that stint, you might want to put more weight on the SPM stats, since that group didn't seem to be doing much on Defense anyway, but had okay contributions on offense, which likley can be measured by the box score stats better than the defensive contributions can be.
Back to top
View user's profile Send private message
mtamada



Joined: 28 Jan 2005
Posts: 377

PostPosted: Wed Oct 28, 2009 12:09 am    Post subject: Reply with quote

fundamentallysound wrote:
watching the Wayne Winston video - heard his bit on Multicollinearity and wondered about ways of fixing it.

There's a paper written about a multicollinearity fix, here:

http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1339926

I don't know enough about regressions and stats to know what it all means, but it might be useful for you all here.


I didn't like it. I'm okay with his notion of not trying to find a unique true expression of the impact of X on Y (could be defined as -- let's write them as "b1" and "b1 + b2d", from his equations 7 and 9). But his suggested "primary regression" (from which we omit the correlated predictor variable, X2) then simply gives us an estimate of b1 + b2d, and what are we supposed to do with that? It's not an estimate of b1, and it has a misleadingly low standard error because that regression has ignored the effects of X2. The secondary regression does look at X2, but again I think gives us a misleadingly low standard error because X1 is not in it.
Back to top
View user's profile Send private message
gabefarkas



Joined: 31 Dec 2004
Posts: 1313
Location: Durham, NC

PostPosted: Wed Oct 28, 2009 7:22 am    Post subject: Reply with quote

mtamada wrote:
So, the idea is not to treat all five players on the floor equally, but to use SPM or box score stats to try to estimate who made the most contributions in that time frame, and use those results to weight or pre-judge the data before we put it into the regression? Might be a helpful idea. I don't like the formula at first glance, but I have not thought it through. But the underlying idea might be a good one.

I conjecture that we'll get the usual pros and cons -- the result might over-reward a stat-hungry gunner who hits the offensive glass instead of getting back on defense, or a player who goes for the steal instead of playing solid team defense. But maybe the benefits outweigh the disadvantages.

Is that notion of pre-judging the data -- dare I say it -- Bayesian?
Back to top
View user's profile Send private message Send e-mail AIM Address
deepak



Joined: 26 Apr 2006
Posts: 665

PostPosted: Wed Oct 28, 2009 12:24 pm    Post subject: Reply with quote

mtamada wrote:
So, the idea is not to treat all five players on the floor equally, but to use SPM or box score stats to try to estimate who made the most contributions in that time frame, and use those results to weight or pre-judge the data before we put it into the regression? Might be a helpful idea. I don't like the formula at first glance, but I have not thought it through. But the underlying idea might be a good one.


Yes, in a nutshell. And while SPM may not be an accurate reflection of who did what in the time frame, I would think it does a better job than just assuming equal contribution from all players. So to some extent, at least, I would expect the overall ratings to be more accurate when aggregated over hundreds/thousands of time frames during the course of the season.

One difficulty with this approach, of course, is tracking box score stats per time frame. That increases the data you're looking at several times over. And is taking into account SPM each time_segment before the regression an improvement over just mixing APM and SPM at the end, as Dan Rosenbaum did? It seems like it should be, but maybe turns out to not make much of a difference. I don't know.
Back to top
View user's profile Send private message
Crow



Joined: 20 Jan 2009
Posts: 820

PostPosted: Mon Nov 02, 2009 10:18 pm    Post subject: Reply with quote

deepak_e wrote:
And is taking into account SPM each time_segment before the regression an improvement over just mixing APM and SPM at the end, as Dan Rosenbaum did? It seems like it should be, but maybe turns out to not make much of a difference. I don't know.


I've asked that question a number times over the last few years but I am not technically qualified to fully answer it. I'm glad you are raising it too and pushed it further and I comment hoping that it will continue to be pushed even farther and that at least some of what follows might be helpful to anyone interested in doing so and more capable.

I think at the broad=brush level the answer is to do both and see how they compare, to the other blend and the pure versions of each.

If a single number isn't the right quest then lets go for a set of numbers to think about and see how they agree or vary and by how much.



As for deepak e's formula itself I also think it could be a step up from the assumption of equal credit to all as input or agnosticism at this stage as I have said before and suggested some form of credit splitting rules a la the early Protrade system (is that now used at all by the Thunder?) It would be interesting to see some version of this run.

Maybe you could assume some minimum (40-50%?) or maximum share (65-75%?) of the credit for play actions to the individual directly involved using SPM- perhaps just on offense (if you have respect/faith for SPM) and then let APM assign the rest of the value that you are less sure to whom it should go.

Somewhat related to assigning this value- what is the average absolute value of the difference between player SPM and APM? I ask this wondering would this help choose how much share of "proper credit" is for non-boxscore activity and should be found by APM and leave what can be directly assigned via SPM to be done with that tool and achieve more accuracy to the play by play data?

I know that for the range of plays any system of credit-splitting values won't be uniquely appropriate for all plays but you got to approximate and round-off.

And we're in the end going to end up with player ratings something on the order of great, very good, average, below average and well below average ratings (or least that is how they will probably boil down and be interpreted and used) and anything stated more precise is probably an overstatement.



And then what if you took a set of pure SPM, pure APM and SPM-APM mixed at the input stage and / or SPM-APM in a post-production blend values for players and searched by some method for a set of numbers that best explained these different sets of numbers? (Does that get back into multi-level regression?)



I'd still be interested to reaction to my earlier comment on a different way to frame "the answer"
for pure Adjusted ratings:

"... rather than finding the Adjusted +/- solution that fits the data the very best of all possibilities ..., what if you looked at the top solutions ... within x distance of the best solution and took the average or weighted average? Is there are precedent or support for doing this?"


I noticed recently Gabe in this thread said
http://sonicscentral.com/apbrmetrics/viewtopic.php?t=2354&start=30

"The error from MC comes from the fact that MC causes coefficient estimates to vary erratically when you change the data only slightly."

and

"What does this mean? Well, first the good news: it means that the perfect relationship between X1 and X2 didn't stop us from obtaining a set of coefficient estimates that are a good fit to the data. Second, the bad news: it means that we can't make any interpretations about any one set of coefficient estimates as reflecting the true effects of the different PV's, since many different sets of coefficient estimates provide the same good fit."

Which strike me as related and consistent about the nature of the problem with the single unique Adjusted +/- "solution" for a league of hundreds of players.

But is my suggestion a reasonable one or does it suggest other thoughts? I'd appreciate hearing from mtamada or tp ryan or others if they are willing and deem it worthwhile. Conceptually I thought it had enough promise to toss out for more knowledgeable feedback. It just doesn't seem that the a single unique solution is going to be fully satisfactory and maybe there is an better way to settle in on an estimate.



I'd also be interested in tp ryan's reaction to the paper fundamentallysound linked to. I skimmed it and can't judge it technically but thought the idea of separate runs for players might have some promise and applicability. (Not sure if you could do that for teams one by one too as I suggested before after preparing a first cut of league-wide player Adjusted +/- ratings to improve the Adjusted ratings among teammates and the fit with actual team performance beyond the present fit but maybe that could help too? A different form of multi-collinearity and an opportunity to do multiple stages of regression to attack it and fine-tune the output values?)


And any further comment on the advantages and disadvantages of Winston's least square method over the way the other APM models do it?


Last edited by Crow on Wed Nov 04, 2009 2:56 am; edited 3 times in total
Back to top
View user's profile Send private message
gabefarkas



Joined: 31 Dec 2004
Posts: 1313
Location: Durham, NC

PostPosted: Tue Nov 03, 2009 8:45 am    Post subject: Reply with quote

Crow wrote:
Which strike me as related and consistent about the nature of the problem with the "solution".

But is my suggestion a reasonable one or does it suggest other thoughts? I'd appreciate hearing from mtamada or tp ryan or others if they are willing and deem it worthwhile. Conceptually I thought it had enough promise to toss out for more knowledgeable feedback.
I think I followed enough to see the relatedness to what you wrote above. However, I'm not sure I caught exactly what your proposed "solution" is. Can you please clarify/summarize, if possible? Thanks!
Back to top
View user's profile Send private message Send e-mail AIM Address
Crow



Joined: 20 Jan 2009
Posts: 820

PostPosted: Tue Nov 03, 2009 3:19 pm    Post subject: Reply with quote

rather than finding the Adjusted +/- solution that fits the data the very best of all possibilities

what if you also looked at the next top solutions within x distance of the best solution (those Adjusted solutions sets with just a little bigger standard error than the very best solution set)

and took the average or weighted average for the player Adjusted +/- values found by the set of top solutions instead of the values found by just the very best one]?



That is still the summary of my "answer", brief and conceptual, slightly revised to try to improve clarity.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    APBRmetrics Forum Index -> General discussion All times are GMT - 5 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group