APBRmetrics Forum Index APBRmetrics
The statistical revolution will not be televised.
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Player Stats Search
Goto page Previous  1, 2
 
Post new topic   Reply to topic    APBRmetrics Forum Index -> General discussion
View previous topic :: View next topic  
Author Message
gabefarkas



Joined: 31 Dec 2004
Posts: 879
Location: Durham, NC

PostPosted: Mon Jul 16, 2007 8:11 am    Post subject: Reply with quote

Along these lines, is there any possibility of doing a career similarity, not just player-season?
Back to top
View user's profile Send private message Send e-mail AIM Address
Ben



Joined: 13 Jan 2005
Posts: 202
Location: Iowa City

PostPosted: Mon Jul 16, 2007 2:10 pm    Post subject: Reply with quote

Just came across this thread - it looks great, Justin.
Back to top
View user's profile Send private message
jkubatko



Joined: 05 Jan 2005
Posts: 508
Location: Columbus, OH

PostPosted: Mon Jul 16, 2007 2:54 pm    Post subject: Reply with quote

Re: the similarity scores, it's not as easy as it may sound. Calculating the sim scores on the fly is too time-intensive, and a table with every possible player/season combination would be ginormous (hey, Merriam-Webster says it's a word now). I'll have to think about this.
_________________
Regards,
Justin Kubatko
Basketball Stats!
Back to top
View user's profile Send private message Send e-mail Visit poster's website
THWilson



Joined: 19 Jul 2005
Posts: 126
Location: phoenix

PostPosted: Mon Jul 16, 2007 4:51 pm    Post subject: Reply with quote

jkubatko wrote:
Re: the similarity scores, it's not as easy as it may sound. Calculating the sim scores on the fly is too time-intensive, and a table with every possible player/season combination would be ginormous (hey, Merriam-Webster says it's a word now). I'll have to think about this.


I don't know if this is feasible, but just a thought. Since sim scores are based (primarily) on standard deviations away from the mean for that season, couldn't you create a table which doesn't have any of the actual player figures, but just their standard deviations above or below the mean? You could add in height and then there would be all the necessary fields on that table. Then to calculate on the fly you'd just have to reference this one similarity-table. Would that solve the time intensity issue?
Back to top
View user's profile Send private message
Ed Küpfer



Joined: 30 Dec 2004
Posts: 616
Location: Toronto

PostPosted: Mon Jul 16, 2007 5:03 pm    Post subject: Reply with quote

THWilson wrote:
Since sim scores are based (primarily) on standard deviations away from the mean for that season, couldn't you create a table which doesn't have any of the actual player figures, but just their standard deviations above or below the mean? You could add in height and then there would be all the necessary fields on that table.


That would be 11 variables for each player season.

Quote:
Then to calculate on the fly you'd just have to reference this one similarity-table. Would that solve the time intensity issue?


Since there are about 500 players/season, the similarity distance matrix would be 15,000 x 15,000. That is pretty big.
_________________
ed
Back to top
View user's profile Send private message
THWilson



Joined: 19 Jul 2005
Posts: 126
Location: phoenix

PostPosted: Tue Jul 17, 2007 10:41 am    Post subject: Reply with quote

Ed Küpfer wrote:

Quote:
Then to calculate on the fly you'd just have to reference this one similarity-table. Would that solve the time intensity issue?


Since there are about 500 players/season, the similarity distance matrix would be 15,000 x 15,000. That is pretty big.


I'm not saying a matrix.

My suggestion was a z-score database on which you could use the exact same steps that Justin is already using to pull up the per-game or total counts for the two players, plus one more step to calculate the differences? Like he does here: http://www.basketball-reference.com/about/similar.html

Rather than doing the whole thing on the fly, or pre-calculating the whole thing, he would have one step pre-calculated (the z-scores), and the second step done at the time of the query (the difference summation). I don't think that would be especially large or especially long running. Make sense?
Back to top
View user's profile Send private message
Ed Küpfer



Joined: 30 Dec 2004
Posts: 616
Location: Toronto

PostPosted: Tue Jul 17, 2007 3:10 pm    Post subject: Reply with quote

THWilson wrote:
Rather than doing the whole thing on the fly, or pre-calculating the whole thing, he would have one step pre-calculated (the z-scores), and the second step done at the time of the query (the difference summation). I don't think that would be especially large or especially long running. Make sense?


I don't know if I understand you. You can have all the differences pre-calculated, but you'd still have to store them in that big ass matrix. If you want to compare 2 or more players, you'd need to either caculate the differences on the fly or look up the calculated differences from that player x player matrix.

One way to make it quicker is to have a list of x most similar players for each player (as shown on each player page), but that wouldn't allow you to compare the differences between two aribtrary players.
_________________
ed
Back to top
View user's profile Send private message
THWilson



Joined: 19 Jul 2005
Posts: 126
Location: phoenix

PostPosted: Tue Jul 17, 2007 5:50 pm    Post subject: Reply with quote

Ed Küpfer wrote:
I don't know if I understand you. You can have all the differences pre-calculated, but you'd still have to store them in that big ass matrix.


Don't pre-calculate the differences, pre-calculate the z-scores. Calculate (and sum) the differences between z-scores at the time that the players are selected, at the time of the query.
Back to top
View user's profile Send private message
Ed Küpfer



Joined: 30 Dec 2004
Posts: 616
Location: Toronto

PostPosted: Tue Jul 17, 2007 7:59 pm    Post subject: Reply with quote

THWilson wrote:
Don't pre-calculate the differences, pre-calculate the z-scores. Calculate (and sum) the differences between z-scores at the time that the players are selected, at the time of the query.


I get the feeling we're talking past each other, because I can't really make sense of that. I want to work it out because similarity stuff is really interesting to me.

Consider a list of 30 North American citiies. Say you want to cacluate the 10 closet neighbours for all 30 cities. To do that you take their latitude and longituge and find the distance between each pair of cities. You'll end up with a 30x30 matrix of distances. The 10 closest neighbours are then the 10 smallest distances for each city.

But wait -- distance between citiies is 3-dimensional because of the curvature of the earth and because of each city's altitude (not a big difference in this example, but just go with me on this). To caluclate the 3-dimenisional distance, you do it the same way as the 2-D version: euclidean distance is sqrt((Lat_city1 - Lat_city2)^2 + (Long_city1 - Long_city2)^2 + (Alt_city1 - Alt_city2)^2). The result is a distance metric -- close neighbours will have smaller distances than far neighbours.

That is how Justin calculates similarity. The z-score transformation precedes the distance calculations (and in my experience, is not a necessary step). Standardising numbers into z-scores is not a comupationally intensive act -- the hard work is coming up with the distance matrix, which is the last step before finding similar players.
_________________
ed
Back to top
View user's profile Send private message
jkubatko



Joined: 05 Jan 2005
Posts: 508
Location: Columbus, OH

PostPosted: Tue Jul 17, 2007 9:08 pm    Post subject: Reply with quote

Ed, I think you guys are talking past each other. I believe that Tom is suggesting that I just display the sim score between the two seasons and not the rank. I think it was Neil that suggested I display the sim score and the rank, which would be a much more difficult proposition (as you have demonstrated).
_________________
Regards,
Justin Kubatko
Basketball Stats!
Back to top
View user's profile Send private message Send e-mail Visit poster's website
davis21wylie2121



Joined: 13 Oct 2005
Posts: 373
Location: Atlanta, GA

PostPosted: Tue Jul 17, 2007 9:19 pm    Post subject: Reply with quote

jkubatko wrote:
Ed, I think you guys are talking past each other. I believe that Tom is suggesting that I just display the sim score between the two seasons and not the rank. I think it was Neil that suggested I display the sim score and the rank, which would be a much more difficult proposition (as you have demonstrated).


So is it feasible as long as you don't show the rank?
Back to top
View user's profile Send private message
jkubatko



Joined: 05 Jan 2005
Posts: 508
Location: Columbus, OH

PostPosted: Wed Jul 18, 2007 8:57 am    Post subject: Reply with quote

davis21wylie2121 wrote:
So is it feasible as long as you don't show the rank?


Feasible, yeah, but it's going to be some time before I can think about adding it, as I have some other projects that need my attention right now. I'll add it to my novel-length list of things to add to the site. :-)
_________________
Regards,
Justin Kubatko
Basketball Stats!
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Display posts from previous:   
Post new topic   Reply to topic    APBRmetrics Forum Index -> General discussion All times are GMT - 5 Hours
Goto page Previous  1, 2
Page 2 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group