APBRmetrics Forum Index APBRmetrics
The statistical revolution will not be televised.
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

New website: BasketballValue.com
Goto page Previous  1, 2, 3, 4  Next
 
Post new topic   Reply to topic    APBRmetrics Forum Index -> General discussion
View previous topic :: View next topic  
Author Message
Ed Küpfer



Joined: 30 Dec 2004
Posts: 522
Location: Toronto

PostPosted: Tue Oct 03, 2006 1:24 pm    Post subject: Reply with quote

gabefarkas wrote:
Ugh, cherokee why use R? Do you have SPSS? Menu driven is always better, IMO.


?

Menu driven packages might be easier, but I can't imagine anyone thinking they'd be better. Command line packages like R might overwhelm a stats beginner, but their flexibility, ability to handle large datasets, and mind-boggling diversity of functions make having one of these installed on your computer a must. On graphing ability alone, R tops SPSS in a hundred different ways.
_________________
ed
Back to top
View user's profile Send private message
Mark



Joined: 20 Aug 2005
Posts: 670

PostPosted: Tue Oct 03, 2006 11:45 pm    Post subject: Reply with quote

I looked briefly at the files and saw the play by play organized with game clock data. It would seem the dataset could be used to compute various fastbreak stats- team and player points off break, shot frequency, FG and TS%, turnovers,etc. by noting time differences between ending action of one play and the next.

I would find it interesting to hear more about that element of the game if anyone cared to analyze it leaguewide / for season. It would seem an important addition to pace discussions.
Back to top
View user's profile Send private message
WizardsKev



Joined: 03 Jan 2005
Posts: 460
Location: Washington, DC

PostPosted: Wed Oct 04, 2006 9:01 am    Post subject: Reply with quote

basketballvalue wrote:
WizardsKev wrote:
I find the way you're counting possessions sorta confusing. We're all accustomed to "per 100 possessions" stats. It would make more sense to me to see the numbers broken down by offensive performance (pts per 100 offensive possessions), defensive performance (pts per 100 defensive possessions), and then have a net +/- per 100 possessions.


Sorry for the delay in responding to all the posts, I'll try and address all the various points.

The reason I approached the problem as I have described is that I'm ultimately planning on producing adjusted +/- stats based on the results of all the "mini-games" that a matchup represent. I have to think this through a little more, but I fear that breaking it up into 100 offensive possessions disconnected from 100 defensive possessions might make that task harder. I'm having a hard enought time as it it. Smile At the same time, I see the point of breaking it up as you and deepak_e have described. I can't promise I'll do this right away, but I'll look into it.

Thanks,
Aaron


Believe me, I appreciate the difficulty inherent in what you're trying to do. It was a major effort to design a database to hold the defensive data I was tracking, and I wasn't even doing the programming on it. We used pbp data, and had similar identification struggles. Including some games where players were misidentified. I'll be interested to see the results of your work. Good luck. Wink
_________________
If you can't explain it simply, you don't understand it well enough.

-- Albert Einstein
Back to top
View user's profile Send private message Visit poster's website AIM Address Yahoo Messenger
basketballvalue



Joined: 07 Mar 2006
Posts: 13

PostPosted: Sun Oct 08, 2006 10:07 pm    Post subject: Reply with quote

cherokee_ACB wrote:
basketballvalue wrote:
So, what the database shows is that the same player ID can be either of these names on Miami, but the true name is "S. O'Neal" since that's more specific.


I noticed and appreciate that. It's good to have a single name for each player, and be able to match it with the different names used in the pbps. What I was asking is for such PlayerTrueName to be unique. As it is now, your database uses J. Jones for both James Jones and Jumaine Jones, so you are forced to look at/work with numerical ids to differentiate them. I'm just saying this would be helpful, but isn't really so important (82games is worse in this respect). I can live with what you have know.


Ah, that is a good point. It probably won't happen right away, but I'll work to make those names unique, too.

Thanks,
Aaron
Back to top
View user's profile Send private message
basketballvalue



Joined: 07 Mar 2006
Posts: 13

PostPosted: Sun Oct 29, 2006 10:04 am    Post subject: Reply with quote

All,

Sorry I've been away for a while, my day job has been keeping me quite busy. However, I have done some work on basketballvalue.com, now there is data on individual players as well. Please check it out and let me know what you think.

I still have a little work to do to start back up for the upcoming season, and I'll be out of town for some family responsibilities early this week, so I expect that 2006-2007 data won't be available right after Tuesday's games conclude. My apologies in advance, and I will be working to make that data available as soon as possible on basketballvalue.com.

I expect that a key question people will have is around weighting when players play for two different teams. This weighting is based on minutes, not possessions, because I do not want to overweight time played on a fast paced team. As an example, consider a player who played every minute of 41 games for Phoenix (most possessions in 2005-2006) and every minute of 41 games for Memphis (fewest possessions in 2005-2006). In this case, I believe that the "performance", however it is defined, of this player over the course of the entire season should be a 50-50 mix of their performance on each team. That result is achieved by weighting on minutes, not possessions.

Please note, I recognize that in many other situations basing stats on possessions, not minutes is preferred. However, I think in this particular calculaton weighting on minutes is more appropriate. I look forward to hearing the community's thoughts on why you agree or disagree. I'm particularly interested as I'll use this weighting approach more as I break the information down further.

Thanks,
Aaron
Back to top
View user's profile Send private message
Ed Küpfer



Joined: 30 Dec 2004
Posts: 522
Location: Toronto

PostPosted: Sun Oct 29, 2006 11:17 am    Post subject: Reply with quote

basketballvalue wrote:
I have done some work on basketballvalue.com, now there is data on individual players as well. Please check it out and let me know what you think.


Um, wow?

Sorry, I'm just speechless. That there is some amazing work, Aaron.
_________________
ed
Back to top
View user's profile Send private message
deepak_e



Joined: 26 Apr 2006
Posts: 200

PostPosted: Sun Oct 29, 2006 2:08 pm    Post subject: Reply with quote

Aaron, again I appreciate your work on this. Can you explain why you refer to On Court stats as "Simple" while Off Court and Difference are "Weighted"?
Back to top
View user's profile Send private message
basketballvalue



Joined: 07 Mar 2006
Posts: 13

PostPosted: Mon Oct 30, 2006 12:15 pm    Post subject: Reply with quote

Thanks for the feedback you two.

deepak_e wrote:
Can you explain why you refer to On Court stats as "Simple" while Off Court and Difference are "Weighted"?


The "Simple" stats are simply adding up the totals for each stat. For example, Peja's total plus minus was 72 when you add up his time for Indiana with his time in Sacramento. You can see this at http://www.basketballvalue.com/teamplayers.php?team=IND. His +/- per 200 possessions from simply adding up his possessions on both teams and +/- on both teams was 1.41.

However, the weighted stats, as I indicated earlier, are based on minutes. For people who played on only one team, it's the same as the simple stats. For people like Peja, however, it's a little different. As you can see on the site, his weighted +/- per 200 is 1.46.

Since this number is higher than 1.41, it implies that his +/- per 200 on the court for the team with the slower pace was higher, as the slower paced team is underweighted when you simply add up points and possessions. Simply adding it up is the same as weighting on possessions, while what I'm doing is weighting on minutes, as I mentioned earlier.

http://www.basketballvalue.com/player.php?id=67 confirms this, he played 1454 minutes for Indiana where he had a 5.63, and 1148 minutes for Sacramento where he had a -3.82. Indiana was slightly slower in terms of possessions per minute last year, 3.92 vs. Sacramento's 3.98.

Incidentally, http://www.basketballvalue.com/player.php?id=67 shows the simple on court and off court stats of each team he played on, those are listed under Total Stats By Team, Player of Interest.

I hope that clarifies things a little better. Please let me know if you have more questions.

Thanks,
Aaron
Back to top
View user's profile Send private message
gabefarkas



Joined: 31 Dec 2004
Posts: 506
Location: NYC

PostPosted: Wed Nov 01, 2006 10:02 am    Post subject: Reply with quote

Ed Küpfer wrote:
gabefarkas wrote:
Ugh, cherokee why use R? Do you have SPSS? Menu driven is always better, IMO.


?

Menu driven packages might be easier, but I can't imagine anyone thinking they'd be better. Command line packages like R might overwhelm a stats beginner, but their flexibility, ability to handle large datasets, and mind-boggling diversity of functions make having one of these installed on your computer a must. On graphing ability alone, R tops SPSS in a hundred different ways.


I would challenge you to find 5 things that R can do, but SPSS cannot.
_________________
Statistics are like a woman's bikini. What it reveals can be fascinating, but what it conceals is ultimately critical!
Back to top
View user's profile Send private message Send e-mail AIM Address
Ed Küpfer



Joined: 30 Dec 2004
Posts: 522
Location: Toronto

PostPosted: Wed Nov 01, 2006 10:35 am    Post subject: Reply with quote

gabefarkas wrote:
I would challenge you to find 5 things that R can do, but SPSS cannot.


1.

2. Quantile regression.

3. Fuzzy clustering -- actually, SPSS omits many clustering algorithms, but fuzzy clustering is what I use the most.

4.

5. All manner of Bayesian functions in model fitting. Take a look here -- each one of those packages has functions that SPSS lacks.
_________________
ed
Back to top
View user's profile Send private message
gabefarkas



Joined: 31 Dec 2004
Posts: 506
Location: NYC

PostPosted: Wed Nov 01, 2006 10:23 pm    Post subject: Reply with quote

Ed Küpfer wrote:


1.

2. Quantile regression.

3. Fuzzy clustering -- actually, SPSS omits many clustering algorithms, but fuzzy clustering is what I use the most.

4.

5. All manner of Bayesian functions in model fitting. Take a look here -- each one of those packages has functions that SPSS lacks.


1) Go to "Graphs" then "Interactive" and you can build those one-by-one.

2) SPSS can do quartile regression (is that good enough?) using Analyze ==> Regression ==> and then choosing the model type that you want to specify. In the window, go to "Statistics" and "Options" and you can specify it as part of the analysis

3) I haven't investigated this, and if I weren't getting married in 10 days I would look for it, because I feel like I've seen the ability to do that.

4) What the crap is that? Earl Grey or Chamomile?

5) You might have me there. Can I get back to you in about 2 weeks? (see #3)
_________________
Statistics are like a woman's bikini. What it reveals can be fascinating, but what it conceals is ultimately critical!
Back to top
View user's profile Send private message Send e-mail AIM Address
KnickerBlogger



Joined: 30 Dec 2004
Posts: 143

PostPosted: Thu Nov 02, 2006 1:27 pm    Post subject: Reply with quote

I bet in R you can't from a single command line, load up a .csv (or other data set created independently) and have it spit out a jpg (or gif) graph (scatter plot).

Go ahead & prove me wrong! Smile
_________________
KnickerBlogger.Net - now indispensable!
Back to top
View user's profile Send private message Visit poster's website AIM Address Yahoo Messenger
Ed Küpfer



Joined: 30 Dec 2004
Posts: 522
Location: Toronto

PostPosted: Thu Nov 02, 2006 2:03 pm    Post subject: Reply with quote

KnickerBlogger wrote:
I bet in R you can't from a single command line, load up a .csv (or other data set created independently) and have it spit out a jpg (or gif) graph (scatter plot).

Go ahead & prove me wrong! Smile


No time for frills now, but the basics are:

Code:

png("C:\plot.png")
off <- read.csv('c:\offense.csv', head=T)
def <- read.csv('c:\defense.csv', head=T)
attach(off)
ortg <- pts/(fga + 0.44*fta - or + to)*100
detach(off)
attach(def)
drtg <- pts/(fga + 0.44*fta - or + to)*100
plot(ortg,drtg,cex=0)
text(ortg,drtg,team,cex=.5)
dev.off()


which generates this:



"offense.csv" and "defense.csv" are Doug's stats, save by Excel as csvs. The plot function allows for labels and titles and such, the text function actually put the data labels on the plot. I don't know much about text manipulation in R, but I'm sure there's a way of truncating the team names. A plot grid has to be added between the plot and text lines, but is easy enough to do.
_________________
ed
Back to top
View user's profile Send private message
WizardsKev



Joined: 03 Jan 2005
Posts: 460
Location: Washington, DC

PostPosted: Thu Nov 02, 2006 5:21 pm    Post subject: Reply with quote

This thread needs to be re-titled to: "Stump Ed Ed Küpfer"
_________________
If you can't explain it simply, you don't understand it well enough.

-- Albert Einstein
Back to top
View user's profile Send private message Visit poster's website AIM Address Yahoo Messenger
FFSBasketball



Joined: 07 Mar 2005
Posts: 175
Location: MD

PostPosted: Thu Nov 02, 2006 6:24 pm    Post subject: Reply with quote

WizardsKev wrote:
This thread needs to be re-titled to: "Stump Ed Ed Küpfer"
Which is a shame because it's overlooking some of the great work by basketballvalue.com.
_________________
"Statistics: The only science that enables different experts using the same figures to draw different conclusions." - Evan Esar
Back to top
View user's profile Send private message Visit poster's website AIM Address
Display posts from previous:   
Post new topic   Reply to topic    APBRmetrics Forum Index -> General discussion All times are GMT - 5 Hours
Goto page Previous  1, 2, 3, 4  Next
Page 3 of 4

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group