APBRmetrics Forum Index APBRmetrics
The statistical revolution will not be televised.
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Searching for Data
Goto page 1, 2  Next
 
Post new topic   Reply to topic    APBRmetrics Forum Index -> General discussion
View previous topic :: View next topic  
Author Message
Dream-



Joined: 26 Jan 2008
Posts: 13

PostPosted: Mon Jan 28, 2008 7:13 pm    Post subject: Searching for Data Reply with quote

I am looking for some kind of database or webpage where I can find:

For the current season, date, teams, score, Field goal attempts, for each game played in the season. Additionally a general point spread at the time of the game.

Any ideas where I can find this information?

So far the best I have been able to get is date, teams and score for each game.
Back to top
View user's profile Send private message
Dream-



Joined: 26 Jan 2008
Posts: 13

PostPosted: Tue Jan 29, 2008 1:13 pm    Post subject: Reply with quote

I have been able to get some of the information.

For games the most useful has been:

http://www.basketball-reference.com/leagues/NBA_2008_games.html

For point spread and O/U info, I can get the games per team (not the best way, but at least I can compile a table from the individual team charts):

http://stats.therx.com/NBA/gamelogs/GameLogs.aspx?TeamId=1

But I am still missing some source where I can find the field goal attempts (other shot data would be very beneficial).

The way Offensive and Defensive efficiencies are computed everywhere is flawed because it does not take into consideration the opposing team's efficiencies (and pace) which is crucial in my opinion.

By using the opposing team efficiencies we can get a normalized efficiency which can then be multiplied by a normalized pace factor to obtain a better predictor.
Back to top
View user's profile Send private message
hoopseng



Joined: 13 Oct 2006
Posts: 54
Location: Basketball Research

PostPosted: Tue Jan 29, 2008 4:14 pm    Post subject: Reply with quote

Dream- wrote:


The way Offensive and Defensive efficiencies are computed everywhere is flawed because it does not take into consideration the opposing team's efficiencies (and pace) which is crucial in my opinion.



Are you sure?

Opposing team's offensive efficieny is your team's defensive efficiency. Your team's defensive efficiency is opposing team's offensive efficiency.

That's why league averages are equal to each other. You can find the data here:

http://www.nbastuffer.com/team_efficiency
_________________
http://www.nbastuffer.com
Back to top
View user's profile Send private message
Dream-



Joined: 26 Jan 2008
Posts: 13

PostPosted: Tue Jan 29, 2008 4:34 pm    Post subject: Reply with quote

I could be wrong but the sites I have seen before seem to compute efficiency as a simple average of points made per 100 posessions.

But wouldn't points scored against a weak defensive team count less than points scored against a strong defensive team?

This is why I think you cannot just do a simple average.

Assume 3 idealized teams:

Team A has a real offensive efficiency of 100 points per 100 possesions.
Team B has a real Defensive efficiency of 90 points per 100 possesions.
Team C has a real Defensive efficiency of 110 points per 100 possesions.

What would be the expected points scored by A vs. B in 100 possesions? (surely less than 100, right?)
What would be the expected points scored by A vs. C in 100 possesions? (surely more than 100, right?)

If A scores the expected points against B or C, then its Off Eff should stay the same even when the points scored are above or below 100. But with the usual method, the Off Eff for A will change unless A scores 100 points in both cases.

So I think we need to normalize the Off/Def efficiencies.

PS. Very nice website you have!
Back to top
View user's profile Send private message
gabefarkas



Joined: 31 Dec 2004
Posts: 972
Location: Durham, NC

PostPosted: Tue Jan 29, 2008 5:52 pm    Post subject: Reply with quote

Dream- wrote:
I could be wrong but the sites I have seen before seem to compute efficiency as a simple average of points made per 100 posessions.

But wouldn't points scored against a weak defensive team count less than points scored against a strong defensive team?

How would you determine that a defense is "weak" in the first place? A natural way would be to see how well other teams score against them, right? Specifically, if other teams score more points per 100 possessions against them, right?

Dream- wrote:
This is why I think you cannot just do a simple average.

Assume 3 idealized teams:

Team A has a real offensive efficiency of 100 points per 100 possesions.
Team B has a real Defensive efficiency of 90 points per 100 possesions.
Team C has a real Defensive efficiency of 110 points per 100 possesions.

Define "real" in this case. Do you mean "underlying" or do you mean "in actuality"? If it's the former, I would posit there is no reliable way of determining it without using an instrumental variable or overspecifying, as far as I know. If it's the latter, then I would think a simple average would be their "real" efficiency.

Dream- wrote:
What would be the expected points scored by A vs. B in 100 possesions? (surely less than 100, right?)
What would be the expected points scored by A vs. C in 100 possesions? (surely more than 100, right?)

If A scores the expected points against B or C, then its Off Eff should stay the same even when the points scored are above or below 100. But with the usual method, the Off Eff for A will change unless A scores 100 points in both cases.

So I think we need to normalize the Off/Def efficiencies.

Normalizing makes sense, but regression to the mean might make more sense, no?
Back to top
View user's profile Send private message Send e-mail AIM Address
Dream-



Joined: 26 Jan 2008
Posts: 13

PostPosted: Tue Jan 29, 2008 6:13 pm    Post subject: Reply with quote

gabefarkas wrote:

How would you determine that a defense is "weak" in the first place? A natural way would be to see how well other teams score against them, right? Specifically, if other teams score more points per 100 possessions against them, right?

Define "real" in this case. Do you mean "underlying" or do you mean "in actuality"? If it's the former, I would posit there is no reliable way of determining it without using an instrumental variable or overspecifying, as far as I know. If it's the latter, then I would think a simple average would be their "real" efficiency.


Your first two questions are actually the same thing, so let me answer that.

My line of thought involves not determining the efficiency of a given team, but assuming a priori that there is some actual efficiency value that may or may not be known through calculations. This value is what I mean by "real". We may not be able to determine it, but if we assume this idealized team situation we can make some inferences from it.

My argument is that if you don't normalize the efficiencies, you are not getting to the true efficiency. A 100 OE team vs a 90 DE team is expected to score under 100 points per 100 poss. But the current efficiency computation method contradicts this notion because it will change the OE of the first team unless said team scores exactly 100 points per 100 poss.

In other words, scoring against a weaker def. team is easier, so if you score more points against it, it may not mean that you have raised your efficiency, but rather that you are just playing against a weaker defense. So those points scored should be worth less than points scored against stronger defenses. (and the converse must be true as well)

For example, if A has a real OE of 100, and B a real DE of 90 and C a real DE of 110, one could say that scoring a point against B is 1.1 times more difficult than scoring against a team with a DE of 100. In the same way, scoring a point against C is 0.9 times easier than it is against a team with a DE of 100.
Back to top
View user's profile Send private message
Mountain



Joined: 13 Mar 2007
Posts: 437

PostPosted: Tue Jan 29, 2008 9:27 pm    Post subject: Reply with quote

Would pages like this do?
http://hosted.stats.com/nba/teamstats.asp?teamno=02&btnGo=Go&type=ologs
using text to columns at the hyphen?
Back to top
View user's profile Send private message
Dream-



Joined: 26 Jan 2008
Posts: 13

PostPosted: Tue Jan 29, 2008 9:38 pm    Post subject: Reply with quote

Mountain wrote:
Would pages like this do?
http://hosted.stats.com/nba/teamstats.asp?teamno=02&btnGo=Go&type=ologs
using text to columns at the hyphen?


Yeah those work nicely. I also found:

http://www.statfox.com/nba/nbateam~teamid~UTAH~season~2008~log~1.htm

The only problem with these pages is that I would need to get a page per team and then compile the data from those 30 pages into a single per game data base.

I am still hopeful to find something like this:

http://www.basketball-reference.com/leagues/NBA_2008_games.html

but with the extra data I need.
Back to top
View user's profile Send private message
DLew



Joined: 13 Nov 2006
Posts: 72

PostPosted: Tue Jan 29, 2008 10:40 pm    Post subject: Reply with quote

You seem to be arguing that strength of schedule is important. I think everyone agrees, but generally it more or less averages out over many games. Even if a team's schedule was non-average the efficiency numbers are still descriptively accurate, they tell you what the team's points divided possessions for the season was. I don't think anyone is arguing that points divided possessions tells you exactly a team's true offensive strength, but it tends to be in the ball park.
Back to top
View user's profile Send private message
gabefarkas



Joined: 31 Dec 2004
Posts: 972
Location: Durham, NC

PostPosted: Tue Jan 29, 2008 11:40 pm    Post subject: Reply with quote

Dream- wrote:
My line of thought involves not determining the efficiency of a given team, but assuming a priori that there is some actual efficiency value that may or may not be known through calculations. This value is what I mean by "real". We may not be able to determine it, but if we assume this idealized team situation we can make some inferences from it.

This part sounded like you were going for a Bayesian approach, which I've thought about in the past too. However, the rest of the post is more about normalizing and adjusting, which I think is much more challenging to implement. Have you thought about just a strictly Bayesian approach?
Back to top
View user's profile Send private message Send e-mail AIM Address
Dream-



Joined: 26 Jan 2008
Posts: 13

PostPosted: Tue Jan 29, 2008 11:59 pm    Post subject: Reply with quote

I have thought of bayesian approaches, and also have a neural net approach in mind.

But I think I will try this first. I think a converging system (similar to ELO perhaps) could be used to bring the efficiencies to their stable value.

Once I have the appropriate data I can run some tests.

Perhaps I am just naive in thinking that this approach will be substantially better than just using averaging... but I'll try anyway.
Back to top
View user's profile Send private message
Mountain



Joined: 13 Mar 2007
Posts: 437

PostPosted: Wed Jan 30, 2008 12:19 pm    Post subject: Reply with quote

With these team pages you don't have to manually break FGs from FGAs into separate columns.

http://www.basketball-reference.com/fc/tgl.cgi?team=SEA&year=2008

30 cut and pastes isn't that bad.
Back to top
View user's profile Send private message
Dream-



Joined: 26 Jan 2008
Posts: 13

PostPosted: Wed Jan 30, 2008 2:04 pm    Post subject: Reply with quote

Mountain wrote:
With these team pages you don't have to manually break FGs from FGAs into separate columns.

http://www.basketball-reference.com/fc/tgl.cgi?team=SEA&year=2008

30 cut and pastes isn't that bad.


Oh that's a nice format.

The data collecting will be done programatically every day, so no cut and paste. (30 cut and pastes every day could get boring after a few days Smile

I'll ask the people from Basketball-reference if it would be possible to compile these numbers on the season game summary, so that only one file has to be retrieved. That would be fantastic. (I also wonder if they have access to point spread and Over/Under Totals data).
Back to top
View user's profile Send private message
Chicago76



Joined: 06 Nov 2005
Posts: 77

PostPosted: Thu Jan 31, 2008 12:20 am    Post subject: Reply with quote

I may be wrong, but it sounds to me like you're trying to develop a predictive model comprised of SOS-adjusted O and D ratings of teams. Voros McCracken of DIPS fame did this a few years back for international soccer. If this is the case, why not look at the last few years of data to develop the system and test first?

You can get the last years of data using a link similar to the one provided above: http://www.basketball-reference.com/fc/tgl.cgi?team=SEA&year=2007

You could calculate your adjusted O and D Rtg chronologically to determine the predictive ability of your model on the next day vs. the traditional non-SOS adjusted O and D ratings. The ultimate indicator of whether SOS significantly influences the ratings would be in the predictive power of your metric vs. the traditional one.

I suspect adjusting for opponent quality would be an improvement. How much is the question.

The biggest problem may be that the effect of injuries may dwarf any meaningful difference between the two.

This is very interesting. Keep us posted.
Back to top
View user's profile Send private message
Dream-



Joined: 26 Jan 2008
Posts: 13

PostPosted: Thu Jan 31, 2008 1:13 am    Post subject: Reply with quote

Chicago76 wrote:
I may be wrong, but it sounds to me like you're trying to develop a predictive model comprised of SOS-adjusted O and D ratings of teams. Voros McCracken of DIPS fame did this a few years back for international soccer.


I am not familiar with the McCracken system, but I am indeed developing a strength based model.

Quote:
If this is the case, why not look at the last few years of data to develop the system and test first?
...
You could calculate your adjusted O and D Rtg chronologically to determine the predictive ability of your model on the next day vs. the traditional non-SOS adjusted O and D ratings. The ultimate indicator of whether SOS significantly influences the ratings would be in the predictive power of your metric vs. the traditional one.


This is what I am doing. I have already tested data from past seasons but using unadjusted strengths. This gives pretty good predictions for Win-Lose and point differentials, but it is not as good for predicting actual scores. Which is why I am trying to change the model.

Quote:
This is very interesting. Keep us posted.


It is very interesting indeed. I am already finding surprising results with respect to home court advantage and how rested teams are.

I am adjusting the strengths individually for these factors because it seems that different teams react much differently to them.

I am also finding that some teams are incredibly hard to predict (with 50% hit rate) while others are very consistent (95% hit rate). And it is not the ones on the very top or bottom of the standings.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    APBRmetrics Forum Index -> General discussion All times are GMT - 5 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group