View previous topic :: View next topic |
Author |
Message |
Dream-
Joined: 26 Jan 2008 Posts: 13
|
Posted: Mon Jan 28, 2008 7:13 pm Post subject: Searching for Data |
|
|
I am looking for some kind of database or webpage where I can find:
For the current season, date, teams, score, Field goal attempts, for each game played in the season. Additionally a general point spread at the time of the game.
Any ideas where I can find this information?
So far the best I have been able to get is date, teams and score for each game. |
|
Back to top |
|
|
Dream-
Joined: 26 Jan 2008 Posts: 13
|
Posted: Tue Jan 29, 2008 1:13 pm Post subject: |
|
|
I have been able to get some of the information.
For games the most useful has been:
http://www.basketball-reference.com/leagues/NBA_2008_games.html
For point spread and O/U info, I can get the games per team (not the best way, but at least I can compile a table from the individual team charts):
http://stats.therx.com/NBA/gamelogs/GameLogs.aspx?TeamId=1
But I am still missing some source where I can find the field goal attempts (other shot data would be very beneficial).
The way Offensive and Defensive efficiencies are computed everywhere is flawed because it does not take into consideration the opposing team's efficiencies (and pace) which is crucial in my opinion.
By using the opposing team efficiencies we can get a normalized efficiency which can then be multiplied by a normalized pace factor to obtain a better predictor. |
|
Back to top |
|
|
hoopseng
Joined: 13 Oct 2006 Posts: 54 Location: Basketball Research
|
Posted: Tue Jan 29, 2008 4:14 pm Post subject: |
|
|
Dream- wrote: |
The way Offensive and Defensive efficiencies are computed everywhere is flawed because it does not take into consideration the opposing team's efficiencies (and pace) which is crucial in my opinion.
|
Are you sure?
Opposing team's offensive efficieny is your team's defensive efficiency. Your team's defensive efficiency is opposing team's offensive efficiency.
That's why league averages are equal to each other. You can find the data here:
http://www.nbastuffer.com/team_efficiency _________________ http://www.nbastuffer.com |
|
Back to top |
|
|
Dream-
Joined: 26 Jan 2008 Posts: 13
|
Posted: Tue Jan 29, 2008 4:34 pm Post subject: |
|
|
I could be wrong but the sites I have seen before seem to compute efficiency as a simple average of points made per 100 posessions.
But wouldn't points scored against a weak defensive team count less than points scored against a strong defensive team?
This is why I think you cannot just do a simple average.
Assume 3 idealized teams:
Team A has a real offensive efficiency of 100 points per 100 possesions.
Team B has a real Defensive efficiency of 90 points per 100 possesions.
Team C has a real Defensive efficiency of 110 points per 100 possesions.
What would be the expected points scored by A vs. B in 100 possesions? (surely less than 100, right?)
What would be the expected points scored by A vs. C in 100 possesions? (surely more than 100, right?)
If A scores the expected points against B or C, then its Off Eff should stay the same even when the points scored are above or below 100. But with the usual method, the Off Eff for A will change unless A scores 100 points in both cases.
So I think we need to normalize the Off/Def efficiencies.
PS. Very nice website you have! |
|
Back to top |
|
|
gabefarkas
Joined: 31 Dec 2004 Posts: 972 Location: Durham, NC
|
Posted: Tue Jan 29, 2008 5:52 pm Post subject: |
|
|
Dream- wrote: | I could be wrong but the sites I have seen before seem to compute efficiency as a simple average of points made per 100 posessions.
But wouldn't points scored against a weak defensive team count less than points scored against a strong defensive team? |
How would you determine that a defense is "weak" in the first place? A natural way would be to see how well other teams score against them, right? Specifically, if other teams score more points per 100 possessions against them, right?
Dream- wrote: | This is why I think you cannot just do a simple average.
Assume 3 idealized teams:
Team A has a real offensive efficiency of 100 points per 100 possesions.
Team B has a real Defensive efficiency of 90 points per 100 possesions.
Team C has a real Defensive efficiency of 110 points per 100 possesions. |
Define "real" in this case. Do you mean "underlying" or do you mean "in actuality"? If it's the former, I would posit there is no reliable way of determining it without using an instrumental variable or overspecifying, as far as I know. If it's the latter, then I would think a simple average would be their "real" efficiency.
Dream- wrote: | What would be the expected points scored by A vs. B in 100 possesions? (surely less than 100, right?)
What would be the expected points scored by A vs. C in 100 possesions? (surely more than 100, right?)
If A scores the expected points against B or C, then its Off Eff should stay the same even when the points scored are above or below 100. But with the usual method, the Off Eff for A will change unless A scores 100 points in both cases.
So I think we need to normalize the Off/Def efficiencies. |
Normalizing makes sense, but regression to the mean might make more sense, no? |
|
Back to top |
|
|
Dream-
Joined: 26 Jan 2008 Posts: 13
|
Posted: Tue Jan 29, 2008 6:13 pm Post subject: |
|
|
gabefarkas wrote: |
How would you determine that a defense is "weak" in the first place? A natural way would be to see how well other teams score against them, right? Specifically, if other teams score more points per 100 possessions against them, right?
Define "real" in this case. Do you mean "underlying" or do you mean "in actuality"? If it's the former, I would posit there is no reliable way of determining it without using an instrumental variable or overspecifying, as far as I know. If it's the latter, then I would think a simple average would be their "real" efficiency. |
Your first two questions are actually the same thing, so let me answer that.
My line of thought involves not determining the efficiency of a given team, but assuming a priori that there is some actual efficiency value that may or may not be known through calculations. This value is what I mean by "real". We may not be able to determine it, but if we assume this idealized team situation we can make some inferences from it.
My argument is that if you don't normalize the efficiencies, you are not getting to the true efficiency. A 100 OE team vs a 90 DE team is expected to score under 100 points per 100 poss. But the current efficiency computation method contradicts this notion because it will change the OE of the first team unless said team scores exactly 100 points per 100 poss.
In other words, scoring against a weaker def. team is easier, so if you score more points against it, it may not mean that you have raised your efficiency, but rather that you are just playing against a weaker defense. So those points scored should be worth less than points scored against stronger defenses. (and the converse must be true as well)
For example, if A has a real OE of 100, and B a real DE of 90 and C a real DE of 110, one could say that scoring a point against B is 1.1 times more difficult than scoring against a team with a DE of 100. In the same way, scoring a point against C is 0.9 times easier than it is against a team with a DE of 100. |
|
Back to top |
|
|
Mountain
Joined: 13 Mar 2007 Posts: 437
|
|
Back to top |
|
|
Dream-
Joined: 26 Jan 2008 Posts: 13
|
|
Back to top |
|
|
DLew
Joined: 13 Nov 2006 Posts: 72
|
Posted: Tue Jan 29, 2008 10:40 pm Post subject: |
|
|
You seem to be arguing that strength of schedule is important. I think everyone agrees, but generally it more or less averages out over many games. Even if a team's schedule was non-average the efficiency numbers are still descriptively accurate, they tell you what the team's points divided possessions for the season was. I don't think anyone is arguing that points divided possessions tells you exactly a team's true offensive strength, but it tends to be in the ball park. |
|
Back to top |
|
|
gabefarkas
Joined: 31 Dec 2004 Posts: 972 Location: Durham, NC
|
Posted: Tue Jan 29, 2008 11:40 pm Post subject: |
|
|
Dream- wrote: | My line of thought involves not determining the efficiency of a given team, but assuming a priori that there is some actual efficiency value that may or may not be known through calculations. This value is what I mean by "real". We may not be able to determine it, but if we assume this idealized team situation we can make some inferences from it. |
This part sounded like you were going for a Bayesian approach, which I've thought about in the past too. However, the rest of the post is more about normalizing and adjusting, which I think is much more challenging to implement. Have you thought about just a strictly Bayesian approach? |
|
Back to top |
|
|
Dream-
Joined: 26 Jan 2008 Posts: 13
|
Posted: Tue Jan 29, 2008 11:59 pm Post subject: |
|
|
I have thought of bayesian approaches, and also have a neural net approach in mind.
But I think I will try this first. I think a converging system (similar to ELO perhaps) could be used to bring the efficiencies to their stable value.
Once I have the appropriate data I can run some tests.
Perhaps I am just naive in thinking that this approach will be substantially better than just using averaging... but I'll try anyway. |
|
Back to top |
|
|
Mountain
Joined: 13 Mar 2007 Posts: 437
|
|
Back to top |
|
|
Dream-
Joined: 26 Jan 2008 Posts: 13
|
Posted: Wed Jan 30, 2008 2:04 pm Post subject: |
|
|
Oh that's a nice format.
The data collecting will be done programatically every day, so no cut and paste. (30 cut and pastes every day could get boring after a few days
I'll ask the people from Basketball-reference if it would be possible to compile these numbers on the season game summary, so that only one file has to be retrieved. That would be fantastic. (I also wonder if they have access to point spread and Over/Under Totals data). |
|
Back to top |
|
|
Chicago76
Joined: 06 Nov 2005 Posts: 77
|
Posted: Thu Jan 31, 2008 12:20 am Post subject: |
|
|
I may be wrong, but it sounds to me like you're trying to develop a predictive model comprised of SOS-adjusted O and D ratings of teams. Voros McCracken of DIPS fame did this a few years back for international soccer. If this is the case, why not look at the last few years of data to develop the system and test first?
You can get the last years of data using a link similar to the one provided above: http://www.basketball-reference.com/fc/tgl.cgi?team=SEA&year=2007
You could calculate your adjusted O and D Rtg chronologically to determine the predictive ability of your model on the next day vs. the traditional non-SOS adjusted O and D ratings. The ultimate indicator of whether SOS significantly influences the ratings would be in the predictive power of your metric vs. the traditional one.
I suspect adjusting for opponent quality would be an improvement. How much is the question.
The biggest problem may be that the effect of injuries may dwarf any meaningful difference between the two.
This is very interesting. Keep us posted. |
|
Back to top |
|
|
Dream-
Joined: 26 Jan 2008 Posts: 13
|
Posted: Thu Jan 31, 2008 1:13 am Post subject: |
|
|
Chicago76 wrote: | I may be wrong, but it sounds to me like you're trying to develop a predictive model comprised of SOS-adjusted O and D ratings of teams. Voros McCracken of DIPS fame did this a few years back for international soccer. |
I am not familiar with the McCracken system, but I am indeed developing a strength based model.
Quote: | If this is the case, why not look at the last few years of data to develop the system and test first?
...
You could calculate your adjusted O and D Rtg chronologically to determine the predictive ability of your model on the next day vs. the traditional non-SOS adjusted O and D ratings. The ultimate indicator of whether SOS significantly influences the ratings would be in the predictive power of your metric vs. the traditional one.
|
This is what I am doing. I have already tested data from past seasons but using unadjusted strengths. This gives pretty good predictions for Win-Lose and point differentials, but it is not as good for predicting actual scores. Which is why I am trying to change the model.
Quote: | This is very interesting. Keep us posted. |
It is very interesting indeed. I am already finding surprising results with respect to home court advantage and how rested teams are.
I am adjusting the strengths individually for these factors because it seems that different teams react much differently to them.
I am also finding that some teams are incredibly hard to predict (with 50% hit rate) while others are very consistent (95% hit rate). And it is not the ones on the very top or bottom of the standings. |
|
Back to top |
|
|
|