APBRmetrics

Ed Küpfer · Joined: 30 Dec 2004 Posts: 787 Location: Toronto

I hope to add some thoughts to this thread at irregular intervals.

To predict the outcome of a game, Bill James's "log5" method has been used traditionally:

kjb · Joined: 03 Jan 2005 Posts: 865 Location: Washington, DC

How does it do with predicting victory margins? Twisted Evil

Dan Rosenbaum · Posted: Fri Jan 07, 2005 12:56 pm Post subject:

This is very interesting work, ed. Thanks! I bet injuries and suspensions are the explanation for why the formula does poorly on the tails. In those cases the predictions are based upon players who are not playing in that given game, so it would not be surprising for the predictions to be off. For that reason I would not be too aggressive in pursuing other functional forms that would predict more wins in the tails. They may fit the data better but for the wrong reasons.

Ed Küpfer · Joined: 30 Dec 2004 Posts: 787 Location: Toronto

WizardsKev: I don't see why the variables can't be regressed against the win margin instead of simply the win/loss binary outcome. We've seen from the Pythagorean method that win margins tell us something real about the quality of the teams, so this might be something worth persuing. I am in the middle of assembling more data to use as variables (rest days and distance traveled) so it will have to wait a few days.

Dan Rosenbaum: Thanks for the input. You're probably right that I shouldn't waste time on the results of extreme predictions. In the next day or two I'll post a logit model for game predictions, so maybe these extreme predictions will go away!
_________________
ed

Ed Küpfer · Joined: 30 Dec 2004 Posts: 787 Location: Toronto

Using the same game data as above, I used Minitab's binary logistic function to model the outcomes. logistic regression is basically the same type of thing as linear regression, except that while linear regression is used on dependant variables that are continuous (like height, or points per game), logistic regression is used on outcomes that are of the win/loss, yes/no, hit/miss variety. My dependant variable was HomeWin (1=yes, 0=no). I'll append the Minitab output at the bottom of this post.

To show how well the logistic model predicted game outcomes in comparison to the log5 method mentioned previously, take a look at the following graph:

A perfect prediction model would lie right along the grey diagonal line. Both models work pretty well, but you can see that the logistic model is slightly better. If you'd like to use it to predict game outcomes, use the following equation:

Ed Küpfer · Joined: 30 Dec 2004 Posts: 787 Location: Toronto

Similar to the analysis above: binary logistic regression, using the following variables:

KnickerBlogger · Joined: 30 Dec 2004 Posts: 180

Sam O · Joined: 14 Jan 2005 Posts: 5 Location: New York, NY

I'm new to this and really enjoying this thread.

My interpretation of HOME% was simply the calculated pyg win% of the home team in question.

My question is how did you expand this formula

KnickerBlogger · Joined: 30 Dec 2004 Posts: 180

Ed Küpfer · Joined: 30 Dec 2004 Posts: 787 Location: Toronto

KnickerBlogger: HOME% and AWAY% are merely estimates of the Home and Away team's strength. I used their Pythagorean win% up to that point in the season, but you can use anything: win-loss%, Gaussian%, whatever. Note that these numbers aren't adjusted for home court advantage, they just represent the strength of whatever team happens to be at home, and whatever team happens to be visiting. The equation attempts to answer the question, "What is the probability of a home team win, given team A at home and team B visiting?"

Sam O · Joined: 14 Jan 2005 Posts: 5 Location: New York, NY

Wow. The example really helps. Thank you for a great answer.

Sam

Dan Rosenbaum · Posted: Fri Jan 14, 2005 3:49 pm Post subject:

Very nice work, Ed.

I was trying to get a sense of magnitude of the effects of rest days. Here is what I came up using your regressions results.

Holding all of the other variables constant at the mean, the effect of X days of rest for the home team relative to no days of rest is the given increase in the probability of the home team winning. Remember all of these are relative to no days of rest.

1 day: 3.4 percentage point increase
2 days: 5.4 percentage point increase
3 days: 6.0 percentage point increase
4 days: 5.0 percentage point increase
5 days: 2.6 percentage point increase
6 days: 1.2 percentage point decrease

One thing you may want to consider is a non-quadratic functional form for the days of rest variables. Perhaps you could put in dummy variables for one day of rest, two days of rest, three days of rest, four or more days of rest. You could do the same for both home and away rest days. With all of the data that you have, you probably can get precise estimates for all of these parameters.

And again, great work.

Best wishes,
Dan

Ed Küpfer · Joined: 30 Dec 2004 Posts: 787 Location: Toronto

Dan Rosenbaum · Posted: Fri Jan 14, 2005 5:30 pm Post subject:

The omitted group here must be six or more days of rest. Since you probably have so few of those observations, you are getting really low p-values. I would suggest leaving out HREST0 and AREST0 and redefining HREST4 and AREST4 the following way.

HREST4 = 1 if home team has four or more days of rest, 0 otherwise
AREST4 = 1 if away team has four or more days of rest, 0 otherwise

With this setup, you also leave out HREST5 and AREST5.

This way you will be comparing everything to zero days of rest.

In this particular sample where the dependent variable is equal to one about 63 percent of the time, the marginal effect evaluated at the mean is given by the parameter estimate times 0.234.

So a coefficient of 0.24 for your HREST3 variable would imply that holding the other variables constant, a home team with three days of rest has a 0.234*0.24 = 0.052 or 5.2 percentage points better chance of winning than a home team on zero days of rest.

0.234 is the approximate value of the PDF (probability density function) evaluated at the mean in this particular sample. For the example given above where the predicted probability of the home team winning was far less than 63 percent, the marginal effect would be smaller - the parameter estimate times 0.062.

The mutliplier is largest for predicted probability of 0.5 and smaller for predicted probabilities close to zero or one.

Ed Küpfer · Joined: 30 Dec 2004 Posts: 787 Location: Toronto