12 Comments

  1. Ben

    I like it. Does changing the value slightly have much effect on prediction accuracy.

    Also, just to clarify, you looked at last year’s data and this year’s separately. Nothing from last year influences this year’s predictions – is that correct?

    Do you think looking at more years would improve the model or do you converge on the value of a = 0.2 pretty quickly?

    • DanielM

      Changing either value didn’t change predictive accuracy much, so I don’t think more years will do a whole lot to change things.

      When finding the value of 0.2, I looked at each year separately, but regressed as a whole (the models were independent, but I looked for what would minimize the error overall).

  2. what a bunch a ##…you stats heads think u know it all…there r so many things wrong with this i dont know where to begin…

    just kidding, very interesting stuff. A little hard to digest all at once past midnight though!

  3. This is really cool. A little hard to digest, but if I’m understanding correctly your using a Bayesian approach to figure out how well a team is playing “now” (recently). Yes?

    So the question is, how well does this model perform? 🙂

    • DanielM

      Yes, this is the best-fit Bayesian model to predict from the team history in games 1 to N-1 how the team will perform in game N.

      How would you like me to measure it? There’s still a ton of variance not explained by the model. Some could be explained by lineup changes, but still there is a ton not accounted for.

  4. I understand there is a lot potentially unaccounted for. Nonetheless, here’s my initial thinking: Start at a reasonable point in the season (10g in? 20g?) and every night, predict the point differential based on Bayesian differential. I think that’s the old Sagarin approach, no? (I’d adjust for HCA and HCA against back-to-backs.) See how many games it gets right and what the errors are. That seems like a simple place to start.

    Of course, that might be a complicated place to start because it involves updating the Rankings after every game, but that’s the merit of something like this anyway, so it’s seems less useful to test a “final” ranking against a block of games.

    • DanielM

      That’s how I did these rankings–hence the motion chart!

      Except… how do you adjust for opponent? I used the current non-Bayesian adjusted ratings for the opponent adjustment, and then applied the Bayesian on the already-adjusted game ratings.

  5. Ah right – forgot to activate the flash on that. Really really nice motion chart.

    You aren’t following what I’m saying about testing it I think (or maybe you are and I’m about to embarrass myself with simplicity). Let me give you an example using philadelphia. Let’s call t the game number (t=1 is opening night. t=82 the last game of the RS)

    At t=18 (dec 3) they are -1.3 when they play the Atlanta Hawks. Atlanta is +1.0. In a linear sense, the expected result is Atlanta to win by 2.3 points. (Again, there might be a better way for those numbers to interact — your math is far ahead of mine.) Atlanta is also playing at home, so they’d need some kind of adjustment (this year on my latest pass it was ~2.5 points — you could estimate the difference per team of course.) So the line should be Atlanta by 4.8 points. Compare that to the final result for a) the difference and b) the W/L result.

    (I picked that game at random, and sure enough Atlanta won by 5 points.) Check it around the league every night like that and see how good it is at predicting games. There are plenty of ways to test these things, but that’s the first one that game into my head.

  6. Jerry

    Does the weighing have to be done in the “1/(12)^2, then 1/(12+a)^2” way? I’m currently messing around with 1/(x^b) where b is the number of days this particular game occured before the current date. X has to be determined by CV. Should this lead to the same result, or is your way definitely better?
    I guess I’ll try that myself in the next couple of days but figured I might ask anyway

    • DanielM

      The theoretical weighting for a Bayesian problem should be based on 1/(StdErr)^2.

      Commonly, time weighting is done as x^b (not 1/x^b), where b is time period ago and x is less than 1. This implies a specific factor increasing stderr as the time ago increases.

      Suppose our x factor is 0.95, so weighting is of the form 0.95^b, where b is time ago.

      In a Bayesian standard error formulation, that same weight could be expressed as a standard error of e^(-.5*ln(0.95)) . That makes more sense than increasing the standard error with the 12 + a,2a,… formulation in the article.

      My formulation in the article is probably not correct; I should have used something like the ones above.

      I recommend using a formulation of x^b, where b is days ago and x<1, found by cross-validation.

Leave a Reply to ElGee Cancel reply

Your email address will not be published. Required fields are marked *