web analytics

Bayesian Golf Ratings and Masters Preview

April 6, 2011
By

That’s right, golf.  I’m taking up where Ken Pomeroy left off.  A year or two ago, he developed a rating system for golfers–basically, he created a huge regression of all players and all specific rounds at tournaments.  Each round was assigned a level of difficulty, and each player was assigned an overall rating.  His numbers, prior to the PGA Championship in 2009, are on his website, including odds for each player to win.

I’m attempting to both continue the effort and take it a step further.  I’m creating a Bayesian rating system that best projects out-of-sample (future) performance.

To do this, I compiled all tournaments on the European and PGA tours for this year and the previous 2 years.  I didn’t grab any other tour’s data, since it was a little harder to get a hold of.

Next, I looked at the number of variables.  ~2000 players and ~200 tournaments (with ~4 rounds each).  That’s 2800 unknowns right off the bat!  Ouch.

So I simplified.  I chose a subset of “baseline golfers” that played a bunch of rounds in the last 2+ years, across both tours.  These ~80 golfers I defined to sum to 0, to set the baseline for each course.  I then took the 800 tournament rounds, split them into 7 chunks, and solved for each chunk and the 80 golfers simultaneously.  Thus, the 80 golfers could vary amongst themselves, but they had to sum to 0–and then the tournament round difficulties were estimated against them.  Thus, some rounds were assigned a difficulty of 73, others 68.

Once I had set a difficulty level for each round in the past 2+ years, it was time to get Bayesian.  I didn’t do it explicitly like I have previously.  I gave a weight parameter, slightly less than 1, and weighted the results of each round for each player by the (weight parameter)^n, where n is the number of weeks since that tournament.  I then added in a regression toward a fixed value (A), with a weight (R).  All ready, then!

I took the players that have played more than 140 rounds in the past 2+ years, and minimized the prediction error ^2 for each round in their past 10 tournaments played.  This gave me the weight parameter, fixed value (A), and weight (R).

In order to do a prediction of The Masters, I had to find out how much the players varied from round to round.  So I calculated sqrt(average(predictionerror^2)) for the last 15 tournaments for each player.  For the players with the most data, this average 2.78.  I then regressed each player’s standard deviation toward that mean of 2.78, to get a true estimate of the standard deviation going forward.

Well then.  That’s about it!  We’ve got a Bayesian prediction for the next tournament, and a per-round standard deviation.  Perfect for a Monte Carlo!

First, the ratings themselves, for the top 100 players of the US/Euro tours:

RankPlayersTotal RoundsPGAEuroBayesian RatingAverage RatingStdev
1Martin Kaymer18874114-1.45-1.822.73
2Graeme McDowell20380123-1.40-1.602.82
3Charl Schwartzel23368165-1.40-1.682.92
4Lee Westwood18676110-1.39-1.982.79
5Matt Kuchar2052014-1.32-1.412.47
6Francesco Molinari22452172-1.31-1.713.00
7Luke Donald20215844-1.21-1.442.70
8Steve Stricker1751714-1.18-1.772.70
9Nick Watney20619412-1.18-1.322.95
10Rory McIlroy20898110-1.17-1.682.89
11Paul Casey16610660-1.11-1.642.50
12Phil Mickelson19316726-1.08-1.392.82
13Tiger Woods13511916-1.04-2.202.84
14Louis Oosthuizen19740157-1.01-1.342.96
15Retief Goosen23616472-0.98-1.392.77
16Raphael Jacquelin2326226-0.91-0.982.56
17Thomas Aiken19614182-0.90-1.102.63
18Anders Hansen19624172-0.88-1.192.61
19Peter Hanson20248154-0.84-1.372.87
20Justin Rose21017634-0.82-0.982.69
21Dustin Johnson1901864-0.81-1.132.86
22Hunter Mahan2092054-0.81-1.132.69
23Richard Green1646158-0.80-1.292.78
24Joost Luiten1450145-0.77-1.032.75
25Alvaro Quiros21054156-0.75-1.082.75
26Jamie Donaldson2090209-0.72-1.012.83
27Edoardo Molinari14538107-0.71-1.242.75
28Padraig Harrington20213963-0.71-1.172.86
29Stephen Gallacher1638155-0.70-0.832.60
30Robert Allenby19216428-0.70-1.082.53
31Ernie Els23716473-0.70-1.252.76
32Miguel Angel Jimenez22650176-0.69-1.122.93
33Ian Poulter18811870-0.69-1.152.80
34Anthony Wall2046198-0.69-1.052.41
35Jim Furyk1821820-0.69-1.322.57
36Rickie Fowler1501446-0.68-0.792.72
37Tim Clark19717522-0.65-1.122.47
38David Lynn1970197-0.65-0.932.83
39Chris Wood19516179-0.63-1.022.62
40Martin Laird20619016-0.62-0.363.04
41Charles Howell III2342340-0.62-0.722.75
42Robert-Jan Derksen2070207-0.61-1.012.40
43Jean-Baptiste Gonnet1940194-0.61-0.642.94
44Spencer Levin2252250-0.59-0.552.88
45Bill Haas2142104-0.59-0.682.82
46Gregory Bourdy2308222-0.58-0.762.27
47Ben Crane1941904-0.57-0.892.72
48Ross Fisher18770117-0.57-0.982.86
49Rafael Cabrera-Bello2274223-0.57-0.722.66
50David Toms2042040-0.56-0.922.64
51Kevin Na2112110-0.56-0.902.52
52Sergio Garcia18811870-0.55-1.002.81
53Thongchai Jaidee20736171-0.55-1.092.75
54Vijay Singh1711692-0.54-0.652.63
55Zach Johnson2002000-0.54-1.062.39
56K.J. Choi18917118-0.53-0.792.60
57Matteo Manassero961482-0.53-0.973.02
58Rory Sabbatini22319330-0.52-0.613.02
59Stewart Cink1781744-0.52-0.682.56
60Bo Van Pelt2202200-0.51-0.802.56
61Darren Clarke21130181-0.51-0.722.95
62Bubba Watson1741740-0.51-0.832.82
63Thomas Bjorn1686162-0.50-0.702.81
64Robert Dinwiddie1390139-0.50-0.632.78
65J.B. Holmes2032012-0.49-0.512.84
66Robert Karlsson1516487-0.48-0.902.99
67Peter Lawrie2150215-0.47-0.972.84
68Brendon de Jonge2392390-0.47-0.432.86
69Gary Woodland1001000-0.460.022.93
70Johan Edfors19510185-0.46-0.863.00
71Bradley Dredge2074203-0.46-0.902.85
72Ryan Moore1901864-0.46-0.712.98
73Ignacio Garrido2294225-0.46-0.892.86
74John Senden2402346-0.46-0.742.69
75Simon Dyson22226196-0.46-1.012.83
76Damien McGrane2452243-0.45-0.912.92
77J.J. Henry2202200-0.44-0.512.74
78Brandt Snedeker1951932-0.42-0.652.88
79Gonzalo Fernandez-Casta19828170-0.42-0.932.90
80Paul Lawrie1816175-0.42-0.762.68
81Steve Marino2051996-0.42-0.772.82
82Jaco Van Zyl46046-0.42-1.322.78
83Brian Gay2292218-0.42-0.582.49
84James Kingston18712175-0.41-0.602.99
85Soren Hansen22242180-0.41-1.002.31
86Geoff Ogilvy18115130-0.41-0.992.68
87Nicolas Colsaerts1020102-0.40-0.832.78
88Gareth Maybin2166210-0.40-0.923.04
89Jonathan Byrd1941940-0.39-0.592.86
90Adam Scott17213636-0.38-0.672.88
91Aaron Baddeley19518312-0.38-0.412.90
92Fredrik Jacobson1921920-0.38-0.552.69
93Jerry Kelly2112074-0.36-0.572.72
94Soren Kjeldsen21142169-0.36-0.892.68
95Scott Verplank1721720-0.33-0.642.83
96Jeev Milkha Singh230104126-0.33-0.642.77
97Danny Willett1952193-0.33-0.953.12
98Webb Simpson2152150-0.33-0.342.70
99Alexander Noren1886182-0.30-0.822.78
100Marc Leishman2242168-0.30-0.392.88

And here are the Monte Carlo simulation results, for this year’s Masters.  Some players did not have enough data to make a projection; I’ll assume none of them win (that would be old champs, a few Asian players, and the amateurs.)

Notes on the results:

  • There is no clear favorite, as opposed to Pomeroy’s ratings for the ’09 PGA.  Tiger has the best average rating over the past 2+ years, but has not done well recently.  For the ’09 PGA, Tiger had a 25% chance of winning!  Nothing like that this year.
  • The European contingent looks really strong.
  • Who is Charl Schshwartszel?  Did I misspell that?  Will we set an all-time record for misspellings of contenders?
  • Tiger and Phil have similar odds?  Remember, injuries aren’t accounted for explicitly, so Phil’s playing with arthritis and not doing well may be counted too strongly.
  • Compare with Jason Sobel’s Rankings.
RankPlayerBayesianStdevWins/MillionWin%Avg. Rank1 in
1Charl Schwartzel-1.402.92471264.71%28.221.2
2Francesco Molinari-1.313.00452994.53%29.722.1
3Graeme McDowell-1.402.82426844.27%27.923.4
4Martin Kaymer-1.452.73414584.15%26.824.1
5Lee Westwood-1.392.79408484.08%27.924.5
6Nick Watney-1.182.95358813.59%31.527.9
7Rory McIlroy-1.172.89333643.34%31.530.0
8Louis Oosthuizen-1.012.96287072.87%34.234.8
9Luke Donald-1.212.70274062.74%30.536.5
10Phil Mickelson-1.082.82263582.64%32.937.9
11Steve Stricker-1.182.70259942.60%30.938.5
12Tiger Woods-1.042.84254992.55%33.539.2
13Matt Kuchar-1.322.47232852.33%28.242.9
14Retief Goosen-0.982.77209612.10%34.447.7
15Peter Hanson-0.842.87192041.92%36.952.1
16Dustin Johnson-0.812.86180101.80%37.355.5
17Martin Laird-0.623.04176651.77%40.556.6
18Paul Casey-1.112.50167261.67%31.759.8
19Miguel Angel Jimenez-0.692.93165141.65%39.360.6
20Padraig Harrington-0.712.86151711.52%39.065.9
21Rory Sabbatini-0.523.02145431.45%42.268.8
22Alvaro Quiros-0.752.75135771.36%38.273.7
23Justin Rose-0.822.69135431.35%37.073.8
24Ian Poulter-0.692.80134721.35%39.174.2
25Anders Hansen-0.882.61134171.34%35.974.5
26Hunter Mahan-0.812.69133231.33%37.175.1
27Robert Karlsson-0.482.99131651.32%42.976.0
28Edoardo Molinari-0.712.75127211.27%38.878.6
29Ernie Els-0.702.76126351.26%39.179.1
30Ryan Moore-0.462.98122911.23%43.281.4
31Ross Fisher-0.572.86119601.20%41.383.6
32Gary Woodland-0.462.93114121.14%43.087.6
33Bill Haas-0.592.82113451.13%40.988.1
34Rickie Fowler-0.682.72110351.10%39.490.6
35Sergio Garcia-0.552.81105761.06%41.694.6
36Bubba Watson-0.512.82101541.02%42.398.5
37Brandt Snedeker-0.422.8896950.97%43.7103.1
38Aaron Baddeley-0.382.9094890.95%44.5105.4
39Ben Crane-0.572.7291970.92%41.2108.7
40Adam Scott-0.382.8891930.92%44.4108.8
41Jonathan Byrd-0.392.8687100.87%44.4114.8
42Steve Marino-0.422.8286290.86%43.8115.9
43Jim Furyk-0.692.5784680.85%39.1118.1
44Anthony Kim-0.193.0081980.82%47.6122.0
45Robert Allenby-0.702.5380270.80%38.8124.6
46Y.E. Yang-0.292.9079540.80%45.9125.7
47David Toms-0.562.6475660.76%41.4132.2
48Vijay Singh-0.542.6373110.73%41.6136.8
49K.J. Choi-0.532.6066360.66%41.8150.7
50Charley Hoffman-0.132.9465210.65%48.6153.4
51D.A. Points-0.222.8563970.64%47.1156.3
52Henrik Stenson0.083.1263330.63%51.9157.9
53Tim Clark-0.652.4762280.62%39.6160.6
54Jerry Kelly-0.362.7262280.62%44.8160.6
55Geoff Ogilvy-0.412.6861700.62%44.0162.1
56Alex Cejka-0.152.9060620.61%48.4165.0
57Stewart Cink-0.522.5659760.60%42.0167.3
58Bo Van Pelt-0.512.5658610.59%42.1170.6
59Jeff Overton-0.172.8456250.56%48.1177.8
60Kevin Na-0.562.5256240.56%41.3177.8
61Mark Wilson-0.202.7245400.45%47.5220.3
62Fred Couples0.183.0041620.42%53.5240.3
63Ricky Barnes-0.102.7641280.41%49.3242.2
64Stuart Appleby0.112.9339540.40%52.6252.9
65Zach Johnson-0.542.3939520.40%41.6253.0
66Jason Day-0.252.6136790.37%46.8271.8
67Jhonattan Vegas-0.022.7835520.36%50.6281.5
68Sean O'Hair-0.202.5830880.31%47.8323.8
69Camilo Villegas0.022.7430610.31%51.3326.7
70Lucas Glover-0.132.6330450.30%48.8328.4
71Kevin Streelman-0.012.5921800.22%50.9458.7
72Carl Pettersson0.072.6521070.21%52.3474.6
73Trevor Immelman0.292.7919320.19%55.5517.6
74Gregory Havret-0.052.5419250.19%50.4519.5
75Heath Slocum0.192.7018110.18%54.2552.2
76Arjun Atwal0.422.8517630.18%57.7567.2
77Angel Cabrera-0.052.4715590.16%50.3641.4
78Ryan Palmer0.202.5812030.12%54.6831.3
79Jose Maria Olazabal0.973.0711160.11%64.9896.1
80Jason Bohn0.302.579710.10%56.31029.9
81Davis Love III0.082.408570.09%52.81166.9
82Kyung-Tae Kim0.752.786400.06%62.81562.5
83Hiroyuki Fujita0.942.784520.05%65.52212.4
84Ryo Ishikawa0.922.784510.05%65.22217.3
85Tom Watson0.922.784450.04%65.22247.2
86Mike Weir1.213.0300.00%68.21000000.0
87Yuta Ikeda1.282.7800.00%69.81000000.0
88Mark O'Meara2.152.7800.00%78.61000000.0

Players left out:

  • Nathan Smith
  • David Chung
  • Hideki Matsuyama
  • Jin Jeong
  • Lion Kim
  • Peter Uihlein
  • Ben Crenshaw
  • Craig Stadler
  • Ian Woosnam
  • Larry Mize
  • Sandy Lyle

Tags: , , , , , ,

11 Responses to Bayesian Golf Ratings and Masters Preview

  1. rexfordbuzzsaw on April 7, 2011 at 12:00 am

    First off, this is good stuff, it’s nice to see someone with a brain try to rate golfers as opposed to Jason Sobel.

    For the past couple of years, I’ve calculated a world golf ranking based on standardized scores across the world’s three biggest tours (PGA, NW, EPGA). You can see my full masters rankings here.

    I think you have a couple of problems, though. For one, the European Tour is weighted to highly. I can tell you this, because I think I did a similar thing. I used to make no difference between playing on the European and PGA Tour and my odds looked a lot like that. In reality there is about a .21 standard deviation difference (~.6 strokes per round) between the PGA and European Tours. It’s about .35 for the NW Tour to PGA Tour.

    I think that’s reason you are so high on the European Tour players like Schwartzel and Molinari.

    I’d like to know how much of an impact recent play has in your rankings, because just looking at these off the top of my head I think it might be too much. I know in my rankings a player that is playing well recently should get a max bonus of around .2- .3 strokes per round. That is somewhat empirically derived, but mostly I came up common sense observation and by adjusting Vegas odds to my rankings.

    Finally, using standard deviation of the sample I don’t think is accurate. I’m not sure positive about this, but from what I’ve done a players true standard deviation has a direct correlation between the players average score in relation to the field. There is no correlation in a single players standard deviation from year to year. Some years Tiger has a really low standard deviation, others he’s had a really high. This applies to almost every single player who has played a large sample of rounds from every year since 2002.

    Hope that gives you something to think about and helps going forward.

    • DanielM on April 7, 2011 at 6:30 am

      Thanks for the very informative post!

      1) Theoretically, if I have good connectivity between tours, the European events will be adjusted appropriately automatically. I’m rating each player at each event NOT vs. “other players” but against the “baseline” for that round. This baseline is calculated as the average of how the 80 “baseline golfers” did in that round.

      That group could be only 10 in any given round, and not all of those golfers played on both tours. My regression may not properly capture the difference between the tours in difficulty of round. I’ll look at it a bit more today.

      That said–the 0.6 per round. Is that just a general average? Because the difficulty of each round varies WIDELY based on weather/course/etc.

      2) The best fit for projecting out of sample tournament results yielded a weighting that dropped from 1 in the most recent events to about 0.05 in the events and the start of 2009. Then, everybody is regressed with about 6.5 rounds (weighted at 1) of +2.5 or so golf. The weighting towards recency seemed a bit strong to me, as well.

      3) Regarding standard deviation: you may be right on this. Perhaps I should just assign everyone a consistent standard deviation? That said, I did regress each player’s standard deviation to the mean pretty hard via Bayesian inference with a prior of Avg.

      Thanks again for your insights!

  2. bradluen on April 7, 2011 at 12:25 am

    Do “Average Rating” and “Bayesian Rating” have the same baseline? If so, that’s a huge regression effect. Though maybe you need a huge regression effect with only two years of data.

    • DanielM on April 7, 2011 at 6:32 am

      Yes, they have the same baseline. There’s more than just the regression effect going on, though. There’s also the deprecation of the older results. Tiger’s oldest results are also his best results, so when they get weighted only 10% as strongly as his most recent results, that drops him a ton. There’s also regression to a baseline of +2.5 or so, but there’s only 6 rounds of +2.5 added to the weighted average–and a lot of guys have over 200 rounds. That effects the players with few rounds played a lot more.

  3. DanielM on April 7, 2011 at 8:08 am

    I’ll probably, if I have time today (I’m busy), try to revise the way I did the 80 baseline golfers to include a larger group and get a better connection between European and PGA tours. I agree with the various comments (twitter and here): it does look like the Euro players are rated unusually highly.

  4. rexfordbuzzsaw on April 7, 2011 at 11:23 am

    “That said–the 0.6 per round. Is that just a general average? Because the difficulty of each round varies WIDELY based on weather/course/etc.”

    Yes.

    The first thing I do is standardize the scores against everyone in the field. Obviously, it doesn’t matter if the course average is 75 or 69, the player’s relation to the field average is all that counts. That’s not to say a course plays at the same difficulty for someone playing at 8 a.m. as opposed to 1 p.m. I just hope most of that is randomness and balanced out in the long run, which I think it is pretty well.

    The next step is to assign a field difficulty based on everyone’s raw average from above over a three-year period.

    Finally, I take the adjusted z-scores and compare players rounds across tours. Over about 2000 rounds each way, players are on average .6 strokes better in relation to the field when they play on the European Tour. However, it does vary, like you said. The Dubai Desert Classic still boasts a stronger field than the Puerto Rico Open even though the European players raw rankings are inflated.

    • DanielM on April 8, 2011 at 12:12 pm

      I see. I’m taking a different approach, but if we’re each doing it right the answer should be the same. I’m standardizing against a group of “baseline players”, and there should be enough of them that play each round to get a good grasp of how hard the round was. I’m going to re-run the regression with over 140 players as baseline, using a slightly different procedure.

      I wish I weren’t getting a “cannot allocate memory for vector” error problem with R when I try to run it all that way. Apparently, no 2-300 MB block of contiguous RAM available?

  5. EvanZ on April 10, 2011 at 8:32 am

    Daniel, you had McIlroy top 10. Pretty good!

    • Ben on April 10, 2011 at 5:35 pm

      Daniel’s looking even better today!

  6. Neil Paine on April 10, 2011 at 5:35 pm

    Schwartzel!!!!!!

  7. EvanZ on April 10, 2011 at 7:18 pm

    lol

    “Who is Charl Schshwartszel? Did I misspell that? Will we set an all-time record for misspellings of contenders?”

Leave a Reply

Your email address will not be published. Required fields are marked *

Current day month ye@r *

DSMok1 on Twitter

To-Do List

  1. Salary and contract value discussions and charts
  2. Multi-year APM/RAPM with aging incorporated
  3. Revise ASPM based on multi-year RAPM with aging
  4. ASPM within-year stability/cross validation
  5. Historical ASPM Tableau visualizations
  6. Create Excel VBA recursive web scraping tutorial
  7. Comparison of residual exponents for rankings
  8. Comparison of various "value metrics" ability to "explain" wins
  9. Publication of spreadsheets used
  10. Work on using Bayesian priors in Adjusted +/-
  11. Work on K-Means clustering for player categorization
  12. Learn ridge regression
  13. Temporally locally-weighted rankings
  14. WOWY as validation of replacement level
  15. Revise ASPM with latest RAPM data
  16. Conversion of ASPM to" wins"
  17. Lineup Bayesian APM
  18. Lineup RAPM
  19. Learn SQL