Bayesian Golf Ratings and Masters Preview

April 6, 2011 by Daniel M 11 Comments

That’s right, golf. I’m taking up where Ken Pomeroy left off. A year or two ago, he developed a rating system for golfers–basically, he created a huge regression of all players and all specific rounds at tournaments. Each round was assigned a level of difficulty, and each player was assigned an overall rating. His numbers, prior to the PGA Championship in 2009, are on his website, including odds for each player to win.

I’m attempting to both continue the effort and take it a step further. I’m creating a Bayesian rating system that best projects out-of-sample (future) performance.

To do this, I compiled all tournaments on the European and PGA tours for this year and the previous 2 years. I didn’t grab any other tour’s data, since it was a little harder to get a hold of.

Next, I looked at the number of variables. ~2000 players and ~200 tournaments (with ~4 rounds each). That’s 2800 unknowns right off the bat! Ouch.

So I simplified. I chose a subset of “baseline golfers” that played a bunch of rounds in the last 2+ years, across both tours. These ~80 golfers I defined to sum to 0, to set the baseline for each course. I then took the 800 tournament rounds, split them into 7 chunks, and solved for each chunk and the 80 golfers simultaneously. Thus, the 80 golfers could vary amongst themselves, but they had to sum to 0–and then the tournament round difficulties were estimated against them. Thus, some rounds were assigned a difficulty of 73, others 68.

Once I had set a difficulty level for each round in the past 2+ years, it was time to get Bayesian. I didn’t do it explicitly like I have previously. I gave a weight parameter, slightly less than 1, and weighted the results of each round for each player by the (weight parameter)^n, where n is the number of weeks since that tournament. I then added in a regression toward a fixed value (A), with a weight (R). All ready, then!

I took the players that have played more than 140 rounds in the past 2+ years, and minimized the prediction error ^2 for each round in their past 10 tournaments played. This gave me the weight parameter, fixed value (A), and weight (R).

In order to do a prediction of The Masters, I had to find out how much the players varied from round to round. So I calculated sqrt(average(predictionerror^2)) for the last 15 tournaments for each player. For the players with the most data, this average 2.78. I then regressed each player’s standard deviation toward that mean of 2.78, to get a true estimate of the standard deviation going forward.

Well then. That’s about it! We’ve got a Bayesian prediction for the next tournament, and a per-round standard deviation. Perfect for a Monte Carlo!

First, the ratings themselves, for the top 100 players of the US/Euro tours:

Rank	Players	Total Rounds	PGA	Euro	Bayesian Rating	Average Rating	Stdev
1	Martin Kaymer	188	74	114	-1.45	-1.82	2.73
2	Graeme McDowell	203	80	123	-1.40	-1.60	2.82
3	Charl Schwartzel	233	68	165	-1.40	-1.68	2.92
4	Lee Westwood	186	76	110	-1.39	-1.98	2.79
5	Matt Kuchar	205	201	4	-1.32	-1.41	2.47
6	Francesco Molinari	224	52	172	-1.31	-1.71	3.00
7	Luke Donald	202	158	44	-1.21	-1.44	2.70
8	Steve Stricker	175	171	4	-1.18	-1.77	2.70
9	Nick Watney	206	194	12	-1.18	-1.32	2.95
10	Rory McIlroy	208	98	110	-1.17	-1.68	2.89
11	Paul Casey	166	106	60	-1.11	-1.64	2.50
12	Phil Mickelson	193	167	26	-1.08	-1.39	2.82
13	Tiger Woods	135	119	16	-1.04	-2.20	2.84
14	Louis Oosthuizen	197	40	157	-1.01	-1.34	2.96
15	Retief Goosen	236	164	72	-0.98	-1.39	2.77
16	Raphael Jacquelin	232	6	226	-0.91	-0.98	2.56
17	Thomas Aiken	196	14	182	-0.90	-1.10	2.63
18	Anders Hansen	196	24	172	-0.88	-1.19	2.61
19	Peter Hanson	202	48	154	-0.84	-1.37	2.87
20	Justin Rose	210	176	34	-0.82	-0.98	2.69
21	Dustin Johnson	190	186	4	-0.81	-1.13	2.86
22	Hunter Mahan	209	205	4	-0.81	-1.13	2.69
23	Richard Green	164	6	158	-0.80	-1.29	2.78
24	Joost Luiten	145	0	145	-0.77	-1.03	2.75
25	Alvaro Quiros	210	54	156	-0.75	-1.08	2.75
26	Jamie Donaldson	209	0	209	-0.72	-1.01	2.83
27	Edoardo Molinari	145	38	107	-0.71	-1.24	2.75
28	Padraig Harrington	202	139	63	-0.71	-1.17	2.86
29	Stephen Gallacher	163	8	155	-0.70	-0.83	2.60
30	Robert Allenby	192	164	28	-0.70	-1.08	2.53
31	Ernie Els	237	164	73	-0.70	-1.25	2.76
32	Miguel Angel Jimenez	226	50	176	-0.69	-1.12	2.93
33	Ian Poulter	188	118	70	-0.69	-1.15	2.80
34	Anthony Wall	204	6	198	-0.69	-1.05	2.41
35	Jim Furyk	182	182	0	-0.69	-1.32	2.57
36	Rickie Fowler	150	144	6	-0.68	-0.79	2.72
37	Tim Clark	197	175	22	-0.65	-1.12	2.47
38	David Lynn	197	0	197	-0.65	-0.93	2.83
39	Chris Wood	195	16	179	-0.63	-1.02	2.62
40	Martin Laird	206	190	16	-0.62	-0.36	3.04
41	Charles Howell III	234	234	0	-0.62	-0.72	2.75
42	Robert-Jan Derksen	207	0	207	-0.61	-1.01	2.40
43	Jean-Baptiste Gonnet	194	0	194	-0.61	-0.64	2.94
44	Spencer Levin	225	225	0	-0.59	-0.55	2.88
45	Bill Haas	214	210	4	-0.59	-0.68	2.82
46	Gregory Bourdy	230	8	222	-0.58	-0.76	2.27
47	Ben Crane	194	190	4	-0.57	-0.89	2.72
48	Ross Fisher	187	70	117	-0.57	-0.98	2.86
49	Rafael Cabrera-Bello	227	4	223	-0.57	-0.72	2.66
50	David Toms	204	204	0	-0.56	-0.92	2.64
51	Kevin Na	211	211	0	-0.56	-0.90	2.52
52	Sergio Garcia	188	118	70	-0.55	-1.00	2.81
53	Thongchai Jaidee	207	36	171	-0.55	-1.09	2.75
54	Vijay Singh	171	169	2	-0.54	-0.65	2.63
55	Zach Johnson	200	200	0	-0.54	-1.06	2.39
56	K.J. Choi	189	171	18	-0.53	-0.79	2.60
57	Matteo Manassero	96	14	82	-0.53	-0.97	3.02
58	Rory Sabbatini	223	193	30	-0.52	-0.61	3.02
59	Stewart Cink	178	174	4	-0.52	-0.68	2.56
60	Bo Van Pelt	220	220	0	-0.51	-0.80	2.56
61	Darren Clarke	211	30	181	-0.51	-0.72	2.95
62	Bubba Watson	174	174	0	-0.51	-0.83	2.82
63	Thomas Bjorn	168	6	162	-0.50	-0.70	2.81
64	Robert Dinwiddie	139	0	139	-0.50	-0.63	2.78
65	J.B. Holmes	203	201	2	-0.49	-0.51	2.84
66	Robert Karlsson	151	64	87	-0.48	-0.90	2.99
67	Peter Lawrie	215	0	215	-0.47	-0.97	2.84
68	Brendon de Jonge	239	239	0	-0.47	-0.43	2.86
69	Gary Woodland	100	100	0	-0.46	0.02	2.93
70	Johan Edfors	195	10	185	-0.46	-0.86	3.00
71	Bradley Dredge	207	4	203	-0.46	-0.90	2.85
72	Ryan Moore	190	186	4	-0.46	-0.71	2.98
73	Ignacio Garrido	229	4	225	-0.46	-0.89	2.86
74	John Senden	240	234	6	-0.46	-0.74	2.69
75	Simon Dyson	222	26	196	-0.46	-1.01	2.83
76	Damien McGrane	245	2	243	-0.45	-0.91	2.92
77	J.J. Henry	220	220	0	-0.44	-0.51	2.74
78	Brandt Snedeker	195	193	2	-0.42	-0.65	2.88
79	Gonzalo Fernandez-Casta	198	28	170	-0.42	-0.93	2.90
80	Paul Lawrie	181	6	175	-0.42	-0.76	2.68
81	Steve Marino	205	199	6	-0.42	-0.77	2.82
82	Jaco Van Zyl	46	0	46	-0.42	-1.32	2.78
83	Brian Gay	229	221	8	-0.42	-0.58	2.49
84	James Kingston	187	12	175	-0.41	-0.60	2.99
85	Soren Hansen	222	42	180	-0.41	-1.00	2.31
86	Geoff Ogilvy	181	151	30	-0.41	-0.99	2.68
87	Nicolas Colsaerts	102	0	102	-0.40	-0.83	2.78
88	Gareth Maybin	216	6	210	-0.40	-0.92	3.04
89	Jonathan Byrd	194	194	0	-0.39	-0.59	2.86
90	Adam Scott	172	136	36	-0.38	-0.67	2.88
91	Aaron Baddeley	195	183	12	-0.38	-0.41	2.90
92	Fredrik Jacobson	192	192	0	-0.38	-0.55	2.69
93	Jerry Kelly	211	207	4	-0.36	-0.57	2.72
94	Soren Kjeldsen	211	42	169	-0.36	-0.89	2.68
95	Scott Verplank	172	172	0	-0.33	-0.64	2.83
96	Jeev Milkha Singh	230	104	126	-0.33	-0.64	2.77
97	Danny Willett	195	2	193	-0.33	-0.95	3.12
98	Webb Simpson	215	215	0	-0.33	-0.34	2.70
99	Alexander Noren	188	6	182	-0.30	-0.82	2.78
100	Marc Leishman	224	216	8	-0.30	-0.39	2.88

And here are the Monte Carlo simulation results, for this year’s Masters. Some players did not have enough data to make a projection; I’ll assume none of them win (that would be old champs, a few Asian players, and the amateurs.)

Notes on the results:

There is no clear favorite, as opposed to Pomeroy’s ratings for the ’09 PGA. Tiger has the best average rating over the past 2+ years, but has not done well recently. For the ’09 PGA, Tiger had a 25% chance of winning! Nothing like that this year.
The European contingent looks really strong.
Who is Charl Schshwartszel? Did I misspell that? Will we set an all-time record for misspellings of contenders?
Tiger and Phil have similar odds? Remember, injuries aren’t accounted for explicitly, so Phil’s playing with arthritis and not doing well may be counted too strongly.
Compare with Jason Sobel’s Rankings.

Rank	Player	Bayesian	Stdev	Wins/Million	Win%	Avg. Rank	1 in
1	Charl Schwartzel	-1.40	2.92	47126	4.71%	28.2	21.2
2	Francesco Molinari	-1.31	3.00	45299	4.53%	29.7	22.1
3	Graeme McDowell	-1.40	2.82	42684	4.27%	27.9	23.4
4	Martin Kaymer	-1.45	2.73	41458	4.15%	26.8	24.1
5	Lee Westwood	-1.39	2.79	40848	4.08%	27.9	24.5
6	Nick Watney	-1.18	2.95	35881	3.59%	31.5	27.9
7	Rory McIlroy	-1.17	2.89	33364	3.34%	31.5	30.0
8	Louis Oosthuizen	-1.01	2.96	28707	2.87%	34.2	34.8
9	Luke Donald	-1.21	2.70	27406	2.74%	30.5	36.5
10	Phil Mickelson	-1.08	2.82	26358	2.64%	32.9	37.9
11	Steve Stricker	-1.18	2.70	25994	2.60%	30.9	38.5
12	Tiger Woods	-1.04	2.84	25499	2.55%	33.5	39.2
13	Matt Kuchar	-1.32	2.47	23285	2.33%	28.2	42.9
14	Retief Goosen	-0.98	2.77	20961	2.10%	34.4	47.7
15	Peter Hanson	-0.84	2.87	19204	1.92%	36.9	52.1
16	Dustin Johnson	-0.81	2.86	18010	1.80%	37.3	55.5
17	Martin Laird	-0.62	3.04	17665	1.77%	40.5	56.6
18	Paul Casey	-1.11	2.50	16726	1.67%	31.7	59.8
19	Miguel Angel Jimenez	-0.69	2.93	16514	1.65%	39.3	60.6
20	Padraig Harrington	-0.71	2.86	15171	1.52%	39.0	65.9
21	Rory Sabbatini	-0.52	3.02	14543	1.45%	42.2	68.8
22	Alvaro Quiros	-0.75	2.75	13577	1.36%	38.2	73.7
23	Justin Rose	-0.82	2.69	13543	1.35%	37.0	73.8
24	Ian Poulter	-0.69	2.80	13472	1.35%	39.1	74.2
25	Anders Hansen	-0.88	2.61	13417	1.34%	35.9	74.5
26	Hunter Mahan	-0.81	2.69	13323	1.33%	37.1	75.1
27	Robert Karlsson	-0.48	2.99	13165	1.32%	42.9	76.0
28	Edoardo Molinari	-0.71	2.75	12721	1.27%	38.8	78.6
29	Ernie Els	-0.70	2.76	12635	1.26%	39.1	79.1
30	Ryan Moore	-0.46	2.98	12291	1.23%	43.2	81.4
31	Ross Fisher	-0.57	2.86	11960	1.20%	41.3	83.6
32	Gary Woodland	-0.46	2.93	11412	1.14%	43.0	87.6
33	Bill Haas	-0.59	2.82	11345	1.13%	40.9	88.1
34	Rickie Fowler	-0.68	2.72	11035	1.10%	39.4	90.6
35	Sergio Garcia	-0.55	2.81	10576	1.06%	41.6	94.6
36	Bubba Watson	-0.51	2.82	10154	1.02%	42.3	98.5
37	Brandt Snedeker	-0.42	2.88	9695	0.97%	43.7	103.1
38	Aaron Baddeley	-0.38	2.90	9489	0.95%	44.5	105.4
39	Ben Crane	-0.57	2.72	9197	0.92%	41.2	108.7
40	Adam Scott	-0.38	2.88	9193	0.92%	44.4	108.8
41	Jonathan Byrd	-0.39	2.86	8710	0.87%	44.4	114.8
42	Steve Marino	-0.42	2.82	8629	0.86%	43.8	115.9
43	Jim Furyk	-0.69	2.57	8468	0.85%	39.1	118.1
44	Anthony Kim	-0.19	3.00	8198	0.82%	47.6	122.0
45	Robert Allenby	-0.70	2.53	8027	0.80%	38.8	124.6
46	Y.E. Yang	-0.29	2.90	7954	0.80%	45.9	125.7
47	David Toms	-0.56	2.64	7566	0.76%	41.4	132.2
48	Vijay Singh	-0.54	2.63	7311	0.73%	41.6	136.8
49	K.J. Choi	-0.53	2.60	6636	0.66%	41.8	150.7
50	Charley Hoffman	-0.13	2.94	6521	0.65%	48.6	153.4
51	D.A. Points	-0.22	2.85	6397	0.64%	47.1	156.3
52	Henrik Stenson	0.08	3.12	6333	0.63%	51.9	157.9
53	Tim Clark	-0.65	2.47	6228	0.62%	39.6	160.6
54	Jerry Kelly	-0.36	2.72	6228	0.62%	44.8	160.6
55	Geoff Ogilvy	-0.41	2.68	6170	0.62%	44.0	162.1
56	Alex Cejka	-0.15	2.90	6062	0.61%	48.4	165.0
57	Stewart Cink	-0.52	2.56	5976	0.60%	42.0	167.3
58	Bo Van Pelt	-0.51	2.56	5861	0.59%	42.1	170.6
59	Jeff Overton	-0.17	2.84	5625	0.56%	48.1	177.8
60	Kevin Na	-0.56	2.52	5624	0.56%	41.3	177.8
61	Mark Wilson	-0.20	2.72	4540	0.45%	47.5	220.3
62	Fred Couples	0.18	3.00	4162	0.42%	53.5	240.3
63	Ricky Barnes	-0.10	2.76	4128	0.41%	49.3	242.2
64	Stuart Appleby	0.11	2.93	3954	0.40%	52.6	252.9
65	Zach Johnson	-0.54	2.39	3952	0.40%	41.6	253.0
66	Jason Day	-0.25	2.61	3679	0.37%	46.8	271.8
67	Jhonattan Vegas	-0.02	2.78	3552	0.36%	50.6	281.5
68	Sean O'Hair	-0.20	2.58	3088	0.31%	47.8	323.8
69	Camilo Villegas	0.02	2.74	3061	0.31%	51.3	326.7
70	Lucas Glover	-0.13	2.63	3045	0.30%	48.8	328.4
71	Kevin Streelman	-0.01	2.59	2180	0.22%	50.9	458.7
72	Carl Pettersson	0.07	2.65	2107	0.21%	52.3	474.6
73	Trevor Immelman	0.29	2.79	1932	0.19%	55.5	517.6
74	Gregory Havret	-0.05	2.54	1925	0.19%	50.4	519.5
75	Heath Slocum	0.19	2.70	1811	0.18%	54.2	552.2
76	Arjun Atwal	0.42	2.85	1763	0.18%	57.7	567.2
77	Angel Cabrera	-0.05	2.47	1559	0.16%	50.3	641.4
78	Ryan Palmer	0.20	2.58	1203	0.12%	54.6	831.3
79	Jose Maria Olazabal	0.97	3.07	1116	0.11%	64.9	896.1
80	Jason Bohn	0.30	2.57	971	0.10%	56.3	1029.9
81	Davis Love III	0.08	2.40	857	0.09%	52.8	1166.9
82	Kyung-Tae Kim	0.75	2.78	640	0.06%	62.8	1562.5
83	Hiroyuki Fujita	0.94	2.78	452	0.05%	65.5	2212.4
84	Ryo Ishikawa	0.92	2.78	451	0.05%	65.2	2217.3
85	Tom Watson	0.92	2.78	445	0.04%	65.2	2247.2
86	Mike Weir	1.21	3.03	0	0.00%	68.2	1000000.0
87	Yuta Ikeda	1.28	2.78	0	0.00%	69.8	1000000.0
88	Mark O'Meara	2.15	2.78	0	0.00%	78.6	1000000.0

Players left out:

Nathan Smith
David Chung
Hideki Matsuyama
Jin Jeong
Lion Kim
Peter Uihlein
Ben Crenshaw
Craig Stadler
Ian Woosnam
Larry Mize
Sandy Lyle

11 Comments

rexfordbuzzsaw

April 7, 2011 at 12:00 am

First off, this is good stuff, it’s nice to see someone with a brain try to rate golfers as opposed to Jason Sobel.

For the past couple of years, I’ve calculated a world golf ranking based on standardized scores across the world’s three biggest tours (PGA, NW, EPGA). You can see my full masters rankings here.

I think you have a couple of problems, though. For one, the European Tour is weighted to highly. I can tell you this, because I think I did a similar thing. I used to make no difference between playing on the European and PGA Tour and my odds looked a lot like that. In reality there is about a .21 standard deviation difference (~.6 strokes per round) between the PGA and European Tours. It’s about .35 for the NW Tour to PGA Tour.

I think that’s reason you are so high on the European Tour players like Schwartzel and Molinari.

I’d like to know how much of an impact recent play has in your rankings, because just looking at these off the top of my head I think it might be too much. I know in my rankings a player that is playing well recently should get a max bonus of around .2- .3 strokes per round. That is somewhat empirically derived, but mostly I came up common sense observation and by adjusting Vegas odds to my rankings.

Finally, using standard deviation of the sample I don’t think is accurate. I’m not sure positive about this, but from what I’ve done a players true standard deviation has a direct correlation between the players average score in relation to the field. There is no correlation in a single players standard deviation from year to year. Some years Tiger has a really low standard deviation, others he’s had a really high. This applies to almost every single player who has played a large sample of rounds from every year since 2002.

Hope that gives you something to think about and helps going forward.

Reply
- DanielM
  
  April 7, 2011 at 6:30 am
  
  Thanks for the very informative post!
  
  1) Theoretically, if I have good connectivity between tours, the European events will be adjusted appropriately automatically. I’m rating each player at each event NOT vs. “other players” but against the “baseline” for that round. This baseline is calculated as the average of how the 80 “baseline golfers” did in that round.
  
  That group could be only 10 in any given round, and not all of those golfers played on both tours. My regression may not properly capture the difference between the tours in difficulty of round. I’ll look at it a bit more today.
  
  That said–the 0.6 per round. Is that just a general average? Because the difficulty of each round varies WIDELY based on weather/course/etc.
  
  2) The best fit for projecting out of sample tournament results yielded a weighting that dropped from 1 in the most recent events to about 0.05 in the events and the start of 2009. Then, everybody is regressed with about 6.5 rounds (weighted at 1) of +2.5 or so golf. The weighting towards recency seemed a bit strong to me, as well.
  
  3) Regarding standard deviation: you may be right on this. Perhaps I should just assign everyone a consistent standard deviation? That said, I did regress each player’s standard deviation to the mean pretty hard via Bayesian inference with a prior of Avg.
  
  Thanks again for your insights!
  
  Reply
bradluen

April 7, 2011 at 12:25 am

Do “Average Rating” and “Bayesian Rating” have the same baseline? If so, that’s a huge regression effect. Though maybe you need a huge regression effect with only two years of data.

Reply
- DanielM
  
  April 7, 2011 at 6:32 am
  
  Yes, they have the same baseline. There’s more than just the regression effect going on, though. There’s also the deprecation of the older results. Tiger’s oldest results are also his best results, so when they get weighted only 10% as strongly as his most recent results, that drops him a ton. There’s also regression to a baseline of +2.5 or so, but there’s only 6 rounds of +2.5 added to the weighted average–and a lot of guys have over 200 rounds. That effects the players with few rounds played a lot more.
  
  Reply
DanielM

April 7, 2011 at 8:08 am

I’ll probably, if I have time today (I’m busy), try to revise the way I did the 80 baseline golfers to include a larger group and get a better connection between European and PGA tours. I agree with the various comments (twitter and here): it does look like the Euro players are rated unusually highly.

Reply
rexfordbuzzsaw

April 7, 2011 at 11:23 am

“That said–the 0.6 per round. Is that just a general average? Because the difficulty of each round varies WIDELY based on weather/course/etc.”

Yes.

The first thing I do is standardize the scores against everyone in the field. Obviously, it doesn’t matter if the course average is 75 or 69, the player’s relation to the field average is all that counts. That’s not to say a course plays at the same difficulty for someone playing at 8 a.m. as opposed to 1 p.m. I just hope most of that is randomness and balanced out in the long run, which I think it is pretty well.

The next step is to assign a field difficulty based on everyone’s raw average from above over a three-year period.

Finally, I take the adjusted z-scores and compare players rounds across tours. Over about 2000 rounds each way, players are on average .6 strokes better in relation to the field when they play on the European Tour. However, it does vary, like you said. The Dubai Desert Classic still boasts a stronger field than the Puerto Rico Open even though the European players raw rankings are inflated.

Reply
- DanielM
  
  April 8, 2011 at 12:12 pm
  
  I see. I’m taking a different approach, but if we’re each doing it right the answer should be the same. I’m standardizing against a group of “baseline players”, and there should be enough of them that play each round to get a good grasp of how hard the round was. I’m going to re-run the regression with over 140 players as baseline, using a slightly different procedure.
  
  I wish I weren’t getting a “cannot allocate memory for vector” error problem with R when I try to run it all that way. Apparently, no 2-300 MB block of contiguous RAM available?
  
  Reply
EvanZ

April 10, 2011 at 8:32 am

Daniel, you had McIlroy top 10. Pretty good!

Reply
- Ben
  
  April 10, 2011 at 5:35 pm
  
  Daniel’s looking even better today!
  
  Reply
Neil Paine

April 10, 2011 at 5:35 pm

Schwartzel!!!!!!

Reply
EvanZ

April 10, 2011 at 7:18 pm

lol

“Who is Charl Schshwartszel? Did I misspell that? Will we set an all-time record for misspellings of contenders?”

Reply

Chosen Stats

Carefully Chosen Sports Stats — Daniel Myers

Bayesian Golf Ratings and Masters Preview

11 Comments

Leave a Reply Cancel reply