NFL Week 14 Predictions, Ratings, Optimization and Non-Linear Programming

We finished classes yesterday, so all that is left for the semester is a homework assignment, three projects and three more exams. I have whittled this away to only needing to complete one last project and study for the exams. But, instead of finishing my database project, which is due Sunday, I elected to take a deeper dive into a classroom example for an exam I had yesterday.

They are not totally unrelated. The third db project involves converting the current NFL season data (mostly scores and teams) into a graph database model, but more on that at a later date.

While formatting the data, I was reminded of an example we worked in Optimization class Monday. The problem involved calculating NFL team ratings using Non-Linear Programming methods (with Excel). The example comes from our textbook, Practical Management Science by Winston and Albright (South-Western Cengage Learning, 4th Ed.).

What is Non-Linear Programming?

Almost verbatim from the book: It is the solving of an optimization problem where the objective or constraints are non-linear functions of the decision variables.

For the layman: It is a problem where there is an objective, but it is constrained by variables, which cannot be represented by simple straight lines. If you are interested or confused, there is plenty out there to help you understand Linear Programming (start with, http://en.wikipedia.org/wiki/Linear_programming).

The jump from linear to non-linear can be something simple. I like their revenue example, where revenue is a price multiplied by quantity sold. Well, quantity sold is a function of price, via a demand function. So revenue is price multiplied by a function of price. Even if demand is linear in price, the product of price and demand is quadratic because it includes a squared price.

BACK TO FOOTBALL-THE PROBLEM
The objective was to use NLP to create ratings for NFL teams that best predict the actual point spreads. By point spreads, they mean the actual score differential, not the betting lines.

As you may know, I have a little experience in this. In 1993, I created my own version of this and have applied it to both high school football (www.sixmanfootball.com) and college tennis (www.texascollegetennis.com). The basics of my programs use much of the same logic, but are more iterative and detailed.

In the classroom example, we set up the spreadsheet the following way.

First you create a table of team names and ratings. The ratings can be left blank or given any number, as they will be the values the Solver will change to meet our constraints. Somewhere near there, you can also add a home field advantage cell and make that also a variable cell to be tested in the Solver.

Next you create a table of played games, including home team, visiting team, home score and visitor score. From this you get what they call the point spread (score differential).

Extending that table, you create an expected point spread = home team rating – visiting team rating + home field advantage. Remember these are the variables that will be manipulated.

Again you extend that table, adding a squared predicted error value = (actual point spread (score differential) – predicted point spread) ^ 2 (squared).

Now you need to create a few constraints. The first is the objective. Yes, the ratings are our objective, but we want them to be optimized. To do this, we set the objective as being the sum of errors or to be more precise, the sum of squared errors. We want to minimize this.

Why squared errors? I think they gloss over this in the book, stating this is just has a long tradition in statistics. Yes, you can use the sum of absolute (value) errors as well. Like the authors, I prefer squared errors.

Think of it this way, when you get a big difference (error), like 10, the squared error is 100. When you get an error of 5, the squared error is only 25. I look at squared errors as a way to ‘punish’ the larger errors, bringing the results even tighter.

The final part of setting up the model is to set a boundary or what the authors term normalizing the data, so you only get one result. If you do not do this, there will not be a single unique answer. The idea is to bound the mean of the ratings to some number. They use 85, as that is what Jeff Sagarin does. Part of the logic is that a perfect team would approach 100, which seems like a pretty logical thing to people to understand. In my example, I used 20. This makes the results look more like football scores. I wouldn’t normally do this, but the variance in NFL scores is pretty tight.

THE SOLUTION-2013 WEEK 14
Instead of running the 2003 or 2009 data, as we did in class. I came across the current 2013 data. Once you have the data, it really only takes a second and boom, you get ratings. I even added last night’s Jags-Texans game.

Here’s a sample of what you get.

1 Seattle Seahawks 33.28
2 Carolina Panthers 31.55
3 Denver Broncos 31.37
4 San Francisco 49ers 29.56
5 New Orleans Saints 27.49

So what does this mean?

If the Seahawks played the Saints on a neutral field, they should be about a 5.5-point favorite (33.28-27.49).

An even better question – How can we use this?

Well, I just happen to have the current betting lines (from sportbook.com), so let’s compare.

Visitor	V-Rat	Home	H-Rat	Line	Exp	Diff (raw)	Diff/Line
Kansas City Chiefs	25.24	Washington Redskins	12.55	-3.5	-9.75	6.25	-179%
Minnesota Vikings	13.30	Baltimore Ravens	18.61	7	8.26	-1.26	-18%
Cleveland Browns	12.72	New England Patriots	24.62	13	14.84	-1.84	-14%
Oakland Raiders	13.73	New York Jets	10.07	2.5	-0.72	3.22	129%
Indianapolis Colts	22.23	Cincinnati Bengals	24.20	6.5	4.92	1.58	24%
Carolina Panthers	31.55	New Orleans Saints	27.49	3.5	-1.11	4.61	132%
Detroit Lions	21.09	Philadelphia Eagles	20.60	3	2.45	0.55	18%
Miami Dolphins	20.43	Pittsburgh Steelers	16.77	3.5	-0.71	4.21	120%
Buffalo Bills	16.00	Tampa Bay Buccaneers	17.43	3	4.37	-1.37	-46%
Tennessee Titans	19.51	Denver Broncos	31.37	13	14.80	-1.80	-14%
St Louis Rams	22.39	Arizona Cardinals	24.27	6	4.83	1.17	19%
New York Giants	15.71	San Diego Chargers	20.12	3.5	7.35	-3.85	-110%
Seattle Seahawks	33.28	San Francisco 49ers	29.56	2.5	-0.78	3.28	131%
Dallas Cowboys	22.11	Chicago Bears	17.92	-1	-1.25	0.25	-25%

OK, now what does this mean?

Take it for what you want, but it appears to show us a few big differences between the betting lines and our programs’ expected lines. So we should bet, right? Not so fast my friends.

It appears the guys making the lines are fading on Kansas City, based on three straight losses. A team that looks like almost a 10-point favorite (even on the road) in our system, is only giving 3.5 points.

Another scary game would be Carolina getting 3.5 points at New Orleans. The system says the Panthers SHOULD actually be giving a point, instead of getting 3.5. I think the line of thinking goes that maybe the New Orleans home field advantage is worth a little more than the 2.9 points our system. New Orleans is also what the bettors term a very ‘Public’ team, meaning they are very popular among the casual bettors, compared to Carolina. This may skew the line even further.

Now to really get into evaluation a bit more (and avoid my project as well) I decided to go back a week and see how things worked out last week, based on last week’s ratings.

Visitor	V-Rat	Home	H-Rat	Line	Exp	Diff (raw)	Diff/line	V	H	Correct
Green Bay Packers	19.65	Detroit Lions	19.44	6	2.59	3.41	57%	10	40	0
Oakland Raiders	12.96	Dallas Cowboys	22.35	8	12.18	-4.18	-52%	24	31	0
Pittsburgh Steelers	16.89	Baltimore Ravens	19.52	3	5.42	-2.42	-81%	20	22	0
Tennessee Titans	19.25	Indianapolis Colts	21.25	3.5	4.79	-1.29	-37%	14	22	1
Denver Broncos	30.86	Kansas City Chiefs	25.41	-6	-2.65	-3.35	56%	35	28	1
Jacksonville Jaguars	6.31	Cleveland Browns	14.05	7.5	10.54	-3.04	-40%	32	28	0
Tampa Bay Buccaneers	17.70	Carolina Panthers	30.95	6.5	16.05	-9.55	-147%	6	27	1
Chicago Bears	18.47	Minnesota Vikings	13.05	1	-2.62	3.62	362%	20	23	0
Arizona Cardinals	23.99	Philadelphia Eagles	20.40	3.5	-0.80	4.30	123%	21	24	1
Miami Dolphins	19.78	New York Jets	11.89	2	-5.09	7.09	355%	23	3	1
Atlanta Falcons	15.88	Buffalo Bills	17.29	4.5	4.20	0.30	7%	34	31	1
St Louis Rams	21.78	San Francisco 49ers	29.29	7.5	10.30	-2.80	-37%	13	23	1
New England Patriots	25.66	Houston Texans	12.56	-6.5	-10.30	3.80	-58%	34	31	1
Cincinnati Bengals	24.41	San Diego Chargers	20.14	-1.5	-1.47	-0.03	2%	17	10	1
New York Giants	15.19	Washington Redskins	13.07	1	0.68	0.32	32%	24	17	1
New Orleans Saints	29.49	Seattle Seahawks	31.07	5	4.38	0.62	12%	7	34	0

Now we are getting somewhere. The system actually went 10-5. In games where the expected vs line was greater (or less) than 100%, it went 3-1. Hmmm. In games where the absolute raw was greater than 2.5, it was 6-4.

So, the system says we should probably look again at the big differences.

Maybe we should consider taking Kansas City, Oakland, Carolina, Miami, San Diego and Seattle? A $5 wager would bring you $242.92 in winnings, if they ALL beat the spread.

I’m a little concerned about too many road teams. How about we drop it to KC, Oakland, Miami and San Diego? If you bet $5 here, you win $61.52 if they all win. Sounds good. Done. We will send Prof. Muthuraman a little something if we hit.