We finished classes yesterday, so all that is left for the semester is a homework assignment, three projects and three more exams. I have whittled this away to only needing to complete one last project and study for the exams. But, instead of finishing my database project, which is due Sunday, I elected to take a deeper dive into a classroom example for an exam I had yesterday.
They are not totally unrelated. The third db project involves converting the current NFL season data (mostly scores and teams) into a graph database model, but more on that at a later date.
While formatting the data, I was reminded of an example we worked in Optimization class Monday. The problem involved calculating NFL team ratings using Non-Linear Programming methods (with Excel). The example comes from our textbook, Practical Management Science by Winston and Albright (South-Western Cengage Learning, 4th Ed.).
What is Non-Linear Programming?
Almost verbatim from the book: It is the solving of an optimization problem where the objective or constraints are non-linear functions of the decision variables.
For the layman: It is a problem where there is an objective, but it is constrained by variables, which cannot be represented by simple straight lines. If you are interested or confused, there is plenty out there to help you understand Linear Programming (start with, http://en.wikipedia.org/wiki/Linear_programming).
The jump from linear to non-linear can be something simple. I like their revenue example, where revenue is a price multiplied by quantity sold. Well, quantity sold is a function of price, via a demand function. So revenue is price multiplied by a function of price. Even if demand is linear in price, the product of price and demand is quadratic because it includes a squared price.
BACK TO FOOTBALL-THE PROBLEM
The objective was to use NLP to create ratings for NFL teams that best predict the actual point spreads. By point spreads, they mean the actual score differential, not the betting lines.
As you may know, I have a little experience in this. In 1993, I created my own version of this and have applied it to both high school football (www.sixmanfootball.com) and college tennis (www.texascollegetennis.com). The basics of my programs use much of the same logic, but are more iterative and detailed.
In the classroom example, we set up the spreadsheet the following way.
First you create a table of team names and ratings. The ratings can be left blank or given any number, as they will be the values the Solver will change to meet our constraints. Somewhere near there, you can also add a home field advantage cell and make that also a variable cell to be tested in the Solver.
Next you create a table of played games, including home team, visiting team, home score and visitor score. From this you get what they call the point spread (score differential).
Extending that table, you create an expected point spread = home team rating – visiting team rating + home field advantage. Remember these are the variables that will be manipulated.
Again you extend that table, adding a squared predicted error value = (actual point spread (score differential) – predicted point spread) ^ 2 (squared).
Now you need to create a few constraints. The first is the objective. Yes, the ratings are our objective, but we want them to be optimized. To do this, we set the objective as being the sum of errors or to be more precise, the sum of squared errors. We want to minimize this.
Why squared errors? I think they gloss over this in the book, stating this is just has a long tradition in statistics. Yes, you can use the sum of absolute (value) errors as well. Like the authors, I prefer squared errors.
Think of it this way, when you get a big difference (error), like 10, the squared error is 100. When you get an error of 5, the squared error is only 25. I look at squared errors as a way to ‘punish’ the larger errors, bringing the results even tighter.
The final part of setting up the model is to set a boundary or what the authors term normalizing the data, so you only get one result. If you do not do this, there will not be a single unique answer. The idea is to bound the mean of the ratings to some number. They use 85, as that is what Jeff Sagarin does. Part of the logic is that a perfect team would approach 100, which seems like a pretty logical thing to people to understand. In my example, I used 20. This makes the results look more like football scores. I wouldn’t normally do this, but the variance in NFL scores is pretty tight.
THE SOLUTION-2013 WEEK 14
Instead of running the 2003 or 2009 data, as we did in class. I came across the current 2013 data. Once you have the data, it really only takes a second and boom, you get ratings. I even added last night’s Jags-Texans game.
Here’s a sample of what you get.
1 Seattle Seahawks 33.28
2 Carolina Panthers 31.55
3 Denver Broncos 31.37
4 San Francisco 49ers 29.56
5 New Orleans Saints 27.49
So what does this mean?
If the Seahawks played the Saints on a neutral field, they should be about a 5.5-point favorite (33.28-27.49).
An even better question – How can we use this?
Well, I just happen to have the current betting lines (from sportbook.com), so let’s compare.
| Visitor | V-Rat | Home | H-Rat | Line | Exp | Diff (raw) | Diff/Line |
| Kansas City Chiefs | 25.24 | Washington Redskins | 12.55 | -3.5 | -9.75 | 6.25 | -179% |
| Minnesota Vikings | 13.30 | Baltimore Ravens | 18.61 | 7 | 8.26 | -1.26 | -18% |
| Cleveland Browns | 12.72 | New England Patriots | 24.62 | 13 | 14.84 | -1.84 | -14% |
| Oakland Raiders | 13.73 | New York Jets | 10.07 | 2.5 | -0.72 | 3.22 | 129% |
| Indianapolis Colts | 22.23 | Cincinnati Bengals | 24.20 | 6.5 | 4.92 | 1.58 | 24% |
| Carolina Panthers | 31.55 | New Orleans Saints | 27.49 | 3.5 | -1.11 | 4.61 | 132% |
| Detroit Lions | 21.09 | Philadelphia Eagles | 20.60 | 3 | 2.45 | 0.55 | 18% |
| Miami Dolphins | 20.43 | Pittsburgh Steelers | 16.77 | 3.5 | -0.71 | 4.21 | 120% |
| Buffalo Bills | 16.00 | Tampa Bay Buccaneers | 17.43 | 3 | 4.37 | -1.37 | -46% |
| Tennessee Titans | 19.51 | Denver Broncos | 31.37 | 13 | 14.80 | -1.80 | -14% |
| St Louis Rams | 22.39 | Arizona Cardinals | 24.27 | 6 | 4.83 | 1.17 | 19% |
| New York Giants | 15.71 | San Diego Chargers | 20.12 | 3.5 | 7.35 | -3.85 | -110% |
| Seattle Seahawks | 33.28 | San Francisco 49ers | 29.56 | 2.5 | -0.78 | 3.28 | 131% |
| Dallas Cowboys | 22.11 | Chicago Bears | 17.92 | -1 | -1.25 | 0.25 | -25% |
OK, now what does this mean?
Take it for what you want, but it appears to show us a few big differences between the betting lines and our programs’ expected lines. So we should bet, right? Not so fast my friends.
It appears the guys making the lines are fading on Kansas City, based on three straight losses. A team that looks like almost a 10-point favorite (even on the road) in our system, is only giving 3.5 points.
Another scary game would be Carolina getting 3.5 points at New Orleans. The system says the Panthers SHOULD actually be giving a point, instead of getting 3.5. I think the line of thinking goes that maybe the New Orleans home field advantage is worth a little more than the 2.9 points our system. New Orleans is also what the bettors term a very ‘Public’ team, meaning they are very popular among the casual bettors, compared to Carolina. This may skew the line even further.
Now to really get into evaluation a bit more (and avoid my project as well) I decided to go back a week and see how things worked out last week, based on last week’s ratings.
| Visitor | V-Rat | Home | H-Rat | Line | Exp | Diff (raw) | Diff/line | V | H | Correct |
| Green Bay Packers | 19.65 | Detroit Lions | 19.44 | 6 | 2.59 | 3.41 | 57% | 10 | 40 | 0 |
| Oakland Raiders | 12.96 | Dallas Cowboys | 22.35 | 8 | 12.18 | -4.18 | -52% | 24 | 31 | 0 |
| Pittsburgh Steelers | 16.89 | Baltimore Ravens | 19.52 | 3 | 5.42 | -2.42 | -81% | 20 | 22 | 0 |
| Tennessee Titans | 19.25 | Indianapolis Colts | 21.25 | 3.5 | 4.79 | -1.29 | -37% | 14 | 22 | 1 |
| Denver Broncos | 30.86 | Kansas City Chiefs | 25.41 | -6 | -2.65 | -3.35 | 56% | 35 | 28 | 1 |
| Jacksonville Jaguars | 6.31 | Cleveland Browns | 14.05 | 7.5 | 10.54 | -3.04 | -40% | 32 | 28 | 0 |
| Tampa Bay Buccaneers | 17.70 | Carolina Panthers | 30.95 | 6.5 | 16.05 | -9.55 | -147% | 6 | 27 | 1 |
| Chicago Bears | 18.47 | Minnesota Vikings | 13.05 | 1 | -2.62 | 3.62 | 362% | 20 | 23 | 0 |
| Arizona Cardinals | 23.99 | Philadelphia Eagles | 20.40 | 3.5 | -0.80 | 4.30 | 123% | 21 | 24 | 1 |
| Miami Dolphins | 19.78 | New York Jets | 11.89 | 2 | -5.09 | 7.09 | 355% | 23 | 3 | 1 |
| Atlanta Falcons | 15.88 | Buffalo Bills | 17.29 | 4.5 | 4.20 | 0.30 | 7% | 34 | 31 | 1 |
| St Louis Rams | 21.78 | San Francisco 49ers | 29.29 | 7.5 | 10.30 | -2.80 | -37% | 13 | 23 | 1 |
| New England Patriots | 25.66 | Houston Texans | 12.56 | -6.5 | -10.30 | 3.80 | -58% | 34 | 31 | 1 |
| Cincinnati Bengals | 24.41 | San Diego Chargers | 20.14 | -1.5 | -1.47 | -0.03 | 2% | 17 | 10 | 1 |
| New York Giants | 15.19 | Washington Redskins | 13.07 | 1 | 0.68 | 0.32 | 32% | 24 | 17 | 1 |
| New Orleans Saints | 29.49 | Seattle Seahawks | 31.07 | 5 | 4.38 | 0.62 | 12% | 7 | 34 | 0 |
Now we are getting somewhere. The system actually went 10-5. In games where the expected vs line was greater (or less) than 100%, it went 3-1. Hmmm. In games where the absolute raw was greater than 2.5, it was 6-4.
So, the system says we should probably look again at the big differences.
Maybe we should consider taking Kansas City, Oakland, Carolina, Miami, San Diego and Seattle? A $5 wager would bring you $242.92 in winnings, if they ALL beat the spread.
I’m a little concerned about too many road teams. How about we drop it to KC, Oakland, Miami and San Diego? If you bet $5 here, you win $61.52 if they all win. Sounds good. Done. We will send Prof. Muthuraman a little something if we hit.