We finished classes yesterday, so all that is left for the semester is a homework assignment, three projects and three more exams. I have whittled this away to only needing to complete one last project and study for the exams. But, instead of finishing my database project, which is due Sunday, I elected to take a deeper dive into a classroom example for an exam I had yesterday.

They are not totally unrelated. The third db project involves converting the current NFL season data (mostly scores and teams) into a graph database model, but more on that at a later date.

While formatting the data, I was reminded of an example we worked in Optimization class Monday. The problem involved calculating NFL team ratings using Non-Linear Programming methods (with Excel). The example comes from our textbook, Practical Management Science by Winston and Albright (South-Western Cengage Learning, 4th Ed.).

What is Non-Linear Programming?

Almost verbatim from the book: It is the solving of an optimization problem where the objective or constraints are non-linear functions of the decision variables.

For the layman: It is a problem where there is an objective, but it is constrained by variables, which cannot be represented by simple straight lines. If you are interested or confused, there is plenty out there to help you understand Linear Programming (start with, http://en.wikipedia.org/wiki/Linear_programming).

The jump from linear to non-linear can be something simple. I like their revenue example, where revenue is a price multiplied by quantity sold. Well, quantity sold is a function of price, via a demand function. So revenue is price multiplied by a function of price. Even if demand is linear in price, the product of price and demand is quadratic because it includes a squared price.

BACK TO FOOTBALL-THE PROBLEM
The objective was to use NLP to create ratings for NFL teams that best predict the actual point spreads. By point spreads, they mean the actual score differential, not the betting lines.

As you may know, I have a little experience in this. In 1993, I created my own version of this and have applied it to both high school football (www.sixmanfootball.com) and college tennis (www.texascollegetennis.com). The basics of my programs use much of the same logic, but are more iterative and detailed.

In the classroom example, we set up the spreadsheet the following way.

First you create a table of team names and ratings. The ratings can be left blank or given any number, as they will be the values the Solver will change to meet our constraints. Somewhere near there, you can also add a home field advantage cell and make that also a variable cell to be tested in the Solver.

Next you create a table of played games, including home team, visiting team, home score and visitor score. From this you get what they call the point spread (score differential).

Extending that table, you create an expected point spread = home team rating – visiting team rating + home field advantage. Remember these are the variables that will be manipulated.

Again you extend that table, adding a squared predicted error value = (actual point spread (score differential) – predicted point spread) ^ 2 (squared).

Now you need to create a few constraints. The first is the objective. Yes, the ratings are our objective, but we want them to be optimized. To do this, we set the objective as being the sum of errors or to be more precise, the sum of squared errors. We want to minimize this.

Why squared errors? I think they gloss over this in the book, stating this is just has a long tradition in statistics. Yes, you can use the sum of absolute (value) errors as well. Like the authors, I prefer squared errors.

Think of it this way, when you get a big difference (error), like 10, the squared error is 100. When you get an error of 5, the squared error is only 25. I look at squared errors as a way to ‘punish’ the larger errors, bringing the results even tighter.

The final part of setting up the model is to set a boundary or what the authors term normalizing the data, so you only get one result. If you do not do this, there will not be a single unique answer. The idea is to bound the mean of the ratings to some number. They use 85, as that is what Jeff Sagarin does. Part of the logic is that a perfect team would approach 100, which seems like a pretty logical thing to people to understand. In my example, I used 20. This makes the results look more like football scores. I wouldn’t normally do this, but the variance in NFL scores is pretty tight.

THE SOLUTION-2013 WEEK 14
Instead of running the 2003 or 2009 data, as we did in class. I came across the current 2013 data. Once you have the data, it really only takes a second and boom, you get ratings. I even added last night’s Jags-Texans game.

Here’s a sample of what you get.

1 Seattle Seahawks 33.28
2 Carolina Panthers 31.55
3 Denver Broncos 31.37
4 San Francisco 49ers 29.56
5 New Orleans Saints 27.49

So what does this mean?

If the Seahawks played the Saints on a neutral field, they should be about a 5.5-point favorite (33.28-27.49).

An even better question – How can we use this?

Well, I just happen to have the current betting lines (from sportbook.com), so let’s compare.

VisitorV-RatHomeH-RatLineExp

Diff (raw)

Diff/Line

Kansas City Chiefs

25.24

Washington Redskins

12.55

-3.5

-9.75

6.25

-179%

Minnesota Vikings

13.30

Baltimore Ravens

18.61

7

8.26

-1.26

-18%

Cleveland Browns

12.72

New England Patriots

24.62

13

14.84

-1.84

-14%

Oakland Raiders

13.73

New York Jets

10.07

2.5

-0.72

3.22

129%

Indianapolis Colts

22.23

Cincinnati Bengals

24.20

6.5

4.92

1.58

24%

Carolina Panthers

31.55

New Orleans Saints

27.49

3.5

-1.11

4.61

132%

Detroit Lions

21.09

Philadelphia Eagles

20.60

3

2.45

0.55

18%

Miami Dolphins

20.43

Pittsburgh Steelers

16.77

3.5

-0.71

4.21

120%

Buffalo Bills

16.00

Tampa Bay Buccaneers

17.43

3

4.37

-1.37

-46%

Tennessee Titans

19.51

Denver Broncos

31.37

13

14.80

-1.80

-14%

St Louis Rams

22.39

Arizona Cardinals

24.27

6

4.83

1.17

19%

New York Giants

15.71

San Diego Chargers

20.12

3.5

7.35

-3.85

-110%

Seattle Seahawks

33.28

San Francisco 49ers

29.56

2.5

-0.78

3.28

131%

Dallas Cowboys

22.11

Chicago Bears

17.92

-1

-1.25

0.25

-25%

OK, now what does this mean?

Take it for what you want, but it appears to show us a few big differences between the betting lines and our programs’ expected lines. So we should bet, right? Not so fast my friends.

It appears the guys making the lines are fading on Kansas City, based on three straight losses. A team that looks like almost a 10-point favorite (even on the road) in our system, is only giving 3.5 points.

Another scary game would be Carolina getting 3.5 points at New Orleans. The system says the Panthers SHOULD actually be giving a point, instead of getting 3.5. I think the line of thinking goes that maybe the New Orleans home field advantage is worth a little more than the 2.9 points our system. New Orleans is also what the bettors term a very ‘Public’ team, meaning they are very popular among the casual bettors, compared to Carolina. This may skew the line even further.

Now to really get into evaluation a bit more (and avoid my project as well) I decided to go back a week and see how things worked out last week, based on last week’s ratings.

VisitorV-RatHome

H-Rat

Line

Exp

Diff (raw)

Diff/line

V

H

Correct

Green Bay Packers

19.65

Detroit Lions

19.44

6

2.59

3.41

57%

10

40

0

Oakland Raiders

12.96

Dallas Cowboys

22.35

8

12.18

-4.18

-52%

24

31

0

Pittsburgh Steelers

16.89

Baltimore Ravens

19.52

3

5.42

-2.42

-81%

20

22

0

Tennessee Titans

19.25

Indianapolis Colts

21.25

3.5

4.79

-1.29

-37%

14

22

1

Denver Broncos

30.86

Kansas City Chiefs

25.41

-6

-2.65

-3.35

56%

35

28

1

Jacksonville Jaguars

6.31

Cleveland Browns

14.05

7.5

10.54

-3.04

-40%

32

28

0

Tampa Bay Buccaneers

17.70

Carolina Panthers

30.95

6.5

16.05

-9.55

-147%

6

27

1

Chicago Bears

18.47

Minnesota Vikings

13.05

1

-2.62

3.62

362%

20

23

0

Arizona Cardinals

23.99

Philadelphia Eagles

20.40

3.5

-0.80

4.30

123%

21

24

1

Miami Dolphins

19.78

New York Jets

11.89

2

-5.09

7.09

355%

23

3

1

Atlanta Falcons

15.88

Buffalo Bills

17.29

4.5

4.20

0.30

7%

34

31

1

St Louis Rams

21.78

San Francisco 49ers

29.29

7.5

10.30

-2.80

-37%

13

23

1

New England Patriots

25.66

Houston Texans

12.56

-6.5

-10.30

3.80

-58%

34

31

1

Cincinnati Bengals

24.41

San Diego Chargers

20.14

-1.5

-1.47

-0.03

2%

17

10

1

New York Giants

15.19

Washington Redskins

13.07

1

0.68

0.32

32%

24

17

1

New Orleans Saints

29.49

Seattle Seahawks

31.07

5

4.38

0.62

12%

7

34

0

Now we are getting somewhere. The system actually went 10-5. In games where the expected vs line was greater (or less) than 100%, it went 3-1. Hmmm. In games where the absolute raw was greater than 2.5, it was 6-4.

So, the system says we should probably look again at the big differences.

Maybe we should consider taking Kansas City, Oakland, Carolina, Miami, San Diego and Seattle? A $5 wager would bring you $242.92 in winnings, if they ALL beat the spread.

I’m a little concerned about too many road teams. How about we drop it to KC, Oakland, Miami and San Diego? If you bet $5 here, you win $61.52 if they all win. Sounds good. Done. We will send Prof. Muthuraman a little something if we hit.