Saturday marks the running of the 2021 NCAA D1 Cross Country National Championships in Tallahassee with Northern Arizona on the brink of their fifth team title in six years.

It has been a while since I have publically published any new models in any sport, but yesterday on Twitter, Citius Magazine posted something about a video they had done with Isaac Wood at The Wood Report on his prediction.

Being relatively new to the world of collegiate track and cross country, I had no idea who Isaac was and immediately went and subscribed to his website to see what he had built. I also like to see how others model sport and Isaac has an interesting website.

From both their tweet and checking out the website and his simulator, I began to wonder what I could produce before Saturday’s meet. My initial thought was to create my own individual runner ratings and simulate from there, but to be honest, that is something I have been thinking about for a few months now and is just too big a project. What I could do is take Isaac Woods’s individual runner rankings and try to expand on the team result.

 

PROJECT OVERVIEW

I decided I would build a quick Monte Carlo simulator using the top-7 runners for each team racing Saturday, based on The Wood Report, and calculate probabilities for how each team will do. I also figured I might as well look at individual runners and how the top-10 may look. (HUGE CAVEAT – I am not including any of the individual runners who are not competing within the team standings. I just didn’t have enough time to build everything from scratch.)

 

EDITOR’S NOTE – While writing this, I realized that most of you could care less about the methodology section, so please feel free to skip all of that and see what I came up with. I understand. I do these types of projects mostly to share my thinking so I can improve my methods at a later date as questions/data improvements come.

METHODOLOGY

The first step is to take the 31 teams and find the top-7 runner ratings. With such a short time horizon and not really having a chance to build my own ratings, I have to combine a little art to the science. The main weakness of doing something like this is that I have no idea how Isaac Wood (and his PhD student) created these and really no idea about the variability of each individual runner.

Cross Country is such a great event because every course is different every day. Hills, terrain, altitude, temperature and humidity vary even from day-to-day on the same course. I am not even going to get into teams racing at a variety of distances leading up to this weekend or where individual runners were within their training when they raced various events. That is not being captured here.

We have what we have. Each runner has been given a rating that from my point of view appears to be really solid.

What I can do is simulate variability. This is where art blends with science.

Let’s take BYU’s Connor Mantz with a rating of 9.97. Sure, he’s a favorite, but how much? In reality, we would have all of the variables I mentioned above already baked into his rating, plug Saturday’s expected variables into a formula, and see what his expected time would be. We would then do that for every runner and build the projected results.

But I don’t have that. I have a bunch of individual ratings. Wood simply uses those to build a final result. Instead of doing that, I prefer to throw some variability into the ratings and run the race thousands of times.

How much variability and where do you model this?

Back to Mantz. Sure he’s a favorite, but there are several guys who can win this race. Some people will have the race of their lives while others will struggle for one reason or another (Think of the three H’s: hills, heat and humidity).

I took the top runner for each school participating and found the standard deviation. I did this as well for the rest of the runners. Not surprisingly, as you go from the first runner for each team to the seventh, the variability explodes. This makes sense. Depth is where this thing is won.

I finally settled on a number closer to the standard deviation of the top runners for each team and used it for every runner in the field. This isn’t the best method, as each runner should have their own variance, but it’ll do.

FINALLY, I use a function to generate a random number within +/- 1 ‘standard deviation’ for each runner to figure out their ‘speed’ for the race, rank the runners and score the race. To understand this better, imagine some runners will run better and some worse, but they won’t always run right at their rating. Odds are they will be somewhere close. Of course there will be outliers, but let’s assume they stay ~ +/-34% of their rating in a ‘standard’ way. I also don’t want to talk about Outliers too much as this may send shivers down Chris Chavez’ spine. #TeamGladwell. just Kidding.

I do this 10,000 times.

That is, I simulate the race 10,000 times and see how it all plays out. This should give us a pretty good indication of the probability each team has to finish this weekend in a particular place.

TEAM RESULTS

After running the simulation 10,000 times, the overwhelming favorite is Northern Arizona who wins the title 48.78% of the time. Oklahoma State captures the title 22.64%, with Iowa State winning 12% of the time.

Below shows how many times each team placed in the team standings.

SCHOOL
1ST2ND3RD4TH5TH
Northern Arizona487824791350722380
Oklahoma State2264236819781410978
Iowa State12001728178216881501
Colorado7051257164317821651
Notre Dame5731128153817251772
BYU28063794613291702
Stanford9237067010931519
Tulsa83393251495

How good is Northern Arizona? They have over and 87% chance to finish in the top-3.

Possibly the more interesting part of all of this is the fact that after the top two, the teams are very bunched together. The probabilities for Iowa State, Colorado, Notre Dame, BYU and Stanford are very close. Fighting through the end will be key and I wonder if something that was mentioned on the podcast could be a factor. Will the course favor track runners over those ‘mudders’ like Colorado?

This is even clearer when we look at the average team finish below.

And don’t ignore Tulsa! They actually won the whole thing 8 times out of 10,000. Sure, that’s only 0.08% of the time, but there’s a chance.

Here’s a table of each team’s AVERAGE FINISH within the simulation.

SCHOOLAVERAGE TEAM FINISH
Northern Arizona1.99
Oklahoma State2.99
Iowa State3.80
Colorado4.33
Notre Dame4.49
BYU5.30
Stanford5.82
Tulsa7.40
Oregon9.93
Air Force11.86
Arkansas12.11
Furman12.87
Washington13.03
Wake Forest13.51
Wisconsin14.79
Gonzaga15.27
Ole Miss15.36
Alabama18.19
Texas20.51
Harvard21.20
Southern Utah21.23
Portland23.53
Syracuse24.00
North Carolina24.01
Butler24.28
Florida State24.76
Princeton25.32
Georgetown26.06
Minnesota28.16
Michigan29.05
Michigan State30.85

I find it interesting to see the Big 10 Conference anchoring the bottom. If this pans out, then maybe the committee overvalued their quality. If they perform much better than the model, then I would suspect this means the Wood model possibly held those down too much.

INDIVIDUALS

Now, remember, I did not include the true individuals, running the event without their teams.

Here are the top-10 runners based on the simulation.

INDIVIDUALSCHOOL1ST2ND3RD4TH5TH6TH7TH8TH9TH10TH
Connor MantzBYU40741696975715576498433335263192
Adiaan WildschuttFlorida State227218391096834646551460442431339
Wesley KiptooIowa State184017271193826688593501480435401
Eduardo HerreraColorado436885994781672643572525497477
Abdihamid NurNorthern Arizona415932988805715599603542491450
Nico YoungNorthern Arizona397853896782685570567531527490
Charles HicksStanford187505764716646579589538543513
Casey ClingerBYU165496656724638660520516502504
Cooper TeareOregon82312513620621559550522496456
Ahmed MuhumedFlorida State47210397532568505505512503511

The race is expected to be very close and Mantz (BYU), Wildschutt (FSU) and Kiptoo (ISU) are the clear favorites, but all of the big names are there. Once you venture past those first three, it appears to be wide open.

My model actually had 18 different runners who won the title at least once (way to go, Ky Robinson!) and 21 total who grabbed second at least one time.

I hope you enjoyed this and I understand this was a long and winding road, but I enjoyed diving into this for a day or so and seeing what the numbers showed. Best of luck to all of the runners, especially the ones I know… (go BTR!)