Forecasting 2019 NFL Rookie RB Success: 3-Year Model

May 01, 2019

Joe Robbins/Getty Images

The 2019 NFL Draft is now in the books, giving us dozens of new offensive players to consider for our fantasy teams. In this article, I will discuss some of the key rookie running backs to focus on going into 2019 fantasy drafts.

While our friends over at Dynasty League Football have not yet released post-draft rookie ADP, it is nearly a sure thing that Josh Jacobs will be the top rookie running back taken in fantasy drafts. He was only running back taken in the first round of the NFL draft and lands in a great spot, with the Raiders having no strong incumbent back to compete with (Marshawn Lynch having retired again).

More Rookie Forecasts: WR | LB

Below, I will discuss the three different models that I use to forecast running back success from college to the NFL. Jacobs receives outstanding scores (above 70% success odds) from two of the models. He receives only a mediocre score (36% success odds) from a machine learning model, which probably does not like his lack of dominant market share of rushing attempts while at Alabama. However, given the full weight of the evidence, especially the draft capital invested in Jacobs, the opportunity available, and the overall philosophy of the Raiders offense, he looks like a safe bet.

Other drafters will no doubt agree, meaning that the best running back prospect will require a top pick in rookie drafts if you want to land him. Below, we will discuss some players that might be available at a discount.

Forecasting Running Back Success

In previous seasons, we have predicted running back success using a combination of two models. The first is a standard statistical model (built using logistic regression) and the second is a more sophisticated machine learning model. We averaged the predictions of the two models to get an overall prediction.

While the combined model has identified a few great values (James Conner comes to mind), it has not matched the runaway success of the wide receiver model, which seems to identify one or two substantially undervalued players every year. So, this year, we built a new model that aims to improve upon some of the weaknesses of the current models.

New Forecasting Model

The most obvious conceptual problem with the existing models is that they look at all running backs through the same lens, combining statistics about college production, athleticism, and draft pick in the same way to get a prediction. However, we know that teams do not look at all running backs the same way. A 240 pound running back is evaluated for his ability to run up the middle and pound the ball in at the goal line, while a smaller back may be evaluated more on his ability to catch passes out of the backfield.

The new model starts by putting running backs into one of three buckets based on their weight and receiving production in college (specifically, their career market share of team receptions). For each bucket, it uses a different set of variables to predict success.

Big Bruisers: The first group contains backs with an average weight of 230 pounds. For these backs, the model looks for workhorse production in college, measured using rushing yards and market share of team yards from scrimmage in their final year. The model also gives a small bump for receiving touchdowns. That last part gives up a bump to true "three-down backs", the most elite prospects.
Pass Catchers: This group weighs an average of 204 pounds but accounts for, on average, about 12.5% of their team's receptions over their entire college career. (The final year number is likely a number well above 15%.) For these backs, already screened based on their receiving production, the model looks solely at athleticism, as measured by the 20-yard short shuttle and the height-adjusted speed score.
Shifty Runners: The final group is made up of light-weight backs (averaging 206 pounds) who did not catch a lot of passes (their average market share of team receptions is 6%, only slightly more than the big bruisers). For these backs, the model also looks solely at athleticism, this time, using the 3-cone drill and speed score as the measures.

The particular variables and weights used in each case were found using (L1-regularized) logistic regression. I kept the number of variables small in each case due to the fact that we are working with even less data than before when building this model: each group contains only about one-third of the drafted running backs since 2000 (the latter already being a small group).

It is important to note that, even though the variables used for pass catchers and shifty running backs are similar (just athleticism), the models are very different because different weights are placed on each variable. For shifty runners (lightweight, non-receiving backs), the overall odds of success are lower and more weight is placed on draft pick. Overall, this group does not contain a lot of hits. To get a good score, the running back needs to be drafted early.

The model for big bruisers also puts extra weight on draft pick. Heavy backs taken in late rounds, even with a lot of production, are not usually successful.

Pass-catching backs are notably different from the other groups. Their overall success odds are highest (if every other metric is average) and it drops off less quickly if the player is taken later in the draft.

One example in this last group is Alvin Kamara. He was given only a 43% chance of success from the earlier statistical model, but now he falls into the pass-catching group and gets a new score of 69%. Obviously, that looks a lot better given how well he has turned out. (And note that this model was not trained using Kamara as an example, so that is still an out-of-sample test.)

Given how much nicer this new model looks, should we throw away the old models and just use this one? No. The best option is to add it into the mix, averaging it with the other options. One of the very few general lessons from the machine learning literature is that averaging different models is almost always better than any individual model. (Plus, for now, I'd like to see how the model does for a couple of seasons before putting all my faith in it.)

Combining the three models into one average, here are a few running backs that look like values in this year's draft class:

For those interested, there is a table at the end that shows the predictions of each model on all players drafted in the first six rounds. (The seventh-round draft picks are fighting for a roster spot.) In general, the combined model should be the most accurate, but it is interesting to look at players that score very well in one model or another.