Welcome back to Take Week on The F5. Here’s what you need to know:
Every day this week I’m publishing different takes from two anonymous friends of The F5.
Here’s what we got today:
First, rethinking credit and where credit is due in the NBA.
Then, an unapologetic case against the eye test in favor of building better models.
- Owen Phillips
You Can’t Handle The Unregularized Truth
I’m often asked who I think should be the MVP in the NBA. Typically, when giving out season awards, like the MVP, we want to focus on the descriptive metrics. Stuff that describes what happened on the court, without worrying whether it was sustainable or not. We don’t care that a player “ran good” on midrange shots beyond what some shot quality model says. The shots went in. End of story.
But we do need to still worry about credit attribution. Consider Steph Curry hitting a three. He deserves some credit for that, but how much? Do his teammates deserve some credit? Does the defense? Does his coach?
In baseball, it’s easier. A player hits a home run, and while we can quibble with the allocation of luck vs. skill in hitting that home run, there is a somewhat definite value to that home run. Maybe you need to adjust for opponent, and park, but the player’s teammates didn’t really contribute to the home run itself too much (although they obviously contribute to how many runs the home run was worth).
RAPM
In basketball, we often use RAPM (Regularized Adjusted Plus Minus) as a starting point for this stuff. This is very strange if you think about it for a bit. For readers not familiar, at a high level, RAPM treats each player on the court as a feature in a linear regression model, with the target variable being the number of points scored. You have 10 players, so 10 binary features active at a time. You collect every collection of 10-players in a season, you see how many points they scored, you fit the model, and you get a player value. Boom, you’re done. What’s so weird with this?
Well, the issue is what I just described isn’t actual RAPM. It’s APM (Adjusted Plus Minus). You see, when the ancients were inventing APM, they realized that even with 70,000 unique 10-man combinations, the linear regression approach above would give oddball, wacky results. There’s just not enough data in a single season for a simple linear regression to yield results that “make sense”. You get results like LeBron is worth +18 one year, and then +3 another year. So our analytics forefathers came up with a solution, namely regularization, which is just a way of taking all those players values from the end of that APM/linear regression calculation above, and then shrinking them to the mean somewhat. There’s all sorts of math around this, but high level, you’re just pushing all the APM coefficients towards zero, and that gives you RAPM.
But how much are you pushing those APM coefficients towards zero anyway? And why don’t those initial APM results “make sense” in the first place? Well, the answer to both is predictive power. If you just run a simple APM calculation, you quickly find that while it explains all the variance within your sample, it doesn’t do such a great job predicting performance in future seasons/stints. And so you move to RAPM as your fix for this. As traditionally done, you add a regularization amount to the APM calculation that maximizes out-of-sample predictive power over your data.
Is RAPM predictive or descriptive?
This is very strange, especially for a stat that we’re ostensibly using for backwards looking explanatory value. Some people have taken this calculation a step further, and they time decay RAPM to make it even more explicitly predictive, but as you can see, even vanilla RAPM is explicitly designed, from the ground up, to be predictive.
So lets go back to that Steph Curry three mentioned above. There’s 3 points of credit to go around, so how does RAPM divvy it up? Well, it depends on what those 10 players on the court did in *every other possession they played that season*. If the other 9 guys go off to have terrible seasons after that three was hit, then the amount of credit attributed to Steph vs. his teammates for that 3-pointer changes too. We’re not just recording what happened on the court and then fighting over how much it’s worth. We’re first deciding how predictive it is in the first place, and only then do we move to credit allocation.
This, needless to say, does not happen in baseball. If you hit a home run, there’s lots of different ways to divvy up credit for it, and you can debate whether you deserve more credit for grand slam than a solo home run, but what emphatically does not usually happen in most baseball metrics is that you lose some credit for a home run that you hit because your teammates play great the next month. But that’s exactly what RAPM does. And it does this for a very good reason: because it’s predictive. But that makes it a very weird choice to use for what’s ostensibly a backward-looking question.
Box priors
Further, as I understand how Estimated Plus Minus (EPM), LEBRON, and most other other single-season metrics work, they will often add a box or tracking data prior to RAPM, to help the metric stabilize faster and be better able to disentangle players who often only play on the court together. The idea is that on/off data is very noisy in small sample sizes, but other types of data stabilize faster, so instead of regressing players to the mean (zero) in the RAPM calculation, we should regress them to some sort of more informed, box prior. So after you fit a bunch of box score stats onto *long term* RAPM (less noisy), you get a “statistical plus minus” (SPM) for each player, and you use that as your prior for each player.
Again, this is a great technique if we’re trying to be predictive. But it introduces all sorts of additional epistemological headaches for our MVP exercise. Box score stats are a highly imperfect way of capturing NBA player performance, so these SPMs will often functionally rely on statistical relationships which exist in the data, and *may be predictive*, but are very dubiously descriptive. For example, there’s a famous article on now-defunct 538 talking about the value of an NBA steal being 9-points. This is, of course, impossible in a “this steal led to 9 points” sense (even with a stop, a fast break and a bucket), but reflects a real relationship in the data that you see in the aggregate. This is because of some combination of the inherent value of the steal, plus some degree of “basketball IQ” captured in the steal, plus a bunch of other effects (read the article). And when you fit your box score data onto your RAPM data to create your SPM model, you’re going to capture all of those effects into your SPM, which is all well and good for predictive power purposes, but is very confusing in a “credit attribution” sense.
Back to APM
And so, I’m left in a fairly puzzled place when I think about the MVP. The best metrics we have available which are designed for this sort of thing (e.g., single season EPM and LEBRON) have a *lot* of explicitly predictive elements to them. This is probably a good thing. I certainly don’t have any better ideas for how to handle this question. But it leaves me pretty unsatisfied all the same. If we run these models without any predictive elements designed to make sure they yield answers that align with our priors, then well, we risk generating models that lack credibility to outsiders. But I also worry that by running these weird mostly-predictive-but-don’t-worry-not-too-predictive models, we’re also sort of deluding ourselves.
But if you want a principled answer, I say screw it. Get rid of the regularization and stick with simple single season APM. That’s the point where we introduce all these quasi-predictive elements into our MVP debates, and get ourselves all turned around. We might get some surprising results that challenge our conventional wisdom, but at least we'd be measuring what actually happened on the court.
- Anonymous Python
Build Better Models, Not Better Excuses
There's a narrative I keep hearing: "Analytics don't know everything. You have to watch the games, too."
Analytics practitioners are often quick to embrace this line of thinking. I understand why. It sounds inclusive, non-confrontational, and resonates well with most people -- everyone has takes, very few can build good models. It sounds so reasonable and gives off an air of intellectual humility, but too often it becomes a lazy excuse to avoid doing the hard analytics work.
Instead, make your models so good that they're undeniable. Make them predictive and interpretable. Make them answer the right questions. You don't want your models to coexist in some wishy-washy harmony with the eye test. You want them to systematically improve decision-making whether that's with player evaluation, in-game strategy, or even betting. Arbitrarily weighting gut feelings because "models don't capture everything" doesn't complement analytics; it undermines their raison d'être.
Part of my frustration is that casual or traditional basketball observers often misunderstand analytics. Fans and insiders cling to outdated or simplistic metrics like assists per game or PER, despite significant advancements. You could build a sophisticated model that accurately quantifies passing skill, yet people will still use assists per game to argue who is good at passing. And for PER, why does anyone still mention it when objectively superior metrics like DARKO and EPM exist? The primitive stats get cited because most people arguing player A over player B just like player A better and cherrypick whatever numbers they can find to support their argument. So people discredit analytics by picking on the worst of it and that leads to some analytics people embracing the narrative I mentioned up top.
Ultimately, analytics should be used to make decisions. And the process behind these decisions should be grounded in clarity and intellectual honesty. When we listen to the Watch The Games crowd we let some immeasurable noise factor into decision making. Indeed, there are genuinely valuable insights from attentive non-quantitative basketball watchers. But the point is to learn from them and express their thinking in a quantitative way by turning their thoughts into measurable inputs that produce outputs that can be quantitatively evaluated.
So don't give in to lazy thinking. Just build better models.
- Anonymous Jellyfish
The anonymous Python is missing the role regularization plays in the presence of collinearity... APM without regularization can lead to unstable estimates of player performance
"Get rid of the regularization and stick with simple single season APM. That’s the point where we introduce all these quasi-predictive elements into our MVP debates, and get ourselves all turned around. We might get some surprising results that challenge our conventional wisdom, but at least we'd be measuring what actually happened on the court."
Can we, general fans on the internet, view simple single season APM anywhere? Sorry if I'm just bad at Google but not finding it.