In the spirit of the holidays I’m making this post free for everyone. Plus, I’ve written a similar tutorial before. That one was free at the time so it only feels fair to make this one free as well. If you like the tutorial or know someone who might, consider subscribing or sharing.
5 Things The F5 Can’t Live Without
Before we jump into the tutorial I want to express my gratitude for five websites and pages that I use almost every day during the NBA season. Here they are in no particular order.
plaintextsports.com
I check this site so often I finally a shortcut to it on my phone’s homepage. It’s probably my single most visited website. I love that it has no ads and wastes no space. The box scores are really easy on the eyes, too. 10/10 can’t live out.
rotowire.com/basketball/nba-lineups.php
Every morning I look at this page to see which teams are playing that day and which players are likely injured. I usually look at this page to determine what game I’m going to watch later that day. If I know a star player is going to be out for a game I wanted to see I’ll watch something else.
wikihoops.com and stats.inpredictable.com/nba/preCap.php
If you have NBA league pass and live on the east coast at least one of these sites should be bookmarked. I’m old and can’t stay up past 10:30PM so I’ll often watch games the next morning. These two sites tells me which games were entertaining from the night before without spoiling the score.
basketball-reference.com/friv/dailyleaders.fcgi
I follow the NBA on a day-to-day basis and I often I just want to know, “who had a good game yesterday.” Instead of looking at the box scores for every game, I go to this page on basketball-reference. It’s just a straight ranking of every player’s performance from the night before sorted by Game Score.
pbpstats.com
This is the poor man’s version of Cleaning the Glass (CTG). It has a lot of the same stuff that CTG has but it’s free. I venture to say that ~90% of NBA public analytics products are reliant on pbpstats.com for their data. Darryl Blackport, the creator of the website, deserves the NBA analytics version of the congressional medal of honor.
Comet Plot Tutorial
Full code at the end of the post.
The tutorial is going to focus on recreating the plot below. It’s called a Comet Plot and I first discovered them by following the work of Dr. Cédric Scherer.
Like Dumbbell Plots, Comet Plots are a way to visualize the difference between two states (in this case, last season vs. this season). I prefer Comet Plots over Dumbbell Plots because I think it’s a little more obvious which side is meant to be the start and which side is meant to be the end.
We’ll begin by loading our packages and establishing a custom theme that we’ll use for the plot.
library(tidyverse)
library(jsonlite)
library(ggtext)
library(ggforce)
library(extrafont)
# Custom ggplot2 theme
theme_f5 <- function () {
theme_minimal(base_size = 9, base_family="Roboto") %+replace%
theme(
plot.background = element_rect(fill = 'floralwhite', color = "floralwhite"),
panel.grid.minor = element_blank(),
plot.subtitle = element_text(color = 'gray65', hjust = 0, margin=margin(0,0,10,0))
)
}
The data for our chart comes from pbpstats.com, specifically the Misc. table under the Team Stats Page. What were going to do is right click on the page, navigate over to the Network tab and then find the link that starts with nba/Season=2024-25
.
Open that link in a new tab and it should bring you to this page.
That’s the raw data that we’re going to feed into our R session.
The function below takes season
as a parameter and returns a dataframe with each team’s abbreviation and the percentage of possessions each team has spent in the bonus.
# function for getting team data from pbpstats.com
get_data <- function(season) {
url <- paste0("https://api.pbpstats.com/get-totals/nba?Season=", season, "&SeasonType=Regular%2BSeason&StartType=All&GroupBy=Season&Type=Team")
x <- fromJSON(url)
# select variables we care about
df <- x$multi_row_table_data %>%
select(TeamAbbreviation, OffPoss, PenaltyOffPoss) %>%
mutate(bonus_pct = PenaltyOffPoss / OffPoss,
season = season)
return(df)
}
# get data from 2023-24 and 2024-25
dat <- map_df(c("2023-24", "2024-25"), get_data)
The data should look something like this
Next, we’re going to wrangle our data into a “wide” format where we have a column for each team’s time spent in the bonus in 2023-24 and in 2024-25.
df <- dat %>%
# select relevant variables
select(TeamAbbreviation, bonus_pct, season) %>%
# pivot data wide
pivot_wider(names_from = season, values_from = bonus_pct) %>%
# rename columns
rename(abbr = TeamAbbreviation,
bonus_pct_24 = `2023-24`,
bonus_pct_25 = `2024-25`) %>%
# create an indicator variable that tells us if a team is spending more or less time in bonus
mutate(more_less = ifelse(bonus_pct_25 - bonus_pct_24 >= 0, "More Often", "Less Often"),
# convert abbr to a factor and order data by bonus time in 2024-25
abbr = as.factor(abbr),
abbr = fct_reorder(abbr, bonus_pct_25))
Now our data should look something like this
We can begin to plot our data now. We’re going to use the geom_link()
function from the {ggforce
} package to create the shape of the comet. The part that creates the tapered tail effect is the linewidth = after_stat(index)
portion of the code. You can increase or decrease the n = 1000
part to sharpen the effect of the taper if you like.
p <- df %>%
ggplot() +
geom_link(
aes(x = bonus_pct_24, xend = bonus_pct_25,
y = abbr, yend = abbr,
color = more_less, linewidth = after_stat(index)), n = 1000)
p
Next, we’re going to add some points at the fat end of each comet.
p <- p +
geom_point(
aes(x = bonus_pct_25, y = abbr, color = more_less),
shape = 21,
fill = "white",
size = 3.5
)
p
Next we’re going to adjust our colors, change the scale of the taper on the comet, adjust our x-axis, and call our custom theme from earlier
p <- p +
scale_color_manual(values = c("#E64B35FF", "#00A087FF")) +
scale_linewidth(range = c(.01, 4)) +
scale_x_continuous(limits = c(.175, .325), breaks = seq(0, .35, .025), labels = scales::percent_format(.1)) +
coord_cartesian(clip = 'off') +
theme_f5()
p
Last thing we’ll do is hide the legends and use the {ggtext
} package to add colored text in our title, which will double as our legend.
p <- p +
theme(legend.position = 'none',
plot.title = element_markdown(size = 11, face = 'bold', family = 'Roboto'),
plot.title.position = 'plot') +
labs(title = "Which Teams Are Spending
<span style='color:#00A087FF'>**More**</span> or
<span style='color:#E64B35FF'>**Less**</span> Time In the Bonus vs. Last Season?",
subtitle = "Teams sorted by percentage of possessions spent in the bonus in 2024-25",
x = "Percentage of Possessions in Bonus",
y = "")
p
Full code
library(tidyverse)
library(jsonlite)
library(ggtext)
library(ggforce)
library(extrafont)
# Custom ggplot2 theme
theme_f5 <- function () {
theme_minimal(base_size = 9, base_family="Roboto") %+replace%
theme(
plot.background = element_rect(fill = 'floralwhite', color = "floralwhite"),
panel.grid.minor = element_blank(),
plot.subtitle = element_text(color = 'gray65', hjust = 0, margin=margin(0,0,10,0))
)
}
get_data <- function(season) {
url <- paste0("https://api.pbpstats.com/get-totals/nba?Season=", season, "&SeasonType=Regular%2BSeason&StartType=All&GroupBy=Season&Type=Team")
x <- fromJSON(url)
df <- x$multi_row_table_data %>%
select(TeamAbbreviation, OffPoss, PenaltyOffPoss) %>%
mutate(bonus_pct = PenaltyOffPoss / OffPoss,
season = season)
return(df)
}
dat <- map_df(c("2023-24", "2024-25"), get_data)
df <- dat %>%
# select relevant variables
select(TeamAbbreviation, bonus_pct, season) %>%
# pivot data wide
pivot_wider(names_from = season, values_from = bonus_pct) %>%
# rename columns
rename(abbr = TeamAbbreviation,
bonus_pct_24 = `2023-24`,
bonus_pct_25 = `2024-25`) %>%
# create an indicator variable that tells us if a team is spending more or less time in bonus
mutate(more_less = ifelse(bonus_pct_25 - bonus_pct_24 >= 0, "More Often", "Less Often"),
# convert abbr to a factor and order data by bonus time in 2024-25
abbr = as.factor(abbr),
abbr = fct_reorder(abbr, bonus_pct_25))
p <- df %>%
ggplot() +
geom_link(
aes(x = bonus_pct_24, xend = bonus_pct_25,
y = abbr, yend = abbr,
color = more_less, linewidth = after_stat(index)), n = 1000) +
geom_point(
aes(x = bonus_pct_25, y = abbr, color = more_less),
shape = 21,
fill = "white",
size = 3.5
) +
scale_color_manual(values = c("#E64B35FF", "#00A087FF")) +
scale_linewidth(range = c(.01, 4)) +
scale_x_continuous(limits = c(.175, .325), breaks = seq(0, .35, .025), labels = scales::percent_format(.1)) +
coord_cartesian(clip = 'off') +
theme_f5() +
theme(legend.position = 'none',
plot.title = element_markdown(size = 11, face = 'bold', family = 'Roboto'),
plot.title.position = 'plot') +
labs(
title = "Which Teams Are Spending
<span style='color:#00A087FF'>**More**</span> or
<span style='color:#E64B35FF'>**Less**</span> Time In the Bonus vs. Last Season?",
subtitle = "Teams sorted by percentage of possessions spent in the bonus in 2024-25",
x = "Percentage of Possessions in Bonus",
y = "")
ggsave(plot = p, "bonus_pct_change.png", h = 6, w = 6, dpi = 1000, device = grDevices::png)
I've given league personnel tours through my hard drive before. There is usually a part during the beginning of each one where I say this looks complicated now but will all make sense later.
After reading this post, I think I finally know what they feel in their head when I say that..."Yeah, right!"
This was a fantastic look behind the curtain. Although it's not something I'm skilled at, it was fun to see the process you go through to create.