Loading the data
This week's tidy Tuesday data set for 2023-09-19 has information about CRAN packages. The dataset denotes the cross package connections between developers, as per the DESCRIPTION file. Let's begin by loading the data:
Show the code
pacman::p_load(tidyverse,gt,reactablefmtr,reactablefmtr)
cran_20230905 <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-09-19/cran_20230905.csv')Lets take a peak at the data:
Show the code
cran_20230905 %>%
head(5)# A tibble: 5 × 67
Package Version Priority Depends Imports LinkingTo Suggests Enhances License
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 A3 1.0.0 <NA> R (>= … <NA> <NA> randomF… <NA> GPL (>…
2 AalenJoh… 1.0 <NA> <NA> <NA> <NA> knitr, … <NA> GPL (>…
3 AATtools 0.0.2 <NA> R (>= … magrit… <NA> <NA> <NA> GPL-3
4 ABACUS 1.0.0 <NA> R (>= … ggplot… <NA> rmarkdo… <NA> GPL-3
5 abaseque… 0.1.0 <NA> <NA> <NA> <NA> <NA> <NA> GPL-3
# ℹ 58 more variables: License_is_FOSS <lgl>, ...
There are a lot of columns in the dataset, we will need to do some data cleaning before we can get it to a format that we can use. We select the import and package columns, we also remove special characters and spaces. Finally, we create a tally column so that we can aggregate the total package dependencies:
Show the code
packages<-cran_20230905%>%
select(from=Imports,to=Package)%>%
mutate(from = strsplit(from, ","))%>%
unnest(from)%>%
mutate(from=gsub("\\s*\\([^\\)]+\\)","",from))%>%
mutate(from=str_replace_all(from, fixed(" "), ""))%>%
mutate(n=1)%>%
drop_na()
packages# A tibble: 94,281 × 3
from to n
<chr> <chr> <dbl>
1 magrittr AATtools 1
2 dplyr AATtools 1
3 doParallel AATtools 1
4 foreach AATtools 1
5 ggplot2 ABACUS 1
6 shiny ABACUS 1
7 httr abbyyR 1
8 XML abbyyR 1
9 curl abbyyR 1
10 readr abbyyR 1
# ℹ 94,271 more rows
We are left with columns: from, to and n. From is the package that the package in question depends on. For instance the AATtools has some dependencies from the magrittr package. This allows us to get the top 25 packages:
Show the code
n_25<-packages%>%
group_by(from)%>%
summarize(total=sum(n))%>%
ungroup()%>%
arrange(-total)%>%
slice_head(n = 25) %>%
rename(Package=from,
`Total Cited Dependencies`=total)
n_25# A tibble: 25 × 2
Package `Total Cited Dependencies`
<chr> <dbl>
1 stats 5028
2 utils 3138
3 dplyr 3004
4 methods 2951
5 ggplot2 2847
6 Rcpp 2425
7 graphics 2058
8 rlang 1843
9 magrittr 1725
10 stringr 1456
# ℹ 15 more rows
Data Visualisation
With our data aggregated the way we want it, we can now proceed to visualise it. For this post we are going to explore interactive tables from the reactablefmtr package. Let’s plot a bar graph:
Show the code
reactable(
n_25,
pagination=FALSE,
defaultColDef = colDef(
cell = data_bars(n_25,
round_edges = TRUE,
border_style = "solid",
border_color = "gold",
border_width = ".8px",
text_position = "above",
number_fmt = scales::comma)
)
)Alternatively we can create a bubble chart, to visualise the same patterns:
Show the code
n_25 %>%
reactable(
defaultColDef = colDef(
align = 'center',
cell = bubble_grid(
data = .,
number_fmt = scales::comma,
min_value = -5000,
max_value = 6000,
)
)
)That’s it for this post! Just some interactive tables with Tidy Tuesday data!