• The logo for TidyTuesday

    TidyTuesday: season vignette formats

    For the 2022-03-15 #tidytuesday, we're working with data compiled by Robert Flight. The data reflects vignette uploads to the CRAN and Bioconductor. I wanted to focus on the seasonal nature of uploads, so I used a spiral plot. This was a great opportunity to use the spiralize and ComplexHeatmap packages by Zuguang Gu. I had to rely a lot on the grid functionality to add the title, subtitles, and caption. I found these posts by particularly helpful. Note: I used the zoo package to calculate the 7-day rolling averages. All code is available at github

  • The logo for TidyTuesday

    2 TidyTuesdays

    The last two weeks of #tidytuesday have both involved data that can be spatially mapped. They were a good opportunity to get more familiar with showing information on states in the US or countries in Europe. Alternative fuel sources in the US The data for 2022-03-01 are fueling stations throughout the US that offer alternatives to gasoline or diesel. I used the usmap package to help plot this one: Code for this graphic is here Erasmus exchange program The data for 2022-03-08 come from the Erasmus+ exchange program. It allows students to travel to other countries. I decided to look at which countries received more students than they sent away.…

  • The logo for TidyTuesday

    My first #TidyTuesday

    I've enjoyed lurking the #tidytuesday hastag on Twitter. For those unfamiliar - every Tuesday a new dataset is provided, and folks are encouraged to practice their data visualization skills, especially within the tidyverse. For Black History Month, the goal is to recreate some of the iconic images that W.E.B. Du Bois created for the 1900 Paris Exposition. For this week, the goal is to recreate “Valuation of Town and City Property Owned by Georgia Negroes” (plate 21) Overall, I'm pretty happy with how this turned out. Here's a sneak peak at the final product. You can find all of the code for these plots

  • Cluster mean centering in tidyverse

    I'm re-analyzing some old datasets (e.g. from pilots for my dissertation I ran in 2015) and find myself wanting to re-run some multilevel models. However, the first time I did this, I used grand mean centering. That means I combine the within-cluster effects and between-cluster effects into a single parameter estimate (Curran and Bauer have for a great summary). Instead, I want to cluster mean center. That means calculating the mean of the variable within each cluster, then subtracting the mean of each cluster from the individuals scores in each cluster. Then you include both the cluster means and the cluster mean centered scores in the regression. The coefficient on…

  • Visualize an interaction with ggplot

    I've had to do this enough times (and have to look it up each time) that I decided to memorialize it here. The issue: I have a two-way repeated measures design and I want to visualize all four cells. I'd like one plot to contain the individuals responses as well as the cell means. But I also want to link individuals together. The solution: Plot the individual differences within each level of one of the factors using separate lines for each subject, plus an additional line for the cell means. Here's a simple demo (with a bonus example of how to simulate such a dataset).