There are five more sets like that in the data. For each I needed to reshape such that there was a column for time and a column identifying the type of measure. Something like this…
time affirm
1 3
2 2
3 2.33
4 3
5 3
1 4.33
2 4
...
The original approach
My first attempt was pretty verbose but it worked. You can tell my mastery of pivot_longer() did not exceed much beyond the basics. Yes, I knew how to use the names_transform argument to extract the numbers from the column labels, but that was about it.
P_ID time age gender male white race affirm explor appSUM emoSUM infoSUM
1 1 1 18 1 1 0 2 3.000000 2.333333 NA NA NA
2 1 2 18 1 1 0 2 2.000000 2.666667 9 7 8
3 1 3 18 1 1 0 2 2.333333 2.333333 11 9 9
4 1 4 18 1 1 0 2 3.000000 3.000000 6 10 8
5 1 5 18 1 1 0 2 3.000000 3.333333 NA NA NA
racematch
1 NA
2 0
3 1
4 1
5 NA
This did the trick and I was able to move on, but it didn’t sit right with me. I thought I remembered Jacob demonstrating in his data wrangling workshop how to reshape datasets similar to this using just one call to pivot_longer(). So today I decided to revisit this code and see what I could do with it.
The elegant approach
After some fiddling around with the pivot_longer() function and reading the {tidyr} “Pivoting” vignette, I was finally able to implement this in a much more elegant fashion. (I should have referred to Jacob’s workshop materials, but I didn’t have them handy and decided to see if I could figure it out on my own. Of course, once I did it, I then had to go download his materials and check if this approach is what he demonstrated, and it is!)
This is for my benefit so I remember how this works:
cols = !c(1:6) says to reshape all columns but the first six
names_to = c("time", ".value") uses the keyword .value which “indicates that the corresponding component of the column name defines the name of the output column containing the cell values” (from the help page).
names_pattern = "(w[1-5])(.+)" defines the pattern of the column names: “the letter w followed by a number 1-5, (w[1-5]), and then everything else, (.+).
names_transform = list(time = readr::parse_number) says to transform the time column to a number using the parse_number function.
In the grand scheme, I’m not sure this makes a difference. Most people don’t care about data wrangling but rather the analysis. Although my first attempt was inefficient, it was easy for me to verify that each set of columns was successfully reshaped. I don’t know if I’ll ever get to a place where it becomes routine for me to reshape data like this casually using one call to pivot_longer(). But I thought it might be of interest to see what pivot_longer() is capable of.