Using dplyr and pipes to rename variables

One of the primary things that slows me down in R is data management. In a typical analysis, I’ll import a raw csv file, perform some data management tasks (anonymizing survey responses, renaming variables, recoding, etc.), and export a new csv file with the cleaned data. I still struggle to do this as efficiently in R as I can in Stata.

Here’s dplyr to the rescue when it comes to renaming variables. By using the pipe operator (%>%), I can quickly rename a bunch of variables in a way that is surprisingly readable. You can stack a bunch of dplyr commands in a row, too, knocking out a bunch of data management tasks at once.

Here’s a quick example. First, I’ll load dplyr and create a test dataframe. For grins, I’ll use Purdue and New Orleans Saints star Drew Brees’ NFL passing stats from 2006–2015, which I happen to have sitting on my hard drive:


breesus <- data.frame(
    V1 = c(2006,2007,2008,2009,2010,2011,2012,2013,2014,2015),
    V2 = c(356,440,413,363,448,468,422,446,456,428),
    V3 = c(554,652,635,514,658,657,670,650,659,627),
    V4 = c(26,28,34,34,33,46,43,39,33,32)

Now, use the pipe operator to rename all 4 variables into something more descriptive. Notice how readable and straightforward this is.

breesus <- breesus %>%
    rename(year = V1) %>%
    rename(completions = V2) %>%
    rename(attempts = V3) %>%
    rename(touchdowns = V4)

Check breesus to see if it worked:


## [1] "year"   "completions"   "attempts"  "touchdowns"

UPDATE 2017-03-23

Of course, you don’t need to pipe each of those separately…you can just use one pipe and commas to keep them readable while typing less:

breesus <- breesus %>%
    rename(year = V1,
           completions = V2,
           attempts = V3,
           touchdowns = V4

I try to get less dumb about R each year :)

And, for the fun of it:

ggplot(breesus, aes(x = year, y = 100 * completions/attempts)) +
  geom_line(linetype = "dashed") +
  geom_point(aes(color = touchdowns), size = 3) +
  ylab("Completion Percentage") +
  xlab("Year") +
  scale_x_continuous(breaks = c(2006,2007,2008,2009,2010,2011,2012,2013,2014,2015)) +