Highcharting Jobs Friday

2018-08-09

by Jonathan Regenstein

Today, in honor of last week’s jobs report from the Bureau of Labor Statistics (BLS), we will visualize jobs data with ggplot2 and then, more extensively with highcharter. Our aim is to explore highcharter and its similarity with ggplot and to create some nice interactive visualizations. In the process, we will cover how to import BLS data from FRED and then wrangle it for visualization. We won’t do any modeling or statistical analysis today, though it wouldn’t be hard to extend this script into a forecasting exercise. One nice thing about today’s code flow is that it can be refreshed and updated on each BLS release date.

Let’s get to it!

We will source our data from FRED and will use the tq_get() function from tidyquant which enables us to import many data series at once in tidy, tibble format. We want to get total employment numbers, ADP estimates, and the sector-by-sector numbers that make up total employment. Let’s start by creating a tibble to hold the FRED codes and more intuitive names for each data series.

library(tidyverse)
library(tidyquant)

codes_names_tbl <- tribble(
        ~ symbol, ~ better_names,
        "NPPTTL", "ADP Estimate",
        "PAYEMS", "Nonfarm Employment",
        "USCONS", "Construction",
        "USTRADE",   "Retail/Trade",
        "USPBS",  "Prof/Bus Serv",
        "MANEMP",    "Manufact",
        "USFIRE",    "Financial",
        "USMINE",   "Mining",
        "USEHS",    "Health Care",
        "USWTRADE",    "Wholesale Trade",
        "USTPU",    "Transportation",
        "USINFO",    "Info Sys",
        "USLAH",    "Leisure",
        "USGOVT",    "Gov",
        "USSERV",    "Other Services"
)

Now we pass the symbol column to tq_get().

fred_empl_data <- 
  tq_get(codes_names_tbl$symbol,                         
         get = "economic.data",             
         from = "2007-01-01")

We have our data but look at the symbol column.

fred_empl_data %>% 
  group_by(symbol) %>% 
  slice(1)

# A tibble: 15 x 3
# Groups:   symbol [15]
   symbol   date         price
   <chr>    <date>       <dbl>
 1 MANEMP   2007-01-01  14008 
 2 NPPTTL   2007-01-01 115437.
 3 PAYEMS   2007-01-01 137497 
 4 USCONS   2007-01-01   7725 
 5 USEHS    2007-01-01  18415 
 6 USFIRE   2007-01-01   8389 
 7 USGOVT   2007-01-01  22095 
 8 USINFO   2007-01-01   3029 
 9 USLAH    2007-01-01  13338 
10 USMINE   2007-01-01    706 
11 USPBS    2007-01-01  17834 
12 USSERV   2007-01-01   5467 
13 USTPU    2007-01-01  26491 
14 USTRADE  2007-01-01  15443.
15 USWTRADE 2007-01-01   5969.

The symbols are the FRED codes, which are unrecognizable unless you have memorized how those codes map to more intuitive names. Let’s replace them with the better_names column of codes_names_tbl. We will do this with a left_join(). (This explains why I labeled our original column as symbol - it makes the left_join() easier.) Special thanks to Jenny Bryan for pointing out this code flow!

fred_empl_data %>% 
  left_join(codes_names_tbl, 
            by = "symbol" ) %>% 
  select(better_names, everything(), -symbol) %>% 
  group_by(better_names) %>% 
  slice(1)

# A tibble: 15 x 3
# Groups:   better_names [15]
   better_names       date         price
   <chr>              <date>       <dbl>
 1 ADP Estimate       2007-01-01 115437.
 2 Construction       2007-01-01   7725 
 3 Financial          2007-01-01   8389 
 4 Gov                2007-01-01  22095 
 5 Health Care        2007-01-01  18415 
 6 Info Sys           2007-01-01   3029 
 7 Leisure            2007-01-01  13338 
 8 Manufact           2007-01-01  14008 
 9 Mining             2007-01-01    706 
10 Nonfarm Employment 2007-01-01 137497 
11 Other Services     2007-01-01   5467 
12 Prof/Bus Serv      2007-01-01  17834 
13 Retail/Trade       2007-01-01  15443.
14 Transportation     2007-01-01  26491 
15 Wholesale Trade    2007-01-01   5969.

That looks much better, but we now have a column called price, that holds the monthly employment observations, and a column called better_names, that holds the more intuitive group names. Let’s change those column names to employees and sector.

fred_empl_data <- 
fred_empl_data %>% 
  left_join(codes_names_tbl, 
            by = "symbol" ) %>% 
  select(better_names, everything(), -symbol) %>% 
  rename(employees = price, sector = better_names)

head(fred_empl_data)

# A tibble: 6 x 3
  sector       date       employees
  <chr>        <date>         <dbl>
1 ADP Estimate 2007-01-01   115437.
2 ADP Estimate 2007-02-01   115527.
3 ADP Estimate 2007-03-01   115647 
4 ADP Estimate 2007-04-01   115754.
5 ADP Estimate 2007-05-01   115809.
6 ADP Estimate 2007-06-01   115831.

fred_empl_data has the names and organization we want, but it still has the raw number of employees per month. We want to visualize the month-to-month change in jobs numbers, which means we need to perform a calculation on our data and store it in a new column. We use mutate() to create the new column and calculate monthly change with value - lag(value, 1). We are not doing any annualizing or seasonality work here - it’s a simple substraction. For yearly change, it would be value - lag(value, 12).

empl_monthly_change <- 
  fred_empl_data  %>% 
  group_by(sector) %>% 
  mutate(monthly_change = employees - lag(employees, 1)) %>% 
  na.omit()

Our final data object empl_monthly_change is tidy, has intuitive names in the group column, and has the monthly change that we wish to visualize. Let’s build some charts.

We will start at the top and use ggplot to visualize how total non-farm employment (Sorry farmers. Your jobs don’t count, I guess) has changed since 2007. We want an end-user to quickly glance at the chart and find the months with positive jobs growth and negative jobs growth. That means we want months with positive jobs growth to be one color, and those with negative jobs growth to be another color. There is more than one way to accomplish this, but I like to create new columns and then add geoms based on those columns. (Check out this post by Freddie Mac’s Len Kiefer for another way to accomplish this by nesting ifelse statements in ggplot's aesthetics. In fact, if you like data visualization, check out all the stuff that Len writes.)

Let’s walk through how to create columns for shading by positive or negative jobs growth. First, we are looking at total employment here, so we call filter(sector == "Nonfarm Employment") to get only total employment.

Next, we create two new columns with mutate(). The first is called col_pos and is formed by if_else(monthly_change > 0, monthly_change,...). That logic is creating a column that holds the value of monthly change if monthly change is positive, else it holds NA. We then create another column called col_neg using the same logic.

empl_monthly_change %>% 
  filter(sector == "Nonfarm Employment") %>% 
   mutate(col_pos = 
           if_else(monthly_change > 0, 
                  monthly_change, as.numeric(NA)),
         col_neg = 
           if_else(monthly_change < 0, 
                  monthly_change, as.numeric(NA))) %>% 
  dplyr::select(sector, date, col_pos, col_neg) %>% 
  head()

# A tibble: 6 x 4
# Groups:   sector [1]
  sector             date       col_pos col_neg
  <chr>              <date>       <dbl>   <dbl>
1 Nonfarm Employment 2007-02-01      85      NA
2 Nonfarm Employment 2007-03-01     214      NA
3 Nonfarm Employment 2007-04-01      59      NA
4 Nonfarm Employment 2007-05-01     153      NA
5 Nonfarm Employment 2007-06-01      77      NA
6 Nonfarm Employment 2007-07-01      NA     -30

Have a qucik look at the col_pos and col_neg columns and make sure they look right. col_pos should have only positive and NA values, col_neg shoud have only negative and NA values.

Now we can visualize our monthly changes with ggplot, adding a separate geom for those new columns.

empl_monthly_change %>% 
  filter(sector == "Nonfarm Employment") %>% 
   mutate(col_pos = 
           if_else(monthly_change > 0, 
                  monthly_change, as.numeric(NA)),
         col_neg = 
           if_else(monthly_change < 0, 
                  monthly_change, as.numeric(NA))) %>%
  ggplot(aes(x = date)) +
  geom_col(aes(y = col_neg),
               alpha = .85,
               fill = "pink",
               color = "pink") +
  geom_col(aes(y = col_pos),
               alpha = .85,
               fill = "lightgreen",
               color = "lightgreen") +
  ylab("Monthly Change (thousands)") +
  labs(title = "Monthly Private Employment Change",
       subtitle = "total empl, since 2008",
       caption = "inspired by @lenkiefer") +
  scale_x_date(breaks = scales::pretty_breaks(n = 10)) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1),
        plot.title = element_text(hjust = 0.5),
        plot.subtitle = element_text(hjust = 0.5),
        plot.caption = element_text(hjust=0))

That plot is nice, but it’s static! Hover on it and you’ll see what I mean.

Let’s head to highcharter and create an interactive chart that responds when we hover on it. By way of brief background, highcharter is an R hook into the fantastic highcharts JavaScript library. It’s free for personal use but a license is required for commercial use.

One nice feature of highcharter is that we can use very similar aesthetic logic to what we used for ggplot. It’s not identical, but it’s similar and let’s us work with tidy data.

Before we get to the highcharter logic, we will add one column to our tibble to hold the color scheme for our positive and negative monthly changes. Notice how this is different from the ggplot flow above where we create one column to hold our positive changes for coloring and one column to hold our negative changes for coloring.

I want to color positive changes light blue and negative changes pink, and put the rgb codes for those colors directly in the new column. The rgb code for light blue is “#6495ed” and for pink is “#ffe6ea”. Thus we use ifelse to create a column called color_of_bars that holds “#6495ed” (light blue) when monthly_change is postive and “#ffe6ea” (pink) when it’s negative.

total_employ_hc <- 
  empl_monthly_change %>% 
  filter(sector == "Nonfarm Employment") %>% 
  mutate(color_of_bars = ifelse(monthly_change > 0, "#6495ed", "#ffe6ea"))

head(total_employ_hc)

# A tibble: 6 x 5
# Groups:   sector [1]
  sector             date       employees monthly_change color_of_bars
  <chr>              <date>         <dbl>          <dbl> <chr>        
1 Nonfarm Employment 2007-02-01    137582             85 #6495ed      
2 Nonfarm Employment 2007-03-01    137796            214 #6495ed      
3 Nonfarm Employment 2007-04-01    137855             59 #6495ed      
4 Nonfarm Employment 2007-05-01    138008            153 #6495ed      
5 Nonfarm Employment 2007-06-01    138085             77 #6495ed      
6 Nonfarm Employment 2007-07-01    138055            -30 #ffe6ea

Now we are ready to start the highcharter flow.

We start by calling hchart to pass in our data object. Note the similarity to ggplot where we started with ggplot.

Now, intead of waiting for a call to geom_col, we set type = "column" to let hchart know that we are building a column chart. Next, we use hcaes(x = date, y = monthly_change, color = color_of_bars) to specify our aesthetics. Notice how we can control the colors of the bars from values in the color_of_bars column.

We also supply a name = "monthly change" because we want monthly change to appear when a user hovers on the chart. That wasn’t a consideration with ggplot.

library(highcharter)
hchart(total_employ_hc,  
                type = "column", 
                pointWidth = 5,
                hcaes(x = date,
                      y = monthly_change,
                      color = color_of_bars),
                name = "monthly change") %>%
  hc_title(text = "Monthly Employment Change") %>%
  hc_xAxis(type = "datetime") %>%
  hc_yAxis(title = list(text = "monthly change (thousands)")) %>%
  hc_exporting(enabled = TRUE)

Let’s stay in the highcharter world and visualize how each sector changed in the most recent month, which is July of 2018.

First, we isolate the most recent month by filtering on the last date. We also don’t want the ADP Estimate and filter that out as well.

empl_monthly_change %>% 
filter(date == (last(date))) %>%
filter(sector != "ADP Estimate")

# A tibble: 14 x 4
# Groups:   sector [14]
   sector             date       employees monthly_change
   <chr>              <date>         <dbl>          <dbl>
 1 Nonfarm Employment 2018-07-01   149128           157  
 2 Construction       2018-07-01     7242            19  
 3 Retail/Trade       2018-07-01    15944             7.1
 4 Prof/Bus Serv      2018-07-01    21019            51  
 5 Manufact           2018-07-01    12751            37  
 6 Financial          2018-07-01     8568            -5  
 7 Mining             2018-07-01      735            -4  
 8 Health Care        2018-07-01    23662            22  
 9 Wholesale Trade    2018-07-01     5982.           12.3
10 Transportation     2018-07-01    27801            15  
11 Info Sys           2018-07-01     2772             0  
12 Leisure            2018-07-01    16371            40  
13 Gov                2018-07-01    22334           -13  
14 Other Services     2018-07-01     5873            -5

That filtered flow has the data we want, but we have two more tasks. First, we want to arrange this data so that it goes from smallest to largest. If we did not do this, our chart would still “work”, but the column heights would not progress from lowest to highest.

Second, we need to create another column to hold colors for negative and positive values, with the same ifelse() logic as we used before.

emp_by_sector_recent_month <- 
  empl_monthly_change  %>% 
  filter(date == (last(date))) %>%
  filter(sector != "ADP Estimate") %>% 
  arrange(monthly_change) %>% 
  mutate(color_of_bars = if_else(monthly_change > 0, "#6495ed", "#ffe6ea"))

Now we pass that object to hchart, set type = "column", and choose our hcaes values. We want to label the x-axis with the different sectors and do that with hc_xAxis(categories = emp_by_sector_recent_month$sector).

last_month <- lubridate::month(last(empl_monthly_change$date),
                                 label = TRUE, 
                                 abbr = FALSE)

hchart(emp_by_sector_recent_month,  
                type = "column", 
                pointWidth = 20,
                hcaes(x = sector,
                      y = monthly_change,
                      color = color_of_bars),
                showInLegend = FALSE) %>% 
  hc_title(text = paste(last_month, "Employment Change", sep = " ")) %>%
  hc_xAxis(categories = emp_by_sector_recent_month$sector) %>%
  hc_yAxis(title = list(text = "Monthly Change (thousands)"))

Finally, let’s compare the ADP Estimates to the actual Nonfarm payroll numbers since 2017. We start with filtering again.

adp_bls_hc <- 
  empl_monthly_change %>% 
  filter(sector == "ADP Estimate" | sector == "Nonfarm Employment") %>% 
  filter(date >= "2017-01-01")

We create a column to hold different colors, but our logic is not whether a reading is positive or negative. We want to color the ADP and BLS reports differently.

adp_bls_hc <- 
  adp_bls_hc %>% 
  mutate(color_of_bars = 
           ifelse(sector == "ADP Estimate", "#ffb3b3", "#4d94ff"))

head(adp_bls_hc)

# A tibble: 6 x 5
# Groups:   sector [1]
  sector       date       employees monthly_change color_of_bars
  <chr>        <date>         <dbl>          <dbl> <chr>        
1 ADP Estimate 2017-01-01   123253.           245. #ffb3b3      
2 ADP Estimate 2017-02-01   123533.           280. #ffb3b3      
3 ADP Estimate 2017-03-01   123655            122. #ffb3b3      
4 ADP Estimate 2017-04-01   123810.           155. #ffb3b3      
5 ADP Estimate 2017-05-01   124012.           202. #ffb3b3      
6 ADP Estimate 2017-06-01   124166.           154. #ffb3b3

tail(adp_bls_hc)

# A tibble: 6 x 5
# Groups:   sector [1]
  sector             date       employees monthly_change color_of_bars
  <chr>              <date>         <dbl>          <dbl> <chr>        
1 Nonfarm Employment 2018-02-01    148125            324 #4d94ff      
2 Nonfarm Employment 2018-03-01    148280            155 #4d94ff      
3 Nonfarm Employment 2018-04-01    148455            175 #4d94ff      
4 Nonfarm Employment 2018-05-01    148723            268 #4d94ff      
5 Nonfarm Employment 2018-06-01    148971            248 #4d94ff      
6 Nonfarm Employment 2018-07-01    149128            157 #4d94ff

And now we pass that object to our familiar hchart flow.

hchart(adp_bls_hc, 
       type = 'column', 
       hcaes(y = monthly_change,
             x = date,
             group = sector, 
             color = color_of_bars),
       showInLegend = FALSE
       ) %>% 
  hc_title(text = "ADP v. BLS") %>%
  hc_xAxis(type = "datetime") %>%
  hc_yAxis(title = list(text = "monthly change (thousands)")) %>%
  hc_add_theme(hc_theme_flat()) %>%
  hc_exporting(enabled = TRUE)

That’s all for today. Try revisiting this script on September 7th, when the next BLS jobs data is released, and see if any new visualizations or code flows come to mind.

See you next time and happy coding!