In a previous post, we explored IPOs and IPO returns by sector and year since 2004. Today, let’s investigate how portfolios formed with those IPOs have performed. We will need to grab the price histories of the tickers, then form portfolios, then calculate their performance, and then rank those performances in some way.
Since there are several hundred IPOs for which we need to pull returns data, today’s post will be a bit data intensive. If you don’t want to tax your machine too much, you can always subset down the data before running the script. With that, let’s do this!
I won’t go through the code again in detail since we covered it last time, but our first step is to get the tickers and IPO years for a large universe of companies. Here is the code to retrieve the tickers for all companies that went public between 2004 and 2014 - and have survived (meaning today’s analysis is going to have serious survivor bias).
nasdaq <-
tq_exchange("NASDAQ")
amex <-
tq_exchange("AMEX")
nyse <-
tq_exchange("NYSE")
company_ipo_sector <-
nasdaq %>%
bind_rows(amex) %>%
bind_rows(nyse) %>%
select(symbol, company, ipo.year, sector) %>%
filter(!is.na(ipo.year) & ipo.year < 2015)
And here is what we get:
company_ipo_sector %>%
head()
# A tibble: 6 x 4
symbol company ipo.year sector
<chr> <chr> <dbl> <chr>
1 PIH 1347 Property Insurance Holdings, Inc. 2014 Finance
2 FLWS 1-800 FLOWERS.COM, Inc. 1999 Consumer Services
3 VNET 21Vianet Group, Inc. 2011 Technology
4 TWOU 2U, Inc. 2014 Technology
5 JOBS 51job, Inc. 2004 Technology
6 ABIL Ability Inc. 2014 Capital Goods
There are about 1,830 tickers in this data frame and it conveniently includes a column called ipo.year
, which we’ll make us of later in this post. Our next step is to download the price histories of these tickers from tiingo
.
We’ll need to divide our tickers into smaller subsets and pull in the data in pieces so we don’t hit our tiingo
API limits.
tickers_2004_2006 <-
company_ipo_sector %>%
distinct(company, .keep_all = TRUE) %>%
filter(!is.na(sector) & between(ipo.year, 2004, 2006)) %>%
group_by(ipo.year) %>%
pull(symbol)
tickers_2007_2009 <-
company_ipo_sector %>%
distinct(company, .keep_all = TRUE) %>%
filter(!is.na(sector) & between(ipo.year, 2007, 2009)) %>%
group_by(ipo.year) %>%
pull(symbol)
tickers_2010_2012 <-
company_ipo_sector %>%
distinct(company, .keep_all = TRUE) %>%
filter(!is.na(sector) & between(ipo.year, 2010, 2012)) %>%
group_by(ipo.year) %>%
pull(symbol)
tickers_2013_2014 <-
company_ipo_sector %>%
distinct(company, .keep_all = TRUE) %>%
filter(!is.na(sector) & between(ipo.year, 2013, 2014)) %>%
group_by(ipo.year) %>%
pull(symbol)
We now have 4 sets of tickers. Let’s pass them to the riingo
function one at a time and set the resample_frequency
to monthly
, so we’re pulling in 12 prices per year per company instead of ~252.
prices_riingo_2004_2006 <-
tickers_2004_2006 %>%
riingo_prices(start_date = "2004-01-01", resample_frequency = "monthly") %>%
group_by(ticker)
prices_riingo_2007_2009 <-
tickers_2007_2009 %>%
riingo_prices(start_date = "2007-01-01", resample_frequency = "monthly") %>%
group_by(ticker)
prices_riingo_2010_2012 <-
tickers_2010_2012 %>%
riingo_prices(start_date = "2010-01-01", resample_frequency = "monthly") %>%
group_by(ticker)
prices_riingo_2013_2014 <-
tickers_2013_2014 %>%
riingo_prices(start_date = "2013-01-01", resample_frequency = "monthly") %>%
group_by(ticker)
prices_riingo_full <-
prices_riingo_2004_2006 %>%
bind_rows(prices_riingo_2007_2009) %>%
bind_rows(prices_riingo_2010_2012) %>%
bind_rows(prices_riingo_2013_2014)
We have our prices, now let’s add a column of monthly returns with a call to mutate(monthly_returns = close/lag(close) - 1)
.
prices_riingo_full <-
prices_riingo_full %>%
group_by(ticker) %>%
mutate(monthly_returns = close/lag(close) - 1)
One more step, I won’t bore you with the details but after a lot of slogging, I discovered a few errors in my data, where the ipo.year
came after the first year in which the ticker was traded. That’s probably because of some equities being relisted on other exchanges or changing ticker names, but suffice it to say, it jacked up my portfolio analysis! I’m going to remove those problematic tickers with the code below. First, we filter(date == min(date))
to get the first day for which we have returns. Then we isolate the year of that first trade with mutate(first_trade = year(date))
. If that year doesn’t match ipo.year
, we pull()
the ticker.
tickers_remove_mismatch_dates <-
prices_riingo_full %>%
left_join(company_ipo_sector, by = c("ticker" = "symbol")) %>%
filter(date == min(date)) %>%
select(ticker, date, ipo.year) %>%
mutate(first_trade = year(date)) %>%
filter(ipo.year != first_trade) %>%
pull(ticker)
We now have a vector of the tickers to remove, but we haven’t removed them yet. I separated these two steps so I could take a peak at those tickers first, and maybe come back to later to address this issue in a better way.
Let’s go ahead and remove them (I’m going to remove PGTI
also, it was showing up duplicate downstream in my code).
prices_riingo_full <-
prices_riingo_full %>%
filter(!(ticker %in% tickers_remove_mismatch_dates) & ticker != 'PGTI')
A final step before saving: we’ll left_join()
the company_ipo_sector
tibble so that we have company name and ipo.year
as part of this object.
prices_riingo_full <-
prices_riingo_full %>%
left_join(company_ipo_sector, by = c("ticker" = "symbol"))
To avoid going through that process each time we revisit this project, we can save this price/returns data as a .RDS file.
write_rds(prices_riingo_full, "prices_riingo_full.RDS")
But, another exciting way to save and share data is the new pins
package (see intro to pins here). Pins
is a package that makes it easy to save and share data. I’m going to use RStudio Connect for that, but you could use GitHub or kaggle as well.
First, we register our Connect server.
library(pins)
pins::board_register(server = "path to your RStudio Connect server here", key = "your API key here", board = "rsconnect")
Now, we’re ready to pin
this data up to Connect where we, or any of our colleagues, could pull down this data in the future and take advantage of all the gymnastics we just performed to get this in shape.
pins::pin(
x = prices_riingo_full,
name = "ipo_riingo_prices_pins",
description = "Monthly price history of IPOs from 2004 - 2014.",
board = "rsconnect"
)
For the curious, here’s how to retrieve that data from RStudio Connect.
ipo_riingo_prices_pins <-
pins::pin_get(name = "ipo_riingo_prices_pins",
board = "rsconnect")
Now, back to our ultimate goal: we want to see how portfolios that allocated dollars to IPOs each year would have performed. That is, we’ll build a portfolio for 2004, 2005…through to 2014, 11 portfolios consisting solely of the companies that IPO’d in a specific year. Obviously, we are dealing with selection bias, meaning if we had implemented this strategy, we’d have invested in lots of companies that went to zero and delisted. Those dead companies didn’t make it to today’s project because we pulled the tickers of companies currently listed on the AMEX, NYSE and NASDAQ. So don’t try this at home and nothing in this post is investment advice.
Let’s start by building a portfolio of stocks that IPO’d in 2006, assuming we invest equally in each company. We’ll select()
a few columns needed for the analysis and then filter(ipo.year == 2006)
. Next we use the built-in tq_portfolio()
function from tidyquant
wherein we specify the assets_col
as our tickers, the returns_col
as monthly_returns
and paste together a name that incorporates 2006
. We also set it to rebalance_on = "months"
so that each month we make sure to maintain an equal weighting to each company. We do not need to specify a weights
argument because equal weighting is the default.
prices_riingo_full %>%
select(ticker, date, monthly_returns, ipo.year) %>%
filter(ipo.year == 2006) %>%
group_by(ticker) %>%
filter(!is.na(monthly_returns)) %>%
tq_portfolio(assets_col = ticker,
returns_col = monthly_returns,
col_rename = paste(2006, "_port_returns", sep = ""),
wealth.index = F,
rebalance_on = "months") %>%
tail()
# A tibble: 6 x 2
date `2006_port_returns`
<dttm> <dbl>
1 2019-05-31 00:00:00 0.261
2 2019-06-28 00:00:00 0.0533
3 2019-07-31 00:00:00 0.0346
4 2019-08-30 00:00:00 -0.00789
5 2019-09-30 00:00:00 -0.00539
6 2019-10-31 00:00:00 0.0233
# slice(1:3, (n()-3):n())
That code works for 2006, but now we want to scale it for more general use. We can turn the code flow into a function that takes year
as an argument and then apply the function to all of our IPO returns on a year-by-year basis. I’m also going to add a second argument, show_growth
, that can be set to TRUE
or FALSE
if we wish output the growth of a dollar instead of monthly returns.
ipo_by_year_portfolios <- function(year, show_growth = F){
prices_riingo_full %>%
select(ticker, date, monthly_returns, ipo.year) %>%
filter(ipo.year == year) %>%
tq_portfolio(assets_col = ticker,
returns_col = monthly_returns,
col_rename = paste(year, "_port_returns", sep = ""),
wealth.index = show_growth,
rebalance_on = "months")
}
Let’s test that function on 2006 and confirm we get the same result.
ipo_by_year_portfolios(2006) %>%
tail()
# A tibble: 6 x 2
date `2006_port_returns`
<dttm> <dbl>
1 2019-05-31 00:00:00 0.261
2 2019-06-28 00:00:00 0.0533
3 2019-07-31 00:00:00 0.0346
4 2019-08-30 00:00:00 -0.00789
5 2019-09-30 00:00:00 -0.00539
6 2019-10-31 00:00:00 0.0233
And now, we want to apply that function to every year in our data set and mash the results together into one data frame.
We’ll start by creating a vector of years from 2004 to 2014.
years_numeric <- seq(2004, 2014, by = 1)
And now, we map()
our function across that vector of years. If we just ran map(years_numeric, ipo_by_year_portfolios)
, it would result in 11 lists of portfolio returns, one for each year of our IPOs. I want a data frame with all the results stored together. Let’s use reduce(left_join)
for that task.
returns_each_year_ipo_portfolios <-
map(years_numeric, ipo_by_year_portfolios) %>%
reduce(left_join)
returns_each_year_ipo_portfolios %>%
head()
# A tibble: 6 x 12
date `2004_port_retu… `2005_port_retu… `2006_port_retu…
<dttm> <dbl> <dbl> <dbl>
1 2004-02-27 00:00:00 0 NA NA
2 2004-03-31 00:00:00 -0.00296 NA NA
3 2004-04-30 00:00:00 -0.00141 NA NA
4 2004-05-31 00:00:00 0.00703 NA NA
5 2004-06-30 00:00:00 -0.0194 NA NA
6 2004-07-30 00:00:00 -0.0472 NA NA
# … with 8 more variables: `2007_port_returns` <dbl>,
# `2008_port_returns` <dbl>, `2009_port_returns` <dbl>,
# `2010_port_returns` <dbl>, `2011_port_returns` <dbl>,
# `2012_port_returns` <dbl>, `2013_port_returns` <dbl>,
# `2014_port_returns` <dbl>
We now have one data frame with a date column that runs from the beginning of 2004, the first year for which we pulled in IPO tickers, to the today’s date, the last date for which we have returns for these companies. Then we have one column with the returns for each yearly portfolio constructed by equal weighting each company that IPO’d from 2004 to 2014. Since each column represents a year, we start out with values for the 2004 column but the rest have NA for all of 2004. Then in 2005, the 2005 portfolio springs into existence, but the rest of the columns have NA. And then each year another column gets values as new companies IPO in those years.
We constructed this so that each year we allocated fresh money to the new IPOs, equal weighted, with no regard to the past or any other market conditions. We didn’t have to do any of that, of course. We could use a very similar construction to tweak our existing portfolio based on what happened last year, or look at the past performance of basket of funds in a sector and then allocate to certain tickers going forward. In that case, we would map our function to first create a signal year-by-year, then run our allocation algorithm based on the signal from the previous year. Also note that we’re doing something a bit unusual by constructing each portfolio by year and then doing nothing beyond a monthly rebalance: not quite set it and forget it, but pretty close.
Let’s dig in to our new portfolio returns object by calculating the Sharpe Ratio for each of these IPO portfolios. We currently have our data in wide, decidedly un-tidy format so let’s pivot_longer()
this data to make it tidy (long time readers or dplyers might note that we would have used gather()
for this a few months ago - pivot_longer()
has replaced gather()
in the latest release of dplyr
). We are treating each yearly portfolio as a discrete return stream so we can goup_by()
the portfolio_by_year
column. From there we call tq_performance()
, same as if these return streams were from ETFs or any other asset. Our result should be one Sharpe ratio for each year, or 11 Sharpe Ratios.
port_sharpes <-
returns_each_year_ipo_portfolios %>%
pivot_longer(-date, names_to = "portfolio_by_year", values_to = "monthly_return") %>%
group_by(portfolio_by_year) %>%
arrange(portfolio_by_year, date) %>%
filter(!is.na(monthly_return)) %>%
tq_performance(Ra = monthly_return,
performance_fun = SharpeRatio,
Rf = 0,
FUN= "StdDev") %>%
`colnames<-`(c("portfolio_by_year", "port_sharpe"))%>%
add_column(year = years_numeric)
port_sharpes
# A tibble: 11 x 3
# Groups: portfolio_by_year [11]
portfolio_by_year port_sharpe year
<chr> <dbl> <dbl>
1 2004_port_returns 0.234 2004
2 2005_port_returns 0.192 2005
3 2006_port_returns 0.249 2006
4 2007_port_returns 0.190 2007
5 2008_port_returns 0.142 2008
6 2009_port_returns 0.220 2009
7 2010_port_returns 0.279 2010
8 2011_port_returns 0.152 2011
9 2012_port_returns 0.309 2012
10 2013_port_returns 0.182 2013
11 2014_port_returns 0.218 2014
Let’s port this data to plotly
and create a bar chart, where the height of each bar is the portfolio’s Sharpe Ratio.
(
port_sharpes %>%
ggplot(aes(x = year, y = port_sharpe, fill = portfolio_by_year)) +
geom_col(width = .4) +
scale_x_continuous(breaks = scales::pretty_breaks(n = 10)) +
labs(x = "", y = "sharpe by year", title = "IPO Portfolio Sharpe Ratios") +
theme_minimal()
) %>%
ggplotly()
Here is the exact same data, except displayed as a point chart. Notice how defaults to a slightly different scale, with the y-axis not extending all the way to 0. That makes the poor performance of 2008 seem even more dramatic. Were the companies that IPO’d in 2008 all dogs, or is it just a bad idea to invest in IPOs when the market is in a sharp correction because of a financial crisis?
(
port_sharpes %>%
ggplot(aes(x = year, y = port_sharpe, fill = portfolio_by_year)) +
geom_point(size = 2) +
scale_x_continuous(breaks = scales::pretty_breaks(n = 10)) +
labs(x = "", y = "sharpe by year", title = "IPO Portfolio Sharpe Ratios") +
theme_minimal()
) %>%
ggplotly()
If you’re an avid reader of these posts, it won’t surprise you to learn that we’re going to calculate the rolling Sharpe Ratio as well. We can use the same code as last time and the blazing fast roll
package.
library(roll)
returns_each_year_ipo_portfolios %>%
pivot_longer(-date, names_to = "portfolio_by_year", values_to = "monthly_return") %>%
group_by(portfolio_by_year) %>%
arrange(portfolio_by_year, date) %>%
mutate( rolling_mean = roll_mean(as.matrix(monthly_return), 6, complete_obs = T),
rolling_sd = roll_sd(as.matrix(monthly_return), 6, complete_obs = T),
rolling_sharpe = rolling_mean/rolling_sd) %>%
na.omit() %>%
head()
# A tibble: 6 x 6
# Groups: portfolio_by_year [1]
date portfolio_by_ye… monthly_return rolling_mean
<dttm> <chr> <dbl> <dbl>
1 2004-07-30 00:00:00 2004_port_retur… -0.0472 -0.0107
2 2004-08-31 00:00:00 2004_port_retur… 0.00956 -0.00906
3 2004-09-30 00:00:00 2004_port_retur… 0.0493 -0.000354
4 2004-10-29 00:00:00 2004_port_retur… 0.00395 0.000539
5 2004-11-30 00:00:00 2004_port_retur… 0.131 0.0211
6 2004-12-31 00:00:00 2004_port_retur… 0.0737 0.0367
# … with 2 more variables: rolling_sd <dbl>, rolling_sharpe <dbl>
And let’s port these rolling calculations straight to plotly
.
(
returns_each_year_ipo_portfolios %>%
pivot_longer(-date, names_to = "portfolio_by_year", values_to = "monthly_return") %>%
group_by(portfolio_by_year) %>%
arrange(portfolio_by_year, date) %>%
mutate( rolling_mean = roll_mean(as.matrix(monthly_return), 6, complete_obs = T),
rolling_sd = roll_sd(as.matrix(monthly_return), 6, complete_obs = T),
rolling_sharpe = rolling_mean/rolling_sd,
date = ymd(date)) %>%
na.omit() %>%
ggplot(aes(x = date, y = rolling_sharpe, color = portfolio_by_year)) +
scale_x_date(breaks = scales::pretty_breaks(n = 10)) +
geom_line() +
theme_minimal()
) %>%
ggplotly()
Try double clicking on 2004_port_returns
and then clicking on 2008_port_returns
to isolate just those two. Here’s what you should see:
(
returns_each_year_ipo_portfolios %>%
pivot_longer(-date, names_to = "portfolio_by_year", values_to = "monthly_return") %>%
group_by(portfolio_by_year) %>%
arrange(portfolio_by_year, date) %>%
mutate( rolling_mean = roll_mean(as.matrix(monthly_return), 6, complete_obs = T),
rolling_sd = roll_sd(as.matrix(monthly_return), 6, complete_obs = T),
rolling_sharpe = rolling_mean/rolling_sd,
date = ymd(date)) %>%
na.omit() %>%
filter(portfolio_by_year %in% c("2004_port_returns", "2008_port_returns")) %>%
ggplot(aes(x = date, y = rolling_sharpe, color = portfolio_by_year)) +
geom_line() +
scale_x_date(breaks = scales::pretty_breaks(n = 10)) +
geom_line() +
theme_minimal()
) %>%
ggplotly()
Recall that the 2004 portfolio has a Sharpe Ratio that is ~2x the Sharpe Ratio of the 2008 portfolio. But, look at how dramatically the 2004 portfolio plummeted through March of 2008. Would you have been able to ride out that storm? I dunno if I would have.
And with that…we’re done!
Next week we’ll add a short post on how to add a benchmark to today’s work, since the performance of these or any portfolio isn’t of much use without a benchmark. Thanks for reading and see you next time!
If you like this sort of code through check out my book, Reproducible Finance with R.
Not specific to finance, but I’ve been using tons of code learned at Business Science University course.
I’m also going to be posting weekly code snippets on linkedin, connect with me there if you’re keen for some R finance stuff.
Thanks for reading and see you next time!
You may leave a comment below or discuss the post in the forum community.rstudio.com.