Plumber Logging

The plumber R package is used to expose R functions as API endpoints. Due to plumber’s incredible flexibility, most major API design decisions are left up to the developer. One important consideration to be made when developing APIs is how to log information about API requests and responses. This information can be used to determine how plumber APIs are performing and how they are being utilized. An example of logging API requests in plumber is included in the package documentation.

Read more

Share Comments · · · ·

Tech Dividends, Part 1

In a previous post, we explored the dividend history of stocks included in the SP500. Today, we’ll extend that analysis to cover the Nasdaq because, well, because in the previous post I said I would do that. We’ll also explore a different source for dividend data, do some string cleaning and check out ways to customize a tooltip in plotly. Bonus feature: we’ll get into some animation too.

Read more

Share Comments · · · · ·

Validating Type I and II Errors in A/B Tests in R

In this post, we seek to develop an intuitive sense of what type I (false-positive) and type II (false-negative) errors represent when comparing metrics in A/B tests, in order to gain an appreciation for “peeking”, one of the major problems plaguing the analysis of A/B test today. To better understand what “peeking” is, it helps to first understand how to properly run a test. We will focus on the case of testing whether there is a difference between the conversion rates cr_a and cr_b for groups A and B.

Read more

Share Comments · · ·

June 2019 "Top 40" R Packages

Approximately 136 new packages stuck to CRAN in June. (This number is difficult to nail down with certainty because packages may be removed from CRAN after sitting there for a few days.) Here are my picks for the June “Top 40” in ten categories: Computational Methods, Data, Finance, Genomics, Machine Learning, Science and Medicine, Statistics, Time Series, Utilities, and Visualization. Computational Methods cppRouting v1.1: Provides functions to calculate distances, shortest paths and isochrones on weighted graphs using several variants of Dijkstra algorithm.

Read more

Share Comments · · ·

An R Users Guide to JSM 2019

If you are like me, and rather last minute about making a plan to get the most out of a large conference, you are just starting to think about JSM 2019 which will begin in just a few days. My plans always begin with an attempt to sleuth out the R-related sessions. While in the past it took quite a bit of work to identify talks that were likely backed by R-based calculations, this is clearly no longer the case.

Read more

Share Comments · ·

Three Strategies for Working with Big Data in R

For many R users, it’s obvious why you’d want to use R with big data, but not so obvious how. In fact, many people (wrongly) believe that R just doesn’t work very well for big data. In this article, I’ll share three strategies for thinking about how to use big data in R, as well as some examples of how to execute each of them. By default R runs only on data that can fit into your computer’s memory.

Read more

Share Comments · · · ·

Dividend Sleuthing with R

Welcome to a mid-summer edition of Reproducible Finance with R. Today, we’ll explore the dividend histories of some stocks in the S&P 500. By way of history for all you young tech IPO and crypto investors out there: way back, a long time ago in the dark ages, companies used to take pains to generate free cash flow and then return some of that free cash to investors in the form of dividends.

Read more

Share Comments · · · ·

Imagine your Data Before You Collect It

This post introduces the fabricatr package, whose role in the DeclareDesign suite of packages is to simulate data structure and variables. fabricatr helps you to think about your data before you start analysis or even collect it.

Read more

Share Comments · · · ·

May 2019: "Top 40" New CRAN Packages

Two hundred twenty-two new packages made it to CRAN in May, and it was more of an effort than usual to select the “Top 40”. Nevertheless, here they are in nine categories, Computational Methods, Data, Machine Learning, Mathematics, Medicine, Science, Statistics, Utilities and Visualization.

Read more

Share Comments · · · ·

A Gentle Introduction to tidymodels

Recently, I had the opportunity to showcase tidymodels in workshops and talks. Because of my vantage point as a user, I figured it would be valuable to share what I have learned so far. Let’s begin by framing where tidymodels fits in our analysis projects. The diagram above is based on the R for Data Science book, by Wickham and Grolemund. The version in this article illustrates what step each package covers.

Read more

Share Comments · · · ·