Two Big Ideas from JSM 2018

The Joint Statistical Meetings offer an astounding number of talks. It is impossible for an individual to see more than a small portion of what is going on. Even so, a diligent attendee ought to come away with more than a few good ideas. The following are two big ideas that I got from the conference. Session 149, an invited panel on Theory versus Practice which featured an All-Star team of panelists (Edward George, Trevor Hastie, Elizaveta Levina, John Petkau, Nancy Reid, Richard J Samworth, Robert Tibshirani, Larry Wasserman and Bin Yu), covered a lot of ground and wove a rich tapestry of ideas.

Read more

Share Comments ·

June 2018: Top 40 New Packages

Approximately 144 new packages stuck to CRAN in June. That fact that 31 of these are specialized to particular scientific disciplines or analyses provides some evidence to my hypothesis that working scientists are actively adopting R. Below are my Top 40 picks for June, organized into the categories of Computational Methods, Data, Data Science, Economics, Science, Statistics, Time Series, Utilities and Visualizations. The Data packages, especially rtrek and opensensmapr, look like they have some interesting new data to explore.

Read more

Share Comments · · ·

JSM 2018 Itinerary

JSM 2018 is almost here! Usually around this time, I comb through the entire program manually making an itinerary for myself. But this year I decided to try something new – a programmatic way of going through the program, and then building a Shiny app that helps me better navigate the online program. The end result of the app is below. (I might tweak it a bit further after this post goes live, depending on feedback I receive.

Read more

Share Comments · · ·

REST APIs and Plumber

Moving R resources from development to production can be a challenge, especially when the resource isn’t something like a shiny application or rmarkdown document that can be easily published and consumed. Consider, as an example, a customer success model created in R. This model is responsible for taking customer data and returning a predicted outcome, like the likelihood the customer will churn. Once this model is developed and validated, there needs to be some way for the model output to be leveraged by other systems and individuals within the company.

Read more

Share Comments · · · ·

CVXR: A Direct Standardization Example

In our first blog post, we introduced CVXR, an R package for disciplined convex optimization, and showed how to model and solve a non-negative least squares problem using its interface. This time, we will tackle a non-parametric estimation example, which features new atoms as well as more complex constraints. Direct Standardization Consider a set of observations ((x_i,y_i)) drawn non-uniformly from an unknown distribution. We know the expected value of the columns of (X), denoted by (b \in {\mathbf R}^n), and want to estimate the true distribution of (y).

Read more

Share Comments · · ·

Monte Carlo Shiny: Part Three

In previous posts, we covered how to run a Monte Carlo simulation and how to visualize the results. Today, we will wrap that work into a Shiny app wherein a user can build a custom portfolio, and then choose a number of simulations to run and a number of months to simulate into the future. A link to that final Shiny app is here and here is a snapshot:

Read more

Share Comments · ·

Solver Interfaces in CVXR

Introduction In our previous blog post, we introduced CVXR, an R package for disciplined convex optimization. The package allows one to describe an optimization problem with Disciplined Convex Programming rules using high level mathematical syntax. Passing this problem definition along (with a list of constraints, if any) to the solve function transforms it into a form that can be handed off to a solver. The default installation of CVXR comes with two (imported) open source solvers:

Read more

Share Comments · ·

A First Look at NIMBLE

Writing a domain-specific language (DSL) is a powerful and fairly common method for extending the R language. Both ggplot2 and dplyr, for example, are DSLs. (See Hadley’s chapter in Advanced R for some elaboration.) In this post, I take a first look at NIMBLE (Numerical Inference for Statistical Models using Bayesian and Likelihood Estimation), a DSL for formulating and efficiently solving statistical models in general, and Bayesian hierarchical models in particular.

Read more

Share Comments · ·

May 2018: “Top 40” New Packages

While looking over the 215 or so new packages that made it to CRAN in May, I was delighted to find several packages devoted to subjects a little bit out of the ordinary; for instance, bioacoustics analyzes audio recordings, freegroup looks at some abstract mathematics, RQEntangle computes quantum entanglement, stemmatology analyzes textual musical traditions, and treedater estimates clock rates for evolutionary models. I take this as evidence that R is expanding beyond its traditional strongholds of statistics and finance as people in other fields with serious analytic and computational requirements become familiar with the language.

Read more

Share Comments · · ·

Reading and analysing log files in the RRD database format

I have frequent conversations with R champions and Systems Administrators responsible for R, in which they ask how they can measure and analyze the usage of their servers. Among the many solutions to this problem, one of the my favourites is to use an RRD database and RRDtool. From Wikipedia: RRDtool (round-robin database tool) aims to handle time series data such as network bandwidth, temperatures or CPU load. The data is stored in a circular buffer based database, thus the system storage footprint remains constant over time.

Read more

Share Comments · · ·