Two Big Ideas from JSM 2018

Joseph Rickert 2018-08-07

The Joint Statistical Meetings offer an astounding number of talks. It is impossible for an individual to see more than a small portion of what is going on. Even so, a diligent attendee ought to come away with more than a few good ideas. The following are two big ideas that I got from the conference. Session 149, an invited panel on Theory versus Practice which featured an All-Star team of panelists (Edward George, Trevor Hastie, Elizaveta Levina, John Petkau, Nancy Reid, Richard J Samworth, Robert Tibshirani, Larry Wasserman and Bin Yu), covered a lot of ground and wove a rich tapestry of ideas.

June 2018: Top 40 New Packages

Joseph Rickert 2018-07-29

Approximately 144 new packages stuck to CRAN in June. That fact that 31 of these are specialized to particular scientific disciplines or analyses provides some evidence to my hypothesis that working scientists are actively adopting R. Below are my Top 40 picks for June, organized into the categories of Computational Methods, Data, Data Science, Economics, Science, Statistics, Time Series, Utilities and Visualizations. The Data packages, especially rtrek and opensensmapr, look like they have some interesting new data to explore.

JSM 2018 Itinerary

Mine Çetinkaya-Rundel 2018-07-25

JSM 2018 is almost here! Usually around this time, I comb through the entire program manually making an itinerary for myself. But this year I decided to try something new – a programmatic way of going through the program, and then building a Shiny app that helps me better navigate the online program. The end result of the app is below. (I might tweak it a bit further after this post goes live, depending on feedback I receive.

REST APIs and Plumber

James Blair 2018-07-23

Moving R resources from development to production can be a challenge, especially when the resource isn’t something like a shiny application or rmarkdown document that can be easily published and consumed. Consider, as an example, a customer success model created in R. This model is responsible for taking customer data and returning a predicted outcome, like the likelihood the customer will churn. Once this model is developed and validated, there needs to be some way for the model output to be leveraged by other systems and individuals within the company.

CVXR: A Direct Standardization Example

Anqi Fu, Balasubramanian Narasimhan, Stephen Boyd 2018-07-20

In our first blog post, we introduced CVXR, an R package for disciplined convex optimization, and showed how to model and solve a non-negative least squares problem using its interface. This time, we will tackle a non-parametric estimation example, which features new atoms as well as more complex constraints. Direct Standardization Consider a set of observations ((x_i,y_i)) drawn non-uniformly from an unknown distribution. We know the expected value of the columns of (X), denoted by (b \in {\mathbf R}^n), and want to estimate the true distribution of (y).

Monte Carlo Shiny: Part Three

Jonathan Regenstein 2018-07-18

In previous posts, we covered how to run a Monte Carlo simulation and how to visualize the results. Today, we will wrap that work into a Shiny app wherein a user can build a custom portfolio, and then choose a number of simulations to run and a number of months to simulate into the future. A link to that final Shiny app is here and here is a snapshot:

Solver Interfaces in CVXR

Anqi Fu and Balasubramanian Narasimhan 2018-07-09

Introduction In our previous blog post, we introduced CVXR, an R package for disciplined convex optimization. The package allows one to describe an optimization problem with Disciplined Convex Programming rules using high level mathematical syntax. Passing this problem definition along (with a list of constraints, if any) to the solve function transforms it into a form that can be handed off to a solver. The default installation of CVXR comes with two (imported) open source solvers:

A First Look at NIMBLE

Joseph Rickert 2018-07-05

Writing a domain-specific language (DSL) is a powerful and fairly common method for extending the R language. Both ggplot2 and dplyr, for example, are DSLs. (See Hadley’s chapter in Advanced R for some elaboration.) In this post, I take a first look at NIMBLE (Numerical Inference for Statistical Models using Bayesian and Likelihood Estimation), a DSL for formulating and efficiently solving statistical models in general, and Bayesian hierarchical models in particular.

May 2018: “Top 40” New Packages

Joseph Rickert 2018-06-26

While looking over the 215 or so new packages that made it to CRAN in May, I was delighted to find several packages devoted to subjects a little bit out of the ordinary; for instance, bioacoustics analyzes audio recordings, freegroup looks at some abstract mathematics, RQEntangle computes quantum entanglement, stemmatology analyzes textual musical traditions, and treedater estimates clock rates for evolutionary models. I take this as evidence that R is expanding beyond its traditional strongholds of statistics and finance as people in other fields with serious analytic and computational requirements become familiar with the language.

Reading and analysing log files in the RRD database format

Andrie de Vries 2018-06-20

I have frequent conversations with R champions and Systems Administrators responsible for R, in which they ask how they can measure and analyze the usage of their servers. Among the many solutions to this problem, one of the my favourites is to use an RRD database and RRDtool. From Wikipedia: RRDtool (round-robin database tool) aims to handle time series data such as network bandwidth, temperatures or CPU load. The data is stored in a circular buffer based database, thus the system storage footprint remains constant over time.