ROC Curves

I have been thinking about writing a short post on R resources for working with (ROC) curves, but first I thought it would be nice to review the basics. In contrast to the usual (usual for data scientists anyway) machine learning point of view, I’ll frame the topic closer to its historical origins as a portrait of practical decision theory. ROC curves were invented during WWII to help radar operators decide whether the signal they were getting indicated the presence of an enemy aircraft or was just noise.

Read more

Share Comments · · · · ·

A Look Back on 2018: Part 1

Welcome to Reproducible Finance 2019! It’s a new year, a new beginning, the Earth has completed one more trip around the sun, and that means it’s time to look back on the previous January to December cycle.

Read more

Share Comments · ·

2018 R Views Review and Highlights

2018 was a good year for R Views. With a total of sixty-three posts for the year, we exceeded the pace of at least one post per week. But, it wasn’t quantity we were shooting for. Our main goal was, and continues to be, featuring thoughtful commentary on topics of interest to the R Community and in-depth technical elaboration of R language applications. Before highlighting a few of my favorite posts for 2018, I would like to express my profound gratitude to our guest bloggers (R Community members who are not employed at RStudio), our regular RStudio contributors who sparkled with creativity while meeting committed deadlines, and you, our readers, who made it all worthwhile.

Read more

Share Comments ·

Rolling Origins and Fama French

Today, we continue our work on sampling so that we can run models on subsets of our data and then test the accuracy of the models on data not included in those subsets. In the machine learning prediction world, these two data sets are often called training data and testing data, but we’re not going to do any machine learning prediction today. We’ll stay with our good’ol Fama French regression models for the reasons explained last time: the goal is to explore a new method of sampling our data and I prefer to do that in the context of a familiar model and data set.

Read more

Share Comments · · ·

November 2018: “Top 40” New Packages

Having absorbed an average of 181 new packages each month over the last 28 months, CRAN is still growing at a pretty amazing rate. The following plot shows the number of new packages since I started keeping track in August 2016. This November, 171 new packages stuck to CRAN. Here is my selection for the “Top 40” organized into the categories: Computational Methods, Data, Finance, Machine Learning, Marketing Analytics, Science, Statistics, Utilities and Visualization.

Read more

Share Comments · · ·

Statistics in Glaucoma: Part III

This blog post is the third installment of a three-part series that introduces the role of statistical methods in glaucoma disease management, and the importance of software in glaucoma research. Part I provides an introduction to glaucoma and the use of visual fields for diagnosis purposes. Part II provides a case study applying a novel Bayesian method to learn about glaucoma progression and its use clinically. Finally, Part III details future directions for statistics in glaucoma, and the importance of accessible software for use in clinical practice.

Read more

Share Comments · · ·

Rsampling Fama French

Today we will continue our work on Fama French factor models, but more as a vehicle to explore some of the awesome stuff happening in the world of tidy models. For new readers who want get familiar with Fama French before diving into this post, see here where we covered importing and wrangling the data, here where we covered rolling models and visualization, my most recent previous post here where we covered managing many models, and if you’re into Shiny, this flexdashboard.

Read more

Share Comments · · · · ·

Statistics in Glaucoma: Part II

This blog post is the second installment of a three-part series that introduces the role of statistical methods in glaucoma disease management, and the importance of software in glaucoma research. Part I provides an introduction to glaucoma and the use of visual fields for diagnosis purposes. Part II provides a case study applying a novel Bayesian method to learn about glaucoma progression and its use clinically. Finally, Part III details future directions for statistics in glaucoma, and details the importance of accessible software for use in clinical practice.

Read more

Share Comments · · · ·

Statistics in Glaucoma: Part I

This blog post is the first installment of a three-part series that introduces the role of statistical methods in glaucoma disease management, and the importance of software in glaucoma research. Part I provides an introduction to glaucoma and the use of visual fields for diagnosis purposes. Part II will provide a case study, applying a novel Bayesian method to learn about glaucoma progression and its use clinically. Finally, Part III will provide some details of future directions for statistics in glaucoma and the importance of accessible software for use in clinical practice.

Read more

Share Comments · · · ·

October 2018: “Top 40” New Packages

One hundred eighty-five new packages made it to CRAN in October. Here are my picks for the “Top 40” in eight categories: Computational Methods, Data, Machine Learning, Medicine, Science, Statistics, Utilities, and Visualization. Computational Methods compboost v0.1.0: Provides a C++ implementation of component-wise boosting written to obtain high run-time performance and full memory control. The vignette shows how to use the package. RcppEnsmallen v0.1.10.0.1: Implements an interface to the C++ based Ensmallen mathematical optimization library that provides a simple set of abstractions for writing an objective function to optimize.

Read more

Share Comments · · · ·