Community and Collaboration: Writing Our Book in the Open

In this introductory post, I preview a series of four blog posts to accompany a book my co-authors and I wrote, Data Science in Education Using R. In the coming weeks, we’ll be sharing about the people and tools in the R community that inspired us to write DSIEUR.

Read more

Share Comments · · · ·

Modern Rule-Based Models

Machine learning models come in many shapes and sizes. While deep learning models currently have the lion’s share of coverage, there are many other classes of models that are effective across many different problem domains. This post gives a short summary of several rule-based models that are closely related to tree-based models (but are less widely known).

Read more

Share Comments · · · · · ·

An R View into Epidemiology

If you been been trying to make sense of the various published COVID-19 models but are not an epidemiologist, you may want to acquire some background knowledge. One good way to do this is to find a handful of relevant R packages with documentation and pointers to source material. This post provides a guided search and turns up some R packages that may be of interest to novices and experts alike.

Read more

Share Comments · · · · ·

Congratulations Class of 2020!

A short address to the BSc and MSc Statistics Graduates of the Cal State East Bay Class of 2020

Read more

Share Comments · ·

Some Upcoming R Related, Virtual Events

A short list of some upcoming R related, virtual conferences and events

Read more

Share Comments · · · ·

Greg Wilson Wins ACM Influential Educator Award

An interview with RStudio’s Greg Wilson who won this year’s ACM’s Influential Educator Award

Read more

Share Comments · · · ·

March 2020: "Top 40" New CRAN Packages

Two hundred ninety-six new packages made it to CRAN in March. Here are my “Top 40” picks in ten categories: Computational Methods, Data, Machine Learning, Mathematics, Medicine, Science, Statistics, Time Series, Utilities, and Visualization. Computational Methods celltrackR v0.3.1: Provides a methodology to analyze cells that move in a two- or three-dimensional space. While the methodology has been developed for cell trajectory analysis, it is applicable to anything that moves including animals, people, or vehicles.

Read more

Share Comments · · ·

10 Commands to Get Started with Git

Roland Stevenson is a data scientist and consultant who may be reached on Linkedin. Git and its online extensions like GitHub, Bitbucket, and GitLab are essential tools for data science. While the emphasis is often on collaboration, Git can also be very useful to the solo practitioner. The RStudio IDE offers Git functionality via a convenient web-based interface (see the “Git” tab), as well as interaction with git via the command-line (via the “Console” tab, or via the “Git” tab’s “More”->“Shell” menu option).

Read more

Share Comments · · · · ·

The Case for tidymodels

If you are a data scientist with a built-out set of modeling tools that you know well, and which are almost always adequate for getting your work done, it is probably difficult for you to imagine what would induce you to give them up. Changing out what works is a task that rarely generates much enthusiasm. Nevertheless, in this post, I would like to point out a few features of tidymodels that could help even experienced data scientists make the case to give tidymodels a try.

Read more

Share Comments · · · · · ·

State Unemployment Claims

In today’s Reproducible Finance post, we will explore state-level unemployment claims which get released every Thursday. The last few weeks have shown huge spikes in those claims, of course, due to the coronavirus and statewide lockdown orders, and it got me wondering how these times will look to data scientists in the future. Let’s start by importing unemployment insurance claims data for Georgia. This is a data series that’s reported by all 50 states.

Read more

Share Comments · · · · · · ·