March 2019: "Top 40" New CRAN Packages

By my count, two hundred and thirty-three packages stuck to CRAN last month. I have tried to capture something of the diversity of the offerings by selecting packages in ten categories: Computational Methods, Data, Machine Learning, Medicine, Science, Shiny, Statistics, Time Series, Utilities, and Visualization. The Shiny category contains packages that expand on Shiny capabilities, not just packages that implement a Shiny application. It is not clear whether this is going to be a new cottage industry or not.

Read more

Share Comments · · ·

A Few Old Books

Greg Wilson is a data scientist and professional educator at RStudio. My previous column looked at a few new books about R. In this one, I’d like to explore a few books about programming that people coming from data science backgrounds may not have stumbled upon. The first is Michael Nygard’s Release It!, which more than lives up to its subtitle, “Design and Deploy Production-Ready Software”. Most of us can write programs that work for us on our machines; this book explores what it takes to create software that will work reliably for other people, on machines you’ve never met, long after you’ve moved on to your next project.

Read more

Share Comments · · ·

Reproducible Environments

Great data science work should be reproducible. The ability to repeat experiments is part of the foundation for all science, and reproducible work is also critical for business applications. Team collaboration, project validation, and sustainable products presuppose the ability to reproduce work over time. In my opinion, mastering just a handful of important tools will make reproducible work in R much easier for data scientists.

Read more

Share Comments · · · · · · ·

Setting up RStudio Server on a Cloud for Collaboration and Reproducibility

Roland Stevenson is a data scientist and consultant who may be reached on Linkedin. When setting up R and RStudio Server on a cloud Linux instance, some thought should be given to implementing a workflow that facilitates collaboration and ensures R project reproducibility. There are many possible workflows to accomplish this. In this post, we offer an “opinionated” solution based on what we have found to work in a production environment.

Read more

Share Comments · · · · · ·

On Meeting Data Journalists

“I’d rather do data than date”. I overheard this while eavesdropping on a conversation among three female data journalists while waiting for an elevator at the IRE-CAR (Investigative Reporters and Editors - Computer-Assisted Reporting) conference last month. I would like to think the remark was overloaded with hyperbole, but maybe not. Most of the attendees as this conference were motivated, tenacious, and highly skilled data hounds, the kind of investigative journalists who pry information from government databases through persistent requests, legal leverage, and SQL expertise.

Read more

Share Comments · · ·

How to share R visualizations in Microsoft PowerPoint

Hadrien Dykiel is an RStudio Customer Success Engineer Microsoft PowerPoint is often the de facto choice for creating presentation slides, especially at larger companies. In many organizations, it comes pre-installed on workstations and pretty much everybody knows how to use it. This can make it an effective medium for sharing information, since most folks are comfortable with it. Unfortunately, valuable time is often lost manually creating slides. R developers often find themselves copying and pasting their results into presentation decks.

Read more

Share Comments · · ·

RInside Help in Testing

A problem arises when building R interfaces to C/C++ libraries involves testing: how to go about replicating the existing C/C++ tests in R without undue effort. If the C/C++ tests are simple and small enough, they can be manually translated. However, when there are many tests, and each test initializes its own large data structures, the task becomes a chore. We faced this problem with a recent release of the ECOSolveR, a solver package crucial to our larger package CVXR.

Read more

Share Comments · · ·

February 2019: “Top 40” New CRAN Packages

One hundred and fifty-one new packages arrived at CRAN in February. Here are my “Top 40” picks organized into eight categories: Bioinformatics, Data, Machine Learning, Medicine, Statistics, Time Series, Utilities and Visualization. Bioinfomatics Cascade v1.7: Implements a modeling tool allowing gene selection, reverse engineering, and prediction in cascade networks. See Jung et al. (2014) for details, along with a Package Introduction and a vignette on re-analysis. Result of reverse engineering a TH1 network countfitteR v1.

Read more

Share Comments · · ·

How to Avoid Publishing Credentials in Your Code

Roland Stevenson is a data scientist and consultant who may be reached on Linkedin. When accessing an API or database in R, it is often necessary to provide credentials such as a login name and password. You may find yourself being prompted with something like this: When writing an R script that requires a user to provide credentials, you will want a way to have the script prompt the user or, better yet, programatically provided the credentials in the R script.

Read more

Share Comments · ·

The reticulate package solves the hardest problem in data science: people

Andrew Mangano is the Director of eCommerce Analytics at Albertsons Companies. Part I - Modelling The reticulate package integrates Python within R and, when used with RStudio 1.2, brings the two languages together like never before. Much more important than the technical details of how it all works is the impact that it has on on both individuals and teams by enabling data scientists who speak different languages to collaborate seamlessly on a project.

Read more

Share Comments · · ·