Player Data for the 2018 FIFA World Cup

The World Cup starts today! The tournament which runs from June 14 through July 15 is probably the most popular sporting event in the world. if you are a soccer fan, you know that learning about the players and their teams and talking about it all with your friends greatly enhances the experience. In this post, I will show you how to gather and explore data for the 736 players from the 32 teams at the 2018 FIFA World Cup.

Read more

Share Comments ·

Monte Carlo Part Two

In a previous post, we reviewed how to set up and run a Monte Carlo (MC) simulation of future portfolio returns and growth of a dollar. Today, we will run that simulation many, many, times and then visualize the results.

Read more

Share Comments · · ·

Monte Carlo

Today, we change gears from our previous work on Fama French and run a Monte Carlo (MC) simulation of future portfolio returns. Monte Carlo relies on repeated, random sampling. We will sample based on two parameters: mean and standard deviation of portfolio returns. Our long-term goal (long-term == over the next two or three blog posts) is to build a Shiny app that allows an end user to build a custom portfolio, simulate returns and visualize the results.

Read more

Share Comments · · · ·

Exploring R Packages with cranly

In a previous post, I showed a very simple example of using the R function tools::CRAN_package_db() to analyze information about CRAN packages. CRAN_package_db() extracts the metadata CRAN stores on all of its 12,000 plus packages and arranges it into a “database”, actually a complicated data frame in which some columns have vectors or lists as entries. It’s simple to run the function and it doesn’t take very long on my Mac Book Air.

Read more

Share Comments · · ·

April 2018: “Top 40” New Packages

Below are my “Top 40” picks from the approximately 212 new packages that made it to CRAN in April. They are organized into ten categories: Computational Methods, Data, Data Science, Machine Learning, Music, Science, Statistics, Time Series, Utilities, and Visualizations. Computational Methods diffeqr v0.1.1: Provides an interface to DifferentialEquations.jl which offers high performance methods for solving ordinary differential equations (ODE), stochastic differential equations (SDE), delay differential equations (DDE), differential-algebraic equations (DAE), and more.

Read more

Share Comments · ·

Enterprise Dashboards with R Markdown

This is a second post in a series on enterprise dashboards. See our previous post, Enterprise-ready dashboards with Shiny Databases. We have been living with spreadsheets for so long that most office workers think it is obvious that spreadsheets generated with programs like Microsoft Excel make it easy to understand data and communicate insights. Everyone in a business, from the newest intern to the CEO, has had some experience with spreadsheets.

Read more

Share Comments · · · · · · ·

2018 R Conferences

rstudio::conf 2018 and the New York R Conference are both behind us, but we are rushing headlong into the season for conferences focused on the R Language and its applications. The European R Users Meeting (eRum) begins this coming Monday, May 14th, in Budapest with three days of workshops and talks. Headlined by R Core member Martin Mächler and fellow keynote speakers Achim Zeileis, Nathalie Villa-Vialaneix, Stefano Maria Iacus, and Roger Bivand, the program features an outstanding array of accomplished speakers including RStudio’s own Barbara Borges Ribeiro, Andrie de Vries, and Lionel Henry.

Read more

Share Comments · · ·

Rolling Fama French

In a previous post, we reviewed how to import the Fama French 3-Factor data, wrangle that data, and then regress our portfolio returns on the factors. Please have a look at that previous post, as the following work builds upon it.

Read more

Share Comments · · ·

March 2018: "Top 40" New Package Picks

By my count, just over 200 new packages made it to CRAN and stuck during March. The trend for specialized, and sometimes downright esoteric science packages continues. I counted 40 new packages in this class. Most, but not all of these, are focused on bio-science applications. For example, the foreSIGHT package profiled below focuses on climate science. I was also pleased to see two new packages (not from RStudio) in the Data Science category, h2o4gpu and onnx, built on the reticulate package for interfacing with Python.

Read more

Share Comments · ·

An Introduction to Greta

I was surprised by greta. I had assumed that the tensorflow and reticulate packages would eventually enable R developers to look beyond deep learning applications and exploit the TensorFlow platform to create all manner of production-grade statistical applications. But I wasn’t thinking Bayesian. After all, Stan is probably everything a Bayesian modeler could want. Stan is a powerful, production-level probability distribution modeling engine with a slick R interface, deep documentation, and a dedicated development team.

Read more

Share Comments · · ·