tidyposterior's Bayesian Approach to Model Comparison

A task common to many machine learning workflows is to compare the performance of several models with respect to some metric such as accuracy or area under the ROC curve. Standard practice is to try out several different algorithms on a training data set and see which works better. Unfortunately, all to often, after this work has been done, model selection comes down to “eyeballing” several different ROC curves. If you find eyeballing a little too informal, then take a look at the tidyposterior package (part of the universe of ‘tidymodels`).

Read more

Share Comments · · · · · · · ·

IPO Portfolios and a Benchmark

In two previous posts, we explored IPOs and IPO returns by sector and year since 2004 and then examined the returns of portfolios constructed by investing in IPOs each year. In today’s post, we will add a benchmark so that we can compare our IPO portfolios to something besides themselves. Next time, we will delve into return attribution to visualize how individual equities have contributed to portfolios over time.

Read more

Share Comments · ·

In-Database Logistic Regression with R

Roland Stevenson is a data scientist and consultant who may be reached on Linkedin. In a previous article we illustrated how to calculate xgboost model predictions in-database. This was referenced and incorporated into tidypredict. After learning more about what the tidypredict team is up to, I discovered another tidyverse package called modeldb that fits models in-database. It currently supports linear regression and k-means clustering, so I thought I would provide an example of how to do in-database logistic regression.

Read more

Share Comments · · ·

Introducing sortable to add drag-and-drop to your shiny apps

You can use the sortable package to add drag-and-drop behaviour to shiny apps.

Read more

Share Comments · · · · · ·

October 2019: "Top 40" New R Packages

Two Hundred twenty-three new packages made it to CRAN in October. Here are my “Top 40” picks in ten categories: Computational Methods, Data, Genomics, Machine Learning, Mathematics, Medicine, Pharmacology, Statistics, Utilities, and Visualization. Computational Methods admmDensestSubmatrix v0.1.0: Implements a method to identify the densest sub-matrix in a given or sampled binary matrix. See Bombina et al. (2019) for the technical details and the vignette for examples. mbend v1.2.3: Provides functions to “bend”” non-positive-definite (symmetric) matrices to positive-definite matrices using weighted and unweighted methods.

Read more

Share Comments · · ·

IPO Exploration Part Two

In a previous post, we explored IPOs and IPO returns by sector and year since 2004. Today, let’s investigate how portfolios formed with those IPOs have performed. We will need to grab the price histories of the tickers, then form portfolios, then calculate their performance, and then rank those performances in some way. Since there are several hundred IPOs for which we need to pull returns data, today’s post will be a bit data intensive.

Read more

Share Comments · · · ·

A comparison of methods for predicting clothing classes using the Fashion MNIST dataset in RStudio and Python (Part 1)

Florianne Verkroost is a PhD candidate at Nuffield College at the University of Oxford. With a passion for data science and a background in mathematics and econometrics. She applies her interdisciplinary knowledge to computationally address societal problems of inequality. In this series of blog posts, I will compare different machine and deep learning methods to predict clothing categories from images using the Fashion MNIST data. In this first blog of the series, we will explore and prepare the data for analysis.

Read more

Share Comments · · · · · ·

A First Look at Confidence Distributions

Using a probability distribution to characterize uncertainty is at the core of statistical inference. So, it seems natural to try to summarize the information about the parameters in statistical models with probability distributions.

Read more

Share Comments · · · · · ·

Sept 2019: "Top 40" New R Packages

One hundred and thirteen new packages made it to CRAN in September. Here are my “Top 40” picks in eight categories: Computational Methods, Data, Economics, Machine Learning, Statistics, Time Series, Utilities, and Visualization. Computational Methods eRTG3D v0.6.2: Provides functions to create realistic random trajectories in a 3-D space between two given fixed points (conditional empirical random walks), based on empirical distribution functions extracted from observed trajectories (training data), and thus reflect the geometrical movement characteristics of the mover.

Read more

Share Comments · · ·

IPO Exploration

Inspired by recent headlines like “Fear Overtakes Greed in IPO Market after WeWork Debacle” and “This Year’s IPO Class is Least Profitable since the Tech Bubble”, today we’ll explore historical IPO data, and next time we’ll look at the the performance of IPO-driven portfolios constructed during the ten-year period from 2004 to 2014. I’ll admit, I’ve often wondered how a portfolio that allocated money to new IPOs each year might perform since this has to be an ultimate example of a few headline-gobbling whales dominating the collective consciousness.

Read more

Share Comments · · · ·