Michael's Site

New MI Feature: Flux Statistics

The Viya 2024.04 release includes a brand new MI feature: new missing data statistics. An important choice when building an imputation model is the selection of variables to be included. One method to help in the variable selection process is the usage of summary statistics such as influx and outflux, as proposed by van Buuren. In his words: “Influx and outflux are summaries of the missing data pattern intended to aid in the construction of imputation models. Keeping everything else constant, variables with high influx and outflux are preferred. Realize that outflux indicates the potential (and not actual) contribution to impute other variables” ...

Observations on Obstetrics and EBM

We live in the age of evidence-based medicine and an increasing willingness on the part of patients to review medical guidance and actively participate in their care. This includes such personal and emotional areas like pregnancy and birth. Some popular tools to help would-be parents include Emily Oster’s famous “Expecting Better” and evidencebasedbirth.com. ...

Takeaways from 'On the uses and abuses of regression models'

This weekend I found an interesting new preprint by Carlin and Moreno-Betancur on arxiv titled “On the Uses and Abuses of Regression Models” so I had to check it out. The article focuses on medical literature, where regressions – even in my experience – often seem done almost automatically and then interpreted depending on the the desired question as opposed to with respect to model construction. “Garbage can” regressions to find “important risk factors” abound, as do repeat fittings of simple models in an attempt to describe a joint distribution. One of my favorite examples to show in class of the issues with the latter is a 2008 paper by Wang et al. that to date has been cited more than 1,000 times. The topic of the paper is an analysis of NHANES data with the aim of predicting the prevalence of obesity in the US. They desire to describe how different subgroups of Americans, that is the different genders and ethnicities, fare. Instead of fitting a joint model, they fit multiple linear models. This leads to fun results in their Table 2, where all Americans of all races and ethnicities will be obese by 2048, yet all men won’t be obese until 2051. Mexican-American men fare the best, as they escape being part of all Americans somehow and won’t reach 100% prevalence until 2126. ...

Calling R From SAS

The statistics literature is filled with example code and sample data in R. Sometimes I find myself wanting to work through some provided sample data and compare the output from R with SAS code. In this post, I’ll show how to connect R and SAS so that you can load and execute R code straight from within SAS. ...

Automatic Suspend in Fedora 38

For a while now I’ve recycled an old iMac running Fedora Workstation as a simple homeserver. It’s been working well in the past, but just now with the EOL of Fedora 37 did I get around to updating from Fedora 36 to Fedora 38. ...

Reproducibility by Sharing Code

Whenever I speak with students, I emphasize the need to share as much code and data as is feasible to enable reproduciblity. The fact that a large amount of research is not reproducible is a big issue that has gotten a lot of traction in the past two decades since Ioannidis published his influental paper.

Some Basic SQL Joins

A non-technical friend recently asked me for help with a merge problem. They had two separate data pulls of electronic medical records based on specific study parameters. The set of people in the database who fit the study parameters changed in between the data pulls, for example by having people age into our out of a study, or by having new diagnoses added to their records that cause them to either be newly included or excluded. Let’s call the older data set A and the newer data set B. The goal was to get all those entries from B that don’t also show up in A. The data sets were pulled by a staff data scientist at that company who, despite their title, said they couldn’t figure out how to remove those entries from B that were already in A. Barring any special circumstances, this is a fairly standard problem so let’s look at a couple of tools we could use to solve it. ...

Univariate Missing Data with PROC MI

In Chapter 3 of van Buuren’s Flexible Imputation of Missing Data a variety of methods for imputing univariate missing data are presented. This post will summarize these techniques and show how to implement them in SAS. ...

SUC - a Slack Clone for Modern Unix

I love simple CLI tools and am a big fan of the Unix philosophy. Recently I came across The Dam, a public Unix server that implements a clever tool they termed suc - the Simple Unix Chat. Essentially, it applies the Unix philophy to create a simple chat tool that can be used on any modern Unix server. The key code consists of just a few lines of Bash code. Check out the documentation for it here.

Explore C Code With GNU Tools

This post will introduce three GNU tools to help you explore your C code: ctags, cscope, and cflow. The first two can help you navigate your code as you work on it and can be used directly within Vim. Cflow on the other hand produces control charts that help you get to know the control flow in a project, which is particularly helpful if you are new to the codebase. ...