PROC MI Added to SASPy

I’m excited to announce that the new SAPy v4.6.0 release includes a pull request of mine that adds PROC MI to the SAS/STAT procedures directly exposed in SASPy. This procedure allows you to analyze missing data patterns and create imputations for missing data. ...

Feb 6, 2023 · 3 min · 546 words · D. Michael Senter

Missing Data Mechanisms

Understanding whether a variable’s missingness from a dataset is related to the underlying value of the data is a key concept in the field of missing data analysis. We distinguish three broad categories: missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR). In his book Statistical Rethinking, McElreath1 gives an amusing example to illustrate this concept: he considers variants of a dog eating homework and how the dog chooses - if at all - to eat the homework. The examples he give show substantial shifts in observed values, which make for a good illustration of the types of problems you might encounter. A lecture corresponding to the example from the book can be found on YouTube. In this post, I will first briefly review the different missing data mechanisms before implementing McElreath’s examples in SAS. ...

Jan 3, 2023 · 7 min · 1386 words · D. Michael Senter

CSV2DS

CSV2DS is a new program I wrote in Go to help me create minimum working examples for SAS that can be shared as a single SAS script.

Nov 23, 2022 · 3 min · 577 words · D. Michael Senter

SAS Markdown for Reproducibility

One of the coolest packages for R is knitr. Essentially, it allows you to combine explanatory writing, such as a paper or blog post, directly with your analysis code in a Markdown document. When the target document is compiled (‘knitted’), the R code in the document is run and the results inserted into the final document. The target document could be an HTML or a PDF file, for example. This is great for many reasons. You have a regular report you want to run, but the data updates? Just re-knit and your entire report is updated. No more separate running of the code followed by copying the results into whatever software you use to build the report itself. This makes it not just less cumbersome, but less error prone. It also improves reproducibility. Somebody wants to see your work, perhaps because they are unsure of your results or they want to extend your work? You can share the markdown file and the other party can see exactly what code was used to generate what part of your report or paper. ...

Nov 11, 2022 · 5 min · 964 words · D. Michael Senter

Loading Zillow Housing Data in SAS

Zillow is a well-known website widely used by those searching for a home or curious to find out the value of their current home. What you may not know is that Zillow has a dedicated research page. To make their website work optimally, they churn through tons of data on the American housing market. They share insights they gleaned via zillow.com/research. If you visit their research website you’ll notice they have a data page where you can download some really cool data sets for your own research. They even have an API with which you can load data directly, but you’ll have to register for access. In this post, we’ll look at how to load the CSV files that are available for direct download into SAS for analysis. ...

Aug 1, 2022 · 5 min · 892 words · D. Michael Senter

The INDSNAME Option in SAS

I frequently find myself needing to concatenate data sets but also wanting to be able to distinguish which row came from which data set originally. Introductory SAS courses tend to teach the in keyword, for a workflow similar to this: ...

Apr 20, 2022 · 1 min · 159 words · D. Michael Senter

Working with the Census API Directly from SAS

A post showing how PROC HTTP and LIBNAME JSON can be used to directly work with the Census API from SAS.

Apr 13, 2022 · 5 min · 1049 words · D. Michael Senter

Cleaning up a Date String with RegEx in SAS

Sometimes we have to deal with manually entered data, which means there is a good chance that the data needs to be cleaned for consistency due to the inevitable errors that creep in when typing in data, not to speak of any inconsistencies between individuals entering data. ...

Sep 29, 2021 · 4 min · 711 words · D. Michael Senter

From Proc Import to a Data Step with Regex

I find myself needing to import CSV files with a relatively large number of columns. In many cases, proc import works surprisingly well in giving me what I want. But sometimes, I need to do some work while reading in the file and it would be nice to just use a data step to do so, but I don’t want to type it in by hand. That’s when a combination of proc import and some regex substitution can come in handy. ...

Jul 29, 2021 · 3 min · 489 words · D. Michael Senter

Making INPUT and LABEL Statements with AWK

I am currently working with a database provided by the North Carolina Department of Public Safety that consists of several fixed-width files. Each of these has an associated codebook that gives the internal variable name, a label of the variable, its data type, as well as the start column and the length of the fields for each column. To import the data sets into SAS, I could copy and paste part of that data into my INPUT and LABEL statements, but that gets tedious pretty fast when dealing with dozens of lines. And since I have multiple data sets like that, I didn’t really want to do it that way. In this post I show how a simple command-line script can be written to deal with this problem. ...

Jul 6, 2021 · 4 min · 836 words · D. Michael Senter