PROC MI Added to SASPy

I’m excited to announce that the new SAPy v4.6.0 release includes a pull request of mine that adds PROC MI to the SAS/STAT procedures directly exposed in SASPy. This procedure allows you to analyze missing data patterns and create imputations for missing data.

Syntax

PROC MI is accessed via the mi function that has been added to the SASstat class. Like other procedures, the SAS statements in MI are called as keyword arguments to the function whose name matches the SAS syntax:1

PROC MI options;
  BY variables;
  CLASS variables;
  EM <options>;
  FCS <options>;
  FREQ variable;
  MCMC <options>;
  MNAR options;
  MONOTONE <options>;
  TRANSFORM transform (variables</ options>) <…transform (variables</ options>)>;
  VAR variables;

Here is the corresponding function signature in Python:

def mi(self, data: ('SASdata', str) = None,
        by: (str, list) = None,
        cls: (str, list) = None,
        em: str = None,
        fcs: str = None,
        freq: str = None,
        mcmc: str = None,
        mnar: str = None,
        monotone: str = None,
        transform: str = None,
        var: str = None,
        procopts: str = None,
        stmtpassthrough: str = None,
        **kwargs: dict) -> 'SASresults':

Statements like EM or MCMC, which can stand alone in SAS, are called with an empty string argument in Python.

Basic Example

To use the new MI functionality, make sure you have updated to the newest SASPy release. In addition to starting a SAS Session as per usual, you will also want to enable access to the SAS/STAT procedures:

sas = saspy.SASsession()  # loads a session using your default profile
stat = sas.sasstat()      # gives access to SAS/STAT procedures 

Once these session objects are loaded, you can start using the mi function with stat.mi. The simplest possible call is to invoke MI with a built-in data set and all defaults as stat.mi(data='sashelp.heart'). For best results, store the output in a SASResults object. From there you can access the SAS log associated with the function call (LOG) as well as all ODS Output using the ODS table names in all caps. The default uses the EM method with 25 imputations.

A more realistic use might look something like this:

ods = stat.mi(data='sashelp.heart', em="outem=outem",
              var="Cholesterol Height Smoking Weight",
              procopts="simple nimpute=20 out=imp")

This is equivalent to running

proc mi data=sashelp.heart simple nimpute=20 out=imp;
    em outem=outem;
    var Cholesterol Height Smoking Weight;
run;

in SAS. This call uses the EM procedure to impute values for the cholesterol, height, smoking, and weight variables. The simple option displays univariate statistics and correlations. The outem option saves a data set containing the computed MLE to work.outem. The imputed data sets are saved to work.imp, which contains the additional variable _IMPUTATION_ with the imputation number. This can be used as a by variable in other procedures, and the results can later be pooled using PROC MIANALYZE.

The resulting ods object for our example exposes the following ODS outputs to your Python instance, in addition to the log:

['CORR', 'EMESTIMATES', 'EMINITESTIMATES', 'EMPOSTESTIMATES', 'MISSPATTERN', 'MODELINFO', 'PARAMETERESTIMATES', 'UNIVARIATE', 'VARIANCEINFO']

See the SAS documentation for details. To use the imputed data with Python tools, create a SAS data object. We’ll also print the first few entries so we can see what it looks like:

imputed = sas.sasdata(table="imp", libref="work")
imputed.head()

  1. One exception is the SAS class statement, which is implemented as cls due to class being a reserved keyword in Python. ↩︎

D. Michael Senter
D. Michael Senter
Research Statistician Developer

My research interests include data analytics and missing data.

Related