Using Git with SAS Studio

Git is a widely used version control system that allows users to track their software development in both public and private repositories. It is also increasingly used to store data in text formats, see for example the New York Times COVID-19 data set. This post will briefly demonstrate how to clone and pull updates from a GitHub repository using the git functions that are built into SAS Studio.

Git functionality has been built into SAS Studio for a little while, so there are actually two slightly different iterations of the git functions. The examples in this post will use the versions compatible with SAS Studio 3.8, which is the current version available at SAS OnDemand for Academics. All git functions use the same prefix. In older versions such as SAS Studio 3.8 the prefix is gitfn_, which is followed by a git command such as “clone” or “pull”. In SAS Studio 5, the prefix has been simplified to just git_. Most git functions have the same name between the
two versions, so that the only difference is the prefix. A complete table of the old and new versions of the git functions is available in the documentation.

We use the git functions by calling them in an otherwise empty DATA step. In other words, we use the format

data _null_;
    /* use your git functions here */
run;

Cloning a Repo

To clone a repo from github we use gitfn_clone. It takes two arguments - the URL of the repository of interest and the path to an empty folder. You can have SAS create the folder for you by using OPTIONS DLCREATEDIR. The basic syntax for the clone is as follows:

data _null_;
    rc = gitfn_clone (
     "&repoURL.",    /* URL to repo */
     "&targetDIR."); /* folder to put repo in */
    put rc=;         /* equals 0 if successful */
run;

It doesn’t matter if the URL you use ends in “.git” or not. In other words, the following two macros would both work the same:

%LET repoURL=https://github.com/nytimes/covid-19-data;
/* works the same as */
%LET repoURL=https://github.com/nytimes/covid-19-data.git;

You can also use password based authentication to pull in private repositories:

data _null_;
    rc = gitfn_clone (
     "&repoURL.",   
     "&targetDIR.",
     "&githubUSER.",   /* your GitHub username */
     "&githubPASSW."); /* your GitHub password */
    put rc=;         /* equals 0 if successful */
run;

NOTE: GitHub is deprecating password-based authentication; you will need to switch to OAuth authentication or SSH keys if you are not already using them. To access a repository using an SSH key, use the following:

data _null_;                             
 rc = gitfn_clone(
  "&repoURL.",
  "&targetDIR.",
  "&sshUSER.",
  "&sshPASSW.",
  "&sshPUBkey.",
  "&sshPRIVkey.");
 put rc=;
run;

Pull-ing in Updates

It is just as easy to pull in updates to a local repository by using gitfn_pull("&repoDIR."). This also works with SSH keys for private repositories:

data _null_;
 rc = gitfn_pull(
  "&repoDIR.",
  "&sshUSER.",
  "&sshPASSW.",
  "&sshPUBkey.",
  "&sshPRIVkey.");
run;

Other Functions

SAS also offers other built-in functions, such as _diff, _status, _push, _commit, and others. For a complete list, see the SAS documentation here.

D. Michael Senter
D. Michael Senter
Research Statistician Developer

My research interests include data analytics and missing data.

Related