pandas

Some Basic SQL Joins

A non-technical friend recently asked me for help with a merge problem. They had two separate data pulls of electronic medical records based on specific study parameters. The set of people in the database who fit the study parameters changed in between the data pulls, for example by having people age into our out of a study, or by having new diagnoses added to their records that cause them to either be newly included or excluded.

North Carolina Housing Data

A popular beginners machine learning problem is the prediction of housing prices. A frequently used data set for this purpose uses housing prices in California along some additional gathered through the 1990 Census.

Teacher Salaries

What do you do when your data table is in PDF format? Let's use tabula-py to extract teacher salary information from PDFs directly into Pandas dataframes. We'll also use some regex to clean up the results.

Accessing Census Data via API

The Census Bureau makes an incredible amount of data available online. In this post, I will summarize how to get access to this data via Python by using the Census Bureau’s API.