Basics of Web Scraping with Python

Michael Senter


Goals for Today

  • Understand what tools and methods are available.

  • Be able to create a new project using Python and Jupyter.

  • Be able to edit existing code snippets to gather data.


Python

  • easy to learn, reads like “pseudocode”
  • widely used in a variety of fields
  • many books, websites, etc. to help you learn
print("Hello, world!")

Data Sources


CSV/Excel Downloads



Johns Hopkins Dashboard

The Johns Hopkins data is published on GitHub and is updated regularly.


Using SAS

filename outfile "~/import-data-nyt.sas";

/* download official SAS script to above filename */
proc http url="https://raw.githubusercontent.com/sassoftware/covid-19-sas/master/Data/import-data-nyt.sas" 
  method="get" out=outfile;
run;

/* run the downloaded script */
%include "~/import-data-nyt.sas";
/* state and county level data are now in memory */