Basics of Web Scraping with Python
Michael Senter
Goals for Today
Understand what tools and methods are available.
Be able to create a new project using Python and Jupyter.
Be able to edit existing code snippets to gather data.
Python
- easy to learn, reads like “pseudocode”
- widely used in a variety of fields
- many books, websites, etc. to help you learn
print("Hello, world!")
Data Sources
CSV/Excel Downloads
COVID Related Data
Johns Hopkins Dashboard
The Johns Hopkins data is published on GitHub and is updated regularly.
Using SAS
filename outfile "~/import-data-nyt.sas";
/* download official SAS script to above filename */
proc http url="https://raw.githubusercontent.com/sassoftware/covid-19-sas/master/Data/import-data-nyt.sas"
method="get" out=outfile;
run;
/* run the downloaded script */
%include "~/import-data-nyt.sas";
/* state and county level data are now in memory */