Basics of Web Scraping with Python

Abstract

Data acquisition is a key step in research. In this workshop, we will consider how to effectively access publicly available data sets. We will discuss how to find and load data published in CSV/Excel formats. We will learn how to use Pandas to parse HTML tables. We will discuss some best practises for data acquisition and storage.

Date
Jul 30, 2020 1:00 PM — 2:00 PM
Event
Location
Online

This workshop covers data acquisition and basic data preparation with a focus on using Python with Jupyter Notebooks. To avoid having to install Python locally during the workshop, we will be utilizing an Azure notebook project. The example files are located here.

Please note that the free Azure notebooks will only be available until early October. To continue using Python and Jupyter notebooks, you may want to consider using a local installation. For Windows and Mac users, I recommend using Anaconda. For continued cloud usage, you may consider Cocalc. Please note that you will need a subscription for your Cocalc notebooks to be able to download data from external sources.

Additional Links:

D. Michael Senter
D. Michael Senter
Research Statistician Developer

My research interests include data analytics and missing data.