Extracting and Analyzing Web and Social Media Data: A Short Course

A 3-Day Livestream Seminar Taught by Monica Alexander, Ph.D.

Download Sample Course Slides

The growth of data available through websites, including social media websites such as Facebook, presents an opportunity for researchers to explore new data sources. Web and social media data show potential particularly for studying online communities and networks, temporary and permanent geographic mobility, and the sharing and spread of information (and misinformation).

Many websites permit the extraction of certain types of data through Application Programming Interfaces (APIs). APIs make it possible to extract large amounts of information about website activity, essentially in real time. These data can then be processed, cleaned, and manipulated in a range of statistical analyses.

In this seminar, you will learn to collect and process data from Spotify, Genius Lyrics, and Facebook’s Advertising Platform, as well as the basics of webscraping. In addition to these data sources, you will also learn how to analyze Twitter data and text data from scientific journal articles. We will cover how to extract data from APIs, geocoding, static and interactive mapping, and an introduction to text analysis methods, including sentiment analysis and topic models. We will do all of our coding in R using the tidyverse style.

Starting September 28, we are offering this seminar as a 3-day synchronous*, livestream workshop held via the free video-conferencing software Zoom. Each day will consist of two lecture sessions which include hands-on exercises, separated by a 1-hour break. You are encouraged to join the lecture live, but will have the opportunity to view the recorded session later in the day if you are unable to attend at the scheduled time.

*We understand that finding time to participate in livestream courses can be difficult. If you prefer, you may take all or part of the course asynchronously. The video recordings will be made available within 24 hours of each session and will be accessible for four weeks after the seminar, meaning that you will get all of the class content and discussions even if you cannot participate synchronously.

Closed captioning is available for all live and recorded sessions. Live captions can be translated to a variety of languages including Spanish, Korean, and Italian. For more information, click here.

More Details About the Course Content

This course focuses on how to extract and use common sources of social media data. You will learn to use APIs to extract information, format that information into datasets that can be used in various analyses, and analyze the available data using a range of different methods. We will be working with data from Spotify, Genius lyrics, Twitter, Facebook’s Advertising Platform, and text data from scientific journal articles.

The course will focus on the whole workflow, from extracting data, data cleaning and preparation, geocoding and mapping, plotting and visualization, and text analysis. You will learn practical skills to take with you in future analyses and projects.

Computing

You are strongly encouraged to use a computer with the most recent version of R installed. You are also encouraged to download and install RStudio, a front-end for R that makes it easier to work with. This software is free and available for Windows, Mac, and Linux platforms. The course will be fairly coding-heavy, so basic familiarity with R and RStudio is assumed.

If you’d like to take this course but are concerned that you don’t know enough R, there are excellent online resources for learning the basics. Here are our recommendations.

Who Should Register?

This course is designed for those who have an interest in using social media data in social research.

Outline

Day 1

  • Introduction
    • Types of social media data
    • Ways of getting social media data (webscraping, APIs)
    • Motivating research
    • Ethical concerns
    • Limitations
  • Revision of tidyverse coding concepts
    •  Tidyverse important functions
    • Graphing in ggplot
  • Extracting Spotify/Genius data
    • Artists discographies
    • Song characteristics
    • Lyrics

Day 2

  • Extracting Facebook Ads data
    • Basic demographics
    • More detailed interests
  •  Geocoding
    • Exact location and self-reported locations
    • Google API, tidygeocoder
  • Mapping
    • Static (ggmap)
    • Interactive (leaflet)
  • Comparison with gold-standard data
    • Merging and comparing population distributions

Day 3

  • Methods for text analysis overview
  • Text cleaning
    • Stop words
    • Stemming
  • Descriptive text analysis
    • tf-idf
    • n-grams
  • Sentiment analysis
  • Topic models
    • LDA

Reviews of Extracting and Analyzing Web and Social Media Data

“Everything about this course was great and the instructor was incredibly knowledgeable! This was one of the only courses I have seen offering how to access social media data.” 
  Kathryn Franklin, UMass Boston 

“Monica is one of the best instructors; she is exceptional! I appreciated how detailed, helpful, and quickly Dr. Alexander responded all queries.” 
  Towhid Islam, University of Guelph 

Seminar Information

Thursday, September 28 –
Saturday, September 30, 2023

Daily Schedule: All sessions are held live via Zoom. All times are ET (New York time).

10:00am-12:30pm (convert to your local time)
1:30pm-3:30pm

Payment Information

The fee of $995 includes all course materials.

PayPal and all major credit cards are accepted.

Our Tax ID number is 26-4576270.