Extracting, Manipulating, and Analyzing Social Media Data: A Short Course

A 3-Day Livestream Seminar Taught by Monica Alexander, Ph.D.

The growth of social media websites like Facebook and Twitter present an opportunity for researchers to explore new data sources. Social media data show potential particularly for studying online communities and networks, temporary and permanent geographic mobility, and the sharing and spread of information (and misinformation).

Most social media websites permit the extraction of certain types of data through Application Programming Interfaces (APIs). APIs make it possible to extract large amounts of information about website activity, essentially in real time. These data can then be processed, cleaned, and manipulated in a range of statistical analyses.

In this seminar, you will learn to collect, process, and analyze data from Twitter and Facebook’s Advertising Platform. We will cover how to extract data from APIs, geocoding, static and interactive mapping, and an introduction to text analysis methods, including sentiment analysis and topic models. We will do all of our coding in R using the tidyverse style.

Starting February 23, we are offering this seminar as a 3-day synchronous*, livestream workshop held via the free video-conferencing software Zoom. Each day will consist of two lecture sessions which include hands-on exercises, separated by a 1-hour break. You are encouraged to join the lecture live, but will have the opportunity to view the recorded session later in the day if you are unable to attend at the scheduled time.

*We understand that finding time to participate in livestream courses can be difficult. If you prefer, you may take all or part of the course asynchronously. The video recordings will be made available within 24 hours of each session and will be accessible for four weeks after the seminar, meaning that you will get all of the class content and discussions even if you cannot participate synchronously. 

Closed captioning is available for all live and recorded sessions.

More Details About the Course Content

This course focuses on how to extract and use common sources of social media data. You will learn to use APIs to extract information, format that information into datasets that can be used in various analyses, and analyze the available data using a range of different methods. We will focus on working with Twitter data, but will also investigate data from Facebook’s Advertising Platform.

The course will focus on the whole workflow, from extracting data, data cleaning and preparation, geocoding and mapping, plotting and visualization, and text analysis. You will learn practical skills to take with you in future analyses and projects.

Computing

You are strongly encouraged to use a computer with the most recent version of R installed. You are also encouraged to download and install RStudio, a front-end for R that makes it easier to work with. This software is free and available for Windows, Mac, and Linux platforms. The course will be fairly coding-heavy, so basic familiarity with R and RStudio is assumed.

If you’d like to take this course but are concerned that you don’t know enough R, there are excellent online resources for learning the basics. Here are our recommendations.

Who Should Register?

This course is designed for those who have an interest in using social media data in social research.

Outline

Day 1

  • Introduction
    • Types of social media data
    • Ways of getting social media data (webscraping, APIs)
    • Motivating research
    • Ethical concerns
    • Limitations
  • Extracting Twitter data
    • Timelines/users
    • Topics/hashtags
    • Locations

Day 2

  •  Geocoding
    • Exact location and self-reported locations
    • Google API, tidygeocoder
  • Mapping
    • Static (ggmap)
    • Interactive (leaflet)
  • Extracting Facebook Ads data
    • Migrant populations
  • Comparison with gold-standard data
    • Obtaining census data
    • Merging and comparing population distributions

Day 3

  • Obtaining text data
    • Tweets
    • Newspapers
  • Text cleaning
    • Stop words
    • Stemming
  • Descriptive text analysis
    • tf-idf
    • n-grams
  • Sentiment analysis
  • Topic models
    • LDA
    • CTM/STM

Seminar information

Thursday, February 23 –
Saturday, February 25, 2023

Daily Schedule: All sessions are held live via Zoom. All times are ET (New York time).

10:00am-12:30pm (convert to your local time)
1:30pm-3:30pm

Payment Information

The fee of $995 includes all course materials.

PayPal and all major credit cards are accepted.

Our Tax ID number is 26-4576270.