Extracting, Manipulating, and Analyzing Social Media Data: A Short Course
A 3-Day Livestream Seminar Taught by Monica Alexander, Ph.D.
The growth of social media websites like Facebook and Twitter present an opportunity for researchers to explore new data sources. Social media data show potential particularly for studying online communities and networks, temporary and permanent geographic mobility, and the sharing and spread of information (and misinformation).
Most social media websites permit the extraction of certain types of data through Application Programming Interfaces (APIs). APIs make it possible to extract large amounts of information about website activity, essentially in real time. These data can then be processed, cleaned, and manipulated in a range of statistical analyses.
In this seminar, you will learn to collect and process data from Spotify, Genius Lyrics, and Facebook’s Advertising Platform. In addition to these data sources, you will also learn how to analyze Twitter data and text data from scientific journal articles. We will cover how to extract data from APIs, geocoding, static and interactive mapping, and an introduction to text analysis methods, including sentiment analysis and topic models. We will do all of our coding in R using the tidyverse style.
Starting February 23, we are offering this seminar as a 3-day synchronous*, livestream workshop held via the free video-conferencing software Zoom. Each day will consist of two lecture sessions which include hands-on exercises, separated by a 1-hour break. You are encouraged to join the lecture live, but will have the opportunity to view the recorded session later in the day if you are unable to attend at the scheduled time.
*We understand that finding time to participate in livestream courses can be difficult. If you prefer, you may take all or part of the course asynchronously. The video recordings will be made available within 24 hours of each session and will be accessible for four weeks after the seminar, meaning that you will get all of the class content and discussions even if you cannot participate synchronously.
Closed captioning is available for all live and recorded sessions.
More Details About the Course Content
This course focuses on how to extract and use common sources of social media data. You will learn to use APIs to extract information, format that information into datasets that can be used in various analyses, and analyze the available data using a range of different methods. We will be working with data from Spotify, Genius lyrics, Twitter, Facebook’s Advertising Platform, and text data from scientific journal articles.
The course will focus on the whole workflow, from extracting data, data cleaning and preparation, geocoding and mapping, plotting and visualization, and text analysis. You will learn practical skills to take with you in future analyses and projects.
You are strongly encouraged to use a computer with the most recent version of R installed. You are also encouraged to download and install RStudio, a front-end for R that makes it easier to work with. This software is free and available for Windows, Mac, and Linux platforms. The course will be fairly coding-heavy, so basic familiarity with R and RStudio is assumed.
If you’d like to take this course but are concerned that you don’t know enough R, there are excellent online resources for learning the basics. Here are our recommendations.
Who Should Register?
This course is designed for those who have an interest in using social media data in social research.
- Types of social media data
- Ways of getting social media data (webscraping, APIs)
- Motivating research
- Ethical concerns
- Revision of tidyverse coding concepts
- Tidyverse important functions
- Graphing in ggplot
- Extracting Spotify/Genius data
- Artists discographies
- Song characteristics
- Extracting Facebook Ads data
- Basic demographics
- More detailed interests
- Exact location and self-reported locations
- Google API, tidygeocoder
- Static (ggmap)
- Interactive (leaflet)
- Comparison with gold-standard data
- Merging and comparing population distributions
- Methods for text analysis overview
- Text cleaning
- Stop words
- Descriptive text analysis
- Sentiment analysis
- Topic models
Thursday, February 23 –
Saturday, February 25, 2023
Daily Schedule: All sessions are held live via Zoom. All times are ET (New York time).
10:00am-12:30pm (convert to your local time)
The fee of $995 includes all course materials.
PayPal and all major credit cards are accepted.
Our Tax ID number is 26-4576270.