Data Visualization Using Stata: A Short Course
A 4-Day Livestream Seminar Taught by Trenton Mize, Ph.D.
Understanding data and effectively presenting model results are challenges that data analysts face almost every day. There is seldom a more effective solution than a well thought out visualization. Problems in the data are easily identified; complex effects are quickly summarized; effect sizes and variability are immediately clear. In this seminar, we will cover best practices for accurately representing data as well as many specific approaches to data exploration, model diagnostics, and model presentation.
The primary focus is on the applied analyst’s “bread and butter” types of visualizations: those that will be useful in most every research project. However, we also cover more advanced visualization methods.
Starting July 23, we are offering this seminar as a 4-day synchronous*, livestream workshop held via the free video-conferencing software Zoom. Each day will consist of two lecture sessions which include hands-on exercises, separated by a 1-hour break. You are encouraged to join the lecture live, but will have the opportunity to view the recorded session later in the day if you are unable to attend at the scheduled time.
*We understand that finding time to participate in livestream courses can be difficult. If you prefer, you may take all or part of the course asynchronously. The video recordings will be made available within 24 hours of each session and will be accessible for four weeks after the seminar, meaning that you will get all of the class content and discussions even if you cannot participate synchronously.
Closed captioning is available for all live and recorded sessions. Captions can be translated to a variety of languages including Spanish, Korean, and Italian. For more information, click here.
More Details About the Course Content
Topics covered range from exploratory data analysis techniques to methods for presenting complex model results. Applied exercises will help you implement the techniques we cover in Stata. Additional template Stata code will be provided, allowing you to reproduce all workshop examples.
The seminar will use Stata. Stata is widely-used to clean, examine, model, and visualize data. The data and model visualization capabilities of Stata are impressive yet vastly underutilized by most users. This seminar will teach you about best data visualization practices generally—and specific ways to implement these using Stata.
Computing
This course will use Stata for the examples and exercises. Stata version 18 will be used for the examples, but the exercises can also be done with versions 14-17.
The lecture slides are accompanied by a full set of Stata replication files. To replicate the instructor’s examples, you should have Stata already installed on your computer when the course begins. Basic familiarity with Stata is highly desirable, but even novice Stata coders should be able to follow the presentation and do the exercises. For users new to Stata, an “Introduction to Stata” guide will be provided before the seminar begins which covers the basics of getting started using Stata.
If you’d like to familiarize yourself with Stata basics before the seminar begins, we recommend following along with a “getting started” video like the one here.
Seminar participants who are not yet ready to purchase Stata could take advantage of StataCorp’s 30-day software return policy.
Who Should Register?
If you use data, you can benefit from this seminar. Stata is a flexible and powerful tool for visualizing your data to better understand data and statistical models. This seminar is for anyone who wants to learn tools for creating effective and attractive data visualizations.
No statistical background is required for the seminar—but a working knowledge of regression or ANOVA is helpful to understand some of the more advanced visualization techniques we cover.
Outline
Day 1
- Why visualize data?
- The science of effective data visualization
- Introduction to data visualization in Stata
- Common options universal to most graphs
- Plots of univariate distributions
- Histograms; Kernel density plots
- Overlays for group comparisons
- Box (and whisker) plots
- Histograms; Kernel density plots
- Pies, bars, and dots
- Perceptual accuracy and choosing plots
- Why you shouldn’t use pie charts
- Bar charts
- Stacked bar charts; group comparisons
- Dot plots
- Perceptual accuracy and choosing plots
- Confidence intervals and standard errors
- Visual tools for conveying uncertainty
Day 2
- General data visualization rules and guidelines
- Axis range rules
- 3D graphics
- Using color well
- Nominal vs ordinal palettes
- Color blindness-proofing your graphs
- Figures that work in color or black and white
- Fonts
- Graphics file formats
- Graph schemes
- Confidence intervals and inferring statistical significance
- Plots of bivariate relationships
- Scatterplots
- Options for continuous and nominal variables
- Scatterplot smoothing
- Lowess
- Incorporating covariates
- Local polynomial smoothing
- Lowess
- Plotting change over time (or other continuous variable)
- Slopegraphs
- Ridgeline plots
- Alluvial/Sankey diagrams
- Scatterplots
Day 3
- Multilevel and longitudinal data
- Grouped data
- Multiple levels
- Combining multiple graphs into a single figure; small multiples
- Spaghetti plots
- Visualizing model results
- Coefficient plots
- Comparing across models and/or groups
- Plots of model predictions
- Adding distributional information to plots
- Univariate and group-specific
- Interaction effects
- Nominal x nominal interactions
- Nominal x continuous interactions
- Continuous x continuous interactions
- Group comparisons
- Marginal effects plots
- Models with a single predicted outcome
- Models with multiple outcomes
- Ideal types
- Adding distributional information to plots
- Coefficient plots
Day 4
- Maps
- Map projection options; pros and cons
- Choropleth maps
- Area vs population issues in visualization
- World, countries, states, and counties
- Visualizing covariate balance
- Balanceplots
- Experimental data
- Causal inference matching methods
- Model diagnostics
- Residuals
- Influence
- Added-variable plots
- Balanceplots
Reviews of Data Visualization Using Stata
“This seminar was accessible yet sophisticated. I liked that we were provided ample resources to engage with the content even further.”
Ráchael Powers, University of South Florida
“Trent does a great job of explaining things, giving examples, and thinking through scenarios that people ask about.”
Brandon Crawford, Indiana University
“The instructor was incredibly knowledgeable and organized. I liked how he was able to explain complicated concepts in an accessible way.”
Raj Kumar, Icahn School of Medicine at Mount Sinai
Seminar Information
Tuesday, July 23 –
Friday, July 26, 2024
Daily Schedule: All sessions are held live via Zoom. All times are ET (New York time).
10:30am-12:30pm (convert to your local time)
1:30pm-3:00pm
Payment Information
The fee of $995 includes all course materials.
PayPal and all major credit cards are accepted.
Our Tax ID number is 26-4576270.