Data Visualization Using Stata

A 4-Day Livestream Seminar Taught by Trenton Mize, Ph.D.

Understanding data and effectively presenting model results are challenges that data analysts face almost every day. There is seldom a more effective solution than a well thought out visualization. Problems in the data are easily identified; complex effects are quickly summarized; effect sizes and variability are immediately clear. In this seminar, we will cover best practices for accurately representing data as well as many specific approaches to data exploration, model diagnostics, and model presentation.

The primary focus is on the applied analyst’s “bread and butter” types of visualizations: those that will be useful in most every research project. However, we also cover more advanced visualization methods.

Starting August 16, we are offering this seminar as a 4-day synchronous*, livestream workshop held via the free video-conferencing software Zoom. Each day will consist of two lecture sessions which include hands-on exercises, separated by a 1-hour break. Participants are encouraged to join the lecture live, but will have the opportunity to view the recorded session later in the day if they are unable to attend at the scheduled time 

*We understand that scheduling is difficult during this unpredictable time. If you prefer, you may take all or part of the course asynchronously. The video recordings will be made available within 24 hours of each session and will be accessible for four weeks after the seminar, meaning that you will get all of the class content and discussions even if you cannot participate synchronously.

Closed captioning is available for all live and recorded sessions.

More Details About the Course Content

Topics covered range from exploratory data analysis techniques to methods for presenting complex model results. Applied exercises will help participants implement the techniques we cover in Stata. Additional template Stata code will be provided to workshop participants, allowing everyone to reproduce all workshop examples.

The seminar will use Stata. Stata is widely-used to clean, examine, model, and visualize data. The data and model visualization capabilities of Stata are impressive yet vastly underutilized by most users. This seminar will teach attendees about best data visualization practices generally—and specific ways to implement these using Stata.


Although the vast majority of the methods taught in the seminar can be implemented in most any statistical software package, we will use Stata exclusively for course exercises and examples.

Stata version 17 will be used for the examples, but the exercises can also be done with versions 14-16.

The lecture slides are accompanied by a full set of Stata replication files. To replicate the instructor’s examples, you should have Stata already installed on your computer when the course begins. No previous experience with Stata is needed, however, because all necessary code will be provided.

If you’d like to familiarize yourself with Stata basics before the seminar begins, we recommend following along with a “getting started” video like the one here.

Seminar participants who are not yet ready to purchase Stata could take advantage of StataCorp’s free 30-day evaluation offer or their 30-day software return policy.

Who Should Register?

If you use data, you can benefit from this seminar. Stata is a flexible and powerful tool for visualizing your data to better understand data and statistical models. This seminar is for anyone who wants to learn tools for creating effective and attractive data visualizations.

No statistical background is required for the seminar—but a working knowledge of regression or ANOVA is helpful to understand some of the more advanced visualization techniques we cover.

Previous experience with Stata is helpful but not required.


Day 1

  • Why visualize data?
    • The science of effective data visualization
  • Introduction to data visualization in Stata
    • Common options universal to most graphs
  • Plots of univariate distributions
    • Histograms; Kernel density plots
      • Overlays for group comparisons
    • Box (and whisker) plots
  • Pies, bars, and dots
    • Perceptual accuracy and choosing plots
      • Why you shouldn’t use pie charts
    • Bar charts
      • Stacked bar charts; group comparisons
    • Dot plots
  • Confidence intervals and standard errors
    • Visual tools for conveying uncertainty

Day 2

  • General data visualization rules and guidelines
    • Axis range rules
    • 3D graphics
    • Using color well
      • Nominal vs ordinal palettes
      • Color blindness-proofing your graphs
      • Figures that work in color or black and white
    • Fonts
    • Graphics file formats
    • Graph schemes
    • Confidence intervals and inferring statistical significance
  • Plots of bivariate relationships
    • Scatterplots
      • Options for continuous and nominal variables
      • Scatterplot smoothing
        • Lowess
          • Incorporating covariates
        • Local polynomial smoothing
      • Plotting change over time (or other continuous variable)
        • Slopegraphs
        • Ridgeline plots
        • Alluvial/Sankey diagrams

Day 3

  • Multilevel and longitudinal data
    • Grouped data
    • Multiple levels
    • Combining multiple graphs into a single figure; small multiples
    • Spaghetti plots
  • Visualizing model results
    • Coefficient plots
      • Comparing across models and/or groups
    • Plots of model predictions
      • Adding distributional information to plots
        • Univariate and group-specific
      • Interaction effects
        • Nominal x nominal interactions
        • Nominal x continuous interactions
        • Continuous x continuous interactions
      • Group comparisons
      • Marginal effects plots
        • Models with a single predicted outcome
        • Models with multiple outcomes
      • Ideal types

Day 4

  • Maps
    • Map projection options; pros and cons
    • Choropleth maps
    • Area vs population issues in visualization
    • World, countries, states, and counties
  • Visualizing covariate balance
    • Balanceplots
      • Experimental data
      • Causal inference matching methods
    • Model diagnostics
      • Residuals
      • Influence
      • Added-variable plots

Seminar information

Tuesday, August 16, 2022 –
Friday, August 19, 2022

Each day will follow this schedule:

10:30am-12:30pm ET (New York time): Live session via Zoom

1:30pm-3:00pm ET: Live session via Zoom

Payment Information

The fee of $895 includes all course materials.

PayPal and all major credit cards are accepted.

Our Tax ID number is 26-4576270.